Fetal Heart Rate AI vs Clinicians: CTG Interpretation Study

and delivery. Fetal monitor tracings update in real time, infusion pumps track medication flow, and maternal vital signs are recorded continuously. These signals must be interpreted simultaneously, often under time pressure to inform clinical decisions.

In practice, this data is fragmented across multiple systems. Nurses and physicians navigate separate interfaces, documentation workflows, and device outputs that are not always aligned. Information must be located, interpreted, and reconciled across screens, creating operational friction at the point of care. Within this environment, one of the most critical and least standardized tasks remains the interpretation of the fetal heart rate tracing.

Recently at the Society for Maternal-Fetal Medicine conference, GE HealthCare presented research on a fetal heart rate AI model^[1] designed to analyze cardiotocography (CTG) waveforms and label baseline rate, accelerations, decelerations, and contraction activity using a combination of deep learning and clinical rules.

Objective

We set out to evaluate how the fetal heart rate AI interpretation algorithm performs when directly compared with practicing clinicians interpreting CTGs. Rather than focusing on sensitivity or specificity alone, the study used a clinically meaningful benchmark: whether a panel of expert clinicians would accept each interpretation as reasonable and accurate. This mirrors how CTG interpretation disagreements occur in real practice and provides a pragmatic measure of performance. The hypothesis was that the AI’s performance would be non-inferior to clinical readers, meaning its CTG interpretations would be accepted by the clinician review panel at a rate comparable to, or not meaningfully lower than, the acceptance rate of human readers.

Methods

The study was a retrospective non-inferiority evaluation using CTG data collected at five U.S. hospital facilities between 2013 and 2024. From a dataset of 2,639 de-identified fetal tracings, 800 were randomly selected for analysis. Each tracing was interpreted independently by the AI algorithm and by clinical readers, and both interpretations were reviewed by an expert clinician panel. The panel determined whether the clinician was correct, the AI was correct, both were correct, or neither was correct. Those judgments were used to calculate acceptance rates. The primary endpoints were baseline, accelerations, decelerations, contraction frequency, and contraction duration. The secondary endpoints were variability and deceleration type. A 15 percent non-inferiority margin was predefined.

Results

In a retrospective analysis, outputs for four of five evaluated primary endpoints fell within the study’s pre-specified non-inferiority margin based on expert-panel acceptance rates. The difference in acceptance rates between clinical readers and the algorithm was 3.3 percentage points for baseline, 3.0 for accelerations, 7.1 for decelerations, and 8.2 for contraction frequency. All of those remained within the predefined margin.

An important note: contraction duration did not meet the pre-specified margin in the base analysis. An exploratory analysis using a 20-second variability window produced different results, which indicates the need for future exploration.

The results also show that the task itself is variable, even among clinicians. Baseline interpretation was accepted at 69.2 percent for clinical readers, while contraction frequency was accepted at 43.0 percent. This is not a comparison against a perfect gold standard. It is a comparison within a domain where inter-observer variability is already well established.^[2]

GE HealthCare is exploring whether AI could help standardize review of selected CTG parameters, potential workflow implications have not yet been established. The study did not evaluate clinical decision-making or maternal/neonatal outcomes.

These explorations follow the introduction of CareIntellect for Perinatal^[3], CareIntellect for Perinatal is designed to aggregate selected maternal and fetal data streams into a single view for clinician review. Supported by cloud-based infrastructure, the offering is designed to enable continuous data integration, real-time waveform processing, and a context-aware view CareIntellect for Perinatal is a current data-integration application. Separately, GE HealthCare is conducting research on fetal heart rate AI.

The study suggests that fetal heart rate AI may be able to support clinicians in one of the most variable aspects of intrapartum monitoring. At the same time, CareIntellect for Perinatal is designed to provide the clinical environment in which that intelligence can be surfaced and used. As labor and delivery units manage rising data complexity, staffing pressure, and the need for more consistent interpretation, the significance of this research has potential real-world outcomes for perinatal care.

^[1] Technology in development that represents ongoing research and development efforts. These technologies are not products and may never become products.

^[2] The study explored alignment between algorithm outputs and expert-accepted interpretations on selected parameters. The study was not intended to show direct improvement in maternal or neonatal outcomes.

^[3] CareIntellect Perinatal formerly Mural Perinatal Surveillance, a 510(k)-cleared device in the U.S. is the product name. CareIntellect for Perinatal may appear as descriptive content.

Evaluating AI approaches for fetal heart rate interpretation

Share

Focus Area

Topics

Evaluating AI approaches for fetal heart rate interpretation

Share

Focus Area

Topics

Related Content

3 things to know about GE HealthCare’s Technology & Innovation Center (HTIC)

Live From NVIDIA GTC: The Control Plane for Scaling AI

Unveiling MammoDINO: an anatomically aware vision foundation model for mammography