GE HealthCare recently announced collaborations with Mass General Brigham and the University of Wisconsin–Madison to explore and evaluate its 3D MR research foundation model1. GE HealthCare’s model is one of the early large-scale, full-body and multi-sequence 3D MR foundation models (MR FM) designed to support research that may inform future studies on clinical and operational applications. Trained on more than 200,000 MR images from multi-site studies2, this 3D FM is built to learn the physics, patterns, and anatomy unique to MR, demonstrating adaptability to new imaging tasks with minimal additional data.
The GE HealthCare MR foundation model was built on AWS cloud infrastructure, which provides secure, scalable, high-performance computing for medical imaging AI. It uses flexible storage for large MRI datasets, fast data access, and multiple GPUs working together through Fully Sharded Data Parallel (FSDP), a method that splits the model and data across GPUs to save memory and improve training efficiency. The model learns from hundreds of thousands of MRI scans using advanced methods like 3D vision transformers (for understanding 3D images), multimodal contrastive pre-training (for linking different MRI types), and self-supervised student-teacher learning (where the model improves by teaching itself).
For clinicians and researchers, the adaptability of the MR FM shows promise to support research efficiency and studies in:
- Disease classification, which categorizes abnormalities
- Segmentation, which outlines anatomy or lesions for precise localization
- Grounding, which links text descriptions to image regions
- Report generation, which structures findings consistently

These capabilities are being explored for potential impact on diagnostic accuracy and workflow efficiency. The model’s multimodal foundation supports both clinical and operational research use cases: in clinical settings such as at MGB, we are analyzing the model’s ability to verify the right organ, measurement, and diagnostic context, while in operational settings like UW, it focuses on sequence verification and scan quality checks to assess whether a scan was completed correctly. Through fine-tuning and evaluation, the model may support improved consistency, more reproducible assessments, and reduced reporting variability, helping strengthen both diagnostic and operational research confidence.
Mass General Brigham collaboration explores potential for improved prostate MR precision
Mass General Brigham is working with GE HealthCare to evaluate and refine the MR FM for prostate imaging, one of the most data-intensive and technically demanding areas in radiology. The collaboration centers on three core research themes: disease classification, lesion segmentation, and quantitative measurement. These studies are designed to assess performance across medical device vendors and imaging sequences, capturing the complexity and variability of real-world clinical workflows.
Disease classification
Disease classification involves teaching AI systems to differentiate between healthy and abnormal tissue patterns on imaging. In our internal research study, the model was fine-tuned using only 500 prostate MR studies representing a low-data paradigm that included T2-weighted, diffusion-weighted (DWI), and apparent diffusion coefficient (ADC) sequences.
The dataset was predominantly scanned on 3.0T multi-vendor systems (513/549, 93.4%) from GE HealthCare (50.8%) and multi-vendor (49.2%). A binary PI-RADS classification task was defined: Class 0 (very low to low risk of clinically significant prostate cancer: PI-RADS 1/2, n=168) vs. Class 1 (intermediate to very high risk: PI-RADS 3/4/5, n=381). The classification was performed in a fully automated manner, without user input or ROI placement, differing from semi-automated approaches that require manual intervention.
The model was evaluated using 5-fold cross-validation. Training used the AdamW optimizer for up to 50 epochs with early stopping to prevent overfitting. Across folds, the model reached an average area under the ROC curve (AUC) of about 0.75 ± 0.03, reflecting good class discrimination. Its mean accuracy was approximately 75%.
“Achieving around 75% accuracy in differentiating PI-RADS 1–2 from PI-RADS 3–5 based solely on raw MRI inputs demonstrates the research potential of foundation models,” said Dr. Mukesh Harisinghani, Director of Abdominal and Body Translational MRI, Massachusetts General Hospital. “This represents an early research step toward evaluating AI support for prostate MRI precision.”
When trained using center cropping—a technique that directs the model’s attention to the prostate gland—AUC improved modestly (+0.04 absolute), suggesting that a targeted field of view can enhance model focus and classification performance in research settings.
Research at the University of Wisconsin–Madison explores operational efficiency potential
At the University of Wisconsin–Madison, researchers are evaluating the same MR foundation model for operational research use cases, such as brain tumor classification, sequence verification, image quality control, anatomy detection, and contrast agent recognition across multi-anatomy/sequence MR images.
By leveraging task-specific adapter networks such as multilayer perceptrons (MLPs), features extracted from MR images can be repurposed across different tasks, enabling research efficiency without compromising accuracy. These adapters make it feasible to personalize foundation models for diverse healthcare tasks quickly and cost-effectively, helping bridge the gap between general AI capability and clinical research precision.
Model evaluation performance
The Wisconsin team demonstrated that a two-layer MLP classifier built on features extracted using the MRFM achieved approximately 88% accuracy and 0.95 ROC-AUC for brain tumor classification on the public BraTs2020 benchmark. This closely matched the performance of an end-to-end trained 3D DenseNet-121 model (90% accuracy; 0.94 ROC-AUC).3
When using a proprietary institutional cohort, both models exhibited comparable performance across demographic classification tasks, with mean accuracies of 89.93% (3D DenseNet-121) and 89.90% (MR FM) and mean ROC-AUC values of 79.90% and 79.79%, respectively.
Model training efficiency
The team also demonstrated that adapter-based classifier training could be completed in approximately 0.5 hours per task on a CPU—up to 35x faster than conventional end-to-end deep learning models. This approach allows rapid research adaptation of large models for new scanners or updated protocols with minimal downtime.
Contrast agent recognition
Contrast agent recognition is important in MR because it enhances visibility of certain tissues, making it possible to detect and analyze abnormalities with greater detail. Researchers at the University of Wisconsin–Madison are conducting ongoing studies using GE HealthCare’s 3D MR FM for contrast agent recognition.
Together, these studies illustrate how foundation model research could reduce barriers to AI use across disease care pathways—supporting radiologists and technologists in maintaining research quality and consistency while exploring new applications.
Building a foundation for future MR AI research
GE HealthCare’s 3D MR foundation model was developed as part of ongoing research into volumetric imaging. During pretraining, the model learned image representations using vision transformer architecture (ViT-base), which analyzes images by dividing them into smaller patches to learn both local and global spatial relationships.
A second-stage pretraining combined self-supervised vision learning with report-guided text supervision, helping the model develop shared understanding between visual and semantic meaning for more generalizable MR image representations.4
This approach led to measurable gains in several research benchmarks, indicating benefits of weak supervision from radiology reports. Improvements were observed in cardiac disease classification (+0.05), prostate lesion classification (+0.02), brain age regression (+0.03), body region detection (+0.04), and contrast detection (+0.02 AUC).
The model achieved higher AUC values than baseline DINOv25 models (0.82 vs 0.70) in Alzheimer’s classification on the ADNI dataset6, indicating the contribution of domain-specific pretraining. The model also showed small AUC gains (+0.03) compared with other healthcare-related foundation models such as MedImageInsight7, suggesting complementary strengths.
For anomaly localization tasks, the model achieved +0.05 mIoU over DETR3D8 with a similar ViT decoder structure, indicating benefits from self-supervised initialization.
In segmentation tasks, when paired with a ResNet-based decoder, performance was comparable to leading end-to-end trained models such as nnUnet (0.86 on AMOS9; 0.85 internal pelvis organ segmentation; 0.92 vs 0.91 on ACDC heart chamber segmentation). The model reached a Dice score of 0.82 on ACDC and 0.77 on AMOS within one epoch, demonstrating efficient pretrained initialization.
These research results reflect progress toward faster AI development, improved interpretability, and consistent support in image review studies. Please see more details in our research paper.
- Concept only. This work is in early research phase and may never become a product. Not for sale. Any reported results are preliminary and subject to change. Not cleared or approved by the U.S. FDA or any other global regulator for commercial availability. All data points are internal to GE HealthCare and its academic collaborators. Results are from research use only and are not intended to represent clinical performance. ↩︎
- Data details on multi organ and data distribution can be found here: https://arxiv.org/pdf/2509.21249 ↩︎
- 3D Densenet: Huang, Gao, et al. “Densely connected convolutional networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. ↩︎
- For more details on pretraining refer to Section 4.4 in https://arxiv.org/pdf/2509.21249 ↩︎
- Dinov2: Oquab, Maxime, et al. “Dinov2: Learning robust visual features without supervision.” arXiv preprint arXiv:2304.07193 (2023). ↩︎
- ADNI: Petersen, Ronald Carl, et al. “Alzheimer’s disease Neuroimaging Initiative (ADNI) clinical characterization.” Neurology 74.3 (2010): 201-209. ↩︎
- MedimageInsight: Codella, Noel CF, et al. “Medimageinsight: An open-source embedding model for general domain medical imaging.” arXiv preprint arXiv:2410.06542 (2024). ↩︎
- Detr3D: Wang, Yue, et al. “Detr3d: 3d object detection from multi-view images via 3d-to-2d queries.” Conference on robot learning. PMLR, 2022. ↩︎
- AMOS: Ji, Yuanfeng, et al. “Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation.” Advances in neural information processing systems 35 (2022): 36722-36732. ↩︎



