I reviewed 160 healthcare AI papers: Here’s how to tell which of your AI Investments is working

Deloitte surveyed 180 health system executives last year and found that 51% had not measured return on investment from AI. Only 3% reported significant returns.[1] Thirty percent said they were running generative AI at scale in selected areas. Just 2% had deployed it across the enterprise.

That disconnect between procurement and measurement sits at the center of the healthcare AI conversation. The main challenge is no longer the sophistication of the algorithms or the promise of the models. It is that many organizations are purchasing AI tools without a dependable way to determine whether those tools are producing demonstrable clinical, operational, or financial value.

Over the past six months, I have reviewed more than 160 healthcare AI papers to understand where the evidence is well supported, where it remains incomplete, and where the field may overstate its progress. The broad conclusion is not that AI fails in healthcare. In some areas, it is delivering measurable value, and the supporting evidence is becoming more robust. The more important distinction is between treating AI as a product to buy and treating it as an organizational capability to build. That distinction often influences whether an investment generates sustained returns.

Where the evidence is strongest

Breast cancer screening currently offers one of the clearest examples of clinical AI associated with measurable benefit. Two large prospective studies published in 2025 evaluated AI-supported mammography in real screening populations rather than retrospective datasets or internal validation cohorts.

The PRAIM study, conducted across 12 German screening sites, enrolled 463,094 women and found that AI-supported reading detected 17.6% more cancers than standard double reading, with no increase in recall rates.[2] The ASSURE study, spanning 109 imaging centers in the United States and including 579,583 women, reported a 21.6% increase in cancer detection using an AI-driven workflow for digital breast tomosynthesis.[3]

These are not small pilot programs or isolated institutional reports. They represent prospective, population-scale evidence from two countries and more than one million women. In practical terms, this illustrates how clinical AI may contribute to measurable benefit when deployed in real-world settings, with results observed at scale.

The broader radiology market is also beginning to show where economic value can emerge. A 2026 systematic review in Radiology: Artificial Intelligence screened 1,879 studies on the economics of AI in radiology and found only 21 that reported quantifiable economic outcomes.[4] That amounts to roughly 1% of the literature. The limited number is instructive in itself. Even so, those 21 studies suggested a consistent pattern. Economic value was most apparent where diagnostic complexity was high, scan volumes were substantial, and pricing was based on fixed-cost licensing models.

Reimbursement is also beginning to shift. CMS finalized CPT Category I codes for AI-assisted imaging in 2026, creating a more established reimbursement pathway. This development may support a clearer route to economic sustainability than in prior years.

The context adds additional perspective. An estimated 4.2 billion imaging exams are performed globally each year. More than 81% of healthcare systems report radiology technologist shortages. Nearly two-thirds of the global population still lacks access to X-ray and ultrasound.[5] In settings where the alternative is delayed interpretation or no interpretation at all, the threshold for what constitutes value may change materially.

Where systems let capable tools underperform

Yet a capable model does not guarantee a better outcome.

A research team at the University of Virginia conducted a randomized controlled trial of an AI-based clinical deterioration system. The model predicted deterioration accurately. Its predictions were displayed at the bedside. Outcomes did not improve.[6]

The reason was straightforward and revealing. There was no required response protocol attached to the alert. The system identified the appropriate patients, but the care pathway had not been redesigned to translate that signal into action.

This pattern appears repeatedly across healthcare AI. The technical tool performs as intended, but the surrounding system does not change sufficiently to capture the potential benefit.

Ambient AI scribes are one current example. They have been adopted by 62.6% of Epic EHR hospitals, making them one of the fastest-moving clinical technologies in recent memory.[7] Adoption at the institutional level, however, does not always translate into sustained value at the clinician level. In one large study, fewer than 30% of physicians who enabled a scribe used it for more than 100 encounters over a ten-week period.[8]

A 2025 analysis described a “triple tax” associated with ambient documentation tools. First, supervising AI-generated output may impose cognitive load. Second, editing may shift to later in the day, which can increase after-hours EHR burden. Third, reviewing AI notes introduces a new category of work that did not previously exist. The tool may save time in one moment while creating different forms of overhead elsewhere.

The challenge becomes even more pronounced after deployment. The Royal College of Radiologists issued guidance on post-deployment monitoring following documented cases in which a routine mammography AI software update increased recall rates, while a lung nodule detection tool showed substantial disagreement between software versions.[9] These are not marginal implementation details. They indicate that performance can change materially after go-live.

Too many health systems still treat deployment as the final milestone. In reality, it is the beginning of a new operational phase that requires active monitoring, version control, governance, and clinical oversight.

Most AI demonstrations focus on what the tool can do in isolation. Far less attention is given to what the hospital must build around the tool for it to function reliably in practice.

The costs that rarely make the budget

Economic modelling in healthcare AI also remains incomplete. A systematic review of cost-effectiveness studies found a recurring pattern: economic models often undercount indirect costs such as training, workflow redesign, change management, and infrastructure, while relying on static assumptions that may not capture learning curves over time.[10]

The implication is important. The benefits may be real, but they are frequently presented more favorably than the evidence supports. At the same time, the costs are often understated in a systematic way.

Among those costs, deskilling may be underrecognized.

In a multicenter Polish study involving more than 23,000 colonoscopy procedures, adenoma detection rates fell from 28.4% to 22.4% when endoscopists returned to non-AI procedures after routine exposure to AI assistance.[11] In computational pathology, 28 pathologists abandoned 7% of initially correct diagnoses when they were exposed to incorrect AI suggestions under time pressure. In breast imaging, erroneous AI prompts increased false-positive recall rates by 12% to 15%, including among experienced readers.

This is not a new phenomenon in high-stakes environments. Aviation encountered a similar issue decades ago when reliance on autopilot contributed to erosion in manual flying skills. The industry responded with proficiency requirements, automation interruptions, and simulation-based training to preserve baseline competence.[12]

Healthcare has not yet developed an equivalent discipline at scale. The next phase may be even more complex. It is one thing for experienced clinicians to lose some edge through overreliance on automation. It is another for trainees entering AI-supported environments not to fully develop that foundational expertise. The issue extends beyond deskilling to the risk of incomplete skill acquisition.

That concern lands in a workforce already under significant strain. A 2023 GE HealthCare global study of 2,000 hospital-based clinicians found that 42% were actively considering leaving the profession.[13] In that context, any technology strategy that reduces professional confidence or increases dependence without appropriate safeguards may introduce longer-term organizational risk.

The question receiving too little attention

Most discussions about healthcare AI still center on clinical applications such as diagnostic support, documentation, or triage. Those categories matter, but they are not necessarily where the full economic opportunity sits.

A 2026 analysis estimated that operational AI could generate $200 billion to $360 billion in annual savings across the US healthcare system through areas such as scheduling, supply chain management, patient flow, and follow-up optimization.[14] This potential may exceed the size of the diagnostic AI market currently receiving the greatest attention.

At the same time, governance remains uneven. Only 61% of US hospitals evaluate AI models before deployment. Forty-seven percent perform no model accuracy checks. More than half have no bias evaluation in place.[15] These findings suggest gaps in institutional readiness.

Adoption also appears to follow financial capacity more closely than clinical need. Ambient AI adoption correlates with hospital operating margins. It stands at 70.2% in nonprofit hospitals and 28.8% in for-profit institutions. Metropolitan hospitals are ahead of non-metropolitan facilities.[7] The organizations operating under the greatest pressure are often the least able to invest in tools that might alleviate it.

This is likely to become more consequential as regulation matures. Under the EU AI Act, many healthcare AI applications fall into the high-risk category, with non-compliance penalties reaching up to €15 million or 3% of global annual turnover. Prohibited practices face higher tiers. Meanwhile, the FDA had authorized 1,430 AI-enabled medical devices as of March 2026, with 76% focused on radiology.[16] Regulatory direction is moving toward lifecycle management, pre-determined change control plans, and stricter transparency expectations.

The legal and regulatory framework is evolving quickly. Governance within many health systems may not be keeping pace.

The real question

NVIDIA’s 2025 survey found that 83% of healthcare leaders believe AI will reshape the industry within three to five years.[17] In the same survey, 68% said their organizations are not investing enough. These two views can coexist.

AI is already showing that it can deliver clinical value in specific settings. Breast cancer screening is one example, supported by prospective evidence at population scale. Radiology economics is another, particularly where the deployment conditions are well understood. Clinical documentation tools offer utility, though they also introduce risks and workload trade-offs that many organizations are still not measuring comprehensively. Operational AI may represent a significant opportunity, yet it remains relatively underexplored compared with clinical use cases.

So the essential question is not whether AI works.

The more consequential question is whether an organization has built the measurement discipline, governance structure, workflow integration, and workforce strategy required to distinguish between AI that is creating value and AI that is adding cost without measurable return.

Many organizations are still developing these capabilities.

That is where the work continues.

References

[1] Deloitte. 2026 Global Health Care Outlook. Deloitte Insights, 2025.

[2] Eisemann, N., Bunk, S., Mukama, T. et al. Nationwide real-world implementation of AI for cancer detection in population-based mammography screening. Nature Medicine 31, 917-924 (2025).

[3] Louis, L.D., Wakelin, E.A., McCabe, M.P. et al. Equitable impact of an AI-driven breast cancer screening workflow in real-world US-wide deployment. Nature Health 1, 58-66 (2026).

[4] Molwitz, I., Ristow, I., Erley, J. et al. Economic Value of AI in Radiology: A Systematic Review. Radiology: Artificial Intelligence (2026).

[5] GE HealthCare / NVIDIA. Autonomous Imaging Collaboration. GE HealthCare, March 2025.

[6] Keim-Malpass, J., Ratcliffe, S.J., Clark, M.T. et al. A randomized controlled trial of artificial intelligence-based analytics for clinical deterioration. Scientific Reports 16, 7345 (2026).

[7] Yang, F., Graetz, I. Ambient AI tool adoption in US hospitals and associated factors. American Journal of Managed Care 32(1): e25-e30 (2026).

[8] Goodson, D.A., Garcia, B., Hogarth, M., Tu, S.-P. Artificial intelligence and physician burnout: A productivity paradox. Learning Health Systems (2025).

[9] Royal College of Radiologists. Post-deployment monitoring and safety reporting of AI medical imaging devices in clinical practice. RCR, 2026.

[10] El Arab, R.A., Al Moosa, O.A. Systematic review of cost effectiveness and budget impact of artificial intelligence in healthcare. npj Digital Medicine 8, 548 (2025).

[11] Heudel, P.E., Crochet, H., Filori, Q., Bachelot, T., Blay, J.Y. Artificial intelligence in medicine: a scoping review of the risk of deskilling and loss of expertise among physicians. ESMO Real World Data and Digital Oncology (2026).

[12] Ong, A.Y., Merle, D.A., Pollreisz, A. et al. Flight rules for clinical AI: lessons from aviation for human-AI collaboration in medicine. npj Digital Medicine 9, 201 (2026).

[13] GE HealthCare. Reimagining Better Health: A Global Study of Clinicians and Patients. GE HealthCare, May 2023.

[14] Wong, A., Nallamothu, B.K., Longhurst, C.A. et al. Leveraging AI to reduce operational healthcare costs: lessons from other industries. npj Health Systems 3, 9 (2026).

[15] Hwang, Y.M., Ng, M.Y., Pillai, M. et al. The landscape of AI implementation in US hospitals. Nature Health 1, 99-112 (2026).

[16] U.S. Food and Drug Administration. Artificial Intelligence-Enabled Medical Devices.

[17] NVIDIA Corporation. State of AI in Healthcare and Life Sciences: 2025 Trends.

I reviewed 160 healthcare AI papers: Here’s how to tell which of your AI Investments is working

References

Share

Focus Area

Topics

I reviewed 160 healthcare AI papers: Here’s how to tell which of your AI Investments is working

References

Share

Focus Area

Topics

Related Content

3 things to know about GE HealthCare’s Technology & Innovation Center (HTIC)

What matters is impact: 3 Questions with GE HealthCare’s Chief Scientist Dr. Tom Foo

Collaboration is how we get from AI promise to healthcare practice