Focus on what matters: Enhancing medical vision-language models with automatic attention alignment tuning

Aofei Chang, Le Huang, Alex James Boyd, Parminder Bhatia, Taha Kass-Hout, Cao Xiao, Fenglong Ma

July 14, 2025

1 minute read

Medical Large Vision-Language Models (Med-LVLMs) often exhibit suboptimal attention distribution on visual inputs, leading to hallucinated or inaccurate outputs. Existing mitigation methods primarily rely on inference-time interventions, which are limited in attention adaptation or require additional supervision. To address this, we propose A3TUNE, a novel fine-tuning framework for Automatic Attention Alignment Tuning. A3TUNE leverages zero-shot weak labels from SAM, refines them into prompt-aware labels using BioMedCLIP, and then selectively modifies visually-critical attention heads to improve alignment while minimizing interference. Additionally, we introduce a A3MOE module, enabling adaptive parameter selection for attention tuning across diverse prompts and images. Extensive experiments on medical VQA and report generation benchmarks show that A3TUNE outperforms state-of-the-art baselines, achieving enhanced attention distributions and performance in Med-LVLMs.

Download the paper.

Concept only. This work is in concept phase and may never become a product. Not for Sale. Any reported results are preliminary and subject to change. Not cleared or approved by the U.S. FDA or any other global regulator for commercial availability.

Focus on what matters: Enhancing medical vision-language models with automatic attention alignment tuning

Share

Focus Area

Topics

Focus on what matters: Enhancing medical vision-language models with automatic attention alignment tuning

Share

Focus Area

Topics

Related Content

Six Flagship Innovations Enabled by GE HealthCare’s Advanced Technology Group That Are Furthering the State-of-the-Art in Healthcare

To defeat the silent pandemic killing millions around the world, we have to get comfortable with making nonbinary choices

How CareIntellect for Perinatal leverages the power of cloud to enable precision, unity, and consistency in maternal care