Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction
2026-04-09 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial Intelligence
AI summaryⓘ
The authors studied why large vision-language models sometimes make mistakes by describing images inaccurately, called hallucinations. They found that previous fixes often changed how the models generate text, making it shorter or different. To solve this, the authors created MESA, a method that carefully adjusts only the parts of the model causing hallucinations, without affecting how it normally generates text. Their tests showed MESA lowers hallucinations better than earlier methods while keeping the model's natural language behavior.
Large Vision-Language ModelsHallucinationsLatent Space SteeringCross-modal TasksToken DistributionControlled Latent InterventionGeneration BehaviorPlug-and-play FrameworkDiscriminative BenchmarksGenerative Benchmarks
Authors
Yuanhong Zhang, Zhaoyang Wang, Xin Zhang, Weizhan Zhang, Joey Tianyi Zhou
Abstract
Large Vision-Language Models (LVLMs) have achieved remarkable success across cross-modal tasks but remain hindered by hallucinations, producing textual outputs inconsistent with visual content. Existing methods mitigate hallucinations but often alter generation behavior, resulting in shorter outputs and shifted token distributions, especially in latent space steering approaches. We identify that this issue stems from entangled steering signals, where suppressing hallucinations inadvertently disrupts the model's intrinsic generation behavior. To address this, we propose MESA, an effective plug-and-play framework that performs controlled and selective latent intervention for hallucination mitigation. Specifically, MESA targets hallucination-relevant responses while preserving the model's original token distribution, enabling effective hallucination reduction without compromising generation behavior. Extensive experiments across diverse generative and discriminative benchmarks demonstrate that MESA consistently reduces hallucinations while better preserving generation behavior, outperforming prior methods across multiple LVLM families.