Adversarial Orthogonal Disentanglement for LVLM Hallucination Mitigation

2026-05-25 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors address the problem of hallucinations in large vision-language models, where the models make up details that don't match the visual input. They introduce Adversarial Orthogonal Disentanglement (AOD), a method that learns to separate hallucination signals from useful information by using a special training setup. This allows the models to reduce hallucinations without extra training and keeps their overall performance strong. Their tests show AOD consistently improves accuracy on hallucination detection and is effective across different datasets.

Vision-Language ModelsHallucinationAdversarial LearningOrthogonal DisentanglementGradient Reversal LayerContrastive DecodingMultimodal UnderstandingLatent RepresentationsMinimax ObjectiveModel Robustness

Authors

Ruoxi Cheng, Haoxuan Ma, Zhengfei Hai, Yiyan Huang, Ranjie Duan, Tianle Zhang, Xu Yang, Ziyi Ye, Xingjun Ma

Abstract

Large Vision-Language Models (LVLMs) have advanced multimodal understanding, yet their reliability is limited by hallucination, where generated content conflicts with visual facts. Existing mitigation methods either rely on costly external interventions, such as instruction tuning and retrieval, or use internal mechanisms that remain limited by flawed attention weights and entangled hidden representations. We propose Adversarial Orthogonal Disentanglement (AOD), a latent geometric framework for mitigating LVLM hallucinations. AOD learns a hallucination-related direction through a minimax objective: a classifier concentrates hallucination signals into the projected component, while an adversary removes them from the orthogonal residual space via a Gradient Reversal Layer. The learned direction enables a training-free dual-forward-pass contrastive decoding strategy that suppresses hallucinations while preserving general capabilities. Experiments on three LVLMs across four hallucination and four utility benchmarks show that AOD consistently outperforms strong baselines. It improves POPE accuracy by over 6\% on average, boosts AMBER by 6\%, and maintains strong performance on utility tasks such as MMMU. Further analysis shows robust transfer across datasets, suggesting that AOD captures general hallucination-related biases rather than dataset-specific artifacts. Our source code and datasets are available at https://github.com/Hunter-Wrynn/AOD.

View PDFOpen arXiv