Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models

2026-05-08Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors study how to make vision-language models forget specific sensitive information better. Unlike previous methods that only adjust the language part and cause weird mistakes, their method called HFRU works on the vision part to thoroughly remove the unwanted knowledge. They use a two-step process with special rewards that encourage the model to forget properly without hallucinating objects. Tests on recognizing objects and faces show HFRU forgets over 98% of the targeted information while keeping other knowledge intact and making fewer errors.

vision-language modelsmachine unlearningvision encoderlanguage decoderreinforcement learningobject hallucinationsemantic forgettingGRPO optimizationalignment disruptionface identity recognition
Authors
Kaidi Jia, Yujie Lin, Chengyi Yang, Jiayao Ma, Jinsong Su
Abstract
Vision-language models (VLMs) raise growing concerns about privacy, copyright, and bias, motivating machine unlearning to remove sensitive knowledge. However, existing methods primarily fine-tune the language decoder, leading to superficial forgetting that fails to erase underlying visual representations and often introduces object hallucination. We propose HFRU, a reinforcement unlearning framework that operates on the vision encoder for deep semantic removal. Our two-stage approach combines alignment disruption with GRPO-based optimization using a composite reward, including an abstraction reward that encourages semantically valid substitutions and mitigates hallucinations. Experiments on object recognition and face identity tasks show that HFRU achieves over 98% forgetting and retention performance, while introducing negligible object hallucination, significantly outperforming prior methods.Our code and implementation details are available at https://github.com/XMUDeepLIT/HFRU.