Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders
2026-06-05 • Sound
SoundArtificial Intelligence
AI summaryⓘ
The authors studied a speech recognition model called Whisper, which sometimes makes errors by transcribing sounds that aren't actually speech. They looked inside the model's internal data to find signs of these mistakes and found that certain parts of the model's processing clearly show when hallucinations happen. Using this knowledge, they created methods to reduce these errors significantly without hurting the model's performance much. Their best approach cut the error rate by a lot, making Whisper more reliable on non-speech sounds.
Whisperautomatic speech recognition (ASR)hallucinationsencoder activationsSparse AutoEncoder (SAE)latent spacelinear separabilityerror ratefine-tuning
Authors
Georgii Aparin, Vadim Popov, Tasnima Sadekova, Assel Yermekova
Abstract
Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generated for non-speech audio entirely disconnected from the input. We investigate whether hallucinations can be detected and mitigated through Whisper's internal representations. We extract audio encoder activations and evaluate two representation spaces: raw Whisper activations and Sparse AutoEncoder (SAE) latents. We show that both spaces encode linearly separable hallucination-related information, with discriminative power concentrated in a sparse feature subset and increasing toward deeper encoder layers. We propose two steering strategies: activation-space steering and SAE latent-space steering. SAE-based steering reduces hallucination rate from 72.63% to 14.11% for Whisper small and from 86.88% to 27.33% for Whisper large-v3 on the full non-speech test set, with small WER degradation on speech data, approaching the performance of fine-tuning-based methods.