Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction

2026-04-09Cryptography and Security

Cryptography and Security
AI summary

The authors studied how attackers could secretly manipulate models that predict where people look on a screen during visual searches, which is important for certain mobile applications. They showed that simple attacks are easy to spot, so they created smarter ones that change predicted eye movements in subtle ways, making them hard to detect. These attacks work with different triggers and remain effective even after common defenses and when deployed on real smartphones. Their findings highlight a new security risk for eye-tracking systems on mobile devices.

scanpath predictionvisual searchbackdoor attacksvision-language modelsGazeFormerCOCO-Search18foveated renderingtrigger modalitiesmodel poisoningquantization
Authors
Diana Romero, Mutahar Ali, Momin Ahmad Khan, Habiba Farrukh, Fatima Anwar, Salma Elmalaki
Abstract
Scanpath prediction models forecast the sequence and timing of human fixations during visual search, driving foveated rendering and attention-based interaction in mobile systems where their integrity is a first-class security concern. We present the first study of backdoor attacks against VLM-based scanpath prediction, evaluated on GazeFormer and COCO-Search18. We show that naive fixed-path attacks, while effective, create detectable clustering in the continuous output space. To overcome this, we design two variable-output attacks: an input-aware spatial attack that redirects predicted fixations toward an attacker-chosen target object, and a scanpath duration attack that inflates fixation durations to delay visual search completion. Both attacks condition their output on the input scene, producing diverse and plausible scanpaths that evade cluster-based detection. We evaluate across three trigger modalities (visual, textual, and multimodal), multiple poisoning ratios, and five post-training defenses, finding that no defense simultaneously suppresses the attacks and preserves clean performance across all configurations. We further demonstrate that backdoor behavior survives quantization and deployment on both flagship and legacy commodity smartphones, confirming practical threat viability for edge-deployed gaze-driven systems.