Giving AI a Headache: Acoustic Adversarial Attacks to Computer Vision Applications
2026-06-12 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial Intelligence
AI summaryⓘ
The authors studied how sound waves can shake a camera and confuse AI systems that recognize objects in images. They found that using lower-pitched sounds (audible frequencies) can still cause the camera's image-stabilizing system to create errors. These errors make AI models, like YOLO, misidentify or miss objects in the pictures. Their experiments show this type of attack can work at longer distances than previous high-pitched sound attacks. The authors also explored which camera and image features make AI systems more likely to be tricked, which could help make future defenses.
Artificial IntelligenceComputer VisionAcoustic AttacksCamera StabilizationObject DetectionYOLOUltrasonic FrequenciesAudible FrequenciesImage ArtifactsResonance
Authors
Nicole Villavicencio-Garduño, Maksim Ekin Eren, Milo Prisbrey, Ben Migliori, Michael Teti
Abstract
Artificial Intelligence (AI) is increasingly used to automate a variety of real-world computer vision (CV) applications, such as autonomous vehicle control, facial recognition, and security cameras. Recent research has shown that acoustic vibration can induce real physical motion in cameras, interfering with their internal stabilization mechanisms. Because the motion falls outside the conditions the stabilization system was designed to handle, the system introduces artifacts into the frame, causing AI-based CV models to misclassify, miss targets, or hallucinate objects. Previous work used ultrasonic frequencies (>20 kHz) to perform short-range attacks, which limits them to short distances due to the attenuation exhibited by high frequencies. In this work, we investigate acoustic attacks using lower frequencies in the audible range (<20 kHz), and we further expand our analysis to include how various image and object features are affected by the attacks. Specifically, we performed physical experiments to demonstrate the viability of our attacks on an off-the-shelf object detection model (YOLO11) by resonating a commercially available camera with various frequencies. Based on our results, we provide insights into several factors that make an AI CV system more vulnerable to these attacks, which could help inform the development of future mitigation strategies.