Navig-AI-tion: Navigation by Contextual AI and Spatial Audio

2026-03-13 • Human-Computer Interaction

Human-Computer Interaction

AI summaryⓘ

The authors created a walking navigation system that uses sounds from specific directions to help people know which way to turn. It works by recognizing landmarks around the user and giving instructions based on them, instead of just using basic directions like 'turn left.' When users face the wrong way, the system plays a sound from the correct direction to guide them. In tests, this approach helped people make fewer mistakes compared to both regular audio directions and a system without these special sounds.

Vision Language Modelspatial audiolandmarksnavigation instructionsroute deviationsaudio-only navigationdirectional cuesreal-time correction

Authors

Mathias N. Lystbæk, Haley Adams, Ranjith Kagathi Ananda, Eric J Gonzalez, Luca Ballan, Qiuxuan Wu, Andrea Colaço, Peter Tan, Mar Gonzalez-Franco

Abstract

Audio-only walking navigation can leave users disoriented, relying on vague cardinal directions and lacking real-time environmental context, leading to frequent errors. To address this, we present a novel system that integrates a Vision Language Model (VLM) with a spatial audio cue. Our system extracts environmental landmarks to anchor navigation instructions and, crucially, provides a directional spatial audio signal when the user faces the wrong direction, indicating the precise turn direction. In a user study (n=12), the spatial audio cue with VLM reduced route deviations compared to both VLM-only and Google Maps (audio-only) baseline systems. Users reported that the spatial audio cue effectively supported orientation and that landmark-anchored instructions provided a better navigation experience over audio-only Google Maps. This work serves as an initial look at the utility of future audio-only navigation systems for incorporating directional cues, especially real-time corrective spatial audio.

View PDFOpen arXiv