RA-LWLM: Retrieval-Augmented In-Context Localization with Wireless Foundation Models

2026-06-01Artificial Intelligence

Artificial Intelligence
AI summary

The authors address wireless localization challenges in 6G networks, where traditional methods struggle with complex environments and learning-based methods need costly retraining. They propose RA-LWLM, a system that separates scene-specific details into a fingerprint database, allowing the core model to stay fixed and adapt to new scenes without retraining. Their method uses a retrieval step to find relevant scene data and a transformer module to combine this information for accurate position predictions. Experiments show RA-LWLM performs well across different environments and base station setups, making it a promising approach for flexible 6G localization.

Wireless Localization6G NetworksMultipath PropagationNon-Line-of-SightChannel State InformationRetrieval-Augmented LearningTransformerIn-Context LearningFingerprint DatabaseRay Tracing
Authors
Guangjin Pan, Hui Chen, Hei Victor Cheng, Henk Wymeersch
Abstract
Wireless localization is a fundamental capability of sixth-generation (6G) networks. Conventional model-based methods require accurate modeling of the propagation environment and degrade in complex multipath and non-line-of-sight scenarios, while learning-based methods couple model parameters tightly to the training scene, requiring costly retraining whenever the base station (BS) configuration or propagation environment changes. In this paper, we propose RA-LWLM, a retrieval-augmented in-context localization framework that achieves training-free cross-scene adaptation by externalizing scene-specific information into a per-scene fingerprint database rather than encoding it in model weights. The framework consists of three components: a frozen wireless foundation model (FM) encoder that maps raw channel state information into a scene-agnostic representation; a retrieval module that selects the most informative references from the per-scene database via similarity search in the representation space; and a transformer-based in-context learning (ICL) module that fuses the query with the retrieved references to predict the user equipment (UE) position. To accommodate varying retrieval quality and propagation complexity across queries, the ICL module adopts a mixture-of-experts design in which experts specialize in different context sizes and are softly combined by a learnable selector. Extensive ray-tracing-based experiments across heterogeneous scenes with diverse BS configurations show that RA-LWLM achieves nearly identical accuracy on seen and unseen scenes without any per-scene retraining, substantially outperforming end-to-end and FM-based baselines. These results validate the proposed retrieval-augmented in-context paradigm as a scalable solution for cross-scene localization in 6G networks.