Regime-Aware Peer Specialization for Robust RAG under Heterogeneous Knowledge Conflicts

2026-06-29 • Computation and Language

Computation and Language

AI summaryⓘ

The authors study how language models use extra information from outside sources to improve responses, but sometimes this outside context conflicts with what the model already knows. To handle these conflicts better, they create a method called RAPS-DA that splits conflicting cases into three types and trains specialized versions of the model for each type. Their approach also carefully picks which parts of the input to focus on during training, helping the model learn from tricky examples more effectively. They show that this method improves performance without needing a bigger teacher model or extra work during use.

Retrieval-augmented generationLanguage modelsConflict regimesPeer specializationReverse KL supervisionToken-level filteringOn-policy trainingOut-of-distribution benchmarksFine-tuningModel grounding

Authors

Bo Wang, Heyan Huang, Yaolin Li, Yanghao Zhou, Jiahao Teng, Ziyi Yang, Ge Shi, Chong Feng

Abstract

Retrieval-augmented generation (RAG) improves language models by grounding generation in external context. However, it can be fragile when the retrieved context conflicts with the model's parametric knowledge. Such conflicts span a reliability spectrum, ranging from reliable and partially reliable evidence to adversarial context. Existing remedies often handle such heterogeneous conflicts with regime-agnostic supervision, which can conflate incompatible learning signals across reliability regimes. To disentangle these signals, we propose RAPS-DA, a regime-aware peer specialization framework that addresses conflict at two complementary granularities. At the sample level, conflicts are divided into three regimes, including Grounding, Arbitration, and Resistance, with one same-scale peer specialist trained per regime from a shared base model. Each sample is then hard-routed to its regime-matched peer for on-policy reverse-KL supervision. At the token level, a dual-layer selector uses inter-teacher disagreement, student-teacher divergence, and student entropy to filter uninformative or unstable tokens, upweight confidently misaligned ones, and gradually focus supervision on high-conflict tokens as the student matures. Gains stem from specialization at a fixed model scale, not from a stronger teacher, and the peer specialists exist only during training, so the deployed student requires no regime labels or peer access. Experiments on five conflict scenarios and two out-of-distribution benchmarks show RAPS-DA surpasses all prompting, decoding, fine-tuning, RL, and single-teacher baselines.

View PDFOpen arXiv