SCAR: Semantic Continuity-Aware Retrieval for Efficient Context Expansion in RAG
2026-06-15 • Information Retrieval
Information RetrievalComputation and Language
AI summaryⓘ
The authors address a problem in retrieval-augmented generation where splitting documents into fixed chunks can cut important information in the middle, making it harder to find relevant parts. They propose SCAR, a method that smartly expands chunks based on how related neighboring parts are, while limiting extra content to avoid overload. Tested on various document types, SCAR keeps high recall of important info but uses fewer chunks than existing methods. It works well across different embedding models without needing adjustments and also reduces overall token use during generation without losing accuracy.
Retrieval-Augmented GenerationChunkingBoundary FragmentationInformation RetrievalEmbedding ModelsRecallToken OverheadSemantic RelevanceExpansion ThresholdBootstrap Testing
Authors
Nathanaël Langlois
Abstract
Fixed-length chunking in Retrieval-Augmented Generation (RAG) often leads to boundary fragmentation, where critical evidence is split across segments, degrading retrieval recall. While static windowing and parent retrieval improve recall, they introduce significant token overhead. We propose SCAR (Semantic Continuity-Aware Retrieval), an adaptive retrieval policy that selectively expands neighboring chunks by weighing query-neighbor relevance against a structural continuity penalty. SCAR uses a relative expansion threshold tied to each retrieved chunk's own query-relevance, yielding an approximately scale-invariant decision rule that transfers across embedding models without recalibration. Across four diverse corpora (RFC, GDPR, a 10-K report, and a Merger agreement; N=320 queries; 160 boundary-fragmented), SCAR achieves 92.8% recall on boundary-fragmented queries with only 7.84 chunks, a 22.9% reduction compared to static windowing (10.16 chunks). Paired bootstrap tests (B=10,000) confirm the chunk reduction is highly significant (p<0.0001, Cohen's d=-1.49, large effect), with a small recall difference (Cohen's d=-0.33). The policy transfers across three embedding models (text-embedding-3-large, BGE-large-en-v1.5, zembed-1) using the same single hyperparameter setting, and downstream RAGAS evaluation on the 10-K corpus confirms SCAR preserves generation faithfulness while reducing context tokens by 27.1%.