SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models

2026-06-29Computation and Language

Computation and Language
AI summary

The authors present SrDetection, a new method to find when code language models have seen test data before, which can wrongly make their scores look better. Instead of needing special access or tricky rules, SrDetection creates similar versions of test examples and checks if the model performs unusually better on the original ones. They tested this method in different conditions and it worked much better than previous approaches. Using SrDetection, the authors found unique patterns of leaked data in many popular code models and benchmarks.

Code Large Language ModelsData LeakageBenchmarkingGray-box SettingBlack-box SettingModel LogitsSemantic EquivalenceLeakage DetectionF1 ScorePre-training Data
Authors
Shuaimin Li, Liyang Fan, Zeyang Li, Zhuoyue Wan, Yufang Lin, Shiwen Ni, Feiteng Fang, Hamid Alinejad-Rokny, Yuanfeng Song, Kun Jing, Chen Jason Zhang, Min Yang
Abstract
Evaluating code large language models (Code LLMs) requires reliable detection of data leakage, where benchmark performance is artificially inflated by exposure to benchmark data during pre-training. Existing approaches either assume access to proprietary training corpora, rely on brittle heuristics such as timestamp filtering, or use external reference sets with manually tuned, non-generalizable thresholds. To address these limitations, we introduce \textbf{SrDetection}, a unified \textbf{s}elf-\textbf{r}eferential leakage detection framework for both gray-box (access to model logits) and black-box (access to model outputs) settings. SrDetection generates semantically equivalent variants of a benchmark sample and detects leakage by contrasting the model's behavior on the original versus its variants, flagging cases where the original is disproportionately easier for the model. We further design a controlled leakage detection testbed and evaluate SrDetection in this environment. Across different models and training stages, SrDetection improves average F1 by 21.52 points in the gray-box setting and 14.46 points in the black-box setting over strong baselines, demonstrating robust, threshold-independent leakage detection. Finally, a gray-box study of 15 widely used Code LLMs on four popular benchmarks reveals benchmark-specific leakage patterns beyond prior overlap-based analyses\footnote{\footnotesize Source code and data are available at https://github.com/SMinL/SrDetectionCode