Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers
2026-04-07 • Information Retrieval
Information Retrieval
AI summaryⓘ
The authors found that neural retrievers seem to prefer passages generated by AI language models (LLMs) over human-written ones not because of a flaw in the models themselves, but because of biases in the training data. These biases come from small differences like how fluent or specific the texts are, which affect how the models learn. They show that retrievers learn these biases during training and suggest two ways to fix this: making the training data less biased and adjusting the AI-generated text features. Their work helps reduce concerns about fairness when using AI-generated text in search systems.
neural retrieverssource biaslarge language models (LLMs)contrastive learningembedding spacetraining data artifactsfluencyterm specificityinformation retrievalbias mitigation
Authors
Wei Huang, Keping Bi, Yinqiong Cai, Wei Chen, Jiafeng Guo, Xueqi Cheng
Abstract
Recent studies show that neural retrievers often display source bias, favoring passages generated by LLMs over human-written ones, even when both are semantically similar. This bias has been considered an inherent flaw of retrievers, raising concerns about the fairness and reliability of modern information access systems. Our work challenges this view by showing that source bias stems from supervision in retrieval datasets rather than the models themselves. We found that non-semantic differences, like fluency and term specificity, exist between positive and negative documents, mirroring differences between LLM and human texts. In the embedding space, the bias direction from negatives to positives aligns with the direction from human-written to LLM-generated texts. We theoretically show that retrievers inevitably absorb the artifact imbalances in the training data during contrastive learning, which leads to their preferences over LLM texts. To mitigate the effect, we propose two approaches: 1) reducing artifact differences in training data and 2) adjusting LLM text vectors by removing their projection on the bias vector. Both methods substantially reduce source bias. We hope our study alleviates some concerns regarding LLM-generated texts in information access systems.