Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

2026-06-11Computation and Language

Computation and Language
AI summary

The authors discuss how large language models (LLMs) can be influenced by specific training examples, especially when trying to understand why a model might produce unwanted outputs like toxic language. Existing methods called influence functions help identify which training samples affect the model's behavior, but they are often slow and require a lot of storage. To address this, the authors propose a new method named Influcoder that makes this process faster and more efficient, allowing data attribution on very large datasets. This can help in better curating and understanding training data for LLMs.

Large Language ModelsData AttributionInfluence FunctionsTraining DataModel ConditioningToxicity in AIDataset FilteringEfficiency in MLInfluence-based MethodsScalability
Authors
Dimitri Kachler, Damien Sileo, Pascal Denis
Abstract
With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how individual samples in a training dataset can precondition a model to generate certain outputs. As an example, one might be interested in which samples in the data could be the source of toxic behavior after training the LLM. Many methods quantify this conditioning through the paradigm of influence functions. While methods of this family are effective in its function, they lack the necessary processing speed and storage compactness to be practically implemented on large datasets. We propose a method, Influcoder, as a quick and cost-effective approach to influence-based Data Attribution at scale.