Have You Ever Seen Them? Entity-level Membership Inference through Interrogating Large Language Models

2026-06-22Computation and Language

Computation and Language
AI summary

The authors look at how large language models (LLMs) might reveal if they have learned information about specific real-world entities, similar to how humans remember facts about people or things from different conversations. Instead of checking if exact text was memorized, they examine if the model shows it knows about an entity through scattered mentions. They propose a way to test this by asking the model questions using limited clues and analyzing its responses, even when only the generated text is visible. Their experiments show this method works well to find out if the model was trained on information related to certain people.

Large Language ModelsPrivacy LeakageMembership InferenceEntity-Level InferenceBlack-Box SettingPrompt EngineeringSemantic AnalysisBalanced AccuracyArea Under CurveTraining Data Exposure
Authors
Yiran Zhu, Ziqi Yang
Abstract
Large Language Models (LLMs) raise growing concerns about privacy leakage and copyright compliance. Membership inference is a key tool for assessing such risks, but existing studies mainly focus on whether specific samples or sample-based data units are used for training. We argue that LLMs exhibit a human-memory-like behavior: an LLM may not memorize a specific sample verbatim, yet it can accumulate and reveal knowledge about a real-world entity from scattered mentions. This analogy motivates us to examine whether an LLM can be interrogated like a human interviewee to reveal its exposure to entity-related information. Motivated by this question, we propose entity-level membership inference, which determines whether information related to a target entity is used in LLM training. We study this task in the practical label-only black-box setting, where only generated texts are observable. We formalize the task under clue, input, and model constraints, establish the necessary and sufficient conditions for its feasibility, and instantiate five interrogation strategies based on this formalization. The strategies use limited entity clues to construct prompts, elicit entity-related responses, and infer membership from semantic features among the generated texts. We construct entity-level datasets and adapt state-of-the-art sample-level label-only methods to the entity-level setting as baselines. Experiments on person entities show that our methods achieve AUC up to 0.97 and bring gains of 6.0%--17.5% in Balanced Accuracy over the best adapted baseline.