The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing
2026-06-01 • Digital Libraries
Digital LibrariesMachine Learning
AI summaryⓘ
The authors show that AI language models create fake expert names that often appear together in predictable groups, and these groupings differ by model type and version. They found many false academic records online with these nonexistent authors, some deliberately backdated and published in bulk with real identifiers. These fake records can be tracked to specific AI models and their release times. The study reveals how AI-generated fiction can unintentionally create misleading scholarly data.
large language modelsAI-generated textfictitious authorsdata provenanceDOIDataCiteacademic publishingghost authorshipmodel fingerprintingZenodo repository
Authors
Michał Brzozowski, Neo Christopher Chung
Abstract
These names do not exist. Elena Vasquez and Marcus Chen have appeared as volcano experts, astronauts, thriller protagonists, podcast hosts, and academic co-authors across hundreds of independently produced AI-generated documents, never having lived. We show that large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles, pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced. We document a downstream consequence at scale. On Zenodo, a CERN-operated repository that mints real DataCite DOIs, we identify 1,655 ghost-authored records claiming nonexistent journals with fabricated publication dates: server-side DataCite timestamps prove deliberate backdating, and 991 records were registered in a single month; these carry real DOIs registered in DataCite, making them harvestable by any scholarly aggregator that ingests DOI metadata. Ghost names additionally appear on ResearchGate forming synthetic research groups with collaborators drawn from multiple model families; publication dates on these records provide a reliable temporal proxy for model deployment windows.