Argument Collapse: LLMs Flatten Long-Form Public Debate
2026-06-01 • Computation and Language
Computation and LanguageArtificial Intelligence
AI summaryⓘ
The authors studied how essays written by large language models (LLMs) tend to repeat the same main points and argument structures, unlike human essays that show more varied ideas. They found that LLM-generated arguments are less unique and often use more general and vague points, while humans give specific and diverse responses. Even when asked to produce diverse answers, LLMs recover only about half of the variety seen in human debates. This pattern holds true not only for short essays but also for longer ones, showing a narrowing of argument styles by LLMs compared to humans.
Large Language Models (LLMs)Argument CollapsePublic DebateMain ArgumentsSub-argumentsEssay StructureDiversity in Text GenerationPolished ArgumentsHuman vs Machine WritingNew York Times Debates
Authors
Yekyung Kim, Yapei Chang, Chau Minh Pham, Mohit Iyyer
Abstract
As LLMs are increasingly used to draft public-facing arguments, they may flatten public debate by repeatedly introducing the same polished, plausible arguments. We study argument collapse, the tendency of essays generated by different LLMs to converge to a smaller set of main arguments, sub-arguments, and paragraph-level structures. We compare 1,039 human responses from 195 New York Times (NYT) debates, 448 human responses from 61 longer-form Boston Review (BR) forums, and 23,384 LLM-generated essays. In the NYT corpus, 65.3% of human main arguments are unique within a debate, compared to 3.4% of LLM main arguments. Asking LLMs to generate diverse answers adds variation, but a typical model recovers only about half of the distinct human main arguments, with much of the added variation falling outside the observed human argument space. Collapse also appears in sub-arguments, where among essays with the same main argument, 41.0% of human sub-arguments are unique versus 9.1% from LLM responses. Qualitatively, LLMs often reuse generalized and hedged sub-arguments, while humans prefer more concrete and topic-specific ones. Structure-wise, LLM-generated essays tend to follow a more fixed arc, often opening with a direct claim and moving quickly toward proposals. The same patterns hold in longer BR essays, suggesting that argument collapse extends beyond short-form responses.