Easier to Mislead Than to Correct: Harmful and Beneficial Revision in LLM Conformity

2026-06-01 • Computation and Language

Computation and LanguageArtificial Intelligence

AI summaryⓘ

The authors studied how language models change their answers after seeing other models' responses. They found that models are more likely to be tricked into changing a correct answer if many peers agree on a wrong one, and that authority labels make models trust certain answers more, no matter if they are right. They also discovered that common techniques meant to improve thinking don't reliably stop these bad changes. The authors suggest that models in groups should check peer answers carefully instead of just following the majority.

large language modelsmulti-agent systemspeer agreementconsensus structureauthority labelsanswer revisionchain-of-thoughtreflectionquestion answering datasets

Authors

Jiaming Qu, Lucheng fu, Yibo Hu

Abstract

Large language models are increasingly used in multi-agent systems, where they see and respond to other agents' answers. A key risk is conformity: a model may abandon its own answer simply because others agree on a different one. Prior studies show that LLMs often revise toward a majority answer, but it remains unclear whether these revisions help correct mistakes as often as they introduce new errors. In this paper, we conduct a controlled study in which an LLM first answers a question, then sees simulated peer responses before making a final decision. We manipulate two social cues: consensus structure and authority labels assigned to peers, and measure how they influence beneficial and harmful revisions. Across four open-weight LLMs and seven QA datasets, we find that peer agreement makes it much easier to mislead initially correct models than to correct initially wrong ones. Authority labels make models more likely to choose the endorsed answer, regardless of whether it is correct. More concerningly, generic reasoning interventions such as chain-of-thought and reflection do not reliably reduce harmful revision while preserving beneficial revision. These findings suggest that multi-agent LLM systems should verify peer answers rather than simply aggregate them.

View PDFOpen arXiv