Causal Tongue-Tie: LLMs Can Encode Causal Direction, But Their Yes/No Outputs Fail to Express

2026-05-25 • Computation and Language

Computation and LanguageArtificial Intelligence

AI summaryⓘ

The authors found that large language models sometimes 'know' the right answer to causal questions internally but give a different spoken answer based on common sense. They show this by using a test that reads the model's hidden knowledge directly with a tool called a linear probe, which finds high accuracy, compared to the actual yes/no answers the model says, which are less accurate. They call this difference 'Causal Tongue-Tie,' meaning the model can't always say what it actually understands. This suggests that just looking at the model’s yes/no answers may be misleading when judging if it truly understands cause and effect.

large language modelscausal reasoninglinear probehidden statecommonsensecausal questionmodel outputaccuracybenchmarkCausal Tongue-Tie

Authors

Ziyi Ding, Xiao-Ping Zhang

Abstract

We find a mismatch between what large language models encode about a causal question and what they answer. On anti-commonsense CLadder items, a fixed linear probe recovers the evidence-supported answer from the model's hidden state (accuracy approximately 0.97), while the spoken Yes/No reverts to the commonsense one (accuracy approximately 0.5). We call this approximately +0.5 gap Causal Tongue-Tie: a wrong Yes/No decomposes into two separable failure modes: no internal signal versus a signal the verbal interface cannot say. The implication cuts both ways for output-only causal benchmarks: a benchmark "correct" need not mean the model has understood, and a benchmark "wrong" need not mean it cannot. Sweeping claims about whether LLMs can do causal reasoning, drawn from a single accuracy number, deserve a second look.

View PDFOpen arXiv