AI summaryⓘ
The authors studied how large language models (LLMs) think through problems at test time, noticing that the same starting point can lead to different answers. They explored having a model (the student) ask questions to another (the teacher) during reasoning to better understand what the student 'knows' internally. Their findings suggest that the student's own questioning reveals useful signals about whether it is on the right track, even before getting the teacher's answer. However, while this method can detect when the model is likely correct or uncertain, using the questions to fix mistakes doesn't always work and can sometimes make things worse, showing limits in the model’s ability to improve itself during reasoning. The authors highlight that recognizing a problem doesn't always mean the model can fix it effectively.
large language modelschain-of-thought reasoningtest-time reasoninginference-time interventionstudent-teacher settinghidden stateself-diagnosisquestion-askingself-consistencymodel uncertainty
Authors
Chu Fei Luo, Samuel Dahan, Xiaodan Zhu
Abstract
Test-time reasoning has become a significant field of study since the introduction of chain-of-thought reasoning in large language models (LLMs). However, the mechanisms of this reasoning process are still under-explored -- from the same input prompt, and even the same partial solution, LLMs can produce varied answers if sampled multiple times. We propose to leverage question-asking as an inference-time intervention that articulates information about the model's hidden state. To achieve that, we present a student-teacher setting where a student asks questions to a teacher. We train a probe on the student's hidden state before and after asking a question and find it is predictive of the trajectory's final correctness, even before generating the teacher's answer. This suggests there is a meaningful signal from the self-diagnosis that occurs during question generation rather than information transfer from the teacher. We then frame question-asking as a sequential decision problem, using this probe as a quality score, and define a gating policy to ask questions that maximize likelihood of correctness. We find that the success of question-asking as an intervention is largely dependent on the model's self-consistency. Our empirical results show a gap between detection and recovery; while our gating policy captures model correctness and uncertainty, interventions are equally likely to harm correct trajectories as they are to recover incorrect ones. This gap between diagnosis and correction has broader implications on language models' capacity for self-refinement under uncertainty.