Unveiling the Entropy Dynamics of Chain-of-Thought Reasoning

2026-06-01Computation and Language

Computation and LanguageMachine Learning
AI summary

The authors studied how language models think step-by-step (Chain-of-Thought) and found two phases: first, a phase of uncertainty where the model explores ideas, and then a confidence phase where it settles on the answer. They noticed that once confident, the model both gives very reliable answers and keeps generating unnecessary words. Based on this, the authors created methods to stop the thinking process early or focus on sure answers. They used a classical change-point detection method, CUSUM, to detect when the model becomes confident, improving accuracy and efficiency without extra training.

Chain-of-ThoughtEntropy dynamicsConfidence RegionUncertainty RegionChange-point detectionCUSUM algorithmEarly exitTest-time scalingSelf-consistencyInference efficiency
Authors
Ting Xu, Xu He, Yupu Lu, Jiankai Sun, Dong Li, Wai Lam, Jianye Hao
Abstract
This paper investigates the entropy dynamics of Chain-of-Thought (CoT) and uncovers a consistent two-phase structure: an Uncertainty Region of exploration transitioning sharply to a Confidence Region of convergence. We demonstrate that the Confidence Region possesses two critical properties: 1) High Reliability -- answers in the confidence region become highly accurate and stable, and 2) High Redundancy -- models generate unnecessary tokens long after reaching the correct answer. These properties unlock more efficient and reliable inference strategies: 1) Early Exit leverages reliability and redundancy to terminate computation safely when returns diminish, and 2)Test-Time Scaling uses the Confidence Region signal to prioritize converged trajectories. To operationalize these insights, we formulate Confidence Region detection as a sequential change-point detection problem, being the first to apply classical change-point methods to monitor CoT reasoning. Using the Cumulative Sum (CUSUM) algorithm, a statistically optimal change-point detector, we develop a training-free framework for real-time inference control. Experiments show our approach establishes a superior Pareto-frontier for early exit. CUSUM achieves 63.06% accuracy with 11.1% token reduction, outperforming DEER and Dynasor by 3.28% and 4.36% in accuracy respectively. For test-time scaling, CUSUM-weighted voting consistently outperforms self-consistency.