Think Less, Know More: State-Aware Reasoning Compression with Knowledge Guidance for Efficient Reasoning

2026-04-10 • Computation and Language

Computation and Language

AI summaryⓘ

The authors address the problem of large reasoning models taking too many unnecessary steps when solving complex tasks, which slows them down. They propose STACK, a new method that smartly shortens the reasoning process by deciding which steps are redundant or biased, using extra knowledge to guide the process. STACK also stops early when it detects that further steps won't help. Their experiments show that STACK makes reasoning faster and more accurate compared to previous methods.

Large Reasoning ModelsChain-of-ThoughtReasoning CompressionKnowledge GuidanceProximal Policy OptimizationDirect Preference OptimizationEarly StoppingRetrieval-Augmented ModelsMathematical Reasoning BenchmarksInference Latency

Authors

Yi Sui, Chaozhuo Li, Dawei Song

Abstract

Large Reasoning Models (LRMs) achieve strong performance on complex tasks by leveraging long Chain-of-Thought (CoT), but often suffer from overthinking, leading to excessive reasoning steps and high inference latency. Existing CoT compression methods struggle to balance accuracy and efficiency, and lack fine-grained, step-level adaptation to redundancy and reasoning bias. Therefore, we propose State-Aware Reasoning Compression with Knowledge Guidance (STACK), a framework that performs step-wise CoT compression by explicitly modeling stage-specific redundancy sources and integrating with a retrieval-augmented guidance. STACK constructs online long-short contrastive samples and dynamically switches between knowledge-guided compression for uncertain or biased reasoning state and self-prompted compression for overly long but confident state, complemented by an answer-convergence-based early stopping mechanism to suppress redundant verification. We further propose a reward-difference-driven training strategy by combining Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO), enabling models to learn state-conditioned compression strategies. Experiments on three mathematical reasoning benchmarks show that STACK achieves a superior accuracy-efficiency balance, reducing average response length by 59.9% while improving accuracy by 4.8 points over existing methods.

View PDFOpen arXiv