When Does Intrinsic Self-Correction Help? A Task-Sensitive Analysis

2026-06-22 • Computation and Language

Computation and LanguageArtificial Intelligence

AI summaryⓘ

The authors studied a way for language models to fix their own answers by rethinking them without outside help, called self-correction (SC). They found that whether SC helps depends on the type of task, like checking clear rules, going over complicated steps, or choosing between different strategies. Their experiments showed SC works well only when the task allows the model to benefit from this kind of second look. So, the authors suggest SC should be seen as a strategy that works for some tasks but not a guaranteed fix for all answers.

intrinsic self-correctionlarge language modelsmodel revisiontask structurereasoning processinference-time strategyperformance gainsword-game tasksexplicit constraints

Authors

Elroy Stav, Dvir Berlowitz, Maayan Orner, Sarit Kraus

Abstract

Intrinsic self-correction (SC) aims to improve large language model outputs by prompting a model to revisit its own initial answer without external feedback. Recent studies have questioned the reliability of this approach, showing that models often struggle to judge whether their initial responses are correct. In this work, we take a task-sensitive view of SC. Rather than asking whether it works in general, we examine settings where SC may operate through different mechanisms: verifying explicit constraints, revisiting a complex reasoning process, or providing a second opinion over competing strategies in word-game tasks. Across multiple benchmarks and models, we find that SC can yield consistent performance gains when the underlying task structure facilitates these modes of revision. These results suggest that SC is best understood as a task-dependent inference-time strategy whose usefulness depends on the role the revision stage can play in a given task, rather than as a uniformly reliable method for improving initial model outputs.

View PDFOpen arXiv