REFLECT: Intervention-Supported Error Attribution for Silent Failures in LLM Agent Traces

2026-06-08Artificial Intelligence

Artificial Intelligence
AI summary

The authors studied how to find mistakes in big step-by-step processes made by large language models, especially when the process looks fine but is actually wrong (silent failure). They made a new method called methodname that checks a suspected mistake by replaying the steps with a fix and seeing if the outcome changes. This helps them better identify where the real error is. Their approach worked best on tests involving logical reasoning and using tools, even when the right answer wasnt known.

large language modelserror localizationsilent failuremulti-hop reasoningtrace replaydiagnosis-specific patchcontrastive evidencetool-use traces
Authors
Xiaofeng Lin, Yingxu Wang, Tung Sum Thomas Kwok, Daniel Guo, Sahil Arun Nale, Charles Fleming, Guang Cheng
Abstract
Large language model (LLM) agents now solve complex tasks through long plan-and-execution traces, yet the ability to locate errors in a completed traces still lags far behind, especially in the \emph{silent failure} regime. Existing approaches predict suspect steps via classifiers or LLM judges, or recover correct answers via retry, but none feed the intervention outcome back to \emph{refine the attribution itself}. We propose \methodname, a method that closes this gap by diagnosing a candidate error step, testing it through controlled replay with a diagnosis-specific patch, and using the verified outcome flip as contrastive evidence to refine the final attribution. Across four localization benchmarks spanning multi-hop reasoning across domains, \methodname achieves the highest localization accuracy among same-auditor methods across all four benchmarks, with the largest gains on structured tool-use traces, while providing actionable localization even when ground-truth answers are unavailable.