CausalFlow: Causal Attribution and Counterfactual Repair for LLM Agent Failures

2026-05-25Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors present CausalFlow, a method to fix where large language model agents go wrong in multi-step tasks by analyzing their step-by-step actions. It identifies exactly which step caused the failure and suggests minimal changes to fix it, creating pairs of mistakes and corrections. This approach helps both to fix mistakes during use and to teach models better during training. They tested CausalFlow across various tasks like math and coding, showing it outperforms simpler trial-and-error fixes by making more precise and effective improvements.

Large Language ModelMulti-step ReasoningCounterfactual InterventionExecution TraceCausal ResponsibilityMinimal RepairAgent FailureReward ModelingOffline Preference OptimizationTest-time Repair
Authors
Akash Bonagiri, Devang Borkar, Gerard Janno Anderias, Setareh Rafatirad, Houman Homayoun
Abstract
Large language model (LLM) agents frequently fail on multi-step tasks involving reasoning, tool use, and environment interaction. While such failures are typically logged or retried heuristically, they contain structured signals about where execution broke down. We introduce CausalFlow, an interventional framework that converts failed agent traces into minimal counterfactual repairs and reusable supervision. CausalFlow models execution traces as sequential chains of dependent steps and computes Causal Responsibility Scores(CRS) via step-level counterfactual intervention to identify failure-inducing steps. For these steps, we generate minimally edited repairs that flip the final outcome to success, producing validated contrastive pairs of the form (wrong step, corrected step). CausalFlow supports two complementary uses: targeted test-time repair that recovers from failures with minimal behavioral drift, and training-time supervision suitable for offline preference optimization or reward modeling. Across four benchmarks spanning mathematical reasoning, code generation, question answering, and medical browsing, CausalFlow converts failed executions into validated minimal repairs with high minimality and causal-consensus scores, and demonstrates that causal attribution is necessary for reliable improvement across diverse agent tasks, outperforming heuristic refinement in complex retrieval settings while producing more localized repairs throughout. These results demonstrate that interventional analysis over structured execution traces provides a principled and scalable mechanism for transforming agent failures into reliability gains and learning-ready supervision.