GRADE: Graph Representation of LLM Agent Dependency and Execution

2026-06-22Machine Learning

Machine Learning
AI summary

The authors examine how to represent the actions of large language model (LLM) agents using graphs. They propose GRADE, a method that builds a graph with two types of links: one showing the order steps happened (execution edges) and another showing what each step depended on (dependency edges). Their results across various tasks show that dependency edges help predict when runs fail better than just looking at run size, and execution edges help find exactly where the failure happened. They also explain why common graph neural networks may struggle with their dependency data and offer an alternative approach. This graph-based view can help in diagnosing failures and improving LLM agent performance.

Large language modelsLLM agentsGraph representationExecution edgesDependency edgesTrace dataFailure predictionGraph neural networksRun analysisMulti-agent systems
Authors
Yue Zhao
Abstract
Can one graph represent every kind of LLM agent's run? A trace records what each step did, never what it relied on, the state it read, and the results it reused. GRADE recovers that missing layer: it models any run as one graph over its step nodes with two edge layers, execution edges (what ran in what order) read from the trace for free, and dependency edges (what each step relied on) rarely logged, so each is graded by how it is known, observed, declared, or inferred. One representation, and each layer earns its place. Across six corpora of LLM agents spanning tool use, coding, and the web, the dependency layer can predict failure where run size is weak and, under leave-one-corpus-out transfer, stays above chance on every held-out class while run size fails. Meanwhile, the execution layer localizes the faulting step in a failed multi-agent run. This work also provides a more in-depth analysis of why generic graph neural networks may misread the dependency layer, unlike our feature-based alternative. The same graph representation opens further uses, carrying from failure diagnosis in a single run to efficiency and robustness optimization at scale.