Tool-Call Dependency Structure is Linearly Decodable in LLM Agent Residual Streams

2026-05-25Computation and Language

Computation and Language
AI summary

The authors studied how large language models (LLMs) that use tools create a sequence of tool calls that depend on each other, forming a kind of graph. They tested whether the model actually represents this structure inside its processing and found evidence that it does, by successfully decoding the call dependencies from the model's internal signals. Their analysis showed this representation tracks the overall connection pattern, not just specific values or positions, and it appears at certain layers, suggesting the model actively processes the structure. This work is the first to probe an LLM agent's live tool-call dependency graph, focusing on internal representation rather than the model's behavior.

LLM (Large Language Model)tool-use in AIdependency graphstructural probingresidual streamQwen3-32Bactivation patchingchain-of-thoughtruntime representationmulti-hop reasoning
Authors
Tianda Sun, Dimitar Kazakov
Abstract
Tool-using LLM agents produce trajectories whose calls form a directed dependency graph: earlier tool outputs supply arguments to later calls. Whether this execution structure is represented inside the model is unknown; prior structural probes have targeted static code or chain-of-thought text, not an agent's run-time call graph. A low-capacity edge probe on the residual stream of Qwen3-32B decodes the tool-call dependency graph well above both a Hewitt--Liang random-label control and a positional baseline. A counterfactual contrast between value corruption and structural perturbation indicates the signal tracks abstract topology rather than identifier values, and replicates under an independent, non-substring oracle. The non-positional component replicates on three further interactive multi-hop benchmarks and attenuates as call order alone becomes a sufficient proxy for dependency, vanishing in single-shot planning. Per-layer activation patching shifts the probe at a later, non-patched boundary, evidence that the representation propagates rather than passively reads out, though the realised tool call does not move. To our knowledge this is the first structural probe of an LLM agent's runtime tool-call dependency graph. Our claims concern representation, not behavioural control, and span two model families and one primary domain.