Streaming Communication in Multi-Agent Reasoning

2026-06-03 • Computation and Language

Computation and LanguageArtificial IntelligenceMultiagent Systems

AI summaryⓘ

The authors propose StreamMA, a new multi-agent reasoning system that sends each reasoning step to the next agent immediately instead of waiting for the whole reasoning chain to finish. This approach speeds up the overall process and surprisingly makes the reasoning more accurate because early steps tend to be more reliable. The authors mathematically analyze and prove when streaming helps and test their system on various benchmark tasks, showing improved performance over existing methods. They also find that giving each agent more steps consistently boosts both speed and accuracy.

multi-agent reasoningpipeline latencystreaminglarge language modelsreasoning benchmarksscaling lawsClaude OpusGPT-5algorithmic efficiencystep-level scaling

Authors

Zhen Yang, Xiaogang Xu, Wen Wang, Cong Chen, Xander Xu, Ying-Cong Chen

Abstract

Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency. Surprisingly, this pipelining also improves effectiveness: because multi-step reasoning quality is non-uniform and early steps are more reliable than later ones, working with these reliable early steps instead of the full chain prevents error-prone late steps from misleading downstream agents. We formalize both advantages with the first closed-form joint analysis of stream, serial, and single protocols, deriving the effectiveness ordering, speedup upper bound, and cost ratio. Across eight reasoning benchmarks spanning mathematics, science, and code, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies (Chain, Tree, Graph), StreamMA outperforms both baselines (avg. +7.3 pp, max +22.4 pp on HMMT 2026; Claude Opus 4.6-high). Beyond these contributions, we discover a "step-level scaling law": increasing per-agent steps consistently improves both effectiveness and efficiency, a new scaling dimension orthogonal to and composable with agent-count scaling.

View PDFOpen arXiv