A Token/KV-Cache Communication Media Selection and Resource Allocation Strategy for Multi-Agent Collaboration

2026-05-25Artificial Intelligence

Artificial IntelligenceInformation Theory
AI summary

The authors study how to help smart computer agents work together over future 6G wireless networks, where they can communicate in different ways. They find that sending data as language tokens or as cached key-value pairs each has pros and cons depending on the network and device conditions. To fix this, they create a method that smartly picks the best communication style and shares wireless resources to reduce total delay. Their tests show this approach leads to faster teamwork between agents compared to using just one communication method. This can make multi-agent systems more efficient in future wireless setups.

large language models6G networksmulti-agent cooperationlatent-space interactionend-to-end latencyresource allocationtoken-based transmissionkey-value cachewireless communicationoptimization algorithm
Authors
Lipeng Dai, Luping Xiang, Kun Yang
Abstract
The convergence of large language models (LLMs) with 6G networks is fostering a paradigm of autonomous multi-agent cooperation, which in turn is expected to substantially increase east-west traffic. Although latent-space interaction mechanisms can enable more efficient collaboration than symbolic natural-language (NL) exchanges, prior work often abstracts away the associated communication overhead under practical wireless constraints. In embodied multi-agent settings, heterogeneous interaction media incur disparate inference and transmission costs, thereby inducing an inherent end-to-end (E2E) latency trade-off. To address this, we propose a joint design that integrates communication-media selection with wireless resource allocation. Through analytical characterization and simulation-based evaluation, we show that neither token-based transmission nor key-value (KV) cache-based transmission is uniformly optimal across operating regimes, as performance depends critically on system parameters such as available computational resources and channel conditions. Accordingly, we formulate a joint optimization problem aimed at minimizing the E2E latency of multi-agent collaboration and develop a low-complexity joint media selection and resource allocation (JMSRA) algorithm. Numerical results further confirm that, by adaptively coordinating the interaction media and bandwidth allocation over heterogeneous links, the proposed scheme achieves markedly reduced E2E latency relative to conventional NL-only and KV-cache-only baselines, enabling efficient and robust multi-agent collaboration in future wireless networks.