Scaling Agentic Capabilities via Grounded Interaction Synthesis
2026-06-01 • Computation and Language
Computation and Language
AI summaryⓘ
The authors present GAIS, a new method to create diverse and challenging environments and tasks for training intelligent agents without costly human labeling. Instead of relying on large language models alone, which can produce repetitive or unrealistic tasks, GAIS uses real-world protocols to build meaningful environments and planned strategies to generate complex challenges. Their experiments show that agents trained with GAIS data perform better and more efficiently than those trained with existing methods. This approach helps agents learn more effectively with less data and keeps improving over time.
agentic intelligencelarge language modelsenvironment synthesisModel Context Protocol (MCP)task generationstructure-guided planningadversarial policyinstruction tuningdata efficiencybenchmarking
Authors
Wenhang Shi, Jinhao Dong, Yiren Chen, Zhe Zhao, Shuqing Bian, Wei Lu, Xiaoyong Du
Abstract
General agentic intelligence hinges on the ability to interact with diverse real-world tools to complete complex tasks, a capability fundamentally tied to the quality of interaction data. To bypass the prohibitive costs of human annotation, prevailing paradigms depend entirely on Large Language Models (LLMs) to scale the synthesis of agentic environments and tasks. However, such unconstrained generation often degenerates into biased random sampling of LLMs' internal priors, failing to capture the diversity and difficulty of real-world domains or construct high-fidelity, long-horizon tasks. In this work, we introduce Grounded Agentic Interaction Synthesis (GAIS), a framework that automates the scalable construction of diverse environments and complex tasks via a two-phase grounding mechanism. Specifically, we construct protocol-anchored environments derived from real-world Model Context Protocol (MCP) servers to ensure functional diversity and difficulty. Subsequently, we employ structure-guided planning to navigate these environments, actively enforcing logical dependencies and adversarial policies to generate complex tasks. Experiments on BFCL, $τ^2$-Bench, and ACEBench demonstrate that GAIS-synthesized data significantly outperforms state-of-the-art baselines, enabling base models to match or even surpass their official instruction-tuned counterparts. Furthermore, GAIS exhibits superior data efficiency and scalability, achieving exceptional capabilities with significantly less data while maintaining continuous growth where baselines stagnate. Our code and dataset are publicly available at https://github.com/Eric8932/GAIS.