NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

2026-05-01Machine Learning

Machine Learning
AI summary

The authors address a problem where calculating all possible moves in a team of agents is too slow and complex. They introduce NonZero, a method that focuses on smaller groups of possible moves guided by predictions, instead of trying every combination. This method ranks potential changes based on how much they might help the team work better together. Testing NonZero showed it uses samples more efficiently and achieves better results compared to existing approaches in cooperative tasks. Overall, the authors provide a way to explore multi-agent decisions more smartly without checking every possibility.

Monte Carlo Tree Searchmulti-agent systemscooperative gamesjoint-action spacebandit problemsample efficiencylocal regretinteraction scoremodel-based methodsmodel-free methods
Authors
Sizhe Tang, Zuyuan Zhang, Mahdi Imani, Tian Lan
Abstract
Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over a low-dimensional nonlinear representation using an interaction-guided proposal rule, instead of directly exploring the full joint-action space. Our exploration uses an interaction score: single-agent deviations are ranked by predicted gain, while two-agent deviations are scored by a mixed-difference measure that reveals coordination benefits even when no single agent can improve alone. We formalize candidate proposal as a bandit problem over local deviations and derive a proposal rule, NonZero, with a sublinear local-regret guarantee for reaching approximate graph-local optima without enumerating the joint-action space. Empirically, NonZero improves sample efficiency and final performance on MatGame, SMAC, and SMACv2 relative to strong model-based and model-free baselines under matched search budgets.