Hierarchical Reinforcement Learning in StarCraft Micromanagement with Influence Maps and Cluster-based Scripts
2026-06-29 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors address challenges in real-time strategy games like StarCraft, where many units must be controlled at once and rewards come late, making learning hard. They propose HRL-IM/CBS, a system that breaks down big decisions into smaller steps using battlefield maps encoded as simple codes and grouping units dynamically for better teamwork. This method learns more efficiently and makes decisions clearer by using a layered approach with easy-to-understand tables. Their tests show it competes well with other deep learning methods while being easier to interpret and requiring fewer training samples.
hierarchical reinforcement learninginfluence map hashingcluster-based scriptsStarCraft micromanagementmulti-Q-tablesample efficiencyreward allocationstate-action spacetactical executiondeep reinforcement learning
Authors
Chunhui Bai, Changhe Li, Dequan Li, Xinye Cai, Shengxiang Yang
Abstract
Real-time strategy (RTS) games present significant AI challenges, characterized by expansive state-action spaces arising from multi-unit coordination in continuous battlefields, and sparse delayed rewards stemming from final win/lose signals. Existing approaches face a trade-off between managing the dimensionality explosion of joint actions and maintaining the interpretability of complex state representations. This complexity is further intensified by the limitation of traditional hierarchical structures in adaptively decomposing tasks into effective tactical modules. Such difficulties are compounded by the black-box nature of deep learning models and their reliance on sparse rewards, which together result in limited sample efficiency and a lack of decision-making transparency. To address these limitations, this paper proposes HRL-IM/CBS, a hierarchical reinforcement learning framework with influence map hashing and cluster-based scripts for StarCraft micromanagement. Influence map hashing encodes global battlefield situations into compact hexadecimal codes, capturing spatial control and relative advantage. Cluster-based scripts enable dynamic local coordination through adaptive unit partitioning. The hierarchical multi-Q-table architecture decomposes decision-making into upper-level clustering strategy selection and lower-level tactical execution, with reward allocation providing dense learning signals. Experiments across six asymmetric scenarios demonstrate competitive performance against deep RL baselines while offering advantages in sample efficiency and interpretability through transparent Q-table representations.