Superhuman AI for Generals.io Using Self-Play Reinforcement Learning

2026-06-22Machine Learning

Machine Learning
AI summary

The authors created an AI that plays Generals.io, a strategy game with hidden information, better than top human players. They trained it using powerful GPUs and a very fast game simulator, which is much quicker than previous versions. The AI learns by playing against itself, using a method that improves its strategy based on wins and losses. This work shows that having a fast simulator can make training such AI much more effective.

Generals.ioreal-time strategy gameimperfect informationJAXsimulatorvision transformerself-playpolicy-gradientpolicy parametersreinforcement learning
Authors
Matej Straka, Viliam Lisý, Martin Schmid
Abstract
We present a superhuman AI agent for Generals.io, a real-time strategy game that requires both long-horizon planning and short-term tactics under strong imperfect information. Trained for four days on 4x NVIDIA H200 GPUs, our agent reaches #1 on the public 1v1 leaderboard of over 5,000 human players, leading the second-ranked player by the same margin that separates second place from 25th, and beats the two top-ranked humans head-to-head with a combined 199-70 record across 269 ladder matches. A key enabler is a JAX-native simulator that reaches tens of millions of frames per second on a single GPU, roughly a 10,000x speedup over the prior simulator. On top of this, we train a vision transformer policy end-to-end by self-play with a policy-gradient loop and sparse win/loss reward, using top-advantage sample filtering and an exponential moving average of the policy parameters. Taken together, our findings highlight what matters, and what does not, once a fast simulator removes the data bottleneck.