SAGA: Scene-Aware, Goal-Evolving Agents for Long-Horizon CivRealm Strategy Planning

2026-06-29 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors introduce SAGA, a new system that helps AI agents plan better long-term strategies in complex games by fixing three main problems seen in earlier models: difficulty understanding spatial layouts, handling too much information at once, and learning that doesn’t improve over multiple games. SAGA uses a special map description, focuses on different parts of the game separately with expert helpers, and learns from past games to improve over time without needing custom rewards. When tested on the game FreeCiv, SAGA scored higher and more consistently than other strong methods, especially in building important resources. The authors also show that each part of SAGA helps its success on its own.

Long-horizon planningLarge Language Models (LLM)Scene graphSparse rewardsFreeCivMulti-agent systemsStrategic planningCross-game learningAblation studyContext management

Authors

Tianyu Jin, Shuo Chen, Yida Wang, Liuyu Xiang, Yingzhuo Liu, Zhiyao Jiang, Yexin Li, Zhaofeng He

Abstract

Long-horizon strategic planning in complex strategy games demands concurrent reasoning across multiple decision domains under imperfect information and sparse reward. Existing LLM-based agents suffer from three systematic failures: scene blindness from raw tile coordinates, context overflow and domain coupling from monolithic state dumps, and shallow cross-game learning that treats each episode in isolation. We present SAGA, an LLM multi-agent framework with three mechanisms each directly targeting one class of failure: (i) a Map-Semantic Scene Graph that encodes typed spatial relations among game entities into per-unit natural-language context, resolving spatial blindness without global token inflation; (ii) a Tool-Augmented Planner that pulls fine-grained domain state on demand and dispatches per-domain directives to dedicated specialist controllers, eliminating context overflow, domain coupling, and mechanical constraint violations; and (iii) a Dual-Horizon Feedback Loop that combines periodic within-game goal generation with structured cross-game causal post-mortem, enabling principled strategic evolution without manual reward engineering. Evaluated on FreeCiv, SAGA attains the highest mean civilization score -- the environment's sole sparse objective reward -- with lower variance than the two strongest baselines, and is the only method that significantly surpasses every baseline on infrastructure construction, the resource axis most readily sacrificed under multi-objective conflict. It outscores the two strongest baselines in most head-to-head games while cutting output tokens (the dominant decoding cost) by 27%. Equipped with the cross-game evolution module, SAGA reaches the highest end-of-chain score across five successive episodes. Ablation studies confirm that each architectural component contributes independently to this advantage.

View PDFOpen arXiv