AI summaryⓘ
The authors study how much memory and randomness players need to create the best possible strategies in certain two-player turn-based games with randomness. They confirm that sometimes exponentially more memory is necessary when going from simpler decision scenarios to two-player games, even if players use random moves. For a specific type of objective called mean-payoff-parity, they show that in two-player games, the amount of memory needed for the best random strategies grows only linearly with the problem size, which is simpler than the worst case for related objectives. They also show some limitations in methods that try to build strategies for two-player games based on simpler single-player games, demonstrating cases where infinite memory is still required. Overall, the authors clarify how complex strategies must be in these game settings.
turn-based stochastic gameszero-sum gamesstrategy complexitymemoryless strategiesrandomized strategiesmean-payoff-parity objectiveshift-invariant objectivesinverse-submixingMarkov decision processes (MDPs)finite memory
Abstract
We consider the strategy complexity (i.e., memory and randomization) of optimal strategies in turn-based 2-player zero-sum stochastic games. Results in [Gimbert,Kelmendi:2023] show how to lift optimal memoryless strategies for shift-invariant inverse-submixing objectives from MDPs to 2-player stochastic games with an exponential increase in the number of memory modes. We show the corresponding lower bound, i.e., the extra exponential memory is required in general, even for randomized strategies. Moreover, we solve the strategy complexity of the well-studied mean-payoff-parity objective in 2-player stochastic games. This objective is also shift-invariant inverse-submixing, but easier than the worst case for this class. In MDPs, Maximizer has optimal memoryless randomized strategies, while optimal deterministic strategies require exponential memory. However, in stochastic games, optimal randomized strategies require, at least and at most, linear memory (equal to the number of even colors). Finally, we show that a different construction for lifting memoryless (resp. finite-memory) deterministic strategies from MDPs (resp. 1-player games) to 2-player games cannot be generalized even to memoryless randomized strategies. We construct a shift-invariant objective where Max and Min each have optimal memoryless randomized strategies in all MDPs, but optimal (randomized) Max strategies still require infinite memory in deterministic 2-player games.