Randomization for Faster Exact Optimization of Discounted Markov Decision Processes

2026-06-03 • Data Structures and Algorithms

Data Structures and Algorithms

AI summaryⓘ

The authors developed faster methods to solve discounted Markov Decision Processes (DMDPs), which are problems used to make decisions over time with future rewards considered less important. They improved the solving process by breaking it down into simpler steps: evaluating a given policy and finding nearly best values. They created both a clear-cut method and a quicker, random-based one, using recent improvements in approximate solutions to enhance their results.

discounted Markov Decision ProcessDMDPpolicy evaluationoptimal valuesrandomized algorithmsdeterministic algorithmsapproximate solutionspolicy optimization

Authors

Andrei Graur, Aaron Sidford, Ta-Wei Tu

Abstract

We provide faster deterministic and randomized algorithms for exactly solving discounted Markov Decision Processes (DMDPs). We obtain our results by efficiently reducing computing optimal values and policies in DMDPs to the easier tasks of policy evaluation and computing approximately optimal values in DMDPs. We provide both a straightforward deterministic reduction and a more efficient randomized variant that, together with advances in approximately solving DMDPs, yield our results.

View PDFOpen arXiv