A Comparative Study of Dynamic Programming and Reinforcement Learning in Finite Horizon Dynamic Pricing

2026-04-15 • Machine Learning

Machine Learning

AI summaryⓘ

The authors compare two methods, Fitted Dynamic Programming (DP) and Reinforcement Learning (RL), to set prices in situations where demand changes over time. They test these methods in simple to more complex scenarios involving different product types and rules about revenue over time. Unlike past work focusing only on simple cases, they apply DP to complex, multi-product scenarios with constraints. They examine how well each method performs in terms of revenue, stability, following rules, and computing time, showing the trade-offs between working with explicit models versus learning from experience.

Fitted Dynamic ProgrammingReinforcement LearningDynamic PricingFinite-horizon ProblemsDemand EstimationMulti-typology SettingsInter-temporal ConstraintsRevenue OptimizationComputational ScalingTrajectory-based Learning

Authors

Lev Razumovskiy, Nikolay Karenin

Abstract

This paper provides a systematic comparison between Fitted Dynamic Programming (DP), where demand is estimated from data, and Reinforcement Learning (RL) methods in finite-horizon dynamic pricing problems. We analyze their performance across environments of increasing structural complexity, ranging from a single typology benchmark to multi-typology settings with heterogeneous demand and inter-temporal revenue constraints. Unlike simplified comparisons that restrict DP to low-dimensional settings, we apply dynamic programming in richer, multi-dimensional environments with multiple product types and constraints. We evaluate revenue performance, stability, constraint satisfaction behavior, and computational scaling, highlighting the trade-offs between explicit expectation-based optimization and trajectory-based learning.

View PDFOpen arXiv