When Does Non-Uniform Replay Matter in Reinforcement Learning?

2026-05-11 • Machine Learning

Machine LearningArtificial Intelligence

AI summaryⓘ

The authors studied how replaying past experiences in reinforcement learning helps improve learning. They found that replay methods work best when the number of replays per environment step is low, and that mixing up which experiences are replayed (high entropy) matters even when replaying recent experiences. Based on these insights, they propose a simple method called Truncated Geometric replay that favors recent experiences but still keeps variety, improving learning efficiency without extra computational cost. Their approach works well across different tasks and algorithms, especially when replay volume is small.

off-policy reinforcement learningexperience replayreplay volumeexpected recencyentropysampling distributionTruncated Geometric replaysample efficiencymulti-task learningparallel simulation

Authors

Michal Korniak, Mikołaj Czarnecki, Yarden As, Piotr Miłoś, Pieter Abbeel, Michal Nauman

Abstract

Modern off-policy reinforcement learning algorithms often rely on simple uniform replay sampling and it remains unclear when and why non-uniform replay improves over this strong baseline. Across diverse RL settings, we show that the effectiveness of non-uniform replay is governed by three factors: replay volume, the number of replayed transitions per environment step; expected recency, how recent sampled transitions are; and the entropy of the replay sampling distribution. Our main contribution is clarifying when non-uniform replay is beneficial and providing practical guidance for replay design in modern off-policy RL. Namely, we find that non-uniform replay is most beneficial when replay volume is low, and that high-entropy sampling is important even at comparable expected recency. Motivated by these findings, we adopt a simple Truncated Geometric replay that biases sampling toward recent experience while preserving high entropy and incurring negligible computational overhead. Across large-scale parallel simulation, single-task, and multi-task settings, including three modern algorithms evaluated on five RL benchmark suites, this replay sampling strategy improves sample efficiency in low-volume regimes while remaining competitive when replay volume is high.

View PDFOpen arXiv