When Should a Robot Replan? Regret-Guided Update Scheduling in Time-Varying MDPs

2026-06-15Robotics

Robotics
AI summary

The authors study how robots can best use limited energy and computing power to update their plans when the environment keeps changing. They focus on situations where the robot can't always recalculate its path and must decide when to update its knowledge about how the world works. By modeling this as a special kind of decision process with changing conditions, they create a system that chooses update times to minimize the loss from using outdated plans. They test their method on a Mars rover simulator and a flying drone, showing that their adaptive approach works better than simpler fixed schedules.

non-stationary environmentsMarkov decision processestransition kerneldynamic regretmaximum likelihood estimationpolicy updatingfinite-horizon policyadaptive planningrobot navigationresource-constrained computing
Authors
Negin Musavi, Gokul Puthumanaillam, Ruben Hernandez, William Schafer, Melkior Ornik
Abstract
Robots operating in non-stationary environments must continually adapt their policies as the dynamics drift, but onboard energy and compute budgets cap how often a full state estimation and re-planning step can be performed. This raises a question: \emph{when}, along a horizon, should a robot spend its limited budget? We formulate this problem in time-varying Markov decision processes (TVMDPs) with a known bound on the rate of transition drift. We model execution as a \emph{skip-update} scheme in which, at chosen update times, the agent estimates the transition kernel by maximum likelihood and computes a finite-horizon policy, and between updates reuses this policy under a propagated state estimate. We analyze the dynamic regret of this scheme and show how it grows during skip intervals in terms of the properties of the TVMDP and the skip lengths; the resulting bound answers the opening question via an online, regret-guided update rule that allocates the budget adaptively. We evaluate the rule in a simulated Mars-rover navigation task with time-varying slip dynamics and on a Crazyflie quadrotor in indoor obstacle fields. Adaptive allocation outperforms other budgeted baselines.