RAMP: Hybrid DRL for Online Learning of Numeric Action Models

2026-04-09Artificial Intelligence

Artificial Intelligence
AI summary

The authors created a method called RAMP to help computers learn how to plan actions in situations where numbers are involved. Instead of needing experts to provide all the rules beforehand, their system learns by trying things out and observing what happens. RAMP combines learning from experience using deep reinforcement learning with building an action model and planning based on that model. Their approach works better than a popular learning algorithm called PPO in test problems with numbers.

automated planningaction modelnumeric planningreinforcement learningdeep reinforcement learningPPO algorithmonline learningenvironment interactionGym environmentsIPC numeric domains
Authors
Yarin Benyamin, Argaman Mordoch, Shahaf S. Shperberg, Roni Stern
Abstract
Automated planning algorithms require an action model specifying the preconditions and effects of each action, but obtaining such a model is often hard. Learning action models from observations is feasible, but existing algorithms for numeric domains are offline, requiring expert traces as input. We propose the Reinforcement learning, Action Model learning, and Planning (RAMP) strategy for learning numeric planning action models online via interactions with the environment. RAMP simultaneously trains a Deep Reinforcement Learning (DRL) policy, learns a numeric action model from past interactions, and uses that model to plan future actions when possible. These components form a positive feedback loop: the RL policy gathers data to refine the action model, while the planner generates plans to continue training the RL policy. To facilitate this integration of RL and numeric planning, we developed Numeric PDDLGym, an automated framework for converting numeric planning problems to Gym environments. Experimental results on standard IPC numeric domains show that RAMP significantly outperforms PPO, a well-known DRL algorithm, in terms of solvability and plan quality.