Task-Error Residual Learning for Real-Robot Five-Ball Juggling

2026-06-15 • Robotics

RoboticsMachine Learning

AI summaryⓘ

The authors show how a robot can learn to juggle multiple balls quickly by improving an existing skill using clear feedback about its mistakes. Instead of just using a simple reward, their method uses detailed task error information and selects the best learning samples. This approach lets the robot start juggling successfully from its second try, much faster than humans usually do. They also found that both good feedback and a useful starting model are needed for the best learning results.

residual learningreinforcement learningdirectional task errorsample efficiencyNewton updateBayesian Optimizationstochastic searchrobot manipulationBarrett WAM armconvergence

Authors

Kai Ploeger, Jan Peters

Abstract

For residual learning that refines existing behavior, sample efficiency depends on two things: how much information each rollout returns, and how efficiently the learner uses that information. Reinforcement learning's standard scalar reward carries far less information than the directional task error that defines the task. Random exploration further discards whatever information each rollout returns. Through residual learning with directional task-error supervision and a task error model that drives sample selection, we achieve stable three-, four-, and five-ball juggling on anthropomorphic Barrett WAM arms. Despite planning and controlling through a simple, idealized stack, the system converges from the second attempt. The first attempt drops, after which task error decreases monotonically without further failures. In comparison, five-ball juggling typically takes humans years of practice. We compare residual learners across two ternary axes, the directional information in the learning feedback and the commitment of the analytic prior, spanning Newton-style Jacobian updates, Composite Bayesian Optimization, and stochastic search methods. Both axes prove necessary: neither directional feedback nor an informative prior suffices alone, and the simplest method that combines them, a fixed-Jacobian Newton update, is the most reliable. The learned residual tolerates substantial prior misalignment and degraded joint tracking, affecting mainly convergence speed. The bottleneck for residual learning on real robots is therefore the information content of the supervision signal and how the learner uses it, not the accuracy of the surrounding stack. Video documentation of all experiments is available at https://kai-ploeger.com/residual-juggling.

View PDFOpen arXiv