Reinforcement Learning with Inner-loop Dynamics Estimator for Aerial Manipulation under Uncertainty

2026-06-15 • Robotics

Robotics

AI summaryⓘ

The authors developed a control method for flying robots with arms that can handle sudden arm movements and changing loads without knowing all the details of the robot’s dynamics. They use a two-part system: one part (using Reinforcement Learning) plans the arm and body movements together, and the other part corrects for unexpected changes during flight. They tested their approach on a drone with a 3-joint arm and found it worked better than other methods in keeping the arm’s position accurate and completing tasks successfully. This shows their method helps flying robots handle tricky manipulation tasks more reliably.

Aerial ManipulatorsReinforcement LearningDynamics Estimator6-DoF End-EffectorQuadrotorPayload VariationWhole-Body ControlPID ControlINDI (Incremental Nonlinear Dynamic Inversion)End-Effector Tracking

Authors

Shivansh Pratap Singh, Samaksh Ujjwal, Ishita Chaudhary, V R Vasudevan, Rishabh Dev Yadav, Spandan Roy

Abstract

Aerial manipulators enable physical interaction in hard-to-reach environments; however, the combined problem of direct whole-body aerial manipulation under rapid arm motion, payload changes, and related unknown dynamic uncertainty remains a largely unsolved problem. We present a hierarchical control framework that combines Reinforcement Learning (RL) with an inner-loop dynamics estimator to address this problem. The RL outer loop maps desired 6-degrees-of-freedom (DOF) end-effector targets to coordinated whole-body commands, enabling direct task-driven control without relying on a fully accurate coupled dynamic model in the policy layer. An inner loop then tracks these commands while compensating for transient inertial shifts and uncertainty during execution via a dynamics estimator scheme without requiring system model knowledge. We validate the proposed approach on a custom quadrotor equipped with a 3-DoF manipulator through hardware experiments under varying payload conditions. Compared with RL+PID and RL+INDI+PID baselines, the proposed method reduces end-effector tracking error and improves task success rate across the tested hardware conditions. These results show that combining learned whole-body coordination with estimator-based low-level compensation improves the precision and robustness of aerial manipulation under changing operating conditions.

View PDFOpen arXiv