Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation

2026-06-08 • Robotics

RoboticsArtificial Intelligence

AI summaryⓘ

The authors studied how to teach a computer to race a motorbike by itself in a realistic video game simulator. They combined two learning methods to help the computer gradually get better by facing harder challenges based on how well it performed. Their setup included special way to track the bike’s balance and position on the track, and rewards that encouraged good racing behavior while avoiding crashes. Their tests showed that this new approach helps the computer learn faster and drive more safely compared to previous methods. This work is an early step in teaching AI to handle the tricky balance and controls of motorbike racing.

Autonomous RacingDeep Reinforcement LearningSoft Actor-Critic (SAC)Self-Paced Curriculum LearningMotorbike DynamicsProprioceptive FeaturesLean AngleReward ShapingPhysics-based SimulationUnity Simulator

Authors

Luca Ghisi, Jacopo Essenziale, Carlo D'Eramo, Matteo Luperto

Abstract

Autonomous Racing has seen remarkable progress through deep Reinforcement Learning (RL), primarily for four-wheeled vehicles. However, motorbikes introduce substantially greater complexity due to the need to manage balance and lean angle, in addition to more reactive steering and throttle control, and a smaller weight. In this work, we present a framework for training an autonomous agent to race a superbike in VRider SBK, a physics-accurate Unity-based motorbike simulator. Our approach integrates Soft Actor-Critic (SAC) with Self-Paced curriculum Deep reinforcement Learning (SPDL), which dynamically generates progressively more challenging tasks based on the agent's performance, without requiring manual curriculum design. The agent's state space comprises proprioceptive features extended with lean-angle history, along with global track features via course points. The reward signal is shaped to encourage progress along the track while penalizing instability-inducing behaviors specific to two-wheeled dynamics. Preliminary experimental results demonstrate that SPDL outperforms SAC alone in training efficiency, lap time, and driving stability across multiple tracks and motorbike models, establishing a first baseline for RL-based autonomous motorbike racing.

View PDFOpen arXiv