CoPark: Learning Reactive Parking via Self-Play

2026-06-02 • Robotics

Robotics

AI summaryⓘ

The authors tackle the challenge of teaching autonomous cars to park precisely while safely reacting to other nearby cars. They introduce CoPark, a method that combines a fixed parking plan with a flexible part that learns how to adjust when other cars get close. This approach helps cars park accurately and yield to others when needed. Tested on various parking lots, CoPark outperforms other methods and shows smart behaviors like letting others go first or squeezing through tight spaces.

Autonomous ParkingReinforcement LearningMulti-Agent SystemsResidual PolicySelf-PlayReactive ControlAction PriorZero-Shot EvaluationCollision AvoidanceLane-Level Precision

Authors

Jiarong Wei, Yanxing Chen, Sinuo Song, Yin Wu, Anna Rehr, Abhinav Valada

Abstract

Learning a single policy that reaches a goal with high geometric precision while interacting safely with nearby agents poses conflicting objectives. Precision favors commitment to a fixed geometric plan, whereas interaction requires immediate deviation when another agent intrudes, causing policies optimized for one objective to often fail at the other. We study this problem in the context of reactive autonomous parking, where multiple vehicles must reach assigned slots with sub-meter terminal accuracy while remaining responsive to neighboring vehicles throughout the maneuver. We propose CoPark, a multi-agent self-play RL approach built on a residual-policy architecture. A precomputed offline plan provides a fixed action prior, while a residual head learns the reactive corrections. The residual policy learns behaviors under self-play, where data and scripting fall short, while the fixed prior holds the slot-frame geometry that pure policies struggle to reach reliably. The key design is a partner-threat-modulated, channel-asymmetric release of the prior. A continuous threat signal shifts authority of the longitudinal channel to the residual head to enable yielding, while the lateral channel remains anchored to the precomputed reference to preserve sub-meter slot alignment. A closed-loop refinement layer corrects residual terminal error from action-grid discretization. We train our policy on six parking lots and evaluate zero-shot on our new reactive-parking benchmark spanning Dragon Lake Parking (DLP) and DeepScenario Open 3D (DSC3D). CoPark achieves ~70-85% success with only 3-6% collision rate, substantially outperforming classical, imitation-learning, and large-scale RL baselines. Importantly, the results demonstrate emergent interaction behaviors such as reverse-yielding, mid-maneuver yielding, tight-corridor passing, and queuing.

View PDFOpen arXiv