Set-Supervised Diffusion Policy: Learning Action-Chunking Diffusion through Corrections

2026-06-01 • Robotics

Robotics

AI summaryⓘ

The authors introduce a new method called Set-Supervised Diffusion Policy (SDP) to improve robot learning from human corrections. Unlike previous methods that ignore when robots make mistakes, their approach uses both the robot's wrong actions and the human's fixes to better teach the robot. This helps the robot learn more reliably and handle noisy or imperfect data better. Their tests show that SDP improves robotic manipulation tasks and makes learning from humans more efficient.

diffusion policiesbehavior cloningrobotic manipulationhuman-in-the-loopdistributional shiftcontrastive learningdata aggregationpolicy learningaction-chunks

Authors

Zhaoting Li, Gang Chen, Javier Alonso-Mora, Cosimo Della Santina, Jens Kober

Abstract

Diffusion policies have recently emerged as a powerful framework for robotic manipulation. However, like other behavior cloning methods, they remain vulnerable to distributional shift, often requiring human-in-the-loop interventions to correct failures during deployment. These interactions naturally provide paired supervision in the form of the robot's undesired actions and the human teacher's corrective actions. Yet existing data aggregation pipelines and standard behavior cloning losses largely ignore this negative signal from undesired actions, leading to overfitting to teacher's actions and an increasing reliance on costly expert data. To address this limitation, we propose Set-Supervised Diffusion Policy (SDP), a novel learning framework that utilizes contrastive action-chunk data to train diffusion policies from human corrections. From paired positive and negative action-chunks, SDP constructs a set of desired action-chunks and designs a training pipeline that encourages the diffusion policy to align with the set. Through extensive experiments across multiple robotic manipulation tasks, we demonstrate that SDP consistently improves policy performance, with particularly strong gains in robustness to noisy data. Moreover, SDP induces high-quality aggregated datasets, enabling more efficient and reliable policy learning from human-in-the-loop corrections. Our code is available at https://set-supervised-diffusion-policy.github.io/.

View PDFOpen arXiv