Unified Learning of Temporal Task Structure and Action Timing for Bimanual Robot Manipulation

2026-03-06Robotics

Robotics
AI summary

The authors address how robots can better coordinate two-handed tasks by learning both the order and exact timing of actions from watching humans. They developed a way to represent and learn detailed timing between actions using math models, and a method to find all logical possible action sequences. By combining these with optimization techniques, their system creates plans for robots that match human timing closely during bimanual tasks. Their tests show the method produces more human-like timing than previous approaches. This work bridges the gap between high-level planning and precise movement coordination.

bimanual manipulationtemporal task structuresymbolic temporal relationssubsymbolic temporal constraintsGaussian Mixture ModelsDavis-Putnam-Logemann-Loveland algorithmAllen relationstask planningrobot executionoptimization-based planning
Authors
Christian Dreher, Patrick Dormanns, Andre Meixner, Tamim Asfour
Abstract
Temporal task structure is fundamental for bimanual manipulation: a robot must not only know that one action precedes or overlaps another, but also when each action should occur and how long it should take. While symbolic temporal relations enable high-level reasoning about task structure and alternative execution sequences, concrete timing parameters are equally essential for coordinating two hands at the execution level. Existing approaches address these two levels in isolation, leaving a gap between high-level task planning and low-level movement synchronization. This work presents an approach for learning both symbolic and subsymbolic temporal task constraints from human demonstrations and deriving executable, temporally parametrized plans for bimanual manipulation. Our contributions are (i) a 3-dimensional representation of timings between two actions with methods based on multivariate Gaussian Mixture Models to represent temporal relationships between actions on a subsymbolic level, (ii) a method based on the Davis-Putnam-Logemann-Loveland (DPLL) algorithm that finds and ranks all contradiction-free assignments of Allen relations to action pairs, representing different modes of a task, and (iii) an optimization-based planning system that combines the identified symbolic and subsymbolic temporal task constraints to derive temporally parametrized plans for robot execution. We evaluate our approach on several datasets, demonstrating that our method generates temporally parametrized plans closer to human demonstrations than the most characteristic demonstration baseline.