TiPToP: A Modular Open-Vocabulary Planning System for Robotic Manipulation

2026-03-10Robotics

Robotics
AI summary

The authors developed TiPToP, a flexible system that uses pre-trained vision models combined with a task and motion planner to perform multi-step robot tasks from pictures and language instructions. Their system is easy to set up and can work with different robots without needing any new robot-specific training data. They tested TiPToP on 28 different object manipulation tasks in both simulation and real life, and it performed as well or better than a model trained on hundreds of hours of robot-specific data. The system's design also lets the authors see which parts work well and which do not, helping guide future improvements.

vision foundation modelstask and motion planning (TAMP)robot manipulationnatural language instructionsmulti-step tasksrobot embodimentpretrained modelssimulationrobot learningmodular systems
Authors
William Shen, Nishanth Kumar, Sahit Chintalapudi, Jie Wang, Christopher Watson, Edward Hu, Jing Cao, Dinesh Jayaraman, Leslie Pack Kaelbling, Tomás Lozano-Pérez
Abstract
We present TiPToP, an extensible modular system that combines pretrained vision foundation models with an existing Task and Motion Planner (TAMP) to solve multi-step manipulation tasks directly from input RGB images and natural-language instructions. Our system aims to be simple and easy-to-use: it can be installed and run on a standard DROID setup in under one hour and adapted to new embodiments with minimal effort. We evaluate TiPToP -- which requires zero robot data -- over 28 tabletop manipulation tasks in simulation and the real world and find it matches or outperforms $π_{0.5}\text{-DROID}$, a vision-language-action (VLA) model fine-tuned on 350 hours of embodiment-specific demonstrations. TiPToP's modular architecture enables us to analyze the system's failure modes at the component level. We analyze results from an evaluation of 173 trials and identify directions for improvement. We release TiPToP open-source to further research on modular manipulation systems and tighter integration between learning and planning. Project website and code: https://tiptop-robot.github.io