KPGrasp: Scalable Keypoint Flow Matching for Dexterous Grasp Generation

2026-06-08Robotics

Robotics
AI summary

The authors developed KPGrasp, a new method to teach robots how to grip objects with dexterous hands more reliably. Instead of using complex contact-based rules or slow adjustments during testing, their approach learns from large datasets using a special 3D hand-keypoint setup combined with a Transformer model. This lets the robot understand how to place its fingers more naturally relative to the object, improving success rates significantly in simulations. Their method is also fast and works well on many objects in both virtual and real-world tests.

dexterous graspinghand keypointsTransformer modelflow matching3D object point cloudrobotic graspingSE(3) posepenetration depthsimulation benchmark
Authors
Yuansen Huang, Jiayi Chen, Haoran Liu, Yubin Ke, Bing Han, Jiangran Lyu, Mi Yan, Li Yi, He Wang
Abstract
Generating high-quality dexterous grasps remains challenging for learning-based methods, which often depend on carefully tuned contact losses or costly contact-based test-time refinement. We present KPGrasp, a flow-matching framework that learns dexterous grasp priors from large-scale data rather than relying on contact losses or contact-based test-time refinement. KPGrasp couples an all-Euclidean 3D hand-keypoint parameterization with a simple yet scalable Transformer flow model. The parameterization avoids the drawbacks of the conventional mixed SE(3) pose and joint-angle output space, expresses grasps in the same frame as the object point cloud, and thus enables native spatial reasoning; the Transformer flow model is trained with only the standard flow-matching loss and scales effectively with data, model capacity, and batch size. Experiments demonstrate state-of-the-art performance on two simulation benchmarks. On the Dexonomy benchmark, it reaches a 76.3% grasp success rate, improving over the strongest directly comparable baseline by 47.4% while reducing penetration depth to 2.4 mm. The same model also achieves the best average performance on the DexGrasp Anything benchmark without fine-tuning. For batched inference, KPGrasp requires only 0.032 s per grasp. Finally, real-world experiments on 20 diverse objects demonstrate that the pipeline can be deployed in a real-world setup.