Flow6D: Discrete-to-Continuous Flow Matching for Efficient and Accurate Category-Level 6D Pose Estimation
2026-06-22 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionRobotics
AI summaryⓘ
The authors address the problem of estimating the 3D position and orientation of objects, which is important for tasks like robot control and augmented reality. They introduce Flow6D, a new method that breaks down the problem into two steps: first, it narrows down the possible poses by grouping them into bins, and then it fine-tunes the estimate continuously within those bins. This approach reduces complexity and improves accuracy, enabling fast and precise pose estimation even for moving parts of objects. Their method works faster and better than previous ones on both fake and real data.
6D pose estimationcomputer visionembodied AIrotation and translationdiscrete latent spaceflow matchingpose regressionarticulated objectsreal-time inferencerobotic manipulation
Authors
Mingyu Mei, Li Zhang, Zibo Dai, Han Sun, Xinyue Zhao, Huiliang Shen, Zaixing He
Abstract
6D pose estimation is a key task in computer vision and embodied AI, widely used in robotic manipulation, augmented reality, etc. Existing methods directly regress in a high-dimensional continuous space, facing two key challenges in category-level pose estimation: limited accuracy due to noise and local optima, and inefficient search over an infinite space that hinders real-time performance. This paper proposes Flow6D, a hierarchical flow matching framework with a two-stage discrete latent space localization-continuous pose regression strategy. Rotation and translation parameters are first discretized into bins, with a discrete flow matching model locking the latent space around the true pose to reduce search complexity. Then, by sampling in the latent space, a continuous flow matching model predicts local pose residuals to optimize the estimate and regress to an accurate pose. The framework also naturally extends to articulated objects, outperforming state-of-the-art methods on synthetic and real datasets with real-time inference at 70 FPS. Project website: https://flow6d.github.io/.