Pose Anything Anywhere:Model-free Object Poses from Arbitrary References

2026-06-22Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors introduced PANY, a new method that helps robots figure out the exact position and orientation of objects they've never seen before. Unlike older techniques that need detailed 3D models or work poorly when parts of the object are hidden, PANY uses multiple views and learns to understand the object's shape from different angles, even if they don't perfectly match up. It works with regular or depth cameras and can use several reference images without needing pose information. Tests showed PANY is more accurate than previous methods, especially when the robot has just one or a few photos to work from, making it reliable in real-world situations.

6D pose estimationmodel-free methodsRGB-D inputsmulti-view transformercross-view alignmentpose-graph registrationview-consistent geometryopen-world roboticsobject localizationpose accuracy
Authors
Hongli Xu, Jiaqi Hu, Junwen Huang, Boyang Zhong, Peter KT Yu, Nassir Navab, Benjamin Busam, Slobodan Ilic
Abstract
Estimating the 6D pose of unseen objects is a fundamental yet challenging problem for open-world robotics and embodied perception. Model-based methods are accurate but depend on CAD assets or heavy onboarding, while most model-free approaches are still limited to pairwise single-anchor matching and thus fail under occlusion and large viewpoint changes with low query-reference overlap. Therefore, we present PANY, a unified model-free framework that seamlessly supports both RGB and RGB-D inputs, operates on one or sparse pose-free reference views, and generalizes effectively to novel objects. Built on a multi-view transformer geometry backbone, PANY moves beyond pairwise matching by learning view-consistent geometry and cross-view alignment cues that remain stable under wide baselines and limited overlap. When additional unposed assist views are available, PANY aggregates them via pose-graph canonical registration to increase geometric coverage and reinforce the final pose. Extensive experiments show that PANY achieves state-of-the-art performance across multiple benchmarks, substantially outperforming existing model-free methods, improving pose accuracy by +12% on YCB-V and over +20% on LM-O. Furthermore, PANY consistently performs well under both single-reference and sparse-reference settings, demonstrating strong robustness in real-world environments.