Z-FLoc: Zero-Shot Floorplan Localization via Geometric Primitives

2026-06-03Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionRobotics
AI summary

The authors study how to figure out where a camera is inside a building using simple floorplans. Instead of relying on lots of training data, they use basic shapes like lines and circles found in buildings to match camera views to the floorplans. Their method works without extra training and does better than other methods on new buildings. This approach uses a 3D view from the camera to compare with the floorplan's shapes.

Visual LocalizationCamera Pose EstimationFloorplanDomain GapGeometric PrimitivesBird's-Eye-View ProjectionMonocular 3D ReconstructionMinimal SolversRobust EstimationZero-shot Learning
Authors
Ayumi Umemura, Toshinori Kuwahara, Marc Pollefeys, Daniel Barath
Abstract
Visual localization -- estimating a camera pose within a pre-existing map -- is a fundamental problem in computer vision. Floorplans are an attractive map representation: they are readily available for most buildings, compact, and inherently invariant to visual appearance changes. However, bridging the severe domain gap between camera observations and floorplan geometry remains challenging. Existing methods address this gap through data-driven learning, yet they require large-scale training data and environment-specific retraining, limiting their practical deployment. We propose a zero-shot floorplan localization method that generalizes to novel environments without any retraining. Our key insight is that dominant geometric primitives -- lines and circles -- are ubiquitous in human-made environments and provide appearance-invariant structural constraints. We extract these primitives from a bird's-eye-view (BEV) projection of monocular 3D reconstructions and match them to the floorplan via dedicated minimal solvers within a robust estimation framework. Experiments on both simulated and real-world datasets show that our approach outperforms state-of-the-art learning-based methods on unseen environments, while using a single fixed set of hyperparameters across all experiments. The source code will be made publicly available.