Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation

2026-06-22 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionGraphics

AI summaryⓘ

The authors present Arbor, a new tool that helps 3D models follow specific space rules when generating objects. Instead of just using text or images, Arbor uses special 3D shapes called constraint meshes to tell the model where an object should be, where it should avoid, and where it should touch. These constraints guide the model during generation without being exact blueprints, allowing more precise control of object placement. Tests show Arbor improves how well these space rules are followed while keeping the generated objects varied and high quality.

3D generationlatent spaceconstraint meshestext-conditioned modelsdenoiserspatial control3D modelingtokenizationmesh constraintsobject generation

Authors

Jan-Niklas Dihlmann, Andreas Engelhardt, Simon Donne, Hendrik P. A. Lensch, Mark Boss

Abstract

Text and image conditioned 3D models now generate convincing assets, but they still offer little direct control over the space an object should occupy or avoid. In authoring, this spatial intent is often known before generation starts. A chair should fit a seating envelope, a prop should leave clearance for motion, or a part should expose a contact surface. Prompts and image views are poor carriers for such constraints, requiring the need for an explicit control interface. We present Arbor, a trainable attachment for text conditioned latent 3D generation. Arbor introduces constraint meshes as a native 3D control interface. The interface uses hull regions where geometry should exist, avoidance regions that should remain empty, and touch regions the object should contact. Unlike completion or whole object scaffold control, these meshes are not target evidence. They are local typed requirements and can include regions where no surface should appear. Arbor keeps this signal as geometry by converting constraint meshes into tokens and learning a routed attachment inside a frozen denoiser. Each latent region can therefore receive the part of the constraint that matters for its spatial location. We evaluate Arbor on automatic and artist curated control benchmarks with hull, avoidance, and touch constraints, and compare the metric trends to a user preference study. Even without dedicated compliance losses, Arbor improves constraint obedience while preserving object quality and variation under fixed constraints.

View PDFOpen arXiv