OMGTex: One-stage Multi-style Facial Texture Reconstruction without Geometry Guidance

2026-05-25 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors propose OMGTex, a new method that creates detailed and editable face textures from different styled images without relying on 3D geometry, which is usually hard to estimate. They introduce a way to fix texture alignment problems during image generation and design a training approach that helps the model understand and edit specific facial regions. To support this, they created CANVAS, a dataset with a variety of realistic and stylized face textures. Their approach is robust, works across diverse image styles, and achieves top results on several benchmarks.

facial UV texturediffusion models3D geometry priorstexture reconstructionsemantic disentanglementgradient-guided refinementmulti-style imagestexture editingCANVAS datasetstyle transfer

Authors

Zitong Xiao, Yuda Qiu, Zisheng Ye, Xiaoguang Han

Abstract

We propose OMGTex, an end-to-end diffusion-based framework for reconstructing high-quality and editable facial UV textures from multi-style facial images. Existing texture reconstruction methods face two major limitations: (1) Fragility due to reliance on 3D geometry priors, which are difficult to estimate accurately, especially under facial occlusions or in stylized domains; and (2) A lack of semantic disentanglement, inhibiting region-specific texture editing and style transfer. Our work addresses both challenges simultaneously. Our core innovation is a geometry-free pipeline that directly maps a 2D face image to its corresponding editable UV texture. We introduce two key techniques: First, to address the challenge of UV misalignment common in diffusion generation, we introduce a gradient-guided refinement strategy at inference time, which explicitly corrects structural consistency. Second, we leverage the inherent semantic distribution capability of diffusion models and design a novel training paradigm to enhance this tendency, enabling semantic-aware editing of facial texture. Furthermore, to address the data scarcity in multi-style texture reconstruction, we construct CANVAS, the first comprehensive paired texture reconstruction dataset covering realistic and diverse stylized domains. To the best of our knowledge, OMGTex is the first geometry-free inference framework that achieves robust, style-consistent, and editable facial texture reconstruction across diverse domains. Our method achieves state-of-the-art performance on multiple facial texture benchmarks.

View PDFOpen arXiv