HAFMat: Hybrid Priors Guided Adaptive Fusion for Single-Image Human Material Estimation
2026-06-15 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionGraphics
AI summaryⓘ
The authors present a method called HAFMat to figure out the materials of a person's clothing and skin from just one photo. Because lighting, shape, and shiny properties mix together in pictures, this is a tough problem. They use different types of helpful hints, like the person's shape, appearance, and pre-learned material info, then cleverly combine these clues at different stages to improve accuracy. Tests show their approach works better than others, especially for making new lighting effects on photos.
Physically Based Rendering (PBR)Material EstimationAppearance DecompositionSingle-Image AnalysisGuidance MapsMulti-layer Adaptive Feature FusionSemantic CuesTexture CuesRelightingDecoder Features
Authors
Yu Jiang, Jiahao Xia, Jiongming Qin, Jianchi Sun, Chunxia Xiao
Abstract
Physically based rendering (PBR) material estimation is a fundamental appearance decomposition task with broad applications in virtual content creation, relighting, and digital human rendering. However, estimating PBR materials from a single human image remains highly ill-posed, since illumination, geometry, and reflectance are heavily entangled in the observed appearance. To mitigate this ambiguity, we propose HAFMat, a hybrid-prior-guided framework for single-image human material estimation. Our method introduces guidance maps that encode complementary cues, including appearance, body geometry, structure, and prior material predictions from pre-trained models. A key observation is that these guidance cues are heterogeneous: some cues mainly provide texture-level constraints, while others convey higher-level semantic information. To exploit this property, we design a Multi-layer Adaptive Feature Fusion Mechanism, which adaptively fuses guidance features with decoder features at different stages. This design enables texture-dominant and semantic-dominant cues to guide material decoding at appropriate levels, leading to more accurate and physically plausible material estimation. Extensive experiments on both synthetic and real data demonstrate that our method achieves state-of-the-art performance in material estimation and downstream relighting.