SplitAvatar: One-shot Head Avatar with Autoregressive Gaussian Splitting

2026-05-25Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors present a new way to create detailed 3D animated head models from just one image using a method called 3D Gaussian Splatting. They introduce a Graph splitting network that adds detail step-by-step to better capture facial expressions. To keep the model organized, they adjust how the parts connect and control the density of these details to avoid clutter. Their approach makes training faster and produces clearer, more accurate faces in 3D reconstructions.

3D Gaussian Splattinganisotropic GaussiansGraph Neural Network (GNN)autoregressive architecturemesh topologydensity controlgating mechanismfacial expression reconstructionimage-based 3D modeling3D Morphable Models (3DMM)
Authors
Hongzhe Liao, Chuhua Xian, Hongmin Cai, Haiyang Liu, Fa-Ting Hong
Abstract
3D Gaussian Splatting (3DGS) provides an efficient method for high-quality scene reconstruction using anisotropic Gaussians. Recently, 3DGS-based methods have significantly improved the rendering quality of human avatars while enabling real-time performance. However, existing methods suffer from a magnitude mismatch in the number of Gaussians generated by image-based and 3DMM-based approaches. This discrepancy results in reconstructed expressions that lack fine-grained detail. In this paper, we introduce a novel method for reconstructing an animatable head avatar from a single image. We propose a Graph splitting network to progressively generate Gaussians from coarse to fine using an autoregressive architecture. To address the graph inconsistency caused by split Gaussians, we employ a mesh topology extension method to align the GNN's connectivity with the increased Gaussian count. Furthermore, we introduce a novel density control method that includes a gating mechanism that generates soft masks for Gaussians, preventing over-densification after the splitting operation. This allows for dynamic control over Gaussian density across different facial regions. For smooth and rapid training, we employ a delayed filtering strategy to avoid re-computing the graph topology during training. Experimental results demonstrate that our autoregressive structure effectively improves expression representation ability by progressively splitting Gaussians. This process, enabled by the GNN-guided splitting, synthesizes more precise facial details and achieves higher reconstruction quality.