Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

2026-05-05 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning

AI summaryⓘ

The authors present HeadsUp, a method that creates detailed 3D models of heads using many camera images. Their approach compresses all the input pictures into a small, simple format and then turns that into 3D shapes based on a standard head template. This method works well even with lots of high-quality images and new faces the model hasn't seen before. They tested it on a large dataset with over 10,000 people and showed it produces high-quality results without needing extra tuning. The authors also demonstrate how their model can make new head designs and animate facial expressions.

3D reconstructionGaussian representationencoder-decoder architectureUV parameterizationmulti-view imaginglatent spacefeed-forward methodexpression blendshapesmodel generalizationscalability

Authors

Evangelos Ntavelis, Sean Wu, Mohamad Shahbazi, Fabio Maninchedda, Dmitry Kostiaev, Artem Sevastopolsky, Vittorio Megaro, Trevor Phillips, Alejandro Blumentals, Shridhar Ravikumar, Mehak Gupta, Reinhard Knothe, Jeronimo Bayer, Matthias Vestner, Simon Schaefer, Thomas Etterlin, Christian Zimmermann, Mathias Deschler, Peter Kaufmann, Stefan Brugger, Sebastian Martin, Brian Amberg, Tom Runia

Abstract

We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians from the number and resolution of input images, enabling training with many high-resolution input views. We train and evaluate our model on an internal dataset with more than 10,000 subjects, which is an order of magnitude larger than existing multi-view human head datasets. HeadsUp achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization. We extensively analyze the scaling behavior of our model across identities, views, and model capacity, revealing practical insights for quality-compute trade-offs. Finally, we highlight the strength of our latent space by showcasing two downstream applications: generating novel 3D identities and animating the 3D heads with expression blendshapes.

View PDFOpen arXiv