ZipSplat: Fewer Gaussians, Better Splats

2026-06-03Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors present ZipSplat, a new method to recreate 3D scenes from images more efficiently. Unlike older methods that assign one 3D Gaussian per pixel, ZipSplat groups similar visual information into fewer tokens, which are then turned into 3D Gaussians at flexible positions. This approach reduces the amount of data needed while maintaining or improving quality, even without knowing the exact camera details. Their model works well on several benchmarks and can adapt to different scenarios without retraining.

3D Gaussian Splattingfeed-forward modelvisual tokensk-means clusteringcross-attentionself-attentionPSNRpose-free reconstructionMip-NeRF360ScanNet++
Authors
Alexander Veicht, Sunghwan Hong, Dániel Baráth, Marc Pollefeys
Abstract
Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured object thus produce equally many Gaussians despite very different geometric needs. We propose ZipSplat, a token-based feed-forward model that decouples Gaussian placement from the pixel grid. A multi-view backbone extracts dense visual tokens, and k-means clustering compresses them into a compact set of scene tokens. Cross- and self-attention refine these tokens, and a lightweight MLP decodes each into a group of Gaussians with unconstrained 3D positions. Because clustering is applied at inference, a single trained model spans the quality-efficiency curve without retraining. ZipSplat operates without ground-truth poses or intrinsics, yet sets a new state of the art on DL3DV and RealEstate10K with ${\sim}6{\times}$ fewer Gaussians than pixel-aligned methods, surpassing the best pose-free baseline by 2.1dB and 1.2dB PSNR, respectively. It further generalizes zero-shot to Mip-NeRF360 and ScanNet++, outperforming all comparable baselines. Our project page is at ${\href{https://veichta.com/zipsplat}{https://veichta.com/zipsplat}}$.