Local-GS: Accelerating 3D Gaussian Splatting via Tile-Local Warp Coherence

2026-06-15Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address a problem in 3D Gaussian Splatting (3DGS), a method that creates images by using many small 3D shapes called Gaussians, which can be inefficient on GPUs. They introduce Local-GS, a new way to organize these shapes to better match how GPUs work by grouping tasks more evenly and avoiding wasted work. This method has three steps that help skip unnecessary calculations and make rendering smoother and faster. Tests show their approach speeds up rendering a lot without losing image quality, and it can improve existing methods easily.

3D Gaussian SplattingGPU utilizationSIMTwarp divergencenovel view synthesisparallel renderingefficient GPU renderinganisotropic GaussiansDeep Blendingwarp-coherent rendering
Authors
Yang Luo, Yan Gong, Yongsheng Gao, Jie Zhao, Xinyu Zhang, Huaping Liu
Abstract
3D Gaussian Splatting (3DGS) has significantly advanced real-time novel view synthesis by representing scenes as dense collections of anisotropic 3D Gaussian primitives. However, the irregular spatial distribution of Gaussians often leads to poor GPU utilization, as warp divergence and redundant computation degrade rendering performance. To address this, we present Local-GS, a warp-coherent rendering paradigm that, organizes Gaussian primitives with respect to SIMT (Single Instruction, Multiple Threads) execution boundaries rather than scene geometry. Specifically, we propose three warp-coherent stages: a hoisting stage that precomputes shared parameters at tile level, a culling stage that discards warps with no contribution, and a blending stage that replaces per-pixel branching with a uniform instruction stream. Across extensive benchmarks on multiple datasets, Local-GS improves efficiency without compromising quality. As a plug-and-play optimization, it provides additional performance gains to all tested baselines, culminating in a $7.76\times$ speedup on Deep Blending scenes.