AdaptSplat: Adapting Vision Foundation Models for Feed-Forward 3D Gaussian Splatting

2026-05-11Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors present AdaptSplat, a simple and lightweight addition to 3D Gaussian Splatting models that improves their performance. Instead of complex changes, they add a small adapter with 1.5 million parameters that helps the model keep details in textures and shapes, especially sharp edges. This adapter uses high-frequency information from a strong vision model and blends it into the 3D reconstruction process to fix the blurring effect common in deep learning. Their experiments show that AdaptSplat works better than previous methods and generalizes well across different types of data.

3D Gaussian SplattingFeed-forward reconstructionHigh-frequency structural priorsVision foundation modelsFrequency-preserving adapterPositional encodingResidual modulationCross-domain generalizationHigh-frequency attenuationOver-smoothing
Authors
Mingwei Xing, Xinliang Wang, Yifeng Shi
Abstract
This work explores a simple yet powerful lightweight adapter design for feed-forward 3D Gaussian Splatting (3DGS). Existing methods typically apply complex, architecture-specific designs on top of the generic pipeline of image feature extraction $\rightarrow$ multi-view interaction $\rightarrow$ feature decoding. However, constrained by the scale bottleneck of 3D training data and the low-pass filtering effect of deep networks, these methods still fall short in cross-domain generalization and high-frequency geometric fidelity. To address these problems, we propose AdaptSplat, which demonstrates that without complex component engineering, introducing a single adapter of only 1.5M parameters into the generic architecture is sufficient to achieve superior performance. Specifically, we design a lightweight Frequency-Preserving Adapter (FPA) that extracts direction-aware high-frequency structural priors from the shallow features of a powerful vision foundation model backbone, and seamlessly integrates them into the generic pipeline via high-frequency positional encodings and adaptive residual modulation. This effectively compensates for the high-frequency attenuation caused by over-smoothing in deep features, improving the fitting accuracy of Gaussian primitives on complex surfaces and sharp boundaries. Extensive experiments demonstrate that AdaptSplat achieves state-of-the-art feed-forward reconstruction performance on multiple standard benchmarks, with stable generalization across domains. Code available at: https://github.com/xmw666/AdaptSplat.