Supercharging Thermal Gaussian Splatting with Depth Estimation

2026-05-28 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors focus on making 3D scene models using only thermal infrared images instead of mixing different types of data like visible light and depth. They developed a method called Thermal-to-Depth Gaussian Splatting (TDg) that uses thermal images to build these models faster and with slightly better quality than a similar existing method. Their approach reduces training time by more than half while improving image quality metrics a bit. This technique can help in areas like search and rescue or machine monitoring where heat detection is important.

3D scene representationthermal infrared imagingGaussian splattingradiance fieldsdepth estimationnovel view synthesisLPIPSSSIMPSNRmultimodal data fusion

Authors

Manoj Biswanath, Chenxin Cai, Hannah Schieber, Daniel Roth, Benjamin Busam

Abstract

Efficient and robust 3D scene representation is crucial in autonomous driving, robotics, and related fields. While RGB images provide valuable content for 3D reconstruction, other modalities like thermal or depth can enable additional information on the environment. Lately, novel view synthesis methods like 3D Gaussian Splatting have started using multiple modalities to further boost their performance. But fusing or combining multimodal data can make the process slower and can bring in additional challenges. Therefore, our project aims to use single modality based on thermal infrared domain, by removing the reliance on visible light as much as possible. This single modality can be expected to be faster as it does not rely on multimodal data. We propose a method, Thermal-to-Depth Gaussian Splatting (TDg), that uses only thermal images and depth estimation in its architecture to derive the radiance fields. Our TDg method outperforms the MSMG (Multiple Single-Modal Gaussians) baseline in most cases on our test datasets, RGBT-Scenes and ThermalMix. On average, the rendering quality metrics such as learned perceptual image patch similarity (LPIPS), structural similarity index measure (SSIM), and peak signal-to-noise ratio (PSNR) of TDg are 1.12%, 0.034%, and 0.01% better than the baseline MSMG values. It also reduces the training time significantly, by 12 mins 47 secs (55% improvement). Overall, our method is successful in deriving these thermal radiance fields, which can ultimately have several applications, such as identifying heat sources critical in surveillance, search or rescue operations, and industrial inspections where temperature is widely used to monitor machines.

View PDFOpen arXiv