EndoVGGT: GNN-Enhanced Depth Estimation for Surgical 3D Reconstruction

2026-03-25Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors created a new method called EndoVGGT to build accurate 3D models of soft tissues during surgery, which is hard because tissues can be shiny, smooth, or blocked by instruments. Their approach uses a special tool named DeGAT that looks at the tissue features in a flexible way, helping the model understand parts of the tissue even when some are hidden or changing shape. Tests show their method is better than previous ones at making detailed and consistent 3D reconstructions and works well even on new types of data it hasn't seen before. Overall, the authors demonstrate that their dynamic way of connecting tissue information improves surgical 3D modeling.

3D reconstructionDeformable soft tissuesSurgical roboticsGraph attention networksDeformation-awareFeature-space semantic graphsOcclusionsNon-rigid deformationPSNRSSIM
Authors
Falong Fan, Yi Xie, Arnis Lektauers, Bo Liu, Jerzy Rozenblit
Abstract
Accurate 3D reconstruction of deformable soft tissues is essential for surgical robotic perception. However, low-texture surfaces, specular highlights, and instrument occlusions often fragment geometric continuity, posing a challenge for existing fixed-topology approaches. To address this, we propose EndoVGGT, a geometry-centric framework equipped with a Deformation-aware Graph Attention (DeGAT) module. Rather than using static spatial neighborhoods, DeGAT dynamically constructs feature-space semantic graphs to capture long-range correlations among coherent tissue regions. This enables robust propagation of structural cues across occlusions, enforcing global consistency and improving non-rigid deformation recovery. Extensive experiments on SCARED show that our method significantly improves fidelity, increasing PSNR by 24.6% and SSIM by 9.1% over prior state-of-the-art. Crucially, EndoVGGT exhibits strong zero-shot cross-dataset generalization to the unseen SCARED and EndoNeRF domains, confirming that DeGAT learns domain-agnostic geometric priors. These results highlight the efficacy of dynamic feature-space modeling for consistent surgical 3D reconstruction.