PointTPA: Dynamic Network Parameter Adaptation for 3D Scene Understanding

2026-04-06Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address the difficulty of understanding complex 3D scenes made of point clouds, which have many different shapes and arrangements. They propose PointTPA, a method that adjusts parts of a neural network dynamically based on the specific input scene during testing, rather than using fixed parameters. This approach groups points into local patches and adapts the network weights for each patch, improving scene understanding with very few extra parameters. Their method outperforms other parameter-efficient techniques on standard benchmarks, showing better adaptability to diverse 3D scenes.

point cloud3D scene understandingtest-time adaptationparameter-efficient fine-tuningmIoUlocal patch groupingdynamic network parametersScanNet benchmarkneural network backbonefine-tuning
Authors
Siyuan Liu, Chaoqun Zheng, Xin Zhou, Tianrui Feng, Dingkang Liang, Xiang Bai
Abstract
Scene-level point cloud understanding remains challenging due to diverse geometries, imbalanced category distributions, and highly varied spatial layouts. Existing methods improve object-level performance but rely on static network parameters during inference, limiting their adaptability to dynamic scene data. We propose PointTPA, a Test-time Parameter Adaptation framework that generates input-aware network parameters for scene-level point clouds. PointTPA adopts a Serialization-based Neighborhood Grouping (SNG) to form locally coherent patches and a Dynamic Parameter Projector (DPP) to produce patch-wise adaptive weights, enabling the backbone to adjust its behavior according to scene-specific variations while maintaining a low parameter overhead. Integrated into the PTv3 structure, PointTPA demonstrates strong parameter efficiency by introducing two lightweight modules of less than 2% of the backbone's parameters. Despite this minimal parameter overhead, PointTPA achieves 78.4% mIoU on ScanNet validation, surpassing existing parameter-efficient fine-tuning (PEFT) methods across multiple benchmarks, highlighting the efficacy of our test-time dynamic network parameter adaptation mechanism in enhancing 3D scene understanding. The code is available at https://github.com/H-EmbodVis/PointTPA.