ForestMamba: Sparse Mamba with Geometry-guided Queries for 3D Forest Point Cloud Segmentation

2026-06-01Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors developed ForestMamba, a new AI method for analyzing complex 3D forest data captured by drones and ground-based scanners. Their approach uses forest-specific knowledge to better identify trees and canopy structures, improving accuracy compared to previous methods. They designed special ways to organize the data and generate meaningful starting points for the analysis, making the process faster and less memory-heavy. Tests in multiple forest areas showed that ForestMamba works better and more efficiently than current Transformer-based techniques.

LiDARsemantic segmentationinstance segmentation3D point cloudsTransformer modelsstate-space modelingCanopy Height Model (CHM)Farthest Point Sampling (FPS)k-nearest neighbors (kNN)GPU memory efficiency
Authors
Trung Thanh Nguyen, Tuan-Anh Vu, Duc Viet Le, Yasutomo Kawanishi, Takahiro Komamizu, Ichiro Ide, Teja Kattenborn
Abstract
AI-based semantic and instance segmentation of terrestrial and drone LiDAR point clouds is emerging as a transformative approach for converting the complex 3D structure of forests into actionable information for forest monitoring and biodiversity assessment. However, forest LiDAR scenes remain highly challenging due to their large data volumes, irregular sampling density, overlapping and complex canopy structure, and geographic variability. Existing methods based on sparse convolutions or Transformers achieve promising results, but suffer from two key limitations: Quadratic complexity of attention scales poorly to large forest scenes, and Generic context modeling does not exploit forest structural priors, limiting tree separation in complex regions. To address these challenges, we propose ForestMamba, a structure-aware method that incorporates forest-specific priors into feature encoding, query generation, and query refinement, while replacing quadratic attention with linear-time state-space modeling. First, we introduce a sparse encoder with vertical-priority slab serialization that organizes sparse voxels into vertically coherent sequences for efficient long-range context modeling. Second, we propose a geometry-guided query initialization strategy based on an on-the-fly multi-scale Canopy Height Model (CHM), where canopy maxima provide ecologically meaningful query seeds, supplemented by Farthest Point Sampling (FPS) to cover understory trees. Third, we design a Mamba-based query decoder that combines local kNN voxel aggregation with a spatial dual-path Mamba for query refinement with linear computational complexity. Extensive experiments across seven forest regions demonstrate that ForestMamba consistently outperforms existing baselines in both segmentation tasks, while achieving 3 times faster inference and 2.3 times lower GPU memory than Transformer-based methods.