SFR-Net: Learning Scale-Frustum Representations for Ultra-Wide Area Remote Sensing Image Segmentation
2026-05-25 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors focus on a new problem in remote sensing called ultra-wide area (UWA) image segmentation, which involves very large images covering vast regions with objects of many different sizes. They propose a method called SFR-Net that models objects at different scales using a novel 'scale-frustum' concept inspired by how images are captured at different altitudes. Their method also combines information across scales to better understand both small details and large context. Tests show this approach improves accuracy on standard remote sensing datasets and can also boost other segmentation methods.
Remote sensingImage segmentationScale variationContextual semantic continuityUltra-wide area imagesmIoU (mean Intersection over Union)Scale-frustum representationCross-scale fusionConvolutional neural networksGeographical coverage
Authors
Chuyu Zhong, Keyan Chen, Qinzhe Yang, Bowen Chen, Zhengxia Zou, Zhenwei Shi
Abstract
Pixel count and geographical coverage are two key characteristics of remote sensing images. Existing remote sensing image segmentation methods typically focus on images with either a small pixel count or a large pixel count but limited geographical coverage. In this paper, we introduce a novel segmentation task targeting ultra-wide area (UWA) remote sensing images, characterized by both a large pixel count and extremely wide geographical coverage. The core challenges of UWA segmentation lie in simultaneously handling ground objects with significantly varying scales and maintaining long-range contextual semantic continuity. To address these challenges, we propose the Scale-Frustum Representation Network (SFR-Net). Inspired by the viewing frustums of remote sensing images captured from different altitudes, we construct scale-frustum representations, enabling unified modeling of ground objects and contextual features at different scales. Furthermore, we design a cascaded cross-scale fusion mechanism to effectively integrate these representations, enhancing local semantic understanding while ensuring long-range contextual continuity. Experimental results on GID and FBPS demonstrate that SFR-Net achieves state-of-the-art performance, improving mIoU by 1.72% and 4.29%, respectively, over the strongest competing methods. In addition, the proposed scale-frustum representations can be integrated into generic segmentation networks to improve both segmentation accuracy and convergence speed. The implementation code will be publicly available at https://github.com/ChuyuZhong/SFR-Net.