PolyBuild: An End-to-End Method for Polygonal Building Contour Extraction from High-Resolution Remote Sensing Images

2026-06-08Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence
AI summary

The authors developed a new method called PolyBuild to draw building outlines directly from high-resolution satellite images without extra editing steps. Their approach uses one part to find initial building shapes and another part to make these shapes more accurate by combining local image details with overall shape information. This method uses a mix of CNN and Transformer technologies to better understand building boundaries. They tested PolyBuild on several datasets and found it works better than other leading methods. Overall, it simplifies and improves the process of detecting building outlines in images.

building extractionremote sensingpolygon contourConvolutional Neural Network (CNN)Transformerobject detectionvector polygonsimage segmentationspatial relationshipsbounding box
Authors
Yaoteng Zhang, Julin Zhang, Guangshuai Wang, Jiwei Deng, Hui Sheng, Yasir Muhammad, Shiqing Wei
Abstract
Extracting building polygon contours from high-resolution remote sensing images is a fundamental task for various mapping applications. However, the presence of varying imaging conditions and complex building structures, makes automatic contour extraction extremely challenging. Mainstream approaches for building extraction often rely on pixel-level segmentation followed by multiple post-processing steps to produce building contour, which can be computationally intensive and prone to errors. In this paper, we propose an end-to-end method named PolyBuild, which can directly extract building vector polygons from high-resolution remote sensing images without the need for any post-processing operations. The proposed method leverages two primary modules: an Initial Contour Generation Module (ICGM) and a Contour Optimization Module (COM). The ICGM is designed to generate an initial building contour by utilizing concatenated sub-region center features for each building instance. It performs simultaneous object detection and initial contour extraction by generating bounding boxes and using the center features of four sub-regions to represent each building. The Contour Optimization Module (COM) further refines the generated building contours by iteratively integrating Convolutional Neural Network (CNN) features and contour positional information in a Transformer-based decoder. The hybrid CNN-Transformer architecture effectively captures both local and global spatial relationships within the building contour, ensuring high-quality boundary delineation. Extensive experiments are conducted on three building datasets to evaluate the performance of PolyBuild. The results demonstrate that PolyBuild significantly outperforms state-of-the-art methods, including mask-based and contour-based approaches.