LALE: Lightweight-Transformer Architecture for Land-Cover Estimation

2026-06-01 • Artificial Intelligence

Artificial IntelligenceComputer Vision and Pattern Recognition

AI summaryⓘ

The authors developed LALE, a new method for segmenting remote sensing images that balances catching big-picture context and fine details without using too much computing power. Their approach splits the job: lightweight convolutions handle detailed, high-resolution parts, while transformers focus on broader, low-resolution features, making it efficient. They also designed a simple decoder and used special normalization and activation techniques to keep the model small and fast. On a large benchmark, LALE performs nearly as well as top models but uses far fewer resources, showing a strong balance between speed, size, and accuracy.

semantic segmentationremote sensing imagerytransformerConvMixerself-attentionMLP decoderRMSNormStarReLUARAS400k benchmarkmodel efficiency

Authors

Ümit Mert Çağlar, Alptekin Temizel

Abstract

Semantic segmentation of remote sensing imagery requires models that capture both global context and local detail under tight computational budgets. Prior work typically optimizes for one of these axes: attention for global context, convolution for local detail, or compactness for efficiency. While hybrid approaches aim to capture both, they require architectural changes and encoder backbones with computational overhead, limiting efficiency and performance. We present LALE (Lightweight-transformer Architecture for Land-cover Estimation), an end-to-end remote sensing image segmentation architecture, that bifurcates its encoder by resolution: lightweight ConvMixer stages handle high-resolution local features, while transformer stages handle low-resolution global context, confining the quadratic cost of self-attention to deep, downsampled feature maps. An all-MLP multi-scale decoder, together with RMSNorm and StarReLU throughout, further reduces compute and parameter count. On the large-scale ARAS400k remote-sensing segmentation benchmark, LALE establishes a strong efficiency-performance trade-off against CNN, transformer, and hybrid baselines. Our smallest variant, (just 1.6M parameters), reaches within 2.6 F1 points of the best baseline (UPerNet) while using 4.5x fewer parameters, 7x less storage, 17x fewer GMACs, and delivering 1.8x higher throughput.

View PDFOpen arXiv