SP-MoMamba: Superpixel-driven Mixture of State Space Experts for Efficient Image Super-Resolution

2026-05-25 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors introduce SP-MoMamba, a method for improving single-image super-resolution by grouping pixels into meaningful regions called superpixels before processing. Instead of analyzing images as fixed grids, their approach uses these superpixels as basic units, preserving important spatial relationships. They also combine experts working at different scales and add a special component to keep fine details sharp. Experiments show their method improves image quality efficiently compared to other techniques.

State Space ModelsSingle-Image Super-ResolutionSuperpixelsGestalt Perceptual GroupingMixture of ExpertsMulti-Scale ModelingDynamic RoutingHigh-Frequency DetailsSpatial ModulationImage Reconstruction

Authors

Wenbin Zou, Yawen Cui, Yi Wang, Lap-Pui Chau, Liang Chen, Jinshan Pan, Huiping Zhuang, Guanbin Li

Abstract

State space models (SSMs) have emerged as a powerful paradigm for efficient single-image super-resolution (SR) due to their linear complexity and long-range modeling capabilities. However, existing Mamba-based methods typically rely on data-agnostic rigid scanning, which reshapes 2D images into 1D sequences over a fixed grid, inevitably disrupting spatial-semantic topology and introducing artifacts. Inspired by the \textbf{Gestalt perceptual grouping theory}, we propose \textbf{SP-MoMamba}, a superpixel-driven mixture of state space experts designed for content-aware SR. Our core idea is to transform the traditional rigid scanning into a \textbf{semantic-level interaction} by treating superpixels as fundamental units. Specifically, we introduce the \textbf{Superpixel-driven State Space Model (SP-SSM)}, which compresses semantically homogeneous regions into high-order tokens to preserve global topological consistency. To address the conflict between fixed scanning scales and diverse semantic granularities, we develop the \textbf{Multi-Scale Superpixel Mixture of State Space Experts (MSS-MoE)}. This module utilizes a dynamic routing mechanism to adaptively assign scale-specific experts, effectively capturing multi-scale textures while reducing computational redundancy. Furthermore, to prevent the loss of high-frequency details during global abstraction, we introduce a \textbf{Local Spatial Modulation Expert (LSME)} to complement the global modeling, ensuring a precise reconstruction of sharp edges and fine structures. Extensive experiments on standard benchmarks demonstrate that SP-MoMamba achieves superior reconstruction fidelity and a more favorable efficiency-performance trade-off compared to state-of-the-art efficient SR methods.

View PDFOpen arXiv