FaceMoE: Mixture of Experts for Low-Resolution Face Recognition

2026-06-30 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors address the problem of recognizing faces in very blurry or low-quality images, which is hard because such images lose important details. They propose FaceMoE, a new type of model that uses multiple specialized parts (experts) inside a transformer to focus on different face regions, helping it extract better features depending on image quality. Their design also helps the model remember what it learned from high-quality images while adapting to low-quality ones, without needing much extra computation. They tested FaceMoE on many datasets and found it outperforms existing methods for low-resolution face recognition.

Low-resolution face recognitionMixture of Experts (MoE)Transformer architectureFeature extractionTop-k routerFeed-forward network (FFN)Domain gapCatastrophic forgettingSparse activationLoad balancing loss

Authors

Kartik Narayan, Vishal M. Patel

Abstract

Low-resolution face recognition (LR-FR) remains a challenging task due to poor feature extraction and aggregation, as probe images often contain limited identity information resulting from extreme degradations such as blur, occlusion, and low contrast. Additionally, the domain gap between high-resolution (HR) gallery images and low-resolution (LR) probe images poses a significant challenge. A single feature encoder struggles to generalize effectively across both domains when fine-tuned on an LR dataset, and this issue is further magnified by catastrophic forgetting. To address these challenges, we propose FaceMoE, an effective adaptation of Mixture of Experts (MoE) transfomer architecture for low-resolution face-recognition . Specifically, we introduce multiple specialized feed-forward network (FFN) experts and incorporate a top-k router, which dynamically assigns tokens to appropriate experts. This design emergently promotes specialization across experts for different semantic regions of the face, which enables FaceMoE to perform resolution-aware feature extraction. Moreover, the top-k router facilitates sparse expert activation, enabling the model to preserve pretrained knowledge when finetuned on a LR dataset, while increasing model capacity without proportional computational overhead. FaceMoE is trained with a combined face recognition loss, router z-loss, and load balancing loss to ensure expert specialization and stable training. To the best of our knowledge, this is the first work leveraging MoE for LR-FR. Extensive experiments across eleven datasets, spanning HR, mixed-quality, and LR benchmarks, demonstrate that FaceMoE significantly outperforms state-of-the-art methods. Code: https://github.com/Kartik-3004/FaceMoE

View PDFOpen arXiv