EfficientSign: An Attention-Enhanced Lightweight Architecture for Indian Sign Language Recognition
2026-04-09 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionMachine Learning
AI summaryⓘ
The authors created EfficientSign, a small and fast sign language recognition model for phones, based on EfficientNet-B0 with attention mechanisms that help focus on important parts of the hand gestures. They tested it on Indian Sign Language letters and found it was just as accurate as a much larger model but with fewer parameters. They also showed that classical machine learning methods using features from the model did really well compared to older approaches. Overall, their work suggests attention-based models can recognize sign language efficiently without needing complicated features or huge models.
EfficientNet-B0Squeeze-and-Excitationspatial attentionIndian Sign Languagecross-validationResNet18SVMLogistic RegressionK-Nearest Neighbors (KNN)deep features
Authors
Rishabh Gupta, Shravya R. Nalla
Abstract
How do you build a sign language recognizer that works on a phone? That question drove this work. We built EfficientSign, a lightweight model which takes EfficientNet-B0 and focuses on two attention modules (Squeeze-and-Excitation for channel focus, and a spatial attention layer that focuses on the hand gestures). We tested it against five other approaches on 12,637 images of Indian Sign Language alphabets, all 26 classes, using 5-fold cross-validation. EfficientSign achieves the accuracy of 99.94% (+/-0.05%), which matches the performance of ResNet18's 99.97% accuracy, but with 62% fewer parameters (4.2M vs 11.2M). We also experimented with feeding deep features (1,280-dimensional vectors pulled from EfficientNet-B0's pooling layer) into classical classifiers. SVM achieved the accuracy of 99.63%, Logistic Regression achieved the accuracy of 99.03% and KNN achieved accuracy of 96.33%. All of these blow past the 92% that SURF-based methods managed on a similar dataset back in 2015. Our results show that attention-enhanced learning model provides an efficient and deployable solution for ISL recognition without requiring a massive model or hand-tuned feature pipelines anymore.