Surpassing Scale by Efficiency: A Compact 135M Parameter Foundational LLM Natively Adapted for the Bangla Language

2026-06-15Computation and Language

Computation and Language
AI summary

The authors created a smaller and efficient language model called bangla-smollm-135m specifically for the Bangla script, which is hard to run on low-power devices. They combined techniques from other models to handle the unique writing system without messing up the model's learning. In tests, their model did as well or better than much bigger models despite having fewer parameters. This makes it useful for more accessible language technology in Bangla.

language modelBangla scriptdecoder-only modeltoken mergingzero-shot evaluationsubword fragmentationparameter efficiencylow-resource NLPfoundational modelmultitask benchmarks
Authors
Rabindra Nath Nandi
Abstract
While the NLP landscape is dominated by multi-billion parameter architectures, their deployment in low-resource, non-Latin scripts remains computationally prohibitive for edge configurations, mobile systems, and decentralized local hardware. This paper presents bangla-smollm-135m, a highly compact 135-million parameter decoder-only foundational model engineered explicitly for high-efficiency language modeling in the Bangla script. By leveraging a deterministic intersect-and-append token merging strategy between TituLLMs and SmolLM2-135M, the model overcomes subword script fragmentation without destabilizing early pretrained parameter states. In zero-shot multi-task benchmark evaluations (PIQA_bn, OpenBookQA_bn, CommonsenseQA_bn, and Bangla_MMLU), bangla-smollm-135m matches or outperforms models twice its size (Gemma-3-270m) and achieves parity with models in the 1B parameter tier. The model is available at rnnandi/bangla-smollm-135m