A Lightweight Hybrid Transformer-CRF Architecture for Multi-Type Bangla Medical Entity Recognition

2026-05-25Computation and Language

Computation and Language
AI summary

The authors created a system to identify medical terms in Bangla text, which is important for organizing medical information from writing. They first built a strong model using BanglaBERT and a special layer to detect exact medical terms. To make it easier to use on devices with limited power, they shrank this model into a smaller one by teaching it to mimic the bigger one. They also made the smaller model use less memory and run faster by applying a technique called quantization. As a result, their final model runs over eight times faster on a CPU and uses almost half the storage compared to the original model.

Medical Entity RecognitionBanglaBERTConditional Random FieldKnowledge DistillationQuantizationTransformer ModelsNatural Language ProcessingUnstructured Medical TextModel Compression
Authors
Peyal Saha, Ahsanul Haque Hasib, Shoumik Barman Polok
Abstract
MedER refers to the identification of medical entities. It is crucial for extracting structured clinical information from unstructured medical text. Many existing systems rely on transformer-based models, which are computationally expensive and difficult to deploy in resource-constrained environments. Furthermore, earlier works often use relaxed evaluation metrics that artificially inflate performance by rewarding correct prediction of dominant "Outside" (O) tokens. In this paper, we propose a lightweight Medical Entity Recognition (MedER) framework for the Bangla language. We establish a rigorous baseline using a 12-layer BanglaBERT model combined with a Conditional Random Field (CRF) layer for exact-boundary entity detection. To address deployment constraints, we compress this teacher model into a 4-layer student network through Knowledge Distillation (KD), where the student learns from the teacher's pre-CRF soft emission logits. Finally, we apply INT8 dynamic quantization to further reduce model size and inference cost. Our final quantized student achieves an 8.6x CPU speedup while requiring nearly 48 percent less storage than the CRF teacher model.