AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models

2026-04-09 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors created AtlasOCR, the first open-source tool to recognize written Darija, a Moroccan Arabic dialect, from images. They gathered special data by making fake samples with a tool called OCRSmith and also used real-world text images. They improved a big language-vision model by fine-tuning it efficiently with methods named QLoRA and Unsloth. Their tests show AtlasOCR works very well on Darija and standard Arabic text compared to other models. This helps because there were no good tools for reading Darija text before.

DarijaOptical Character Recognition (OCR)Vision Language Model (VLM)Fine-tuningSynthetic Data GenerationQLoRAUnslothAtlasOCRBenchKITAB-BenchParameter-efficient Training

Authors

Imane Momayiz, Soufiane Ait Elaouad, Abdeljalil Elmajjodi, Haitame Bouanane

Abstract

Darija, the Moroccan Arabic dialect, is rich in visual content yet lacks specialized Optical Character Recognition (OCR) tools. This paper introduces AtlasOCR, the first open-source Darija OCR model built by fine-tuning a 3B parameter Vision Language Model (VLM). We detail our comprehensive approach, from curating a unique Darija-specific dataset leveraging both synthetic generation with our OCRSmith library and carefully sourced real-world data, to implementing efficient fine-tuning strategies. We utilize QLoRA and Unsloth for parameter-efficient training of Qwen2.5-VL 3B and present comprehensive ablation studies optimizing key hyperparameters. Our evaluation on the newly curated AtlasOCRBench and the established KITAB-Bench demonstrates state-of-the-art performance, challenging larger models and highlighting AtlasOCR's robustness and generalization capabilities for both Darija and standard Arabic OCR tasks.

View PDFOpen arXiv