Little Brains, Big Feats: Exploring Compact Language Models

2026-06-29 • Computation and Language

Computation and LanguageArtificial Intelligence

AI summaryⓘ

The authors looked at how small language models work when used with a system that helps answer questions by pulling in information (called Retrieval-Augmented Generation or RAG). They tested these small models with various question types and topics using both public and private datasets. Their results show that small models can run directly on regular devices without needing powerful GPUs and still respond in a reasonable time. The authors also shared their code and materials online for others to use.

small language modelsRetrieval-Augmented GenerationRAGon-device NLPGPUopen-source datasetsproprietary datasetsquestion answeringlanguage model benchmarkingnatural language processing

Authors

Dari Baturova, Elena Bruches, Ivan Chernov, Roman Derunets, Arsenii Fomin, Andrey Kostin

Abstract

While large language models have been dominating the research landscape recently, small language models remain highly relevant across various domains; yet, they receive far less attention. In this study, we investigate how smaller language models perform during the generation stage within a Retrieval-Augmented Generation (RAG) system. To benchmark these models effectively, we utilised both open-source and proprietary datasets covering diverse subject areas and question types. Our findings demonstrate that a RAG system with small language models can be executed directly on-device without requiring any GPU hardware within a reasonable time. The experimental code and links to the supplementary materials can be accessed through the GitHub repository: https://github.com/SibNN/SLM-RAG-EVAL.

View PDFOpen arXiv