CARTE: A Benchmark for Mapping Language Model Knowledge Across France
2026-06-01 • Computation and Language
Computation and Language
AI summaryⓘ
The authors created CARTE 1, a test to see how well large language models (LLMs) understand detailed knowledge about different regions in France. Unlike tests that look at whole countries, this one checks if models can tell apart close regional differences in things like culture and language. They made over 2,400 questions covering 13 parts of France and various topics. Testing 27 different models showed that some regions and smaller models are harder for LLMs to understand well, pointing to gaps in the data these models learn from.
large language modelsbenchmarkregional variationgeographical knowledgefew-shot learningcultural knowledgelinguistic variationFrancemodel evaluation
Authors
Sarah Almeida Carneiro, Christos Xypolopoulos, Xiao Fei, Yang Zhang, Michalis Vazirgiannis
Abstract
We introduce CARTE 1 (Culturally Anchored Regional-Territorial Evaluation), a multiplechoice benchmark for evaluating the ability of large language models (LLMs) to perform fine-grained reasoning over geographically grounded and regionally differentiated knowledge within France. While prior benchmarks focus on national-level cultural understanding, they largely overlook intra-country variation and the need to distinguish between closely related regional contexts. CARTE addresses this gap by introducing 2,431 questions spanning the 13 metropolitan regions of France and covering 14 thematic domains, including culture, language, demographics, economy, environment, and mobility. We further introduce CARTE-LV, a subset targeting Linguistic Variation across French regions, enabling focused evaluation of language-related differences. We evaluate 27 LLMs ranging from 1B to 12B parameters under few-shot settings. Our experiments reveal performance disparities across regions and model scales, suggesting systematic gaps in pretraining coverage and limited robustness to intra-national variation.