SEA-NLI: Natural Language Inference as a Lens into Southeast Asian Cultural Understanding

2026-06-02 • Computation and Language

Computation and Language

AI summaryⓘ

The authors found that big language models work well with Western cultures but struggle with Southeast Asian (SEA) cultures. They made a new test called SEA-NLI that checks how well models understand SEA cultural knowledge using questions in English and local SEA languages. When they tested 17 different models, performance was low, especially on topics needing cultural or specialized knowledge. They also showed that models adapted to SEA culture or given culture-aware hints did better, but other prompting methods helped less.

Large Language ModelsNatural Language InferenceSoutheast AsiaCultural KnowledgeBenchmarkingMultilingual NLPModel AdaptationPrompt Engineering

Authors

Peerawat Chomphooyod, Jian Gang Ngui, Yosephine Susanto, Attapol T. Rutherford, Alham Fikri Aji, Sarana Nutanong, Can Udomcharoenchaikit, Peerat Limkonchotiwat

Abstract

Frontier LLMs perform well in Western contexts, but remain poorly tested on underrepresented cultures such as those in Southeast Asia (SEA). Existing NLI benchmarks are largely Western-centric, translation-derived, or monolingual, limiting their ability to measure culturally grounded reasoning. We introduce SEA-NLI, a native, culturally grounded NLI benchmark covering eight SEA countries in English and native regional languages, verified by native speakers. Across 17 encoder and decoder models, we observe a low performance from all models, especially for knowledge-intensive categories such as Languages and Science and Technology. Our analysis shows that failure cases mainly stem from missing SEA cultural knowledge: SEA-adapted models and culture-aware prompting improve performance, while CoT prompting offers limited gains.

View PDFOpen arXiv