Causally Evaluating the Learnability of Formal Language Tasks

2026-06-08 • Computation and Language

Computation and LanguageFormal Languages and Automata Theory

AI summaryⓘ

The authors explore how much data is needed for language models to learn specific tasks, but focus on controlled formal languages instead of natural language to avoid confusion. They show that usual methods relying on correlations can give the wrong answers because of hidden factors. To fix this, they introduce a new mathematical tool called the binning semiring to control how often certain features appear in data. Their experiments reveal that careful causal analysis is necessary to truly understand learnability, and warn against common mistakes in evaluating natural language tasks.

language modelsmulti-task learningformal languagesprobabilistic finite automatacorrelational analysiscausal inferencebinning semiringKullback-Leibler divergencelearnabilitycausal graphical model

Authors

Vésteinn Snæbjarnarson, Anej Svete, Josef Valvoda, Reda Boumasmoud, Brian DuSell, Ryan Cotterell

Abstract

Language models, as multi-task learners, acquire a wide range of abilities during training. A fundamental question is how much task-specific data is needed to learn a given task. Answering this for natural language is difficult: tasks are hard to delineate and can confound one another. To rigorously investigate the relationship between data frequency and learnability, we turn to a controlled setting using formal languages induced from probabilistic finite automata. These serve as a methodological testbed to demonstrate that standard correlational evaluation practices are inherently flawed. To enable causal analysis, we introduce the binning semiring, an algebraic object that lets us control how often a targeted property occurs in a sampled corpus. We formulate the experimental pipeline as a causal graphical model and derive decomposed Kullback-Leibler divergence metrics to measure the learnability of specific sub-tasks. Our experiments show that evaluating learnability without causal intervention leads to incorrect conclusions due to confounders in correlational analysis, and serve as a warning about correlational pitfalls in natural-language settings.

View PDFOpen arXiv