Light or Full Verb? A Minimal-Pair Dataset for Probing Phraseological Competence in Language Models

2026-06-03 • Computation and Language

Computation and Language

AI summaryⓘ

The authors studied how language models understand common English verbs that can be used in two ways: as 'light verbs' (like in 'make a decision') or as full verbs ('make a cake'). They created a large dataset of similar sentences where the same verb is used in both ways. Their tests showed that language models can tell the difference between these uses, even in simple sentences. They also shared their dataset and tools for others to use and expand upon.

light-verb constructionfull lexical predicatelanguage modelsEnglish verbsprobing experimentsdatasetcollocatesminimal contextsnatural language processingverb usage

Authors

Francesca Franzon, Nicolas Rosàs Gómez, Leo Wanner

Abstract

Frequent English verbs such as 'have' and 'make' can function either as collocates in light-verb constructions or as full lexical predicates, as in 'make a decision' vs. 'make a cake'. Whether language models represent this distinction remains unclear. We introduce a large-scale controlled dataset of minimally varying English sentence series in which the same context contains the same verb in light-verb and full-verb uses. Two probing experiments show that language models differentiate between these uses even in minimal contexts and exhibit separable patterns across object types. We release the dataset, generation code, and materials as a reusable resource. The framework supports extensions to broader contexts, additional verbs, and other languages.

View PDFOpen arXiv