TalkTag: Fine-Grained Morphosyntactic Error Annotation for Transcribed Speech

2026-06-01Computation and Language

Computation and Language
AI summary

The authors created TalkTag, a tool that uses a large language model to automatically find language errors in children's spoken stories. It helps with detailed language error checking, which usually needs experts and a lot of time. Their tool works even with very little training data and can spot tricky cases where it's hard to decide if there's an error. This makes error annotation easier and more scalable.

morphosyntactic errorlanguage annotationlarge language model (LLM)CHAT formatspoken-language transcriptlow-resource settingchildren's narrativesnatural language processingautomated tagginglinguistic ambiguity
Authors
Shamira Venturini, Oliver Hennhöfer, Steffen Kinkel, Jannik Strötgen
Abstract
Fine-grained morphosyntactic error annotation is important in clinical and developmental language research, yet it is labour-intensive, expert-dependent, and difficult to scale. We present TalkTag, an LLM-based lightweight tool fine-tuned to automate CHAT-style error annotation in spoken-language transcripts. Developed under conditions of extreme data scarcity using children's narrative data, the system shows the feasibility of linguistic analysis in low-resource settings. Our evaluation demonstrates that TalkTag produces encouragingly precise annotation while effectively identifying instances where linguistic ambiguity makes automated tagging genuinely complex. In summary, with TalkTag, we provide a scalable alternative to manual error annotation and practically viable support for morphosyntactic error annotation.