Comparing Human and Automatic Recognition of Dutch Dysarthric Continuous Speech: A Case Study

2026-06-29Computation and Language

Computation and Language
AI summary

The authors compared how well humans and three advanced speech recognition systems understand Dutch speech from a person with severe dysarthria, a speech disorder. Both humans and machines made lots of mistakes (over 70% word errors) when trying to understand the speech. When the authors trained the machines specifically on this person's speech, errors dropped significantly, and the machines even did better than humans, though mistakes were still common. They suggest future work to improve these personalized systems, especially for spontaneous and longer speech.

dysarthriaspeech recognitionword error rateautomatic speech recognition (ASR)personalized modelsfine-tuningspontaneous speechread speechphonemes
Authors
Yuanyuan Zhang, Dimme de Groot, Jorge Martinez, Odette Scharenborg
Abstract
In our goal to develop personalised dysarthric speech recognition (DSR) models, this study compared the recognition performances of human listeners and those of three state-of-the-art, off-the-shelf ASR systems (Whisper-large-V3, Google Chirp 3, and Omnilingual) on the recognition of Dutch continuous read and spontaneous speech from a single speaker with severe dysarthria. Results showed that both humans listeners and the three off-the-shelf ASR systems exhibit word error rates (WER) exceeding 70% on average, indicating that DSR is highly challenging for both humans and ASR systems. Fine-tuning on the dysarthric speech significantly reduced WER. Although overall WERs are still quite high (>23%), the personalised DSR models outperformed the human listeners, and performance is getting closer to being useful for supporting day-to-day communication of dysarthric speakers. Future research should focus on improving personalized DSR on spontaneous speech and longer utterances in the case of read speech, with a specific focus on particular phonemes.