Cross-lingual Retrieval-Augmented Classification for Dysarthria Severity Assessment

2026-06-22Sound

SoundArtificial IntelligenceComputation and Language
AI summary

The authors developed a new method called CRAC to better assess how severe a speech problem called dysarthria is, even when there isn't much labeled data available in one language. Their approach uses speech data from another language by matching and combining information across languages to improve accuracy. They tested this method on Korean and Italian speech datasets and found it worked much better than methods using only one language. This shows their technique can effectively use cross-lingual data to help with speech disorder assessment.

dysarthriaspeech pathologycross-lingual learningcontrastive learningembedding spacevector databasecross-attentionspeech classificationpost-stroke speechALS speech dataset
Authors
Taeyoung Jeong, Insung Lee, Du-Seong Chang, Myoung-Wan Koo
Abstract
Automatic dysarthria severity assessment is limited by the scarcity of labeled pathological speech data. To address this, we propose Cross-lingual Retrieval-Augmented Classification (CRAC), which leverages speech from a different language via an align-retrieve-fuse pipeline. Supervised contrastive learning first shapes a severity-focused embedding space, then a vector database is built from the opposite-language corpus. During both training and inference, the classifier retrieves top-k references from the aligned space and fuses them with the input via cross-attention. Evaluated on Korean post-stroke and Italian ALS dysarthria datasets under a speaker-independent three-class protocol, CRAC achieves balanced accuracies of 87.3% on Korean and 86.7% on Italian, improving over monolingual baselines by 8.4 and 20.0 percentage points, respectively.