Confidence Score Guided Incremental and Speaker Adaptive Pseudo-Labeling for Semi-Supervised Elderly Speech Recognition

2026-06-15 • Sound

Sound

AI summaryⓘ

The authors developed a new way to improve speech recognition for elderly speakers by carefully choosing which unlabeled speech data to trust and use. They created a system that ranks how confident it is about the speech content, gradually adding data it is more sure about. Additionally, their method adjusts for individual speaker differences using special training prompts. Tests showed their approach made fewer mistakes compared to others that did not use confidence ranking or speaker adaptation.

semi-supervised learningpseudo-labelingconfidence estimationincremental learningspeaker adaptationword error ratecharacter error ratecurriculum learningelderly speech recognition

Authors

Chengxi Deng, Xurong Xie, Shujie Hu, Jiajun Deng, Mengzhe Geng, Youjun Chen, Huimeng Wang, Haoning Xu, Guinan Li, Xunying Liu

Abstract

This paper proposes a novel confidence score guided incremental and speaker adaptive pseudo-labeling approach for semi-supervised elderly speech recognition. It facilitates higher-quality pseudo-label selection and progressive refinement, while also mitigating speaker heterogeneity. A confidence estimation module is designed to rank the reliability of untranscribed data, enabling a curriculum learning trajectory that progressively folds in unlabeled data subsets from high to low confidence. Speaker-specific characteristics are captured through speaker adaptive training with learnable prompts. Experiments on the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech datasets suggest that the proposed method outperforms the semi-supervised baseline using no confidence scores guided incremental or speaker adaptive pseudo-labeling by statistically significant word error rate (WER) or character error rate (CER) reductions of 1.45% and 2.27% absolute (6.21% and 6.98% relative).

View PDFOpen arXiv