Child-Centric Voice Anonymization in Single and Multi-Speaker Speech via Domain-Adapted SSL Models

2026-06-29 • Sound

SoundArtificial Intelligence

AI summaryⓘ

The authors studied how to make voice anonymization work better for children's speech. They adapted a speech anonymization system, originally designed for adults, to better handle child voices using a special child speech dataset. Their tests showed that this adaptation improved how clear and natural the speech sounds while still protecting the speaker's identity. They also showed their method works well even when multiple people talk at once. This work shows it’s important to tailor voice privacy tools specifically for children's voices.

voice anonymizationself-supervised learningchild speechspeech privacyintelligibilitymulti-speaker mixturespeaker extractionMyST corpusspeech usabilityspeech processing

Authors

Pranav Tushar, Xiao Xiao Miao, Rong Tong

Abstract

Voice anonymization aims to protect speaker identity while preserving linguistic content and speech usability. However, most anonymization systems are developed on adult speech, leading to degraded performance when applied to child speech. This paper investigates child-centric anonymization by adapting a self-supervised learning (SSL) based anonymization pipeline to the child speech domain. The system is adapted using child speech from the MyST corpus and evaluated under both single-speaker and two-speaker mixture conditions. Experimental results show that child-domain adaptation improves intelligibility and perceptual quality while maintaining strong privacy protection. Extending the approach to multi-speaker further demonstrates that combining target speaker extraction with child-adapted anonymization provides privacy protection while preserving conversational structure. These findings highlight the importance of child-specific adaptation for practical speech anonymization systems.

View PDFOpen arXiv