Child-Centric Voice Anonymization in Single and Multi-Speaker Speech via Domain-Adapted SSL Models

2026-06-29Sound

SoundArtificial Intelligence
AI summary

The authors studied how to make voice anonymization work better for children's speech. They adapted a speech anonymization system, originally designed for adults, to better handle child voices using a special child speech dataset. Their tests showed that this adaptation improved how clear and natural the speech sounds while still protecting the speaker's identity. They also showed their method works well even when multiple people talk at once. This work shows it’s important to tailor voice privacy tools specifically for children's voices.

voice anonymizationself-supervised learningchild speechspeech privacyintelligibilitymulti-speaker mixturespeaker extractionMyST corpusspeech usabilityspeech processing
Authors
Pranav Tushar, Xiao Xiao Miao, Rong Tong
Abstract
Voice anonymization aims to protect speaker identity while preserving linguistic content and speech usability. However, most anonymization systems are developed on adult speech, leading to degraded performance when applied to child speech. This paper investigates child-centric anonymization by adapting a self-supervised learning (SSL) based anonymization pipeline to the child speech domain. The system is adapted using child speech from the MyST corpus and evaluated under both single-speaker and two-speaker mixture conditions. Experimental results show that child-domain adaptation improves intelligibility and perceptual quality while maintaining strong privacy protection. Extending the approach to multi-speaker further demonstrates that combining target speaker extraction with child-adapted anonymization provides privacy protection while preserving conversational structure. These findings highlight the importance of child-specific adaptation for practical speech anonymization systems.