ArtBoost: Synthetic Articulatory Data Augmentation for Acoustic-to-Articulatory Inversion

2026-06-15 • Sound

SoundArtificial Intelligence

AI summaryⓘ

The authors designed a method called ArtBoost to improve models that predict how our mouth moves when we speak, using data that is usually hard and expensive to get. They used big datasets originally made for animating 3D faces from speech to guess mouth movements, then trained their models first on this guessed data before using real, limited mouth movement data. Their tests showed better accuracy, and the guessed movements matched real mouth movements well. They also found ArtBoost works with different types of prediction models, making it a helpful tool for this research area.

Acoustic-to-Articulatory Inversion (AAI)Electromagnetic Articulography (EMA)Data Augmentation3D Facial AnimationSpeech-Mesh DatasetsPseudo Articulatory TrajectoriesPre-trainingPearson Correlation Coefficient (PCC)Root Mean Square Error (RMSE)Articulatory Dynamics

Authors

Hyung Kyu Kim, Byungchan Hwang, Hak Gu Kim

Abstract

Recent acoustic-to-articulatory inversion (AAI) models rely on electromagnetic articulography (EMA) data, which are costly and limited in scale. To address this limitation, we propose \textit{ArtBoost}, a novel data augmentation strategy that leverages large-scale speech--mesh datasets originally developed for speech-driven 3D facial animation to improve AAI under limited EMA supervision. \textit{ArtBoost} extracts pseudo articulatory trajectories from visible facial anchors and uses them for pre-training before fine-tuning on real EMA data. Experiments show consistent improvements in PCC and RMSE. Trajectory analyses confirm that the pseudo articulatory signals reflect physically meaningful visible articulatory dynamics. Additional evaluations across different AAI architectures demonstrate stable performance gains, indicating that \textit{ArtBoost} can be integrated into diverse AAI models. These results suggest that speech--mesh data provide an effective and scalable source of articulatory supervision for AAI. Project page: https://cau-irislab.github.io/Interspeech26-ArtBoost/

View PDFOpen arXiv