OpenBibleTTS: Large-Scale Speech Resources and TTS Models for Low-Resource Languages

2026-06-08 • Computation and Language

Computation and Language

AI summaryⓘ

The authors create OpenBibleTTS, a large collection of speech data for 37 less-studied languages, to better test text-to-speech (TTS) systems. They compare different TTS models on Bible text and other materials to see how well they work across these languages. Their results show no single model works best everywhere: some are easier to understand, others sound better, and some struggle with unfamiliar text. They also share all their data and models publicly to help future research improve TTS for languages that don't have much tech support. This helps address the challenge that most TTS advances focus on common languages, leaving low-resource ones behind.

neural text-to-speechlow-resource languagesmultilingual speech generationOpenBibleTTSspeech synthesisphonetic coveragetext-to-speech architecturesintelligibilitymultilingual modelssubjective human evaluation

Authors

David Guzmán, Luel Hagos Beyene, Jesujoba Oluwadara Alabi, Yejin Jeon, Dietrich Klakow, David Ifeoluwa Adelani

Abstract

Recent advances in neural text-to-speech (TTS) and multilingual speech generation have substantially improved synthetic speech quality, yet these gains remain unevenly distributed across the world's languages. Existing models are still dominated by a small set of high-resource languages, while many studies of low-resource TTS are simulated on artificially downsampled high-resource corpora that do not reflect the orthographic variation and limited phonetic coverage encountered in genuinely underrepresented settings. As such, we introduce OpenBibleTTS, which is a large-scale benchmark for low-resource speech synthesis spanning 37 underrepresented languages. Moreover, a systematic comparison of various TTS architectures and large-scale speech generation models is conducted across in-domain Biblical text and out-of-domain material. Results show that no single system dominates across languages and metrics: Gemini-TTS achieves the highest listener ratings on most evaluated languages, but monolingual EveryVoice models trained on OpenBibleTTS remain strongest for intelligibility and are preferred in several African languages, while open from-scratch systems degrade sharply on out-of-domain text, revealing a persistent gap between broad multilingual coverage and reliable synthesis quality in underserved linguistic communities. We complement automatic evaluation with subjective human judgments, and open-source all processed datasets, alignments, and trained models to support future low-resource TTS research.

View PDFOpen arXiv