TMASC: Transmasculine Attitude and Speech Corpus

2026-06-15Computation and Language

Computation and Language
AI summary

The authors created the Transmasculine Attitudes and Speech Corpus (TMASC), a collection of data from 196 transmasculine people. This includes answers to questions about their vocal health and 66 audio recordings like coughing, throat clearing, and reading aloud. They explain how they gathered this information and show three examples of how the data can be used. These examples include combining different types of speech data, finding common voice traits in the group, and improving voice measurement methods. This corpus helps better understand and support the voices of transmasculine individuals.

transmasculinevocal healthmultimodal corpusspeech corpusacoustic measurementsperceptual datagroup characteristicsaudio recordingsvoice analysis
Authors
Sidney Wong
Abstract
We introduce the Transmasculine Attitudes and Speech Corpus (TMASC), a multimodal corpus of 196 transmasculine individuals, including questionnaire responses and 66 audio recordings. The questionnaire includes items exploring the vocal health of transmasculine individuals. The audio recordings include cough and throat-clearing samples, a reading passage, and additional session-specific questions. This paper outlines the development of this corpus and the data collection procedures. To illustrate the utility of this corpus, we present three case studies demonstrating how this crowd-sourced multimodal corpus can be used to support transmasculine individuals. These include the integration of perceptual and acoustic data, the identification of group-level characteristics, and the calibration of acoustic measurements.