TRACE: Temporal Relationship-Aware Conversational Entrainment Detection in Dyadic Speech

2026-06-29Computation and Language

Computation and LanguageArtificial Intelligence
AI summary

The authors focus on understanding how people emotionally sync up during conversations, called emotional entrainment. They created DyadEE, a dataset containing real and fake conversations where emotions either match or are purposely mismatched. They also developed TRACE, a method that looks at short segments of speech to better detect emotional syncing by using advanced audio features. Their experiments showed that including who is talking and the conversation context helps find emotional entrainment more accurately, with TRACE reaching about 97% accuracy.

emotional entrainmentdyadic interactionspeech AI agentsacoustic embeddingsWhisper representationsemotion fine-tuningconversational contextsynthetic interactionsemotion resynthesispartner swapping
Authors
Sathvik Manikantan Napa Ugandhar, Hao Zhang, Alison Gunzler, Yuzhe Wang, Thomas Thebaud, Georgi Tinchev, Venkatesh Ravichandran, Laureano Moro-Velázquez
Abstract
With the proliferation of speech AI agents, understanding emotional entrainment in conversational interaction has become increasingly important. Emotional entrainment is shaped by social relationships and conversational context, influencing affective coordination over time. We introduce DyadEE, a dataset for emotional entrainment detection in dyadic speech interactions, containing both emotionally entrained conversations and synthetic interactions where entrainment is disrupted through partner swapping and emotion resynthesis. We further propose TRACE, a window-level framework that models dyadic interaction as ordered sequences of acoustic embeddings derived from emotion fine-tuned Whisper representations, treating each sample as an interaction trace rather than pooled utterances. Experimental results on DyadEE show that incorporating conversational context and relationship information improves emotional entrainment detection, with TRACE achieving the best accuracy of 97.01%.