SIMAX: A Scalable and Interpretable Framework for Multi-Fidelity and Annotated Clinician-Patient Dialogue Simulation

2026-06-29Computation and Language

Computation and LanguageArtificial Intelligence
AI summary

The authors created SIMAX, a system that simulates doctor-patient conversations with specific behaviors and scenarios, making it easier to study communication in healthcare. SIMAX produces realistic dialogues with annotations about communication quality that help test and improve AI tools for analyzing these talks. Their tests showed the simulations are natural, accurate, and helpful for evaluating communication coding systems. This approach provides a scalable way to generate data that is usually hard to collect and label by hand.

clinical dialoguecommunication codingsimulationbehavioral annotationsambient digital scribespeech naturalnessWER (word error rate)clinical realismcommunication qualitymulti-fidelity simulation
Authors
Zhuhan Bao, Rui Yang, Bohao Yang, Zhiyi Liu, Sicheng Shu, Ruio Heerschap, Le Li, Doris Yang, Elisabeth Bond, Haoyuan Wang, Nicoleta Economou-Zavlanos, Joshua M. Biro, Matthew McDermott, Nan Liu, Anand Chowdhury, Kai Sun, Kathryn Pollak, Ed Hammond, Chuan Hong
Abstract
Background. The widespread deployment of ambient digital scribes is driving large-scale capture of clinician-patient dialogues. Human coding of clinical communication data remains costly, inconsistent, and difficult to scale, motivating AI-driven communication coding systems. However, evaluating these systems requires real-world dialogues and human-coded labels, both hard to obtain at scale. Methods. We developed SIMAX (Scalable and Interpretable Framework for Multi-Fidelity and Annotated Clinician-Patient Dialogue Simulation), a framework for generating controlled clinical dialogue data with reference behavioral annotations. SIMAX generates clinician-patient dialogues from predefined clinical scenarios, personas and voice conditions, and target communication behaviors. Behaviors are controlled using two codebooks: the Global Codebook for overall communication quality and the WISER Codebook for specific countable behaviors. We evaluated SIMAX using automated and human quality assessments and an example communication coding system. Results. SIMAX generated 3,388 simulated dialogues across three specialties, multiple visit stages, persona characteristics, and accent conditions. Automated assessment showed mean UTMOS and WV-MOS scores of 3.03 and 2.61, WER and CER of 0.07 and 0.05, and CLAP cosine similarity of 0.41, suggesting reasonable speech naturalness, high transcription fidelity, and positive text-audio correspondence. Human evaluation showed a median MOS of 4.67 and a median clinical realism score of 3.00. Downstream evaluation suggests that SIMAX can assess how a communication coding system responds to behavioral targets and reveal insufficient sensitivity in some dimensions. Conclusions. SIMAX generates controlled and reproducible simulated clinician-patient dialogues, providing a data foundation for developing, validating, and refining communication coding systems.