Generating synthetic electronic health record data using agent-based models to evaluate machine learning robustness under mass casualty incidents

2026-05-11Machine Learning

Machine Learning
AI summary

The authors developed a computer simulation to create fake hospital data reflecting unusual situations like disasters with many injured people. They used this fake data to test how well machine learning models, which predict patient treatment times, work when hospitals experience sudden changes. They found that these models often miss patients with longer hospital stays during these times. Their work shows that using simulations can help check if medical AI tools stay reliable during emergencies, even when real data from such events is scarce.

machine learningelectronic health recordsagent-based modelingemergency departmentmass casualty incidentsynthetic datamodel robustnesslength of stay predictionclinical workflowhealthcare systems
Authors
Roben Delos Reyes, Daniel Capurro, Nicholas Geard
Abstract
ML models in healthcare are typically evaluated using curated real-world EHR data. A key limitation of such evaluations is that they may fail to assess the robustness of ML models to changes in the data at deployment, which is a common issue because EHR data used for ML model development cannot capture all such changes. Mass casualty incidents (MCIs) caused by disasters are critical instances where this will be an issue, as they induce rare, uncertain, and novel changes to routine system conditions. Because real-world EHR data from MCIs are often limited or unavailable, assessing ML robustness under such conditions before deployment remains challenging. Here, we propose an agent-based modelling approach for generating synthetic EHR data to evaluate the robustness of ML models under MCI scenarios. We use real-world EHR data to develop and calibrate an agent-based model (ABM) of an emergency department (ED) that explicitly models patient arrivals, resource capacity, and clinical workflow. By changing these system conditions to reflect plausible MCI scenarios, the ED model generates synthetic versions of the real-world EHR data that exhibit shifts in system behaviour. Using these synthetic data, we test ML models for predicting length of stay. We observed consistent declines in recall under MCI conditions relative to baseline system conditions, resulting in an increase in the number of patients with prolonged length of stay that were missed by the ML models. These results highlight the impact of changes in system conditions on patient outcomes, EHR data, and ML model performance. Our work establishes ABM-based synthetic EHR data generation as a proactive and systematic approach for evaluating the robustness of ML models under MCI or other system conditions not captured in real-world EHR data, supporting the safer and more effective deployment of ML models in healthcare systems.