TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

2026-06-08Machine Learning

Machine LearningArtificial IntelligenceComputation and Language
AI summary

The authors address a problem with current clinical early warning systems that use electronic health records: existing large language models tend to make overly confident yes-or-no predictions about patient risk, which is not very helpful for doctors. They created TRIAGE, a method that trains these models to explain their thinking by comparing different possible outcomes, helping the model give more nuanced risk scores. Tests showed TRIAGE improves prediction accuracy and produces clearer, more trustworthy explanations for clinical decisions. Overall, the authors demonstrate a way to make AI risk assessments both more accurate and easier for clinicians to understand.

clinical early warning systemselectronic health recordsirregularly sampled medical time serieslarge language modelsrisk calibrationrisk polarizationdialectical reasoningAUPRCmodel interpretabilityclinical triage
Authors
Hyeongwon Jang, Gyouk Chu, Changhun Kim, Joonhyung Park, Hangyul Yoon, Eunho Yang
Abstract
Clinical early warning systems built on electronic health records, in which clinical observations are recorded as irregularly sampled medical time series (ISMTS), must deliver both calibrated risk scores for patient triage and interpretable rationales that clinicians can verify. Large Language Models (LLMs) have been explored for this task, yet they collapse graded clinical risk into overconfident binary predictions. This risk polarization undermines both calibration and cross-patient comparability. To address this, we propose TRIAGE, a framework that trains an LLM to generate dialectical reasoning over competing clinical outcomes by eliciting outcome-specific rationales. This dialectical formulation mitigates risk polarization, enabling a single LLM to yield continuous risk scores grounded in explicit clinical reasoning. Evaluated on three ISMTS benchmarks, TRIAGE achieves an average AUPRC improvement of 3.3% and reduces calibration error by 81% compared to the competitive baselines. An LLM-as-a-judge assessment further shows that our rationales surpass post-hoc explanations from the baseline by 20% in clinical reasoning quality. The source code is available at https://github.com/HyeongWon-Jang/TRIAGE .