Towards end-to-end LLM-based censoring-aware survival analysis

2026-05-25Artificial Intelligence

Artificial Intelligence
AI summary

The authors created LLMSurvival, a method that uses large language models (LLMs) to predict survival times from medical data, even when some patient outcomes are uncertain (censoring). They turned the problem into comparing pairs of patients rather than direct prediction, allowing the model to learn better. Tested on ICU mortality and fracture risk, their approach performed better than traditional and deep learning survival models. Their method also works across different hospitals and can be run locally with smaller models. Overall, the authors show it's possible to adapt LLMs for survival analysis in medicine by focusing on relative risk instead of exact times.

survival analysislarge language modelscensoringtime-to-event predictionpairwise rankingICU mortalityfragility fractureCox proportional hazardsdeep learningclinical prediction
Authors
Yishu Wei, Hexin Dong, Yi Lin, Jiahe Qian, Yi Liu, Yifan Peng
Abstract
Objective: Survival analysis is central to medical prediction, yet large language models (LLMs) are rarely used as end-to-end survival models because censoring prevents straightforward supervised fine-tuning. Here we present LLMSurvival, a framework that enables censoring-aware survival analysis with unmodified LLMs operating directly on tabular clinical data. Materials and Methods: LLMSurvival reformulates time-to-event prediction as pairwise ranking among comparable subjects, and derives test-time risk by aggregating comparisons against anchor individuals from the training cohort. Results: Across two clinical tasks (ICU mortality prediction in MIMIC-IV and fragility fracture prediction in a NewYork-Presbyterian/Weill Cornell Medicine cohort), LLMSurvival improves overall concordance over Cox proportional hazards modeling by 3.1% for ICU mortality and 0.5% for fracture risk, 2.1% on average for ICU mortality and 2.8% for fracture risk over three established deep learning survival models. Discussion: The results show that survival modeling with censoring can be made compatible with LLM fine-tuning through comparison-based reformulation. The framework demonstrates high portability and superior performance over expert curated scores like SAPS-II and FRAX scores across diverse clinical context. Furthermore, the framework supports local deployment, as compact, publicly available base models provide sufficient performance. Conclusion: The LLMSurvival framework serves as a proof of concept for an integrated, censoring-conscious approach to survival analysis via LLMs.