From Hazard Functions to Language Space: Cox-Supervised Distillation of Survival Risk into a Large Language Model
2026-06-08 • Machine Learning
Machine Learning
AI summaryⓘ
The authors explore if a large language model (LLM) can learn to predict how long patients might survive by using risk estimates from a traditional survival analysis method called the Cox model. They turn patient data into text prompts and train the LLM to produce survival risk predictions similar to the Cox model's outputs. Their approach works well on several datasets, even though the LLM was trained to generate text instead of specifically optimizing for survival analysis. They also found that the model’s internal representations smoothly encode survival risk, showing it understands risk as a continuous concept. This suggests that language models can be adapted to reason about time-based medical risks.
Cox proportional hazards modelsurvival analysislarge language modeltext-based modelingfine-tuningrisk predictioncalibrationdiscriminationt-SNE visualizationlatent space
Authors
Nicholas I-Hsien Kuo, Blanca Gallego, Louisa Jorm
Abstract
We investigate whether information about time-to-event risk estimated by a Cox proportional hazards model can be transferred into a generative large language model. We propose a text-based survival modelling pipeline in which structured clinical covariates are converted into text prompts and a Qwen-based large language model is fine-tuned to generate patient-specific survival risk using Cox model predictions as a training target. Across GBSG2, ACTG320, and WHAS500, the model achieves competitive held-out discrimination and calibration despite being trained as a text-generation task rather than with a conventional survival-analysis loss. We further analyse the geometry of the model's hidden states, where t-SNE visualisations reveal smooth risk gradients in latent space, suggesting that the model represents survival risk as a continuous structure rather than isolated risk categories. Together, these findings suggest that large language models can internalise survival-risk structure while supporting calibrated prediction, providing a route towards time-to-event reasoning in language models.