Why Do Self-Harm Prediction Models Struggle to Generalise? Lexical and Semantic Variations in Emergency Department Triage Notes

2026-06-01 • Computation and Language

Computation and Language

AI summaryⓘ

The authors studied emergency department notes about self-harm from two different hospitals to see why computer models that detect self-harm work well in one place but not in another. They found that while the main topics like self-poisoning and self-injury were similar, the words and important clues used to describe self-harm varied between hospitals. These differences made it harder for models to work well across both sites. Their work helps us understand why models struggle with different hospitals and suggests ways to make detection more reliable everywhere.

self-harmemergency departmentnatural language processingtriage notescross-site performancelexical analysisfeature importancemodel generalisabilityclinical textself-poisoning

Authors

Liuliu Chen, Mike Conway, Jo Robinson, Vlada Rozova

Abstract

Self-harm presentations to emergency departments (EDs) are strongly associated with higher suicide risk. NLP models have shown robust performance in detecting self-harm from triage notes within single hospitals, yet performance often declines across institutions. To examine potential causes, we compare ED triage notes from two hospitals by analyzing lexical characteristics, highly associated predictive features, and salient topics. Our results reveal variation in lexical expression and feature importance related to self-harm across hospitals, despite consistent core themes such as self-poisoning and self-injury. These documentation differences are associated with reduced cross-site performance. Our findings provide insight into how institutional variation affects the identification of self-harm in clinical text and highlight potential methods to improve model generalisability.

View PDFOpen arXiv