Next-Token Prediction Learns Generalisable Representations of Sleep Physiology

2026-06-08Artificial Intelligence

Artificial Intelligence
AI summary

The authors created Hypnos, a new model that learns to understand signals from various health sensors like brain waves and heartbeats by predicting the next small piece of data in a sequence. Unlike previous methods, their approach works well even though physiological signals can be unpredictable and don't always fit simple matching rules. Hypnos was trained on lots of sleep study data and can work with different types of sensor information, producing useful summaries for tasks like identifying sleep stages. The authors found that Hypnos performs better than earlier models and does well even with much less labeled data, plus it generalizes to daytime heart monitoring tasks such as detecting irregular heartbeats.

foundation modelsmulti-modal physiological signalsnext-token predictionauto-regressive modelsresidual vector quantizationpolysomnographysleep stage classificationself-supervised learningatrial fibrillationrepresentation learning
Authors
Jonathan F. Carter, Lionel Tarassenko
Abstract
Foundation models offer a promising route to compress multi-modal physiological signals into compact representations of human health, with broad applications across sleep medicine, cardiology, neurology and other healthcare domains. Existing models have typically been trained with masked-reconstruction or contrastive objectives. However, masked reconstruction may be poorly suited to the stochastic nature of these signals, while contrastive approaches rely on positive-pair definitions despite the semantic invariances of physiological signals being poorly understood. In this work, we show that next-token prediction is a simple and scalable alternative. We develop Hypnos, a multi-modal sleep foundation model trained using eight different sensing modalities (e.g. EEG, ECG, respiratory signals) drawn from over 20,000 overnight polysomnography recordings. We tokenize each modality into streams of discrete tokens using residual vector quantization, then train a large auto-regressive RQ-Transformer to jointly predict the next token across all modalities in parallel. After training, Hypnos can be applied to continuous streams of sensor data from any subset of supported modalities, generating embeddings for downstream tasks. Across a range of benchmarks, Hypnos significantly outperforms existing foundation models. In sleep stage classification, we match the performance of strong supervised baselines on held-out test sets whilst using \(100\times\) less labelled data. Hypnos even generalises to daytime physiology, surpassing a dedicated ECG foundation model at detecting atrial fibrillation. Our results demonstrate that next-token prediction is a strong self-supervised objective for representation learning from multi-modal physiological signals.