Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging
2026-05-11 • Machine Learning
Machine LearningArtificial Intelligence
AI summaryⓘ
The authors studied how Transformers, a type of AI model, are used to classify different sleep stages. They found that even without training, a randomly set-up Transformer improves sleep stage detection by smoothing out the data while keeping important changes visible. This shows that the model's design helps more than the actual learned information. Their work suggests simpler techniques focused on smoothing might be enough for sleep analysis, which could make healthcare monitoring faster and easier on small devices.
Transformerssleep stagingself-attentiontemporal continuityrandom initializationsmoothinginductive biasphysiological monitoringlocal smoothnesstransition entropy
Authors
Guisong Liu, Xin Gao, Martin Dresler, Jiansong Zhang, Pengfei Wei
Abstract
Automatic sleep staging commonly adopts Transformers under the assumption that they learn complex long-range dependencies. We challenge this view by revealing a neglected property of sleep sequences: strong local temporal continuity. We show that a randomly initialized Transformer, without any training, substantially improves sleep staging performance and consistently outperforms heuristic smoothing. We formalize this effect via a Random Attention Prior Kernel (RAPK), showing that random self-attention acts as an adaptive smoother by balancing global averaging and content-based similarity while preserving stage transitions. Using two metrics, the Local Smoothness Influence Index (LSII) and the Weighted Transition Entropy (WTE), we provide evidence that most performance gains in Transformer-based sleep staging arise from architectural inductive bias rather than parameter learning. Our results suggest that sleep staging can be effectively addressed with structure-driven smoothing mechanisms rather than complex dependency modeling, enabling more efficient and edge-deployable healthcare systems for large-scale physiological monitoring.