Stanford Researchers Build SleepFM Clinical: A Multimodal Sleep Foundation AI Model for 130+ Disease Prediction

Stanford Researchers Build SleepFM Clinical: A Multimodal Sleep Foundation AI Model for 130+ Disease Prediction


A team of Stanford Medicine researchers have introduced SleepFM Clinical, a multimodal sleep foundation model that learns from clinical polysomnography and predicts long term disease risk from a single night of sleep. The research work is published in Nature Medicine and the team has released the clinical code as the open source sleepfm-clinical repository on GitHub under the MIT license.

From overnight polysomnography to a general representation

Polysomnography records brain activity, eye movements, heart signals, muscle tone, breathing effort and oxygen saturation during a full night in a sleep lab. It is the gold standard test in sleep medicine, but most clinical workflows use it only for sleep staging and sleep apnea diagnosis. The research team treat these multichannel signals as a dense physiological time series and train a foundation model to learn a shared representation across all modalities.

SleepFM is trained on about 585,000 hours of sleep recordings from about 65,000 people, drawn from multiple cohorts. The largest cohort comes from the Stanford Sleep Medicine Center, where about 35,000 adults and children had overnight studies between 1999 and 2024. That clinical cohort is linked to electronic health records, which later enables survival analysis for hundreds of disease categories.

https://www.nature.com/articles/s41591-025-04133-4

Model architecture and pretraining objective

At the modeling level, SleepFM uses a convolutional backbone to extract local features from each channel, followed by attention based aggregation across channels and a temporal transformer that operates over short segments of the night. The same core architecture already appeared in earlier work on SleepFM for sleep staging and sleep disordered breathing detection, where it showed that learning joint embeddings across brain activity, electrocardiography and respiratory signals improves downstream performance.

Tokenmetrics

The pretraining objective is leave one out contrastive learning. For each short time segment, the model builds separate embeddings for each modality group, such as brain signals, heart signals and respiratory signals, and then learns to align these modality embeddings so that any subset predicts the joint representation of the remaining modalities. This approach makes the model robust to missing channels and heterogeneous recording montages, which are common in real world sleep labs.

After pretraining on unlabeled polysomnography, the backbone is frozen and small task specific heads are trained. For standard sleep tasks, a lightweight recurrent or linear head maps embeddings to sleep stages or apnea labels. For clinical risk prediction, the model aggregates the full night into a single patient level embedding, concatenates basic demographics such as age and sex, and then feeds this representation into a Cox proportional hazards layer for time to event modeling.

Benchmarks on sleep staging and apnea

Before moving to disease prediction, the research team verified that SleepFM competes with specialist models on standard sleep analysis tasks. Prior work already showed that a simple classifier on top of SleepFM embeddings outperforms end to end convolutional networks for sleep stage classification and for detection of sleep disordered breathing, with gains in macro AUROC and AUPRC on several public datasets.

In the clinical study, the same pretrained backbone is reused for sleep staging and apnea severity classification across multi center cohorts. Results reported in the research paper show that SleepFM matches or exceeds existing tools such as traditional convolutional models and other automated sleep staging systems, which validates that the representation captures core sleep physiology and not only statistical artifacts from a single dataset.

Predicting 130 diseases and mortality from one night of sleep

The core contribution of this Stanford’s research paper is disease prediction. The research team maps diagnosis codes in the Stanford electronic health records to phecodes and defines more than 1,000 candidate disease groupings. For each phecode, they compute time to first diagnosis after the sleep study and fit a Cox model on top of SleepFM embeddings.

SleepFM identifies 130 disease outcomes whose risks are predictable from a single night of polysomnography with strong discrimination. These include all cause mortality, dementia, myocardial infarction, heart failure, chronic kidney disease, stroke, atrial fibrillation, several cancers and multiple psychiatric and metabolic disorders. For many of these conditions, performance metrics such as concordance index and area under the receiver operating curve are in ranges comparable to established risk scores, even though the model uses only sleep recordings plus basic demographics.

The reporting also notes that for some cancers, pregnancy complications, circulatory conditions and mental health disorders, predictions based on SleepFM reach accuracy levels around 80 percent for multi year risk windows. This suggests that subtle patterns in the coordination between brain, heart and breathing signals carry information about latent disease processes that are not yet clinically visible.

Comparison with simpler baselines

To assess added value, the research team compared SleepFM based risk models with two baselines. The first uses only demographic features such as age, sex and body mass index. The second trains an end to end model directly on polysomnography and outcomes, without unsupervised pretraining. Across most disease categories, the pretrained SleepFM representation combined with a simple survival head yields higher concordance and higher long horizon AUROC than both baselines.

This research clearly shows that the gain comes less from a complex prediction head and more from the foundation model that has learned a general representation of sleep physiology. In practice, this means that clinical centers can reuse a single pretrained backbone, learn small site specific heads with relatively modest labeled cohorts and still approach state of the art performance.

Check out the Paper and FULL CODES here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Check out our latest release of ai2025.dev, a 2025-focused analytics platform that turns model launches, benchmarks, and ecosystem activity into a structured dataset you can filter, compare, and export

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source link

[wp-stealth-ads rows="2" mobile-rows="3"]

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest