TY - JOUR
T1 - Multivariate Contrastive Predictive Coding with Sliding Windows for Disease Prediction from Electronic Health Records
AU - Yuan, Hongxu
AU - Jing, Xiaozhu
AU - Yan, Yuzheng
AU - Luo, Wuman
N1 - Publisher Copyright:
© 2025 The Author(s). Advanced Intelligent Systems published by Wiley-VCH GmbH.
PY - 2025
Y1 - 2025
N2 - Learning effective patient representations from electronic health records (EHRs) is crucial for improving disease prediction models. However, existing supervised learning methods are hindered by high labeling costs. Moreover, capturing complex temporal and multi-indicator relationships—as well as localized temporal pattern shifts in clinical settings—remains a significant challenge. To address these issues, the adaptive multi-indicator contrastive predictive coding (AMCPC) framework is proposed, a self-supervised pretraining approach tailored for EHR data. AMCPC utilizes an adaptive optimal window-size selection algorithm to segment patient visit sequences into temporal sub-windows, enabling the model to focus on localized, context-specific health patterns. Furthermore, by extending contrastive predictive coding (CPC) through a multi-indicator approach, AMCPC employs a 2D convolutional neural network to capture global correlations among medical indicators within each sub-window. Extensive experiments on three real-world clinical datasets demonstrate that AMCPC outperforms both fully supervised and existing self-supervised methods, particularly when trained with limited labeled data. AMCPC establishes an effective self-supervised pretraining framework for unlabeled EHR data, which can be fine-tuned with minimal labeled data—significantly enhancing downstream predictive performance and reducing reliance on large-scale labeled datasets.
AB - Learning effective patient representations from electronic health records (EHRs) is crucial for improving disease prediction models. However, existing supervised learning methods are hindered by high labeling costs. Moreover, capturing complex temporal and multi-indicator relationships—as well as localized temporal pattern shifts in clinical settings—remains a significant challenge. To address these issues, the adaptive multi-indicator contrastive predictive coding (AMCPC) framework is proposed, a self-supervised pretraining approach tailored for EHR data. AMCPC utilizes an adaptive optimal window-size selection algorithm to segment patient visit sequences into temporal sub-windows, enabling the model to focus on localized, context-specific health patterns. Furthermore, by extending contrastive predictive coding (CPC) through a multi-indicator approach, AMCPC employs a 2D convolutional neural network to capture global correlations among medical indicators within each sub-window. Extensive experiments on three real-world clinical datasets demonstrate that AMCPC outperforms both fully supervised and existing self-supervised methods, particularly when trained with limited labeled data. AMCPC establishes an effective self-supervised pretraining framework for unlabeled EHR data, which can be fine-tuned with minimal labeled data—significantly enhancing downstream predictive performance and reducing reliance on large-scale labeled datasets.
KW - contrastive predictive coding
KW - disease prediction
KW - electronic health records
KW - patient representation
UR - https://www.scopus.com/pages/publications/105025817634
U2 - 10.1002/aisy.202500818
DO - 10.1002/aisy.202500818
M3 - Article
AN - SCOPUS:105025817634
SN - 2640-4567
JO - Advanced Intelligent Systems
JF - Advanced Intelligent Systems
ER -