TY - JOUR
T1 - Development of a Machine Learning Model for Predicting Treatment-Related Amenorrhea in Young Women with Breast Cancer
AU - Song, Long
AU - Edib, Zobaida
AU - Aickelin, Uwe
AU - Akbarzadeh Khorshidi, Hadi
AU - Hamy, Anne Sophie
AU - Jayasinghe, Yasmin
AU - Hickey, Martha
AU - Anderson, Richard A.
AU - Lambertini, Matteo
AU - Condorelli, Margherita
AU - Demeestere, Isabelle
AU - Ignatiadis, Michail
AU - Pistilli, Barbara
AU - Su, H. Irene
AU - Chang, Shanton
AU - Pang, Patrick Cheong Iao
AU - Reyal, Fabien
AU - Nelson, Scott M.
AU - Sukumvanich, Paniti
AU - Minisini, Alessandro
AU - Puglisi, Fabio
AU - Ruddy, Kathryn J.
AU - Couch, Fergus J.
AU - Olson, Janet E.
AU - Stern, Kate
AU - Agresta, Franca
AU - Stafford, Lesley
AU - Chin-Lenn, Laura
AU - Cui, Wanda
AU - Anazodo, Antoinette
AU - Gorelik, Alexandra
AU - Nguyen, Tuong L.
AU - Partridge, Ann
AU - Saunders, Christobel
AU - Sullivan, Elizabeth
AU - Macheras-Magias, Mary
AU - Peate, Michelle
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/11
Y1 - 2025/11
N2 - Treatment-induced ovarian function loss is a significant concern for many young patients with breast cancer. Accurately predicting this risk is crucial for counselling young patients and informing their fertility-related decision-making. However, current risk prediction models for treatment-related ovarian function loss have limitations. To provide a broader representation of patient cohorts and improve feature selection, we combined retrospective data from six datasets within the FoRECAsT (Infertility after Cancer Predictor) databank, including 2679 pre-menopausal women diagnosed with breast cancer. This combined dataset presented notable missingness, prompting us to employ cross imputation using the k-nearest neighbours (KNN) machine learning (ML) algorithm. Employing Lasso regression, we developed an ML model to forecast the risk of treatment-related amenorrhea as a surrogate marker of ovarian function loss at 12 months after starting chemotherapy. Our model identified 20 variables significantly associated with risk of developing amenorrhea. Internal validation resulted in an area under the receiver operating characteristic curve (AUC) of 0.820 (95% CI: 0.817–0.823), while external validation with another dataset demonstrated an AUC of 0.743 (95% CI: 0.666–0.818). A cutoff of 0.20 was chosen to achieve higher sensitivity in validation, as false negatives—patients incorrectly classified as likely to regain menses—could miss timely opportunities for fertility preservation if desired. At this threshold, internal validation yielded sensitivity and precision rates of 91.3% and 61.7%, respectively, while external validation showed 92.9% and 60.0%. Leveraging ML methodologies, we not only devised a model for personalised risk prediction of amenorrhea, demonstrating substantial enhancements over existing models but also showcased a robust framework for maximally harnessing available data sources.
AB - Treatment-induced ovarian function loss is a significant concern for many young patients with breast cancer. Accurately predicting this risk is crucial for counselling young patients and informing their fertility-related decision-making. However, current risk prediction models for treatment-related ovarian function loss have limitations. To provide a broader representation of patient cohorts and improve feature selection, we combined retrospective data from six datasets within the FoRECAsT (Infertility after Cancer Predictor) databank, including 2679 pre-menopausal women diagnosed with breast cancer. This combined dataset presented notable missingness, prompting us to employ cross imputation using the k-nearest neighbours (KNN) machine learning (ML) algorithm. Employing Lasso regression, we developed an ML model to forecast the risk of treatment-related amenorrhea as a surrogate marker of ovarian function loss at 12 months after starting chemotherapy. Our model identified 20 variables significantly associated with risk of developing amenorrhea. Internal validation resulted in an area under the receiver operating characteristic curve (AUC) of 0.820 (95% CI: 0.817–0.823), while external validation with another dataset demonstrated an AUC of 0.743 (95% CI: 0.666–0.818). A cutoff of 0.20 was chosen to achieve higher sensitivity in validation, as false negatives—patients incorrectly classified as likely to regain menses—could miss timely opportunities for fertility preservation if desired. At this threshold, internal validation yielded sensitivity and precision rates of 91.3% and 61.7%, respectively, while external validation showed 92.9% and 60.0%. Leveraging ML methodologies, we not only devised a model for personalised risk prediction of amenorrhea, demonstrating substantial enhancements over existing models but also showcased a robust framework for maximally harnessing available data sources.
KW - breast cancer
KW - cross imputation
KW - machine learning
KW - risk prediction model
KW - treatment-related amenorrhea
UR - https://www.scopus.com/pages/publications/105023448012
U2 - 10.3390/bioengineering12111171
DO - 10.3390/bioengineering12111171
M3 - Article
AN - SCOPUS:105023448012
SN - 2306-5354
VL - 12
JO - Bioengineering
JF - Bioengineering
IS - 11
M1 - 1171
ER -