TY - JOUR
T1 - CerviFusionNet
T2 - A multi-modal, hybrid CNN-transformer-GRU model for enhanced cervical lesion multi-classification
AU - Sha, Yuyang
AU - Zhang, Qingyue
AU - Zhai, Xiaobing
AU - Hou, Menghui
AU - Lu, Jingtao
AU - Meng, Weiyu
AU - Wang, Yuefei
AU - Li, Kefeng
AU - Ma, Jing
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2024/12/20
Y1 - 2024/12/20
N2 - Cervical lesions pose a significant threat to women's health worldwide. Colposcopy is essential for screening and treating cervical lesions, but its effectiveness depends on the doctor's experience. Artificial intelligence-based solutions via colposcopy images have shown great potential in cervical lesions screening. However, some challenges still need to be addressed, such as low algorithm performance and lack of high-quality multi-modal datasets. Here, we established a multi-modal colposcopy dataset of 2,273 HPV+ patients, comprising original colposcopy images, acetic acid reactions at 60s and 120s, iodine staining, diagnostic reports, and pathological results. Utilizing this dataset, we developed CerviFusionNet, a hybrid architecture that merges convolutional neural networks and vision transformers to learn robust representations. We designed a temporal module to capture dynamic changes in acetic acid sequences, which can boost the model performance without sacrificing inference speed. Compared with several existing methods, CerviFusionNet demonstrated excellent accuracy and efficiency.
AB - Cervical lesions pose a significant threat to women's health worldwide. Colposcopy is essential for screening and treating cervical lesions, but its effectiveness depends on the doctor's experience. Artificial intelligence-based solutions via colposcopy images have shown great potential in cervical lesions screening. However, some challenges still need to be addressed, such as low algorithm performance and lack of high-quality multi-modal datasets. Here, we established a multi-modal colposcopy dataset of 2,273 HPV+ patients, comprising original colposcopy images, acetic acid reactions at 60s and 120s, iodine staining, diagnostic reports, and pathological results. Utilizing this dataset, we developed CerviFusionNet, a hybrid architecture that merges convolutional neural networks and vision transformers to learn robust representations. We designed a temporal module to capture dynamic changes in acetic acid sequences, which can boost the model performance without sacrificing inference speed. Compared with several existing methods, CerviFusionNet demonstrated excellent accuracy and efficiency.
KW - Artificial intelligence
KW - Cervical smear
KW - Classification of bioinformatical subject
KW - Medical imaging
UR - http://www.scopus.com/inward/record.url?scp=85209141200&partnerID=8YFLogxK
U2 - 10.1016/j.isci.2024.111313
DO - 10.1016/j.isci.2024.111313
M3 - Article
AN - SCOPUS:85209141200
SN - 2589-0042
VL - 27
JO - iScience
JF - iScience
IS - 12
M1 - 111313
ER -