Abstract
Breast cancer poses a significant threat to women's health, and ultrasound plays a critical role in the assessment of breast lesions. This study introduces a prospective deep learning architecture, termed the “Multi-modal Multi-task Network” (3MT-Net), which integrates clinical data with B-mode and color Doppler ultrasound images. Specifically, an AM-CapsNet is employed to extract key features from ultrasound images, while a cascaded cross-attention mechanism is utilized to fuse clinical data. Moreover, an ensemble learning approach with an optimization algorithm is adopted to dynamically assign weights to different modalities, accommodating both high-dimensional and low-dimensional data. The 3MT-Net performs binary classification of benign versus malignant lesions and further classifies the pathological subtypes. Data were retrospectively collected from nine medical centers to ensure the broad applicability of the 3MT-Net. Two separate testsets were created and extensive experiments were conducted. Comparative analyses demonstrated that the AUC of the 3MT-Net outperforms the industry-standard computer-aided detection product, S-Detect, by 1.4% to 3.8%.
Original language | English |
---|---|
Pages (from-to) | 4680-4691 |
Number of pages | 12 |
Journal | IEEE Journal of Biomedical and Health Informatics |
Volume | 29 |
Issue number | 7 |
DOIs | |
Publication status | Published - 2025 |
Keywords
- Breast cancer
- ensemble learning
- multi-task learning
- multimodal
- ultrasound imaging