Abstract
Breast ultrasound (BUS) offers a low-cost, radiation-free imaging alternative for breast cancer diagnostics, particularly suitable for point-of-care use. Despite promising results from deep learning (DL) models for BUS lesion classification, most models show significant performance drops on external datasets, suggesting overfitting to dataset-specific features. This lack of generalizability is concerning, especially given disparities in breast cancer outcomes across demographic groups and the diversity of ultrasound acquisition conditions. In this study, we investigate the impact of training dataset diversity on the robustness of DL models for BUS lesion classification. We compare three model architectures: ResNet50 (CNN), MViTv2 (Vision Transformer), and MambaOut (Vision Mamba), using 8403 B-mode BUS images from ten publicly available datasets originating from seven countries. Models were evaluated under three scenarios: single-dataset training, leave-one-dataset-out (LODO), and limited-data all-source training. Our results show that performance strongly depends on training set composition, with certain datasets consistently yielding better model performance, and substantial variability in cross-dataset generalization. This study provides new insights into the design of fair and generalizable DL systems for breast cancer diagnostics.
| Original language | English |
|---|---|
| Title of host publication | Artificial Intelligence and Imaging for Diagnostic and Treatment Challenges in Breast Care - 2nd Deep Breast Workshop, Deep-Breath 2025, Held in Conjunction with MICCAI 2025, Proceedings |
| Editors | Tianyu Zhang, Oliver Lester Saldanha, Luyi Han, Nika Rasoolzadeh, Lidia Garrucho Moras, Jarek van Dijk, Tao Tan, Jakob Nikolas Kather, Ritse Mann |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 21-30 |
| Number of pages | 10 |
| ISBN (Print) | 9783032055583 |
| DOIs | |
| Publication status | Published - 2026 |
| Event | 2nd Deep Breast Workshop on Artificial Intelligence and Imaging for Diagnostic and Treatment Challenges in Breast Care, Deep-Breath 2025, held in conjunction with the 28th International Conference on Medical Imaging and Computer-Assisted Intervention, MICCAI 2025 - Daejeon, Korea, Republic of Duration: 23 Sept 2025 → 23 Sept 2025 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 16142 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 2nd Deep Breast Workshop on Artificial Intelligence and Imaging for Diagnostic and Treatment Challenges in Breast Care, Deep-Breath 2025, held in conjunction with the 28th International Conference on Medical Imaging and Computer-Assisted Intervention, MICCAI 2025 |
|---|---|
| Country/Territory | Korea, Republic of |
| City | Daejeon |
| Period | 23/09/25 → 23/09/25 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Breast cancer
- Data diversity
- Deep learning
- Domain generalization
- Lesion classification
- Ultrasound
Fingerprint
Dive into the research topics of 'Training Set Diversity: A Key Factor in AI-Driven Breast Ultrasound Classification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver