Training Set Diversity: A Key Factor in AI-Driven Breast Ultrasound Classification

  • Rebecca Mes
  • , Mark Wijkhuizen
  • , Lennard M. van Karnenbeek
  • , Tao Tan
  • , Ritse Mann
  • , Theo Ruers
  • , Freija Geldof
  • , Behdad Dashtbozorg

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Breast ultrasound (BUS) offers a low-cost, radiation-free imaging alternative for breast cancer diagnostics, particularly suitable for point-of-care use. Despite promising results from deep learning (DL) models for BUS lesion classification, most models show significant performance drops on external datasets, suggesting overfitting to dataset-specific features. This lack of generalizability is concerning, especially given disparities in breast cancer outcomes across demographic groups and the diversity of ultrasound acquisition conditions. In this study, we investigate the impact of training dataset diversity on the robustness of DL models for BUS lesion classification. We compare three model architectures: ResNet50 (CNN), MViTv2 (Vision Transformer), and MambaOut (Vision Mamba), using 8403 B-mode BUS images from ten publicly available datasets originating from seven countries. Models were evaluated under three scenarios: single-dataset training, leave-one-dataset-out (LODO), and limited-data all-source training. Our results show that performance strongly depends on training set composition, with certain datasets consistently yielding better model performance, and substantial variability in cross-dataset generalization. This study provides new insights into the design of fair and generalizable DL systems for breast cancer diagnostics.

Original languageEnglish
Title of host publicationArtificial Intelligence and Imaging for Diagnostic and Treatment Challenges in Breast Care - 2nd Deep Breast Workshop, Deep-Breath 2025, Held in Conjunction with MICCAI 2025, Proceedings
EditorsTianyu Zhang, Oliver Lester Saldanha, Luyi Han, Nika Rasoolzadeh, Lidia Garrucho Moras, Jarek van Dijk, Tao Tan, Jakob Nikolas Kather, Ritse Mann
PublisherSpringer Science and Business Media Deutschland GmbH
Pages21-30
Number of pages10
ISBN (Print)9783032055583
DOIs
Publication statusPublished - 2026
Event2nd Deep Breast Workshop on Artificial Intelligence and Imaging for Diagnostic and Treatment Challenges in Breast Care, Deep-Breath 2025, held in conjunction with the 28th International Conference on Medical Imaging and Computer-Assisted Intervention, MICCAI 2025 - Daejeon, Korea, Republic of
Duration: 23 Sept 202523 Sept 2025

Publication series

NameLecture Notes in Computer Science
Volume16142 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd Deep Breast Workshop on Artificial Intelligence and Imaging for Diagnostic and Treatment Challenges in Breast Care, Deep-Breath 2025, held in conjunction with the 28th International Conference on Medical Imaging and Computer-Assisted Intervention, MICCAI 2025
Country/TerritoryKorea, Republic of
CityDaejeon
Period23/09/2523/09/25

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Breast cancer
  • Data diversity
  • Deep learning
  • Domain generalization
  • Lesion classification
  • Ultrasound

Fingerprint

Dive into the research topics of 'Training Set Diversity: A Key Factor in AI-Driven Breast Ultrasound Classification'. Together they form a unique fingerprint.

Cite this