TY - JOUR
T1 - ImageNet pre-training and two-step transfer learning in chromosome image classification
AU - Chen, Tianhao
AU - Xie, Can
AU - Zhang, Wenhua
AU - Li, Yufei
AU - Ke, Wei
AU - Li, Tian
AU - Huang, Xiujing
AU - Li, Kefeng
N1 - Publisher Copyright:
© The Author(s) 2026.
PY - 2026/12
Y1 - 2026/12
N2 - Chromosome image classification typically relies on ImageNet pre-training, yet the potential of leveraging intermediate domains from related staining techniques remains largely underexplored. Here, we evaluate two-step transfer learning–where classifiers are first fine-tuned on an intermediate domain before targeting the final classification task–across Q-band (BioImLab dataset) and G-band (CIR dataset) chromosome classification. Each dataset serves as intermediate domain for the other. Across 11 architecture families and three training approaches, models achieve improvements when domain similarity is high and data quality is limited: modern architectures (ConvNeXt, Swin Transformer, ViT, MobileNetV3) show + 0.8 to + 3.3 percentage point gains in Macro-F1 on Q-band classification, while traditional CNNs benefit less or show no improvement. On the higher-quality G-band dataset, all architectures approach performance saturation, with minimal gains from two-step transfer (+ 0.1 to + 0.7 percentage points). Consistent results across both transfer directions demonstrate that, with appropriate architecture selection and intermediate domain similarity, two-step transfer learning can boost performance when target datasets are challenging, while ImageNet pre-training alone suffices for high-quality data. The code is publicly available at https://github.com/MuscleOne/chromosome_TL.
AB - Chromosome image classification typically relies on ImageNet pre-training, yet the potential of leveraging intermediate domains from related staining techniques remains largely underexplored. Here, we evaluate two-step transfer learning–where classifiers are first fine-tuned on an intermediate domain before targeting the final classification task–across Q-band (BioImLab dataset) and G-band (CIR dataset) chromosome classification. Each dataset serves as intermediate domain for the other. Across 11 architecture families and three training approaches, models achieve improvements when domain similarity is high and data quality is limited: modern architectures (ConvNeXt, Swin Transformer, ViT, MobileNetV3) show + 0.8 to + 3.3 percentage point gains in Macro-F1 on Q-band classification, while traditional CNNs benefit less or show no improvement. On the higher-quality G-band dataset, all architectures approach performance saturation, with minimal gains from two-step transfer (+ 0.1 to + 0.7 percentage points). Consistent results across both transfer directions demonstrate that, with appropriate architecture selection and intermediate domain similarity, two-step transfer learning can boost performance when target datasets are challenging, while ImageNet pre-training alone suffices for high-quality data. The code is publicly available at https://github.com/MuscleOne/chromosome_TL.
KW - Chromosome karyotyping
KW - Image classification
KW - ImageNet
KW - Pre-training
KW - Two-step transfer learning
UR - https://www.scopus.com/pages/publications/105031088143
U2 - 10.1038/s41598-026-38662-w
DO - 10.1038/s41598-026-38662-w
M3 - Article
C2 - 41652092
AN - SCOPUS:105031088143
SN - 2045-2322
VL - 16
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 7572
ER -