TY - JOUR
T1 - Adaptive multi-teacher knowledge distillation framework with foundation models for medical image analysis
AU - Liu, Dudu
AU - Gao, Yuan
AU - Zhang, Ningyi
AU - Wang, Xin
AU - Zhang, Tianyu
AU - Fan, Ming
AU - Sun, Yue
AU - Li, Shuo
AU - Tan, Tao
N1 - Publisher Copyright:
© 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
PY - 2026/4
Y1 - 2026/4
N2 - Foundation models (FMs) in medical imaging are increasingly specialized across vertical domains, yet their substantial computational demands and large parameter scales hinder deployment on resource-limited edge devices. Retraining these models for new tasks requires scarce high-quality data and significant computational resources, while cross-model knowledge transfer often introduces notable information loss. To address model coordination, capability migration, knowledge preservation, and practical edge deployment, we introduce MultiMedDistill, an adaptive multi-teacher distillation framework that integrates multiple heterogeneous FMs into a single lightweight student model. A dual-level gating mechanism enables dynamic teacher coordination, and a return decoder preserves semantic fidelity during feature projection. Across six benchmark datasets spanning ultrasound, endoscopy, fundus imaging, CT, and MRI, MultiMedDistill achieves 94.77% and 97.06% Dice on BUSI and Kvasir-SEG — improvements of 25.76% and 13.04% over baselines — while compressing the student model to 8.8M parameters (18× reduction). Ablation studies show that adaptive gating and reconstruction-based knowledge preservation contribute gains of 3.2% and 1.4%, respectively. These results demonstrate the framework’s effectiveness in transferring FM capabilities with minimal computational cost, enabling practical deployment on clinical edge devices.
AB - Foundation models (FMs) in medical imaging are increasingly specialized across vertical domains, yet their substantial computational demands and large parameter scales hinder deployment on resource-limited edge devices. Retraining these models for new tasks requires scarce high-quality data and significant computational resources, while cross-model knowledge transfer often introduces notable information loss. To address model coordination, capability migration, knowledge preservation, and practical edge deployment, we introduce MultiMedDistill, an adaptive multi-teacher distillation framework that integrates multiple heterogeneous FMs into a single lightweight student model. A dual-level gating mechanism enables dynamic teacher coordination, and a return decoder preserves semantic fidelity during feature projection. Across six benchmark datasets spanning ultrasound, endoscopy, fundus imaging, CT, and MRI, MultiMedDistill achieves 94.77% and 97.06% Dice on BUSI and Kvasir-SEG — improvements of 25.76% and 13.04% over baselines — while compressing the student model to 8.8M parameters (18× reduction). Ablation studies show that adaptive gating and reconstruction-based knowledge preservation contribute gains of 3.2% and 1.4%, respectively. These results demonstrate the framework’s effectiveness in transferring FM capabilities with minimal computational cost, enabling practical deployment on clinical edge devices.
KW - Adaptive gating mechanism
KW - Deep learning
KW - Foundation models
KW - Medical image analysis
KW - Model compression
KW - Multi-teacher knowledge distillation
UR - https://www.scopus.com/pages/publications/105034498401
U2 - 10.1016/j.compmedimag.2026.102739
DO - 10.1016/j.compmedimag.2026.102739
M3 - Article
C2 - 41797111
AN - SCOPUS:105034498401
SN - 0895-6111
VL - 130
JO - Computerized Medical Imaging and Graphics
JF - Computerized Medical Imaging and Graphics
M1 - 102739
ER -