RefineNet: Elevating Medical Foundation Models Through Quality-Centric Data Curation by MLLM-Annotated Proxy Distillation

Ningyi Zhang, Yuan Gao, Xin Wang, Ka Hou Chan, Jian Wu, Chan Tong Lam, Shanshan Wang, Yue Sun, Sio Kei Im, Tao Tan

研究成果: Conference contribution同行評審

摘要

The rapid advancement of medical foundation models creates unprecedented demand for large-scale training data, yet existing medical repositories remain contaminated by heterogeneous mixtures of high- and low-quality image-text pairs—a severe data pollution problem that significantly bottlenecks model performance and optimization. While manual curation could theoretically ensure quality, it is impractical for managing large-scale datasets effectively.To address this critical challenge, we introduce RefineNet—a scalable framework that systematically refines data quality by distilling multimodal large language model (MLLM) insights into an offline reward model.RefineNet innovatively decouples human decision-making for quality assessment into two key dimensions: image-text fidelity and semantic consistency. By strategically filtering and curating datasets, RefineNet demonstrates remarkable performance improvements across diagnostic tasks. Specifically, our method selects 50% high-quality data subsets that outperform full-data baselines by 9.15% in Recall@10 (retrieval), 85.59 AUC (classification), and 72.59% accuracy (visual question answering). Moreover, RefineNet achieves notable agreement with human expert judgments (Pearson’s r = 0.67), providing clinicians an auditable bridge between automated curation and validation.

原文English
主出版物標題Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 - 28th International Conference, 2025, Proceedings
編輯James C. Gee, Jaesung Hong, Carole H. Sudre, Polina Golland, Jinah Park, Daniel C. Alexander, Juan Eugenio Iglesias, Archana Venkataraman, Jong Hyo Kim
發行者Springer Science and Business Media Deutschland GmbH
頁面498-508
頁數11
ISBN(列印)9783032051400
DOIs
出版狀態Published - 2026
事件28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 - Daejeon, Korea, Republic of
持續時間: 23 9月 202527 9月 2025

出版系列

名字Lecture Notes in Computer Science
15970 LNCS
ISSN(列印)0302-9743
ISSN(電子)1611-3349

Conference

Conference28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025
國家/地區Korea, Republic of
城市Daejeon
期間23/09/2527/09/25

指紋

深入研究「RefineNet: Elevating Medical Foundation Models Through Quality-Centric Data Curation by MLLM-Annotated Proxy Distillation」主題。共同形成了獨特的指紋。

引用此