TY - GEN
T1 - TRRG
T2 - 28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025
AU - Wang, Yuhao
AU - Sun, Yue
AU - Tan, Tao
AU - Hao, Chao
AU - Cui, Yawen
AU - Su, Xinqi
AU - Xie, Weichen
AU - Shen, Linlin
AU - Yu, Zitong
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - The vision-language capabilities of multi-modal large language models have gained attention, but radiology report generation still faces challenges due to imbalanced data distribution and weak alignment between reports and radiographs. To address these issue, we propose TRRG, a stage-wise training framework for truthful radiology report generation. In the pre-training stage, contrastive learning enhances the visual encoder’s ability to capture fine-grained disease details. In the fine-tuning stage, our clue injection module improves disease perception by integrating robust zero-shot disease recognition. Finally, the cross-modal clue interaction module enables effective multi-granular fusion of visual and disease clue embeddings, significantly improving report generation and clinical effectiveness. Experiments on IU-Xray and MIMIC-CXR show that TRRG achieves state-of-the-art performance, enhancing disease perception and clinical utility.
AB - The vision-language capabilities of multi-modal large language models have gained attention, but radiology report generation still faces challenges due to imbalanced data distribution and weak alignment between reports and radiographs. To address these issue, we propose TRRG, a stage-wise training framework for truthful radiology report generation. In the pre-training stage, contrastive learning enhances the visual encoder’s ability to capture fine-grained disease details. In the fine-tuning stage, our clue injection module improves disease perception by integrating robust zero-shot disease recognition. Finally, the cross-modal clue interaction module enables effective multi-granular fusion of visual and disease clue embeddings, significantly improving report generation and clinical effectiveness. Experiments on IU-Xray and MIMIC-CXR show that TRRG achieves state-of-the-art performance, enhancing disease perception and clinical utility.
KW - Chest X-ray
KW - Large Language Model
KW - Radiology Report Generation
UR - https://www.scopus.com/pages/publications/105017851329
U2 - 10.1007/978-3-032-04981-0_61
DO - 10.1007/978-3-032-04981-0_61
M3 - Conference contribution
AN - SCOPUS:105017851329
SN - 9783032049803
T3 - Lecture Notes in Computer Science
SP - 647
EP - 657
BT - Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 - 28th International Conference, Proceedings
A2 - Gee, James C.
A2 - Hong, Jaesung
A2 - Sudre, Carole H.
A2 - Golland, Polina
A2 - Alexander, Daniel C.
A2 - Iglesias, Juan Eugenio
A2 - Venkataraman, Archana
A2 - Kim, Jong Hyo
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 23 September 2025 through 27 September 2025
ER -