TY - GEN
T1 - Region-Based Text-Consistent Augmentation for Multimodal Medical Segmentation
AU - Cai, Kunyan
AU - Yan, Chenggang
AU - He, Min
AU - Qu, Liangqiong
AU - Wang, Shuai
AU - Tan, Tao
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Medical image segmentation is crucial for various clinical applications, and deep learning has significantly advanced this field. To further enhance performance, recent research explores multimodal data integration, combining medical images and textual reports. However, a critical challenge lies in image data augmentation for multimodal medical data, specifically in maintaining text-image consistency. Traditional augmentation techniques, designed for unimodal images, can introduce mismatches between augmented images and text, hindering effective multimodal learning. To address this, we introduce Region-Based Text-Consistent Augmentation (RBTCA), a novel framework for coherent multimodal augmentation. Our approach performs region-based image augmentation by first identifying image regions described in associated text reports and then extracting textual cues grounded in these regions. These cues are integrated into the image, and augmentation is subsequently performed on this modality-aware representation, ensuring inherent text-cue consistency. Notably, the RBTCA’s plug-and-play design allows for straightforward integration into existing medical image analysis pipelines, enhancing its practical utility. We demonstrate the efficacy of our framework on the QaTa-Covid19 and our in-house Lung Tumor CT Segmentation (LTCT) datasets, achieving substantial gains, with a Dice coefficient improvement of up to 7.24% when integrated into baseline segmentation models. Our code will be released on https://github.com/KunyanCAI/RBTCA.
AB - Medical image segmentation is crucial for various clinical applications, and deep learning has significantly advanced this field. To further enhance performance, recent research explores multimodal data integration, combining medical images and textual reports. However, a critical challenge lies in image data augmentation for multimodal medical data, specifically in maintaining text-image consistency. Traditional augmentation techniques, designed for unimodal images, can introduce mismatches between augmented images and text, hindering effective multimodal learning. To address this, we introduce Region-Based Text-Consistent Augmentation (RBTCA), a novel framework for coherent multimodal augmentation. Our approach performs region-based image augmentation by first identifying image regions described in associated text reports and then extracting textual cues grounded in these regions. These cues are integrated into the image, and augmentation is subsequently performed on this modality-aware representation, ensuring inherent text-cue consistency. Notably, the RBTCA’s plug-and-play design allows for straightforward integration into existing medical image analysis pipelines, enhancing its practical utility. We demonstrate the efficacy of our framework on the QaTa-Covid19 and our in-house Lung Tumor CT Segmentation (LTCT) datasets, achieving substantial gains, with a Dice coefficient improvement of up to 7.24% when integrated into baseline segmentation models. Our code will be released on https://github.com/KunyanCAI/RBTCA.
KW - Data Augmentation
KW - Medical Image Segmentation
KW - Multimodal Learning
KW - Text-Image Consistency
UR - https://www.scopus.com/pages/publications/105017855237
U2 - 10.1007/978-3-032-04947-6_51
DO - 10.1007/978-3-032-04947-6_51
M3 - Conference contribution
AN - SCOPUS:105017855237
SN - 9783032049469
T3 - Lecture Notes in Computer Science
SP - 533
EP - 543
BT - Medical Image Computing and Computer Assisted Intervention , MICCAI 2025 - 28th International Conference, 2025, Proceedings
A2 - Gee, James C.
A2 - Hong, Jaesung
A2 - Sudre, Carole H.
A2 - Golland, Polina
A2 - Alexander, Daniel C.
A2 - Iglesias, Juan Eugenio
A2 - Venkataraman, Archana
A2 - Kim, Jong Hyo
PB - Springer Science and Business Media Deutschland GmbH
T2 - 28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025
Y2 - 23 September 2025 through 27 September 2025
ER -