Region-Based Text-Consistent Augmentation for Multimodal Medical Segmentation

Kunyan Cai, Chenggang Yan, Min He, Liangqiong Qu, Shuai Wang, Tao Tan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Medical image segmentation is crucial for various clinical applications, and deep learning has significantly advanced this field. To further enhance performance, recent research explores multimodal data integration, combining medical images and textual reports. However, a critical challenge lies in image data augmentation for multimodal medical data, specifically in maintaining text-image consistency. Traditional augmentation techniques, designed for unimodal images, can introduce mismatches between augmented images and text, hindering effective multimodal learning. To address this, we introduce Region-Based Text-Consistent Augmentation (RBTCA), a novel framework for coherent multimodal augmentation. Our approach performs region-based image augmentation by first identifying image regions described in associated text reports and then extracting textual cues grounded in these regions. These cues are integrated into the image, and augmentation is subsequently performed on this modality-aware representation, ensuring inherent text-cue consistency. Notably, the RBTCA’s plug-and-play design allows for straightforward integration into existing medical image analysis pipelines, enhancing its practical utility. We demonstrate the efficacy of our framework on the QaTa-Covid19 and our in-house Lung Tumor CT Segmentation (LTCT) datasets, achieving substantial gains, with a Dice coefficient improvement of up to 7.24% when integrated into baseline segmentation models. Our code will be released on https://github.com/KunyanCAI/RBTCA.

Original languageEnglish
Title of host publicationMedical Image Computing and Computer Assisted Intervention , MICCAI 2025 - 28th International Conference, 2025, Proceedings
EditorsJames C. Gee, Jaesung Hong, Carole H. Sudre, Polina Golland, Daniel C. Alexander, Juan Eugenio Iglesias, Archana Venkataraman, Jong Hyo Kim
PublisherSpringer Science and Business Media Deutschland GmbH
Pages533-543
Number of pages11
ISBN (Print)9783032049469
DOIs
Publication statusPublished - 2026
Event28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 - Daejeon, Korea, Republic of
Duration: 23 Sept 202527 Sept 2025

Publication series

NameLecture Notes in Computer Science
Volume15962 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025
Country/TerritoryKorea, Republic of
CityDaejeon
Period23/09/2527/09/25

Keywords

  • Data Augmentation
  • Medical Image Segmentation
  • Multimodal Learning
  • Text-Image Consistency

Fingerprint

Dive into the research topics of 'Region-Based Text-Consistent Augmentation for Multimodal Medical Segmentation'. Together they form a unique fingerprint.

Cite this