TY - GEN
T1 - C2MAOT
T2 - 28th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025
AU - Huang, Jiaju
AU - Chen, Shaobin
AU - Liang, Xinglong
AU - Yang, Xiao
AU - Zhang, Zhuoneng
AU - Sun, Yue
AU - Wang, Ying
AU - Tan, Tao
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Accurate cancer segmentation in PET-CT images is crucial for oncology, yet remains challenging due to lesion diversity, data scarcity, and modality heterogeneity. Existing methods often struggle to effectively fuse cross-modal information and leverage self-supervised learning for improved representation. In this paper, we introduce C2MAOT, a Cross-modal Complementary Masked Autoencoder with Optimal Transport framework for PET-CT cancer segmentation. Our method employs a novel modality-complementary masking strategy during pre-training to explicitly encourage cross-modal learning between PET and CT encoders. Furthermore, we integrate an optimal transport loss to guide the alignment of feature distributions across modalities, facilitating robust multi-modal fusion. Experimental results on two datasets demonstrate that C2MAOT outperforms existing state-of-the-art methods, achieving significant improvements in segmentation accuracy across five cancer types. These results establish our proposed method as an effective approach for tumor segmentation in PET-CT imaging. Our code is available at https://github.com/hjj194/c2maot.
AB - Accurate cancer segmentation in PET-CT images is crucial for oncology, yet remains challenging due to lesion diversity, data scarcity, and modality heterogeneity. Existing methods often struggle to effectively fuse cross-modal information and leverage self-supervised learning for improved representation. In this paper, we introduce C2MAOT, a Cross-modal Complementary Masked Autoencoder with Optimal Transport framework for PET-CT cancer segmentation. Our method employs a novel modality-complementary masking strategy during pre-training to explicitly encourage cross-modal learning between PET and CT encoders. Furthermore, we integrate an optimal transport loss to guide the alignment of feature distributions across modalities, facilitating robust multi-modal fusion. Experimental results on two datasets demonstrate that C2MAOT outperforms existing state-of-the-art methods, achieving significant improvements in segmentation accuracy across five cancer types. These results establish our proposed method as an effective approach for tumor segmentation in PET-CT imaging. Our code is available at https://github.com/hjj194/c2maot.
KW - Cross-modal Fusion
KW - PET-CT Segmentation
KW - Self-supervised Learning
UR - https://www.scopus.com/pages/publications/105017849849
U2 - 10.1007/978-3-032-04927-8_9
DO - 10.1007/978-3-032-04927-8_9
M3 - Conference contribution
AN - SCOPUS:105017849849
SN - 9783032049261
T3 - Lecture Notes in Computer Science
SP - 87
EP - 97
BT - Medical Image Computing and Computer Assisted Intervention, MICCAI 2025 - 28th International Conference, 2025, Proceedings
A2 - Gee, James C.
A2 - Hong, Jaesung
A2 - Sudre, Carole H.
A2 - Golland, Polina
A2 - Alexander, Daniel C.
A2 - Iglesias, Juan Eugenio
A2 - Venkataraman, Archana
A2 - Kim, Jong Hyo
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 23 September 2025 through 27 September 2025
ER -