TY - JOUR
T1 - M3 SegNet
T2 - A Multi-Modal and Multi-Branch Framework for Nasopharyngeal Carcinoma Segmentation in Radiotherapy Planning
AU - Ma, Junqiang
AU - Han, Luyi
AU - Tong, Henry H.Y.
AU - Jia, Dengqiang
AU - Xie, Hui
AU - Lee, Anne W.M.
AU - Hung, Hing Ming
AU - Tan, Tao
AU - Soong, Sung Inda
AU - Sun, Yue
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2026
Y1 - 2026
N2 - Accurate and simultaneous labeling of multiple structures, including gross tumor volumes, clinical target volumes, and organs at risk, is a fundamental multi-task requirement for radiotherapy planning in nasopharyngeal carcinoma. However, conventional manual labeling is labor-intensive and suffers from substantial inter-observer variability. This variability poses a significant challenge to the multi-modal interpretation of CT and MRI scans. Against this backdrop, automated approaches, particularly multi-modal and multi-task learning, are promising solutions. However, their clinical adoption is limited by three urgent needs: attention mechanisms that fuse multi-modal information at both local and global views, explicit incorporation of anatomical priors to regularize predictions, and a unified framework that enables concurrent segmentation of all desired structures. To overcome these limitations, we propose M3 SegNet, a novel multi-modal and multi-branch framework that concurrently performs all clinically relevant segmentation tasks, integrating feature fusion and anatomical guidance. Our primary contributions are threefold. First, we introduce the Synergistic Global-Local Attention that extracts informative features from various imaging modalities (CT, T1-weighted, T2-weighted, and T1 contrast). Second, we propose an Anatomy-Aware Hierarchical Learning strategy that uses OAR spatial information to guide tumor segmentation. We also integrate Random Modality Dropout to enhance robustness against missing modalities. We validated M3 SegNet on an internal 257-patient NPC dataset and confirmed its generalizability on three external datasets. In experiments, our framework significantly outperformed state-of-the-art methods. By providing a mechanism to leverage multi-modal information and anatomical priors, our M3 SegNet offers a reliable, automated, and clinically translatable solution for NPC radiotherapy planning.
AB - Accurate and simultaneous labeling of multiple structures, including gross tumor volumes, clinical target volumes, and organs at risk, is a fundamental multi-task requirement for radiotherapy planning in nasopharyngeal carcinoma. However, conventional manual labeling is labor-intensive and suffers from substantial inter-observer variability. This variability poses a significant challenge to the multi-modal interpretation of CT and MRI scans. Against this backdrop, automated approaches, particularly multi-modal and multi-task learning, are promising solutions. However, their clinical adoption is limited by three urgent needs: attention mechanisms that fuse multi-modal information at both local and global views, explicit incorporation of anatomical priors to regularize predictions, and a unified framework that enables concurrent segmentation of all desired structures. To overcome these limitations, we propose M3 SegNet, a novel multi-modal and multi-branch framework that concurrently performs all clinically relevant segmentation tasks, integrating feature fusion and anatomical guidance. Our primary contributions are threefold. First, we introduce the Synergistic Global-Local Attention that extracts informative features from various imaging modalities (CT, T1-weighted, T2-weighted, and T1 contrast). Second, we propose an Anatomy-Aware Hierarchical Learning strategy that uses OAR spatial information to guide tumor segmentation. We also integrate Random Modality Dropout to enhance robustness against missing modalities. We validated M3 SegNet on an internal 257-patient NPC dataset and confirmed its generalizability on three external datasets. In experiments, our framework significantly outperformed state-of-the-art methods. By providing a mechanism to leverage multi-modal information and anatomical priors, our M3 SegNet offers a reliable, automated, and clinically translatable solution for NPC radiotherapy planning.
KW - Clinical target volume
KW - Gross tumor volumes
KW - Medical images Segmentation
KW - Modality Contribution
KW - Modality Missing
KW - Multi-Modal Segmentation
KW - Multi-Task
KW - Organs at Risk
UR - https://www.scopus.com/pages/publications/105028851101
U2 - 10.1109/JBHI.2026.3658081
DO - 10.1109/JBHI.2026.3658081
M3 - Article
AN - SCOPUS:105028851101
SN - 2168-2194
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
ER -