Abstract
Accurate segmentation of breast cancer in PET-CT images is crucial for precise staging, monitoring treatment response, and guiding personalized therapy. However, the small size and dispersed nature of metastatic lesions, coupled with the scarcity of annotated data and heterogeneity between modalities that hinders effective information fusion, make this task challenging. This paper proposes a novel anatomy-guided cross-modal learning framework to address these issues. Our approach first generates organ pseudo-labels through a teacher-student learning paradigm, which serve as anatomical prompts to guide cancer segmentation. We then introduce a self-aligning cross-modal pre-training method that aligns PET and CT features in a shared latent space through masked 3D patch reconstruction, enabling effective cross-modal feature fusion. Finally, we initialize the segmentation network’s encoder with the pre-trained encoder weights, and incorporate organ labels through a Mamba-based prompt encoder and Hypernet-Controlled Cross-Attention mechanism for dynamic anatomical feature extraction and fusion. Notably, our method outperforms eight state-of-the-art methods, including CNN-based, transformer-based, and Mamba-based approaches, on two datasets encompassing primary breast cancer, metastatic breast cancer, and other types of cancer segmentation tasks.
| Original language | English |
|---|---|
| Article number | 103956 |
| Journal | Medical Image Analysis |
| Volume | 110 |
| DOIs | |
| Publication status | Published - May 2026 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Breast cancer
- Cross-modal fusion
- PET-CT
- Volumetric image segmentation
Fingerprint
Dive into the research topics of 'Anatomy-guided prompting with cross-modal self-alignment for whole-body PET-CT breast cancer segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver