CoTrFuse: A novel framework by fusing CNN and transformer for medical image segmentation

Yuanbin Chen, Tao Wang, Hui Tang, Longxuan Zhao, Xinlin Zhang, Tao Tan, Qinquan Gao, Min Du, Tong Tong

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)

Abstract

Medical image segmentation is a crucial and intricate process in medical image processing and analysis. With the advancements in artificial intelligence, deep learning techniques have been widely used in recent years for medical image segmentation. One such technique is the U-Net framework based on the U-shaped convolutional neural networks (CNN) and its variants. However, these methods have limitations in simultaneously capturing both the global and the remote semantic information due to the restricted receptive domain caused by the convolution operation's intrinsic features. Transformers are attention-based models with excellent global modeling capabilities, but their ability to acquire local information is limited. To address this, we propose a network that combines the strengths of both CNN and Transformer, called CoTrFuse. The proposed CoTrFuse network uses EfficientNet and Swin Transformer as dual encoders. The Swin Transformer and CNN Fusion module are combined to fuse the features of both branches before the skip connection structure. We evaluated the proposed network on two datasets: the ISIC-2017 challenge dataset and the COVID-QU-Ex dataset. Our experimental results demonstrate that the proposed CoTrFuse outperforms several state-of-the-art segmentation methods, indicating its superiority in medical image segmentation. The codes are available at https://github.com/BinYCn/CoTrFuse.

Original languageEnglish
Article number175027
JournalPhysics in Medicine and Biology
Volume68
Issue number17
DOIs
Publication statusPublished - 7 Sept 2023

Keywords

  • convolutional neural network
  • medical image segmentation
  • transformer

Fingerprint

Dive into the research topics of 'CoTrFuse: A novel framework by fusing CNN and transformer for medical image segmentation'. Together they form a unique fingerprint.

Cite this