Medical image segmentation is a crucial and intricate process in medical image processing and analysis. With the advancements in artificial intelligence, deep learning techniques have been widely used in recent years for medical image segmentation. One such technique is the U-Net framework based on the U-shaped convolutional neural networks (CNN) and its variants. However, these methods have limitations in simultaneously capturing both the global and the remote semantic information due to the restricted receptive domain caused by the convolution operation's intrinsic features. Transformers are attention-based models with excellent global modeling capabilities, but their ability to acquire local information is limited. To address this, we propose a network that combines the strengths of both CNN and Transformer, called CoTrFuse. The proposed CoTrFuse network uses EfficientNet and Swin Transformer as dual encoders. The Swin Transformer and CNN Fusion module are combined to fuse the features of both branches before the skip connection structure. We evaluated the proposed network on two datasets: the ISIC-2017 challenge dataset and the COVID-QU-Ex dataset. Our experimental results demonstrate that the proposed CoTrFuse outperforms several state-of-the-art segmentation methods, indicating its superiority in medical image segmentation. The codes are available at https://github.com/BinYCn/CoTrFuse.
- convolutional neural network
- medical image segmentation