TY - JOUR
T1 - CoTrFuse
T2 - A novel framework by fusing CNN and transformer for medical image segmentation
AU - Chen, Yuanbin
AU - Wang, Tao
AU - Tang, Hui
AU - Zhao, Longxuan
AU - Zhang, Xinlin
AU - Tan, Tao
AU - Gao, Qinquan
AU - Du, Min
AU - Tong, Tong
N1 - Publisher Copyright:
© 2023 The Author(s). Published on behalf of Institute of Physics and Engineering in Medicine by IOP Publishing Ltd.
PY - 2023/9/7
Y1 - 2023/9/7
N2 - Medical image segmentation is a crucial and intricate process in medical image processing and analysis. With the advancements in artificial intelligence, deep learning techniques have been widely used in recent years for medical image segmentation. One such technique is the U-Net framework based on the U-shaped convolutional neural networks (CNN) and its variants. However, these methods have limitations in simultaneously capturing both the global and the remote semantic information due to the restricted receptive domain caused by the convolution operation's intrinsic features. Transformers are attention-based models with excellent global modeling capabilities, but their ability to acquire local information is limited. To address this, we propose a network that combines the strengths of both CNN and Transformer, called CoTrFuse. The proposed CoTrFuse network uses EfficientNet and Swin Transformer as dual encoders. The Swin Transformer and CNN Fusion module are combined to fuse the features of both branches before the skip connection structure. We evaluated the proposed network on two datasets: the ISIC-2017 challenge dataset and the COVID-QU-Ex dataset. Our experimental results demonstrate that the proposed CoTrFuse outperforms several state-of-the-art segmentation methods, indicating its superiority in medical image segmentation. The codes are available at https://github.com/BinYCn/CoTrFuse.
AB - Medical image segmentation is a crucial and intricate process in medical image processing and analysis. With the advancements in artificial intelligence, deep learning techniques have been widely used in recent years for medical image segmentation. One such technique is the U-Net framework based on the U-shaped convolutional neural networks (CNN) and its variants. However, these methods have limitations in simultaneously capturing both the global and the remote semantic information due to the restricted receptive domain caused by the convolution operation's intrinsic features. Transformers are attention-based models with excellent global modeling capabilities, but their ability to acquire local information is limited. To address this, we propose a network that combines the strengths of both CNN and Transformer, called CoTrFuse. The proposed CoTrFuse network uses EfficientNet and Swin Transformer as dual encoders. The Swin Transformer and CNN Fusion module are combined to fuse the features of both branches before the skip connection structure. We evaluated the proposed network on two datasets: the ISIC-2017 challenge dataset and the COVID-QU-Ex dataset. Our experimental results demonstrate that the proposed CoTrFuse outperforms several state-of-the-art segmentation methods, indicating its superiority in medical image segmentation. The codes are available at https://github.com/BinYCn/CoTrFuse.
KW - convolutional neural network
KW - medical image segmentation
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85168460074&partnerID=8YFLogxK
U2 - 10.1088/1361-6560/acede8
DO - 10.1088/1361-6560/acede8
M3 - Article
C2 - 37605997
AN - SCOPUS:85168460074
SN - 0031-9155
VL - 68
JO - Physics in Medicine and Biology
JF - Physics in Medicine and Biology
IS - 17
M1 - 175027
ER -