TY - GEN
T1 - RST-UNet
T2 - 31st International Conference on Neural Information Processing, ICONIP 2024
AU - Ye, Songhang
AU - Feng, Zhoule
AU - Huang, Guoheng
AU - Ke, Jinghong
AU - Chen, Xuhang
AU - Pun, Chi Man
AU - Zhong, Guo
AU - Yuan, Xiaochen
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Medical image segmentation has advanced with models like UCTransNet, TransUNet, and TransClaw U-Net, which integrate Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). However, these models face limitations due to the locality of convolutions and the computational demands of Transformers. To overcome these challenges, we introduce RST-UNet, an innovative encoder-decoder network that balances effectiveness with computational efficiency. RST-UNet features two groundbreaking innovations: the Compact Representation Block (CRB) and the Compact Dependency Modeling Block (CDMB). The CRB utilizes superpixel pooling to capture long-range dependencies while minimizing parameters and computation time. The CDMB integrates superpixel unpooling with attention mechanisms and Rotary Position Embedding (RoPE) to enhance long-range dependency modeling. This approach emphasizes critical regions and leverages RoPE to capture extensive image dependencies effectively. Our experimental results on publicly available synapse datasets highlight RST-UNet’s exceptional performance, particularly in segmenting small organs such as the gallbladder, right kidney, and pancreas. Remarkably, RST-UNet achieves superior results without pre-training, showcasing its high adaptability for diverse medical image segmentation tasks. This work represents a significant advancement in developing efficient and effective algorithms for medical image analysis.
AB - Medical image segmentation has advanced with models like UCTransNet, TransUNet, and TransClaw U-Net, which integrate Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). However, these models face limitations due to the locality of convolutions and the computational demands of Transformers. To overcome these challenges, we introduce RST-UNet, an innovative encoder-decoder network that balances effectiveness with computational efficiency. RST-UNet features two groundbreaking innovations: the Compact Representation Block (CRB) and the Compact Dependency Modeling Block (CDMB). The CRB utilizes superpixel pooling to capture long-range dependencies while minimizing parameters and computation time. The CDMB integrates superpixel unpooling with attention mechanisms and Rotary Position Embedding (RoPE) to enhance long-range dependency modeling. This approach emphasizes critical regions and leverages RoPE to capture extensive image dependencies effectively. Our experimental results on publicly available synapse datasets highlight RST-UNet’s exceptional performance, particularly in segmenting small organs such as the gallbladder, right kidney, and pancreas. Remarkably, RST-UNet achieves superior results without pre-training, showcasing its high adaptability for diverse medical image segmentation tasks. This work represents a significant advancement in developing efficient and effective algorithms for medical image analysis.
KW - Medical image segmentation
KW - Rotary Position Embedding
KW - Superpixel
KW - TransUNet
UR - https://www.scopus.com/pages/publications/105009907877
U2 - 10.1007/978-981-96-6969-1_23
DO - 10.1007/978-981-96-6969-1_23
M3 - Conference contribution
AN - SCOPUS:105009907877
SN - 9789819669684
T3 - Communications in Computer and Information Science
SP - 336
EP - 350
BT - Neural Information Processing - 31st International Conference, ICONIP 2024, Proceedings
A2 - Mahmud, Mufti
A2 - Doborjeh, Maryam
A2 - Wong, Kevin
A2 - Leung, Andrew Chi Sing
A2 - Doborjeh, Zohreh
A2 - Tanveer, M.
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 2 December 2024 through 6 December 2024
ER -