跳至主導覽 跳至搜尋 跳過主要內容

SQ-ViT: A Multi-Scale Vision Transformer With Quaternion for Endoscopic Images Classification

  • Zhanjun Jin
  • , Guoheng Huang
  • , Feng Zhang
  • , Xiaochen Yuan
  • , Dingzhou Zhu
  • , Zhe Tan
  • , Chi Man Pun
  • , Guo Zhong
  • Guangdong University of Technology
  • Sun Yat-sen University
  • University of Macau
  • Guangdong University of Foreign Studies

研究成果: Article同行評審

1 引文 斯高帕斯(Scopus)

摘要

In the field of medical consumer electronics, endoscopic imaging technology especially electronic nasopharyngoscope imaging, often suffers from low resolution, which poses a difficulty for endoscopic images classification due to the loss of image details. Recent advancements in Vision Transformer (ViT) based methods have shown promise in addressing this problem. However, ViT relies heavily on global context information to maintain performance, and the limited pixel count in low-resolution images poses a challenge in capturing adequate global context information. To address these challenges, we propose the Sequential Quaternion Vision Transformer (SQ-ViT), which improves multi-scale feature utilization by feeding sampled features into the subsequent encoder layers. Specifically, we introduce the Multi-scale Visual Feature Fusion (MVFF) module, which segments the image into multiple superpixel blocks and refines the contour and color information of the processed image, which helps to enhance the representation of visual features. Additionally, visual information would be captured more effectively by our proposed Quaternion Interactive Encoder (QIE). Experiments demonstrate the effectiveness of SQ-ViT in improving multi-scale feature utilization and addressing challenges in low-resolution endoscopic imaging for endoscopic images classification.

原文English
頁(從 - 到)828-838
頁數11
期刊IEEE Transactions on Consumer Electronics
71
發行號1
DOIs
出版狀態Published - 2025

指紋

深入研究「SQ-ViT: A Multi-Scale Vision Transformer With Quaternion for Endoscopic Images Classification」主題。共同形成了獨特的指紋。

引用此