SQ-ViT: A Multi-Scale Vision Transformer With Quaternion For Endoscopic Images Classification

Zhanjun Jin, Guoheng Huang, Feng Zhang, Xiaochen Yuan, Dingzhou Zhu, Zhe Tan, Chi Man Pun, Guo Zhong

Research output: Contribution to journalArticlepeer-review

Abstract

In the field of medical consumer electronics, endoscopic imaging technology especially electronic nasopharyngoscope imaging, often suffers from low resolution, which poses a difficulty for endoscopic images classification due to the loss of image details. Recent advancements in Vision Transformer (ViT) based methods have shown promise in addressing this problem. However, ViT relies heavily on global context information to maintain performance, and the limited pixel count in lowresolution images poses a challenge in capturing adequate global context information. To address these challenges, we propose the Sequential Quaternion Vision Transformer (SQ-ViT), which improves multi-scale feature utilization by feeding sampled features into the subsequent encoder layers. Specifically, we introduce the Multi-scale Visual Feature Fusion (MVFF) module, which segments the image into multiple superpixel blocks and refines the contour and color information of the processed image, which helps to enhance the representation of visual features. Additionally, visual information would be captured more effectively by our proposed Quaternion Interactive Encoder (QIE). Experiments demonstrate the effectiveness of SQ-ViT in improving multi-scale feature utilization and addressing challenges in low-resolution endoscopic imaging for endoscopic images classification. The source code will be released at https://github.com/jinzhanjun625/SQViT.

Original languageEnglish
JournalIEEE Transactions on Consumer Electronics
DOIs
Publication statusAccepted/In press - 2024

Keywords

  • Endoscopic images Classification
  • Endoscopy
  • Interpretability
  • Quaternion Convolution
  • Superpixel
  • Vision Transformer

Fingerprint

Dive into the research topics of 'SQ-ViT: A Multi-Scale Vision Transformer With Quaternion For Endoscopic Images Classification'. Together they form a unique fingerprint.

Cite this