Abstract
In the field of medical consumer electronics, endoscopic imaging technology especially electronic nasopharyngoscope imaging, often suffers from low resolution, which poses a difficulty for endoscopic images classification due to the loss of image details. Recent advancements in Vision Transformer (ViT) based methods have shown promise in addressing this problem. However, ViT relies heavily on global context information to maintain performance, and the limited pixel count in low-resolution images poses a challenge in capturing adequate global context information. To address these challenges, we propose the Sequential Quaternion Vision Transformer (SQ-ViT), which improves multi-scale feature utilization by feeding sampled features into the subsequent encoder layers. Specifically, we introduce the Multi-scale Visual Feature Fusion (MVFF) module, which segments the image into multiple superpixel blocks and refines the contour and color information of the processed image, which helps to enhance the representation of visual features. Additionally, visual information would be captured more effectively by our proposed Quaternion Interactive Encoder (QIE). Experiments demonstrate the effectiveness of SQ-ViT in improving multi-scale feature utilization and addressing challenges in low-resolution endoscopic imaging for endoscopic images classification.
| Original language | English |
|---|---|
| Pages (from-to) | 828-838 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Consumer Electronics |
| Volume | 71 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 2025 |
Keywords
- Vision transformer
- endoscopic images classification
- endoscopy
- interpretability
- quaternion convolution
- superpixel
Fingerprint
Dive into the research topics of 'SQ-ViT: A Multi-Scale Vision Transformer With Quaternion for Endoscopic Images Classification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver