摘要
Visual tracking that combines RGB and thermal infrared modalities (RGB-T) aims to utilize the useful information of each modality to achieve more robust object localization. Most existing tracking methods based on convolutional neural networks (CNNs) and Transformers emphasize integrating multi-modal features through cross-modal attention, but ignore the potential exploitability of complementary information learned by cross-modal attention for enhancing modal features. In this paper, we propose a novel hierarchical progressive fusion network based on cross-modal attention guided enhancement for RGB-T tracking. Specifically, the complementary information generated by cross-modal attention implicitly reflects the consistent regions of interest of important information between different modalities, which is used to enhance modal features in a targeted manner. In addition, a modal feature refinement module and a fusion module are designed based on dynamic routing to perform noise suppression and adaptive integration on the enhanced multi-modal features. Extensive experiments on GTOT, RGBT234, LasHeR and VTUAV show that our method has competitive performance compared with recent state-of-the-art methods.
| 原文 | English |
|---|---|
| 頁(從 - 到) | 276-280 |
| 頁數 | 5 |
| 期刊 | IEEE Signal Processing Letters |
| 卷 | 33 |
| DOIs | |
| 出版狀態 | Published - 11月 2025 |
指紋
深入研究「Cross-Modal Attention Guided Enhanced Fusion Network for RGB-T Tracking」主題。共同形成了獨特的指紋。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver