跳至主導覽 跳至搜尋 跳過主要內容

Cross-Modal Attention Guided Enhanced Fusion Network for RGB-T Tracking

  • Jun Liu
  • , Wei Ke
  • , Shuai Wang
  • , Da Yang
  • , Hao Sheng

研究成果: Article同行評審

摘要

Visual tracking that combines RGB and thermal infrared modalities (RGB-T) aims to utilize the useful information of each modality to achieve more robust object localization. Most existing tracking methods based on convolutional neural networks (CNNs) and Transformers emphasize integrating multi-modal features through cross-modal attention, but ignore the potential exploitability of complementary information learned by cross-modal attention for enhancing modal features. In this paper, we propose a novel hierarchical progressive fusion network based on cross-modal attention guided enhancement for RGB-T tracking. Specifically, the complementary information generated by cross-modal attention implicitly reflects the consistent regions of interest of important information between different modalities, which is used to enhance modal features in a targeted manner. In addition, a modal feature refinement module and a fusion module are designed based on dynamic routing to perform noise suppression and adaptive integration on the enhanced multi-modal features. Extensive experiments on GTOT, RGBT234, LasHeR and VTUAV show that our method has competitive performance compared with recent state-of-the-art methods.

原文English
頁(從 - 到)276-280
頁數5
期刊IEEE Signal Processing Letters
33
DOIs
出版狀態Published - 11月 2025

指紋

深入研究「Cross-Modal Attention Guided Enhanced Fusion Network for RGB-T Tracking」主題。共同形成了獨特的指紋。

引用此