跳至主導覽 跳至搜尋 跳過主要內容

Enhancing Video Grounding with Dual-Path Modality Fusion on Animal Kingdom Datasets

  • Chengpeng Xiong
  • , Zhengxuan Chen
  • , Nuoer Long
  • , Kin Seong Un
  • , Zhuolin Li
  • , Shaobin Chen
  • , Tao Tan
  • , Chan Tong Lam
  • , Yue Sun

研究成果: Conference contribution同行評審

1 引文 斯高帕斯(Scopus)

摘要

Video grounding, which involves aligning spoken language descriptions with corresponding video segments, plays a critical role in advancing multimedia content understanding. De-spite progress enabled by deep learning in multi-modal learning, this task faces significant challenges within complex datasets such as the Animal Kingdom, which features diverse and intricate natural scenes. Motivated by the need to en-hance cross-modal alignment and achieve robust localization, this study introduces a refined approach based on the Uni-VTG model. We enhance the model through the integration of dual-path modality fusion and a sophisticated multi-modal encoder. This method employs a dual-path mechanism to effectively fuse modalities and an advanced training strategy tailored for the complex requirements of the Animal King-dom dataset. The evaluation on this dataset shows significant improvements in accuracy and robustness, as well as an enhanced mean Intersection over Union (IoU), validating the effectiveness of our approach in navigating the complexities of natural environment video grounding.

原文English
主出版物標題2024 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2024
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9798350379815
DOIs
出版狀態Published - 2024
事件2024 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2024 - Niagara Falls, Canada
持續時間: 15 7月 202419 7月 2024

出版系列

名字2024 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2024

Conference

Conference2024 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2024
國家/地區Canada
城市Niagara Falls
期間15/07/2419/07/24

指紋

深入研究「Enhancing Video Grounding with Dual-Path Modality Fusion on Animal Kingdom Datasets」主題。共同形成了獨特的指紋。

引用此