Enhancing Video Grounding with Dual-Path Modality Fusion on Animal Kingdom Datasets

Chengpeng Xiong, Zhengxuan Chen, Nuoer Long, Kin Seong Un, Zhuolin Li, Shaobin Chen, Tao Tan, Chan Tong Lam, Yue Sun

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Video grounding, which involves aligning spoken language descriptions with corresponding video segments, plays a critical role in advancing multimedia content understanding. De-spite progress enabled by deep learning in multi-modal learning, this task faces significant challenges within complex datasets such as the Animal Kingdom, which features diverse and intricate natural scenes. Motivated by the need to en-hance cross-modal alignment and achieve robust localization, this study introduces a refined approach based on the Uni-VTG model. We enhance the model through the integration of dual-path modality fusion and a sophisticated multi-modal encoder. This method employs a dual-path mechanism to effectively fuse modalities and an advanced training strategy tailored for the complex requirements of the Animal King-dom dataset. The evaluation on this dataset shows significant improvements in accuracy and robustness, as well as an enhanced mean Intersection over Union (IoU), validating the effectiveness of our approach in navigating the complexities of natural environment video grounding.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350379815
DOIs
Publication statusPublished - 2024
Event2024 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2024 - Niagara Falls, Canada
Duration: 15 Jul 202419 Jul 2024

Publication series

Name2024 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2024

Conference

Conference2024 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2024
Country/TerritoryCanada
CityNiagara Falls
Period15/07/2419/07/24

Keywords

  • Animal Kingdom Dataset
  • Modality Fusion
  • Video Grounding

Fingerprint

Dive into the research topics of 'Enhancing Video Grounding with Dual-Path Modality Fusion on Animal Kingdom Datasets'. Together they form a unique fingerprint.

Cite this