Enhanced Video Caption Generation Based on Multimodal Features

  • Xuefei Huang
  • , Wei Ke
  • , Hao Sheng

研究成果: Conference contribution同行評審

3 引文 斯高帕斯(Scopus)

摘要

Video caption is the automatically generated of abstract expressions for the content contained in videos. It involves two important fields - computer vision and natural language processing, and has become a considerable research topic in smart life. Deep learning has successfully contributed to this task with good results. As we know, video contains various modals of information, yet most of the existing solutions start from the visual perspective of video, while ignoring the equally important audio modal information. Therefore, how to benefit from additional forms of cues other than visual information is a huge challenge. In this work, we propose a video caption generation method that fuses multimodal features in videos, and adds attention mechanism to improve the quality of generated description sentences. The experimental results demonstrate that the method is well validated on the MSR-VTT dataset.

原文English
主出版物標題6th IEEE International Conference on Universal Village, UV 2022
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9781665474771
DOIs
出版狀態Published - 2022
事件6th IEEE International Conference on Universal Village, UV 2022 - Hybrid, Boston, United States
持續時間: 22 10月 202225 10月 2022

出版系列

名字6th IEEE International Conference on Universal Village, UV 2022

Conference

Conference6th IEEE International Conference on Universal Village, UV 2022
國家/地區United States
城市Hybrid, Boston
期間22/10/2225/10/22

指紋

深入研究「Enhanced Video Caption Generation Based on Multimodal Features」主題。共同形成了獨特的指紋。

引用此