Towards Further Comprehension on Referring Expression with Rationale

Rengang Li, Baoyu Fan, Xiaochuan Li, Runze Zhang, Zhenhua Guo, Kun Zhao, Yaqian Zhao, Weifeng Gong, Endong Wang

研究成果: Conference contribution同行評審

1 引文 斯高帕斯(Scopus)

摘要

Referring Expression Comprehension (REC) is one important research branch in visual grounding, where the goal of REC is to localize a relevant object in the image, given an expression in the form of text to exactly describe a specific object. However, existing REC tasks aim at text content filtering and image object locating, which are evaluated based on the precision of the detection boxes. This may lead models to skip the learning process of multimodal comprehension directly and achieve good performance. In this paper, we work on how to enable an artificial agent to understand RE further and propose a more comprehensive task, called Further Comprehension on Referring Expression (FREC). In this task, we mainly focus on three sub-tasks: 1) correcting the erroneous text expression based on visual information; 2) generating the rationale of this input expression; 3) localizing the proper object based on the corrected expression. Accordingly, we make a new dataset named Further-RefCOCOs based on the RefCOCO, RefCOCO+, RefCOCOg benchmark datasets for this new task and make it publicly available. After that, we design a novel end-to-end pipeline to achieve these sub-tasks simultaneously. The experimental results demonstrate the validity of the proposed pipeline. We believe this work will motivate more researchers to explore along with this direction, and promote the development of visual grounding.

原文English
主出版物標題MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
發行者Association for Computing Machinery, Inc
頁面4336-4344
頁數9
ISBN(電子)9781450392037
DOIs
出版狀態Published - 10 10月 2022
對外發佈
事件30th ACM International Conference on Multimedia, MM 2022 - Lisboa, Portugal
持續時間: 10 10月 202214 10月 2022

出版系列

名字MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia

Conference

Conference30th ACM International Conference on Multimedia, MM 2022
國家/地區Portugal
城市Lisboa
期間10/10/2214/10/22

指紋

深入研究「Towards Further Comprehension on Referring Expression with Rationale」主題。共同形成了獨特的指紋。

引用此