Inexactly Matched Referring Expression Comprehension With Rationale

  • Xiaochuan Li
  • , Baoyu Fan
  • , Runze Zhang
  • , Kun Zhao
  • , Zhenhua Guo
  • , Yaqian Zhao
  • , Rengang Li

研究成果: Article同行評審

2 引文 斯高帕斯(Scopus)

摘要

Referring Expression Comprehension (REC) is a multimodal comprehension task that aims to locate an object in an image, given a text description. Traditionally, during the existing REC tasks, there has been a basic assumption that the given text expression and the image are usually exactly matched to each other. However, in real-world scenarios, there is uncertainty in how well the image and text match each other exactly. Illegible objects in the image or ambiguous phrases in the text have the potential to significantly degrade the performance of conventional REC tasks. To overcome these limitations, we consider a more practical and comprehensive REC task, where the given image and its referring text expression can be inexactly matched. Our models aim to correct such inexact matching and supply corresponding interpretations. We refer to this task as Further REC (FREC). This task is divided into three subtasks: 1) correcting the erroneous text expression using visual information, 2) generating the rationale for this input expression, and 3) localizing the proper object based on the corrected expression. We introduce three new datasets for FREC: Further-RefCOCOs, Further-Copsref and Further-Talk2Car. These datasets are based on the existing REC datasets, including RefCOCO and Talk2Car. We developed a novel pipeline architecture to execute the three subtasks simultaneously in an end-to-end fashion. Next, we developed an elastic masked language modeling (EMLM) training head to rectify text errors with uncertain lengths. Our experimental results demonstrate the validity of our proposed pipeline. We hope this work sparks more research focused on inexactly matched REC.

原文English
頁(從 - 到)3937-3950
頁數14
期刊IEEE Transactions on Multimedia
26
DOIs
出版狀態Published - 2024
對外發佈

指紋

深入研究「Inexactly Matched Referring Expression Comprehension With Rationale」主題。共同形成了獨特的指紋。

引用此