GDN-CMCF: A Gated Disentangled Network With Cross-Modality Consensus Fusion for Multimodal Named Entity Recognition

Guoheng Huang, Qin He, Zihao Dai, Guo Zhong, Xiaochen Yuan, Chi Man Pun

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Multimodal named entity recognition (MNER) is a crucial task in social systems of artificial intelligence that requires precise identification of named entities in sentences using both visual and textual information. Previous methods have focused on capturing fine-grained visual features and developing complex fusion procedures. However, these approaches overlook the heterogeneity gap and loss of original modality uniqueness that may occur during fusion, leading to incorrect entity identification. This article proposes a novel approach for MNER called a gated disentangled network with cross-modality consensus fusion (GDN-CMCF) to address the above challenges. Specifically, to eliminate cross-modality variation, we propose a cross-modality consensus fusion module that generates a consensus representation by learning inter- and intramodality interactions with a designed commonality constraint. We then introduce a gated disentanglement module to separate modality-relevant features from support and auxiliary modalities, which further filters out extraneous information while retaining the uniqueness of unimodal features. Experimental results on two real public datasets are provided to verify the effectiveness of our proposed GDN-CMCF. The source code of this article can be found at https://github.com/HaoDavis/GDN-CMCF.

Original languageEnglish
Pages (from-to)3944-3954
Number of pages11
JournalIEEE Transactions on Computational Social Systems
Volume11
Issue number3
DOIs
Publication statusPublished - 1 Jun 2024

Keywords

  • Common space learning
  • feature disentanglement
  • multimodal named entity recognition (MNER)

Fingerprint

Dive into the research topics of 'GDN-CMCF: A Gated Disentangled Network With Cross-Modality Consensus Fusion for Multimodal Named Entity Recognition'. Together they form a unique fingerprint.

Cite this