A multi-modal speech emotion recognition method based on graph neural networks

Yan Li, Yapeng Wang, Xu Yang, Lap Man Hoi, Sio Kei Im

Research output: Contribution to journalArticlepeer-review

Abstract

For emotion recognition tasks, to combine text and speech is a scheme with better modal interactivity and less feature redundancy, which can effectively improve performance. However, existing methods easily neglect the potential relationships and differences between modalities, which makes it difficult to better understand and utilize multi-modal and multi-level emotional information. Therefore, based on graph neural networks, this article proposes a novel multi-modal speech emotion recognition method. Specifically, it designs a reconstructed-graph fusion mechanism to achieve cross-modal interaction and enhance the interpretability of fused features. A gating update mechanism is designed to eliminate modal redundancy and preserve emotional characteristics. The weighted accuracy of 78.09% and the unweighted accuracy of 78.44% are achieved on the IEMOCAP dataset, which is comparable or even superior to the baseline methods.

Original languageEnglish
Article number1051
JournalApplied Intelligence
Volume55
Issue number16
DOIs
Publication statusPublished - Nov 2025

Keywords

  • Emotion recognition
  • Graph neural networks
  • Multi-modality

Fingerprint

Dive into the research topics of 'A multi-modal speech emotion recognition method based on graph neural networks'. Together they form a unique fingerprint.

Cite this