Abstract
For emotion recognition tasks, to combine text and speech is a scheme with better modal interactivity and less feature redundancy, which can effectively improve performance. However, existing methods easily neglect the potential relationships and differences between modalities, which makes it difficult to better understand and utilize multi-modal and multi-level emotional information. Therefore, based on graph neural networks, this article proposes a novel multi-modal speech emotion recognition method. Specifically, it designs a reconstructed-graph fusion mechanism to achieve cross-modal interaction and enhance the interpretability of fused features. A gating update mechanism is designed to eliminate modal redundancy and preserve emotional characteristics. The weighted accuracy of 78.09% and the unweighted accuracy of 78.44% are achieved on the IEMOCAP dataset, which is comparable or even superior to the baseline methods.
| Original language | English |
|---|---|
| Article number | 1051 |
| Journal | Applied Intelligence |
| Volume | 55 |
| Issue number | 16 |
| DOIs | |
| Publication status | Published - Nov 2025 |
Keywords
- Emotion recognition
- Graph neural networks
- Multi-modality
Fingerprint
Dive into the research topics of 'A multi-modal speech emotion recognition method based on graph neural networks'. Together they form a unique fingerprint.Press/Media
-
Researchers from Faculty of Applied Sciences Provide Details of New Studies and Findings in the Area of Networks (A Multi-modal Speech Emotion Recognition Method Based On Graph Neural Networks)
YANG, X., IM, S. K., WANG, Y., HOI, L. M. & HOI, L. M.
19/11/25
1 item of Media coverage
Press/Media
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver