Abstract
For emotion recognition tasks, to combine text and speech is a scheme with better modal interactivity and less feature redundancy, which can effectively improve performance. However, existing methods easily neglect the potential relationships and differences between modalities, which makes it difficult to better understand and utilize multi-modal and multi-level emotional information. Therefore, based on graph neural networks, this article proposes a novel multi-modal speech emotion recognition method. Specifically, it designs a reconstructed-graph fusion mechanism to achieve cross-modal interaction and enhance the interpretability of fused features. A gating update mechanism is designed to eliminate modal redundancy and preserve emotional characteristics. The weighted accuracy of 78.09% and the unweighted accuracy of 78.44% are achieved on the IEMOCAP dataset, which is comparable or even superior to the baseline methods.
| Original language | English |
|---|---|
| Article number | 1051 |
| Journal | Applied Intelligence |
| Volume | 55 |
| Issue number | 16 |
| DOIs | |
| Publication status | Published - Nov 2025 |
Keywords
- Emotion recognition
- Graph neural networks
- Multi-modality