摘要
For emotion recognition tasks, to combine text and speech is a scheme with better modal interactivity and less feature redundancy, which can effectively improve performance. However, existing methods easily neglect the potential relationships and differences between modalities, which makes it difficult to better understand and utilize multi-modal and multi-level emotional information. Therefore, based on graph neural networks, this article proposes a novel multi-modal speech emotion recognition method. Specifically, it designs a reconstructed-graph fusion mechanism to achieve cross-modal interaction and enhance the interpretability of fused features. A gating update mechanism is designed to eliminate modal redundancy and preserve emotional characteristics. The weighted accuracy of 78.09% and the unweighted accuracy of 78.44% are achieved on the IEMOCAP dataset, which is comparable or even superior to the baseline methods.
| 原文 | English |
|---|---|
| 文章編號 | 1051 |
| 期刊 | Applied Intelligence |
| 卷 | 55 |
| 發行號 | 16 |
| DOIs | |
| 出版狀態 | Published - 11月 2025 |
指紋
深入研究「A multi-modal speech emotion recognition method based on graph neural networks」主題。共同形成了獨特的指紋。新聞/媒體
引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver