TY - JOUR
T1 - MISA-GMC
T2 - An Enhanced Multimodal Sentiment Analysis Framework with Gated Fusion and Momentum Contrastive Modality Relationship Modeling
AU - Du, Zheng
AU - Wang, Yapeng
AU - Yang, Xu
AU - Im, Sio Kei
AU - Wang, Zhiwen
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2026/1
Y1 - 2026/1
N2 - Multimodal sentiment analysis jointly exploits textual, acoustic, and visual signals to recognize human emotions more accurately than unimodal models. However, real-world data often contain noisy or partially missing modalities, and naive fusion may allow unreliable signals to degrade overall performance. To address this, we propose an enhanced framework named MISA-GMC, a lightweight extension of the widely used MISA backbone that explicitly accounts for modality reliability. The core idea is to adaptively reweight modalities at the sample level while regularizing cross-modal representations during training. Specifically, a reliability-aware gated fusion module down-weights unreliable modalities, and two auxiliary training-time regularizers (momentum contrastive learning and a lightweight correlation graph) help stabilize and refine multimodal representations without adding inference-time overhead. Experiments on three benchmark datasets—CMU-MOSI, CMU-MOSEI, and CH-SIMS—demonstrate the effectiveness of MISA-GMC. For instance, on CMU-MOSI, the proposed model improves 7-class accuracy from 43.29 to 45.92, reduces the mean absolute error (MAE) from 0.785 to 0.712, and increases the Pearson correlation coefficient (Corr) from 0.764 to 0.795. This indicates more accurate fine-grained sentiment prediction and better sentiment-intensity estimation. On CMU-MOSEI and CH-SIMS, MISA-GMC also achieves consistent gains over MISA and strong baselines such as LMF, ALMT, and MMIM across both classification and regression metrics. Ablation studies and missing-modality experiments further verify the contribution of each component and the robustness of MISA-GMC under partial-modality settings.
AB - Multimodal sentiment analysis jointly exploits textual, acoustic, and visual signals to recognize human emotions more accurately than unimodal models. However, real-world data often contain noisy or partially missing modalities, and naive fusion may allow unreliable signals to degrade overall performance. To address this, we propose an enhanced framework named MISA-GMC, a lightweight extension of the widely used MISA backbone that explicitly accounts for modality reliability. The core idea is to adaptively reweight modalities at the sample level while regularizing cross-modal representations during training. Specifically, a reliability-aware gated fusion module down-weights unreliable modalities, and two auxiliary training-time regularizers (momentum contrastive learning and a lightweight correlation graph) help stabilize and refine multimodal representations without adding inference-time overhead. Experiments on three benchmark datasets—CMU-MOSI, CMU-MOSEI, and CH-SIMS—demonstrate the effectiveness of MISA-GMC. For instance, on CMU-MOSI, the proposed model improves 7-class accuracy from 43.29 to 45.92, reduces the mean absolute error (MAE) from 0.785 to 0.712, and increases the Pearson correlation coefficient (Corr) from 0.764 to 0.795. This indicates more accurate fine-grained sentiment prediction and better sentiment-intensity estimation. On CMU-MOSEI and CH-SIMS, MISA-GMC also achieves consistent gains over MISA and strong baselines such as LMF, ALMT, and MMIM across both classification and regression metrics. Ablation studies and missing-modality experiments further verify the contribution of each component and the robustness of MISA-GMC under partial-modality settings.
KW - MISA
KW - gated fusion
KW - momentum contrastive learning
KW - multimodal fusion
KW - multimodal sentiment analysis
KW - robustness to missing modalities
UR - https://www.scopus.com/pages/publications/105028495139
U2 - 10.3390/math14010115
DO - 10.3390/math14010115
M3 - Article
AN - SCOPUS:105028495139
SN - 2227-7390
VL - 14
JO - Mathematics
JF - Mathematics
IS - 1
M1 - 115
ER -