多模式学习
计算机科学
情绪分析
模式
多通道交互
利用
机器学习
代表(政治)
多模态
对抗制
人工智能
CRF公司
人机交互
条件随机场
社会科学
计算机安全
社会学
政治
万维网
政治学
法学
作者
Lingyong Fang,Gongshen Liu,Ru Zhang
标识
DOI:10.1109/icassp48485.2024.10446351
摘要
Multimodal sentiment analysis aims to utilize different modalities including language, visual, and audio to identify human emotions in videos. Multimodal interaciton mechanism is the key challenge. Previous works lack modeling of multimodal interaction at different grain levels, and does not suppress redundant information in multimodal interaction. This leads to incomplete multimodal representation with noisy information. To address these issues, we propose Multi-grained Multimodal Interaction Network (MMIN) to provide a more complete view of multimodal representation. Coarse-grained Interaction Network (CIN) exploits the unique characteristics of different modalities at a coarse-grained level and adversarial learning is used to reduce redundancy. Fine-grained Interaction Network (FIN) employ sparse-attention mechanism to capture fine-grained interactions between multimodal sequences across distinct time steps and reduce irrelevant fine-grained multimodal interaction. Experimental results on two public datasets demonstrate the effectiveness of our model in multimodal sentiment analysis.
科研通智能强力驱动
Strongly Powered by AbleSci AI