计算机科学
任务(项目管理)
情态动词
模态(人机交互)
模式
概化理论
人工智能
特征(语言学)
自然语言处理
情绪分析
多模式学习
代表(政治)
水准点(测量)
特征学习
机器学习
模式识别(心理学)
语言学
统计
地理
经济
法学
政治学
政治
社会学
大地测量学
管理
数学
化学
社会科学
高分子化学
哲学
作者
Lan Wang,Junjie Peng,Cangzhi Zheng,Tong Zhao,Lian Zhu
标识
DOI:10.1016/j.ipm.2024.103675
摘要
Humans often express affections and intentions through multiple forms when communicating, involving text, audio, and vision modalities. Using a single modality to determine the sentiment state may be biased, but combining multiple clues can fully explore more comprehensive information. Effective fusion of heterogeneous data is one of the core problems of multimodal sentiment analysis. Most cross-modal fusion strategies inevitably bring noisy information, resulting in low-quality joint feature representations and impacting the accuracy of sentiment classification. Considering the unique cues of modality-specific, common information between modalities, and sentiment variability among different layers, we introduce multi-task learning and propose a cross-modal hierarchical fusion method for multimodal sentiment analysis. The model combines unimodal, bimodal, and trimodal tasks to enhance multimodal feature representation for the final sentiment prediction. We conduct extensive experiments on CH-SIMS, CMU-MOSI, and CMU-MOSEI, where the first one is in Chinese and the last two are in English. The results demonstrate the generalizability of the proposed method. It effectively improves the accuracy of sentiment analysis while reducing the adverse impact of the noise compared to the existing models.
科研通智能强力驱动
Strongly Powered by AbleSci AI