情态动词
计算机科学
模态(人机交互)
情绪分析
人工智能
自然语言处理
代表(政治)
表达式(计算机科学)
特征学习
模态分析
特征(语言学)
语音识别
模式识别(心理学)
机器学习
工程类
语言学
有限元法
政治
政治学
高分子化学
法学
化学
哲学
结构工程
程序设计语言
作者
Jianguo Bai,Haiqing Yang,Cheng Feng,Shuxian Wang,Xue Li
摘要
With the rapid development of Internet and multimedia technology, people tend to express their feelings and views through video and other media. The key to sentiment analysis in user videos on social media is to fully utilize the embedded multimodal features, such as text, audio, and facial expressions, to establish efficient deep learning models. The traditional processing methods of simply fusing feature vectors or using multiple models to comprehensively predict results cannot effectively extract the intra modal characteristics and inter modal commonalities of multiple modal data, resulting in unsatisfactory accuracy of sentiment analysis results. In response to the above issues, this article takes monologue videos posted by users on social media as the specific research object and proposes a cross modal sentiment analysis model CMRL based on modal representation learning. By establishing constraints for both independent and fused modal modules, the fused modal module can fully consider the intrinsic characteristics of the modes. In order to enable the model to fully learn the intra modal characteristics, a loss function based on Pearson correlation coefficient is established by combining the sentiment analysis results of the independent modal module's speech modality, text modality, and expression image modality data with the sentiment analysis results of the fusion modal module. In order to prevent loss or confusion of intra modal features after feature fusion, the speech modal features, text modal features, and expression image features extracted by the Transformer in the independent modal module are fused, and a loss function based on Spearman correlation coefficient is established with the fused features of the fused modal module.
科研通智能强力驱动
Strongly Powered by AbleSci AI