计算机科学
表达式(计算机科学)
人工智能
情绪识别
语音识别
模式识别(心理学)
自然语言处理
程序设计语言
作者
Zebang Cheng,Yuxiang Lin,Zhaoru Chen,Xiang Li,Shuyi Mao,Fan Zhang,Daijun Ding,Bowen Zhang,Xiaojiang Peng
标识
DOI:10.1145/3581783.3612840
摘要
The Multimodal Emotion Recognition (MER 2023) challenge aims to recognize emotion with audio, language, and visual signals, facilitating innovative technologies of affective computing. This paper presents our submission approach on the Semi-Supervised Learning Sub-Challenge (MER-SEMI). First, with large-scale unlabeled emotional videos, we train both image-based and video-based Masked Autoencoders to extract visual features, which termed as expression MAE (expMAE) for simplicity. The expMAE features are found to be largely complementary with other official baseline features. Second, since there is only a few labeled data, we use a classifier to generate pseudo labels for unlabeled videos which have high confidence for a certain category. In addition, we also explore several advanced large models for cross-feature extraction like CLIP, and apply factorized bilinear pooling (FBP) for multimodal feature fusion. Our methods finally achieved 88.55% in F1 score on MER-SEMI, ranking second place among all participating teams.
科研通智能强力驱动
Strongly Powered by AbleSci AI