Audio–Visual Fusion for Emotion Recognition in the Valence–Arousal Space Using Joint Cross-Attention

模式 计算机科学 杠杆(统计) 人工智能 价(化学) 视听 语音识别 情态动词 机器学习 模式识别(心理学) 多媒体 社会科学 物理 化学 量子力学 社会学 高分子化学
作者
R. Gnana Praveen,Patrick Cardinal,Éric Granger
出处
期刊:IEEE transactions on biometrics, behavior, and identity science [Institute of Electrical and Electronics Engineers]
卷期号:5 (3): 360-373 被引量:22
标识
DOI:10.1109/tbiom.2022.3233083
摘要

Automatic emotion recognition (ER) has recently gained much interest due to its potential in many real-world applications. In this context, multimodal approaches have been shown to improve performance (over unimodal approaches) by combining diverse and complementary sources of information, providing some robustness to noisy and missing modalities. In this paper, we focus on dimensional ER based on the fusion of facial and vocal modalities extracted from videos, where complementary audio-visual (A-V) relationships are explored to predict an individual's emotional states in valence-arousal space. Most state-of-the-art fusion techniques rely on recurrent networks or conventional attention mechanisms that do not effectively leverage the complementary nature of A-V modalities. To address this problem, we introduce a joint cross-attentional model for A-V fusion that extracts the salient features across A-V modalities, and allows to effectively leverage the inter-modal relationships, while retaining the intra-modal relationships. In particular, it computes the cross-attention weights based on correlation between the joint feature representation and that of individual modalities. Deploying the joint A-V feature representation into the cross-attention module helps to simultaneously leverage both the intra and inter modal relationships, thereby significantly improving the performance of the system over the vanilla cross-attention module. The effectiveness of our proposed approach is validated experimentally on challenging videos from the RECOLA and AffWild2 datasets. Results indicate that our joint cross-attentional A-V fusion model provides a cost-effective solution that can outperform state-of-the-art approaches, even when the modalities are noisy or absent. Code is available at https://github.com/praveena2j/Joint-Cross-Attention-for-Audio-Visual-Fusion .
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Aries完成签到 ,获得积分10
刚刚
在水一方应助怦然心动采纳,获得10
1秒前
英姑应助小6s采纳,获得10
3秒前
5秒前
川ccc发布了新的文献求助10
8秒前
12秒前
dennisysz发布了新的文献求助10
12秒前
cjyyy发布了新的文献求助10
13秒前
小小淮完成签到,获得积分20
15秒前
hugh完成签到,获得积分10
15秒前
阿蕉发布了新的文献求助10
18秒前
斯文败类应助机灵自中采纳,获得10
21秒前
22秒前
传奇3应助cjyyy采纳,获得10
22秒前
虚心夏烟发布了新的文献求助10
26秒前
淡定的日记本完成签到,获得积分10
29秒前
怦然心动发布了新的文献求助10
29秒前
31秒前
doctor杨完成签到,获得积分20
34秒前
科研通AI5应助zmx采纳,获得10
34秒前
渊澄发布了新的文献求助10
36秒前
清爽的雨竹完成签到 ,获得积分10
38秒前
42秒前
neinei发布了新的文献求助10
44秒前
45秒前
dennisysz发布了新的文献求助10
46秒前
丘比特应助俭朴的大有采纳,获得10
46秒前
科研通AI5应助ref:rain采纳,获得10
49秒前
sunny发布了新的文献求助10
50秒前
荃芏发布了新的文献求助10
51秒前
小马甲应助自觉半凡采纳,获得10
52秒前
虚心夏烟完成签到,获得积分10
54秒前
所所应助科研通管家采纳,获得10
58秒前
Jasper应助科研通管家采纳,获得10
58秒前
大个应助科研通管家采纳,获得10
58秒前
科研通AI2S应助科研通管家采纳,获得10
58秒前
bkagyin应助科研通管家采纳,获得10
58秒前
科研通AI2S应助科研通管家采纳,获得10
58秒前
大模型应助科研通管家采纳,获得10
58秒前
科研通AI5应助科研通管家采纳,获得10
58秒前
高分求助中
【此为提示信息,请勿应助】请按要求发布求助,避免被关 20000
ISCN 2024 – An International System for Human Cytogenomic Nomenclature (2024) 3000
Continuum Thermodynamics and Material Modelling 2000
Encyclopedia of Geology (2nd Edition) 2000
105th Edition CRC Handbook of Chemistry and Physics 1600
Maneuvering of a Damaged Navy Combatant 650
the MD Anderson Surgical Oncology Manual, Seventh Edition 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3777470
求助须知:如何正确求助?哪些是违规求助? 3322795
关于积分的说明 10211897
捐赠科研通 3038215
什么是DOI,文献DOI怎么找? 1667178
邀请新用户注册赠送积分活动 797990
科研通“疑难数据库(出版商)”最低求助积分说明 758133