计算机科学
人工智能
先验概率
编码器
面部表情
视听
语音识别
计算机视觉
模式识别(心理学)
多媒体
贝叶斯概率
操作系统
作者
Yuchen Pan,Yuanyuan Shang,Zhuhong Shao,Tie Liu,Guodong Guo,Hui Ding
标识
DOI:10.1109/taffc.2023.3296318
摘要
Automatic depression diagnosis is a challenging problem, that requires integrating spatial-temporal information and extracting features from audio-visual signals. In terms of privacy protection, the development trend of recognition algorithms based on facial landmarks has created additional challenges and difficulties. In this paper, we propose an audio-visual attention network (AVA-DepressNet) for depression recognition. It is a novel multimodal framework with facial privacy protection, and uses attention-based modules to enhance audio-visual spatial and temporal features. In addition, an adversarial multistage (AMS) training strategy is developed to optimize the encoder-decoder structure. Additionally, facial structure prior knowledge is creatively used in AMS training. Our AVA-DepressNet is evaluated on popular audio-visual depression datasets: AVEC 2013, AVEC 2014, and AVEC 2017. The results show that our approach reaches the state-of-the-art performance or competitive results for depression recognition.
科研通智能强力驱动
Strongly Powered by AbleSci AI