计算机科学
面部表情
人工智能
表达式(计算机科学)
钥匙(锁)
模式识别(心理学)
代表(政治)
卷积神经网络
任务(项目管理)
概括性
面部表情识别
模态(人机交互)
面部识别系统
语音识别
政治
经济
管理
计算机安全
程序设计语言
法学
心理治疗师
政治学
心理学
作者
Xiaoye Qu,Zhikang Zou,Xinxing Su,Pan Zhou,Wei Wei,Shiping Wen,Dapeng Wu
标识
DOI:10.1109/tetci.2021.3070713
摘要
Recognizing human expression in videos is a challenging task due to dynamic changes in facial actions and diverse visual appearances. The key to design a reliable video-based expression recognition system is to extract robust spatial features and make full use of temporal modality characteristics. In this paper, we present a novel network architecture called Cascaded Attention Network (CAN) which is a cascaded spatiotemporal model incorporating with both spatial and temporal attention, tailored to video-level facial expression recognition. The cascaded fundamental model consists of a transfer convolutional network and Bidirectional Long Short-Term Memory (BiLSTM) network. Spatial attention is designed from the facial landmarks since facial expressions depend on the actions of key regions (eyebrows, eyes, nose, and mouth) on the face. Focusing on these key regions can help to decrease the effect of person-specific attributes. Meanwhile, the temporal attention is applied to automatically select the peak of expressions and aggregate the video-level representation. Our proposed CAN achieves the state-of-the-art performance on the three most widely used facial expression datasets: CK+ (99.03%), Oulu-CASIA (88.33%), and MMI (83.55%). Moreover, we conduct an extended experiment on a much more complex wild dataset AFEW and the experimental results further verify the generality of our attention mechanisms.
科研通智能强力驱动
Strongly Powered by AbleSci AI