面部表情
计算机科学
面部表情识别
情态动词
表达式(计算机科学)
情绪识别
语音识别
人工智能
面部识别系统
计算机视觉
模式识别(心理学)
化学
高分子化学
程序设计语言
作者
Haoliang Zhou,Shucheng Huang,Feifei Zhang,Changsheng Xu
标识
DOI:10.1109/tcsvt.2024.3424777
摘要
Facial expression recognition (FER) remains a challenging task due to the ambiguity and subtlety of expressions. To address this challenge, current FER methods predominantly prioritize visual cues while inadvertently neglecting the potential insights that can be gleaned from other modalities. Recently, vision-language pre-training (VLP) models integrated textual cues as guidance, culminating in a powerful multi-modal solution that has proven effective for a range of computer vision tasks. In this paper, we propose a Cross-Modal Emotion-Aware Prompting (CEPrompt) framework for FER based on VLP models. To make VLP models sensitive to expression-relevant visual discrepancies, we devise an Emotion Conception-guided Visual Adapter (EVA) to capture the category-specific appearance representations with emotion conception guidance. Moreover, knowledge distillation is employed to prevent the model from forgetting the pre-trained category-invariant knowledge. In addition, we design a Conception-Appearance Tuner (CAT) to facilitate the interaction of multi-modal information via cooperatively tuning between emotion conception and appearance prompts. In this way, semantic information about emotion text conception is infused directly into facial appearance images, thereby enhancing a comprehensive and precise understanding of expression-related facial details. Quantitative and qualitative experiments show that our CEPrompt outperforms state-of-the-art approaches on three real-world FER datasets. The code is available at https://github.com/HaoliangZhou/CEPrompt.
科研通智能强力驱动
Strongly Powered by AbleSci AI