计算机科学
情绪检测
人工智能
机器学习
数据科学
情绪识别
作者
Priyanka Thakur,Nirmal Kaur,Naveen Aggarwal,Sarbjeet Singh
摘要
ABSTRACT Emotion detection from face and speech is inherent for human–computer interaction, mental health assessment, social robotics, and emotional intelligence. Traditional machine learning methods typically depend on handcrafted features and are primarily centred on unimodal systems. However, the unique characteristics of facial expressions and the variability in speech features present challenges in capturing complex emotional states. Accordingly, deep learning models have been substantial in automatically extracting intrinsic emotional features with greater accuracy across multiple modalities. The proposed article presents a comprehensive review of recent progress in emotion detection, spanning from unimodal to multimodal systems, with a focus on facial and speech modalities. It examines state‐of‐the‐art machine learning, deep learning, and the latest transformer‐based approaches for emotion detection. The review aims to provide an in‐depth analysis of both unimodal and multimodal emotion detection techniques, highlighting their limitations, popular datasets, challenges, and the best‐performing models. Such analysis aids researchers in judicious selection of the most appropriate dataset and audio‐visual emotion detection models. Key findings suggest that integrating multimodal data significantly improves emotion recognition, particularly when utilising deep learning methods trained on synchronised audio and video datasets. By assessing recent advancements and current challenges, this article serves as a fundamental resource for researchers and practitioners in the field of emotional AI, thereby aiding in the creation of more intuitive and empathetic technologies.
科研通智能强力驱动
Strongly Powered by AbleSci AI