计算机科学
人工智能
模糊逻辑
情绪识别
传感器融合
机器学习
模式识别(心理学)
语音识别
作者
Xiao Han,Fuyang Chen,Junrong Ban
标识
DOI:10.1109/tfuzz.2024.3373125
摘要
Conducting and interacting with an orchestra is a multimodal process that integrates channels such as music, visual cues, posture, and gestures to convey artistic intent accurately. For robots, discerning human emotions from these channels can enhance human-machine interactions. Currently, gesture recognition systems in orchestras focus more on rhythm, speed, and dynamics, while studying emotional factors in orchestra conducting music requires more profound research. We introduced the Facial Expression and Orchestra Gesture Emotion (FEGE) dataset, consisting of eight different emotions for recognition. This paper introduces a Fuzzy Multimodal Fusion Network (FMFN) based on fuzzy logic, which operates in multi-feature spaces and is designed for emotion recognition in bimodal tasks involving facial expressions and orchestra-conducting gestures. The network maps facial expressions and gestures into a multi-feature space through bimodal processing, learns unique and shared representations, and decodes them using classifiers optimized by FMFN parameters. Finally, it processes data uncertainty and fuzziness using a fuzzy logic system, improving the classification decision process to enhance the robustness and adaptability of emotion recognition tasks in bimodal visual modalities. Experimental results on the FEGE dataset confirmed the effectiveness of our network. The proposed bimodal fusion network achieved an accuracy of 89.16% in bimodal emotion recognition, which is approximately a 21% improvement over single-modal recognition results. This approach can also be better applied to human-machine interaction systems, particularly in orchestra conducting training, aiming to enhance the most critical emotional factors conveyed during the conducting process, thus elevating the depth of artistic intent.
科研通智能强力驱动
Strongly Powered by AbleSci AI