计算机科学
可扩展性
阶段(地层学)
融合
人工智能
情绪识别
网(多面体)
语音识别
模式识别(心理学)
数据库
几何学
数学
语言学
生物
哲学
古生物学
作者
Md. Milon Islam,Fakhri Karray,Ghulam Muhammad
标识
DOI:10.1016/j.inffus.2025.103028
摘要
Automatic emotion recognition has attracted significant interest in healthcare, thanks to remarkable developments made recently in smart and innovative technologies. A real-time emotion recognition system allows for continuous monitoring, comprehension, and enhancement of the physical entity’s capacities, along with continuing advice for enhancing quality of life and well-being in the context of personalized healthcare. Multimodal emotion recognition presents a significant challenge in terms of efficiently using the diverse modalities present in the data. In this article, we introduce a Multi-Stage Fusion Network (MSF-Net) for emotion recognition capable of extracting multimodal information and achieving significant performances. We propose utilizing the transformer-based structure to extract deep features from facial expressions. We exploited two visual descriptors, local binary pattern and Oriented FAST and Rotated BRIEF, to retrieve the computer vision-based features from the facial videos. A feature-level fusion network integrates the extraction of features from these modules, directing the output into the triplet attention technique. This module employs a three-branch architecture to compute attention weights to capture cross-dimensional interactions efficiently. The temporal dependencies in physiological signals are modeled by a Bi-directional Gated Recurrent Unit (Bi-GRU) in forward and backward directions at each time step. Lastly, the output feature representations from the triplet attention module and the extracted high-level patterns from Bi-GRU are fused and fed into the classification module to recognize emotion. The extensive experimental evaluations revealed that the proposed MSF-Net outperformed the state-of-the-art approaches on two popular datasets, BioVid Emo DB and MGEED. Finally, we tested the proposed MSF-Net in the Internet of Things environment to facilitate real-world scalable smart healthcare application. • Introduce a multi-stage fusion network to recognize emotion in multimodal context. • Propose an efficient approach to extract visual features and temporal dependencies. • Exploit triple attention to capture key emotional features via three-branch fusion. • Achieve state-of-the-art results on multimodal data and validate in IoT networks.
科研通智能强力驱动
Strongly Powered by AbleSci AI