计算机科学
语音识别
唤醒
人工智能
预测编码
传感器融合
情绪识别
变压器
价(化学)
融合
编码(社会科学)
训练集
信息集成
隐马尔可夫模型
情感计算
机器学习
模式识别(心理学)
代表(政治)
数据集成
特征学习
人工神经网络
编码
计算模型
认知
音频信号
作者
A. Padmini,K. Sharmila
标识
DOI:10.1109/iciss67859.2026.11453644
摘要
Music has a great impact on human emotions which are expressed by complex physiological reactions. Accurate real-time emotion prediction from music is still challenging, because of the heterogeneity of multimodal signals as well as the temporal dependency and the inter-individual variability. This work is oriented towards proposing a coherent and interpretable framework to real-time emotion prediction using physiological information and music features fusion. A hybrid architecture consisting of Contrastive Predictive Coding (CPC) for Physiological representation learning (self-supervised learning) and Temporal Fusion Transformer (TFT) for long-range modeling is proposed. Multimodal signals such as EEG, ECG GSR, Respiration and Music Audio signals are temporally synchronized and adaptively preprocessed. The continuous valence- and arousal states are predicted in this model by adaptive attention and online personalization. Experiments performed on the DEAP data set show better performance with valence and arousal MAE of 0.084 and 0.091, respectively, and an overall accuracy rate of 89.6%, which is higher than state-of-the-art methods. The effectiveness of hybrid self-supervised and transformer based fusion in real-time affective computing applications is confirmed by the results.
科研通智能强力驱动
Strongly Powered by AbleSci AI