计算机科学
感知器
代表(政治)
估计
模态(人机交互)
机器学习
人工智能
构造(python库)
矢量化(数学)
交互网络
对象(语法)
模式识别(心理学)
过程(计算)
相互信息
独立成分分析
交互信息
多层感知器
人工神经网络
多通道交互
注意力网络
相互作用模型
情感计算
机制(生物学)
组分(热力学)
人际互动
传感器融合
分类器(UML)
作者
Mingyue Niu,Zhuhong Shao,Yongjun He,Jianhua Tao,Björn W. Schuller
标识
DOI:10.1109/tcsvt.2025.3612697
摘要
Physiological studies have shown that differences between depressed and healthy individuals are manifested in the audio and video modalities. Hence, some researchers have combined local and global information from audio or video modality to obtain the unimodal representation. Attention mechanisms or Multi-Layer Perceptrons (MLPs) are then used to complete the fusion of different representations. However, attention mechanisms or MLPs is essentially a linear aggregation manner, and lacks the ability to explore the element-wise interaction between local and global representations within and across modalities, which affects the accuracy of estimating the depression severity. To this end, we propose a Representation Interaction (RI) module, which uses the mutual linear adjustment to achieve element-wise interaction between representations. Thus, the RI module can be seen as an mutual observation of two representations, which helps to achieve complementary advantages and improve the model’s ability to characterize depression cues. Furthermore, since the interaction process generates multiple representations, we propose a Multi-representation Prediction (MP) module. This module implements multi-representation vectorization in a hierarchical manner from summarizing a single representation to aggregating multiple representations, and adopts the attention mechanism to obtain the estimation of an individual depression severity. In this way, we use the RI and MP modules to construct the Multimodal Local Global Interaction (MLGI) network. The experimental performance on AVEC 2013 and AVEC 2014 depression datasets demonstrates the effectiveness of our method.
科研通智能强力驱动
Strongly Powered by AbleSci AI