Multimodal Local Global Interaction Networks for Automatic Depression Severity Estimation

计算机科学感知器代表（政治）估计模态（人机交互）机器学习人工智能构造（python库）矢量化（数学）交互网络对象（语法）模式识别（心理学）过程（计算）相互信息独立成分分析交互信息多层感知器人工神经网络多通道交互注意力网络相互作用模型情感计算机制（生物学）组分（热力学）人际互动传感器融合分类器（UML）

作者

Mingyue Niu,Zhuhong Shao,Yongjun He,Jianhua Tao,Björn W. Schuller

出处

期刊：IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
日期：2025-09-22 卷期号：36 (2): 2649-2664 被引量：1

标识

DOI：10.1109/tcsvt.2025.3612697

摘要

Physiological studies have shown that differences between depressed and healthy individuals are manifested in the audio and video modalities. Hence, some researchers have combined local and global information from audio or video modality to obtain the unimodal representation. Attention mechanisms or Multi-Layer Perceptrons (MLPs) are then used to complete the fusion of different representations. However, attention mechanisms or MLPs is essentially a linear aggregation manner, and lacks the ability to explore the element-wise interaction between local and global representations within and across modalities, which affects the accuracy of estimating the depression severity. To this end, we propose a Representation Interaction (RI) module, which uses the mutual linear adjustment to achieve element-wise interaction between representations. Thus, the RI module can be seen as an mutual observation of two representations, which helps to achieve complementary advantages and improve the model’s ability to characterize depression cues. Furthermore, since the interaction process generates multiple representations, we propose a Multi-representation Prediction (MP) module. This module implements multi-representation vectorization in a hierarchical manner from summarizing a single representation to aggregating multiple representations, and adopts the attention mechanism to obtain the estimation of an individual depression severity. In this way, we use the RI and MP modules to construct the Multimodal Local Global Interaction (MLGI) network. The experimental performance on AVEC 2013 and AVEC 2014 depression datasets demonstrates the effectiveness of our method.

求助该文献

最长约 10秒，即可获得该文献文件

Multimodal Local Global Interaction Networks for Automatic Depression Severity Estimation

今日热心研友