计算机科学
任务(项目管理)
相互信息
接头(建筑物)
最大化
人工智能
多任务学习
机器学习
模式识别(心理学)
人机交互
心理学
工程类
建筑工程
社会心理学
系统工程
作者
Yanjie Liu,Yulan Liu,Xinran Ma,Rui Wang,Qin Yang,Yijun Mo,Salman A. AlQahtani,Min Chen
标识
DOI:10.1109/tbme.2025.3582806
摘要
Depression is a serious mental health disorder with a potential hazard for individuals and society characterized by persistent sadness and hopelessness. Multimodal information including vision, audio, and text is critical for depression diagnosis and treatment. Most studies focus on designing sophisticated feature extraction methods but ignore feature enhancement and fusion within intra-modality and cross-modality. In this paper, a Chinese Multimodal Depression Corpus (CMD-Corpus) dataset is established assisted by clinical experts aiming to support more depression research. Furthermore, we propose a multimodal depression recognition framework based on Mutual Information Maximization with Multi-task Learning (MIMML) to enhance feature representation and fusion among video, audio, and text modalities. The MIMML employs the strategy of maximizing mutual information to accelerate modality-invariance enhancement. The multi-task is used to improve the representation performance of the single modality to improve modality-specific enhancement. Meanwhile, a gated structure with bidirectional gated recurrent units and convolutional neural networks is designed to achieve multimodal feature fusion, which is key to boosting completeness among modalities. Experimental results show that the proposed MIMML effectively captures representation to increase depression recognition accuracy, achieving 84% and 89% accuracy on DAIC-WOZ and our self-collected CMD-Corpus dataset respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI