计算机科学
萧条(经济学)
心理学
人工智能
语音识别
经济
宏观经济学
作者
Yifan Kou,Fangzhen Ge,Debao Chen,Longfeng Shen,Huaiyu Liu
摘要
ABSTRACT Depression, a prevalent mental disorder in modern society, significantly impacts people's daily lives. Recently, there have been advancements in developing automated diagnosis models for detecting depression. However, data scarcity, primarily due to privacy concerns, has posed a challenge. Traditional speech features have limitations in representing knowledge for depression diagnosis, and the complexity of deep learning algorithms necessitates substantial data support. Furthermore, existing multimodal methods based on neural networks overlook the heterogeneity gap between different modalities, potentially resulting in redundant information. To address these issues, we propose a multimodal depression detection model based on the Enhanced Cross‐Attention (ECA) Mechanism. This model effectively explores text‐speech interactions while considering modality heterogeneity. Data scarcity has been mitigated by fine‐tuning pre‐trained models. Additionally, we design a modal fusion module based on ECA, which emphasizes similarity responses and updates the weight of each modal feature based on the similarity information between modal features. Furthermore, for speech feature extraction, we have reduced the computational complexity of the model by integrating a multi‐window self‐attention mechanism with the Fourier transform. The proposed model is evaluated on the public dataset, DAIC‐WOZ, achieving an accuracy of 80.0% and an average F 1 value improvement of 4.3% compared with relevant methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI