计算机科学
变压器
人工智能
深度学习
自然语言处理
机器学习
工程类
电气工程
电压
作者
Yuheng Fan,Zhengkai Zhou,Jialing Zhao,Jing Kong,Yunxi Liu,Jiaqi Li
标识
DOI:10.1109/mlise66443.2025.11100256
摘要
This study proposes a novel multimodal deep learning framework for depression detection, integrating visual, audio, and textual data. Using OpenFace and Librosa for feature extraction, the system employs Vision Transformers (ViTs) to model cross-modal dependencies, while a large language model (Qwen3-32B) analyzes transcribed speech for interpretable symptom reasoning. Outputs from both modalities are fused via a confidence-based mechanism. Experimental validation on the LMVD dataset demonstrates superior performance, overcoming the limitations of unimodal approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI