光谱图
深度学习
人工智能
学习迁移
卷积神经网络
计算机科学
萧条(经济学)
心理健康
语音识别
分割
心理学
机器学习
模式识别(心理学)
特征提取
人工神经网络
焦虑
监督学习
作者
Muhammad Hamza Khan,Muhammad Majid,Aamir Arsalan,Marius George Linguraru,Syed Muhammad Anwar
标识
DOI:10.1109/embc58623.2025.11253416
摘要
Depression is one of the most prevalent mental health disorders, significantly impacting an individual's well-being and daily functioning. Early detection is critical for timely intervention, yet traditional diagnostic methods remain subjective and resource-intensive. This paper explores a speech-based approach to depression detection using multi-modal open dataset for mental-disorder analysis (MODMA), which consists of speech recordings from 52 subjects, including 23 depressed individuals and 29 healthy controls. The dataset was preprocessed by segmenting speech signals into non-overlapping 1-second audio segments, resulting in a total of 26, 590 samples. Spectrograms of size 224 × 224 were generated using the short-time Fourier transform to capture time-frequency representations, facilitating the application of deep learning models. Three pre-trained convolutional neural networks i.e., ResNet-50, VGG-19, and EfficientNet-B0 were fine-tuned and evaluated for their ability to classify depressive states. Among these models, EfficientNet-B0 achieved the highest classification accuracy of 95.68%, demonstrating the effectiveness of transfer learning for speech-based depression detection. The results highlight the potential of deep learning in developing objective, scalable, and non-invasive diagnostic tools for mental health assessment. This research contributes to advancing automated depression detection and highlights the feasibility of using a 1-second speech signal segment as a reliable biomarker for mental health disorders.
科研通智能强力驱动
Strongly Powered by AbleSci AI