心理健康
集合(抽象数据类型)
计算机科学
任务(项目管理)
多样性(控制论)
人工智能
心理学
精神科
工程类
系统工程
程序设计语言
作者
Xuhai Xu,Bingsheng Yao,Yuanzhe Dong,Saadia Gabriel,Hong Yu,James Hendler,Marzyeh Ghassemi,Anind K. Dey,Dakuo Wang
出处
期刊:Cornell University - arXiv
日期:2023-01-01
被引量:33
标识
DOI:10.48550/arxiv.2307.14385
摘要
Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present the first comprehensive evaluation of multiple LLMs, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4, on various mental health prediction tasks via online text data. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for the mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on the mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research.
科研通智能强力驱动
Strongly Powered by AbleSci AI