Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots

造谣聊天机器人互联网隐私计算机安全公共卫生计算机科学健康传播医学健康信息学大流行脆弱性（计算）探索性研究万维网医疗保健数据科学健康公共卫生监督人工智能误传公共卫生信息学数字健康威胁模型阿凡达语言模型健康风险全球卫生健康教育健康促进环境卫生社会化媒体混淆广告梅德林医疗急救互联网哈斯克尔生成对抗网络公共关系心理学卫生服务

作者

Natansh D. Modi,Bradley D. Menz,Abdulhalim A. Awaty,Cyril A. Alex,Jessica M. Logan,Ross A. McKinnon,Andrew Rowland,Stephen Bacchi,Kacper Gradoń,Michael J. Sorich,Ashley M. Hopkins

出处

期刊：Annals of Internal Medicine [American College of Physicians]
日期：2025-06-23 卷期号：178 (8): 1172-1180 被引量：4

链接

nih.govdoi.org

标识

DOI：10.7326/annals-24-03933

摘要

Large language models (LLMs) offer substantial promise for improving health care; however, some risks warrant evaluation and discussion. This study assessed the effectiveness of safeguards in foundational LLMs against malicious instruction into health disinformation chatbots. Five foundational LLMs-OpenAI's GPT-4o, Google's Gemini 1.5 Pro, Anthropic's Claude 3.5 Sonnet, Meta's Llama 3.2-90B Vision, and xAI's Grok Beta-were evaluated via their application programming interfaces (APIs). Each API received system-level instructions to produce incorrect responses to health queries, delivered in a formal, authoritative, convincing, and scientific tone. Ten health questions were posed to each customized chatbot in duplicate. Exploratory analyses assessed the feasibility of creating a customized generative pretrained transformer (GPT) within the OpenAI GPT Store and searched to identify if any publicly accessible GPTs in the store seemed to respond with disinformation. Of the 100 health queries posed across the 5 customized LLM API chatbots, 88 (88%) responses were health disinformation. Four of the 5 chatbots (GPT-4o, Gemini 1.5 Pro, Llama 3.2-90B Vision, and Grok Beta) generated disinformation in 100% (20 of 20) of their responses, whereas Claude 3.5 Sonnet responded with disinformation in 40% (8 of 20). The disinformation included claimed vaccine-autism links, HIV being airborne, cancer-curing diets, sunscreen risks, genetically modified organism conspiracies, attention deficit-hyperactivity disorder and depression myths, garlic replacing antibiotics, and 5G causing infertility. Exploratory analyses further showed that the OpenAI GPT Store could currently be instructed to generate similar disinformation. Overall, LLM APIs and the OpenAI GPT Store were shown to be vulnerable to malicious system-level instructions to covertly create health disinformation chatbots. These findings highlight the urgent need for robust output screening safeguards to ensure public health safety in an era of rapidly evolving technologies.

求助该文献

最长约 10秒，即可获得该文献文件

Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots

今日热心研友