离群值
计算机科学
人工智能
统计模型
异常检测
自然语言处理
机器学习
数据挖掘
医学
作者
Andrea Taloni,Giulia Coco,Marco Pellegrini,Matthias Wjst,Niccolò Salgari,Giovanna Carnovale-Scalzo,Vincenzo Scorcia,Massimo Busin,Giuseppe Giannaccare
标识
DOI:10.1001/jamaophthalmol.2025.0834
摘要
Importance Recently, it was proved that the large language model Generative Pre-trained Transformer 4 (GPT-4; OpenAI) can fabricate synthetic medical datasets designed to support false scientific evidence. Objective To uncover statistical patterns that may suggest fabrication in datasets produced by large language models and to improve these synthetic datasets by attempting to remove detectable marks of nonauthenticity, investigating the limits of generative artificial intelligence. Design, Setting, and Participants In this quality improvement study, synthetic datasets were produced for 3 fictional clinical studies designed to compare the outcomes of 2 alternative treatments for specific ocular diseases. Synthetic datasets were produced using the default GPT-4o model and a custom GPT. Data fabrication was conducted in November 2024. Exposure Prompts were submitted to GPT-4o to produce 12 “unrefined” datasets, which underwent forensic examination. Based on the outcomes of this analysis, the custom GPT Synthetic Data Creator was built with detailed instructions to generate 12 “refined” datasets designed to evade authenticity checks. Then, forensic analysis was repeated on these enhanced datasets. Main Outcomes and Measures Forensic analysis was performed to identify statistical anomalies in demographic data, distribution uniformity, and repetitive patterns of last digits, as well as linear correlations, distribution shape, and outliers of study variables. Datasets were also qualitatively assessed for the presence of unrealistic clinical records. Results Forensic analysis identified 103 fabrication marks among 304 tests (33.9%) in unrefined datasets. Notable flaws included mismatch between patient names and gender (n = 12), baseline visits occurring during weekends (n = 12), age calculation errors (n = 9), lack of uniformity (n = 4), and repetitive numerical patterns in last digits (n = 7). Very weak correlations ( r < 0.1) were observed between study variables (n = 12). In addition, variables showed a suspicious distribution shape (n = 6). Compared with unrefined datasets, refined ones showed 29.3% (95% CI, 23.5%-35.1%) fewer signs of fabrication (14 of 304 statistical tests performed [4.6%]). Four refined datasets passed forensic analysis as authentic; however, suspicious distribution shape or other issues were found in others. Conclusions and Relevance Sufficiently sophisticated custom GPTs can perform complex statistical tasks and may be abused to fabricate synthetic datasets that can pass forensic analysis as authentic.
科研通智能强力驱动
Strongly Powered by AbleSci AI