谬误
认知偏差
事后诸葛亮
确认偏差
认知
认知心理学
医学
心理学
梅德林
偏差(统计)
借记
动机推理
响应偏差
情报分析
社会心理学
言语推理
认知负荷
乐观主义偏见
人类智力
隐性偏差
应用心理学
性别偏见
医学文献
信息处理
行为科学
推理心理学
人工智能
作者
Jonathan Wang,Donald A Redelmeier
标识
DOI:10.1136/bmjqs-2025-019299
摘要
Background Artificial intelligence large language models (LLMs) are increasingly used to inform clinical decisions but sometimes exhibit human-like cognitive biases when facing nuanced medical choices. Methods We tested whether new chain-of-thought reasoning LLMs might mitigate cognitive biases observed in physicians. We presented medical scenarios (n=10) to models released by DeepSeek, OpenAI and Google. Each scenario was presented in two versions that differed according to a specific bias (eg, surgery framed in survival vs mortality statistics). Responses were categorised and the extent of bias was measured by the absolute discrepancy between responses to different versions of the same scenario. The extent of intransigence (also termed dogma or inflexibility) was measured by Shannon entropy. The extent of deviance in each scenario was measured by comparing the average model response to the average practicing physician response (n=2507). Results DeepSeek-R1 mitigated 6 out of 10 cognitive biases observed in practicing physicians by generating intransigent all-or-none responses. The four biases that persisted were post hoc fallacy (34% vs 0%, p<0.001), decoy effects (44% vs 5%, p<0.001), Occam’s razor fallacy (100% vs 0%, p<0.001) and hindsight bias (56% vs 0%, p<0.001). In every scenario, the average model response deviated substantially from the average response of practicing physicians (p<0.001 for all). Similar patterns of persistent specific biases, intransigent responses and substantial deviance from practicing physicians were also apparent in OpenAI and Google. Conclusion Some biases persist in chain-of-thought reasoning LLMs, and models tend to produce intransigent recommendations. These findings highlight the role of clinicians to think broadly, respect diversity and remain vigilant when interpreting chain-of-thought reasoning artificial intelligence LLMs in nuanced medical decisions for patients.
科研通智能强力驱动
Strongly Powered by AbleSci AI