医学
人口统计学的
皮肤病科
多样性(政治)
人口学
人类学
社会学
作者
Lucie Joerg,Margaret Kabakova,Jennifer Y. Wang,Evan Austin,Marc Cohen,Alana Kurtti,Jared Jagdeo
摘要
Abstract Background Generative AI models are increasingly used in dermatology, yet biases in training datasets may reduce diagnostic accuracy and perpetuate ethnic health disparities. Objectives To evaluate two key AI outputs: (1) skin tone representation and (2) diagnostic accuracy of generated dermatologic conditions. Methods Using the standard prompt ‘Generate a photo of a person with [skin condition],’ this cross‐sectional study investigated skin tone diversity and accuracy of four leading AI models—Adobe Firefly, ChatGPT‐4o, Midjourney and Stable Diffusion—across the 20 most common skin conditions. All images ( n = 4000) were evaluated for skin tone representation from June to July 2024. Two independent raters used the Fitzpatrick scale to assess skin tone diversity compared to U.S. Census demographics using χ 2 . Two blinded dermatology residents evaluated a randomized 200‐image subset for diagnostic accuracy. An inter‐rater kappa statistic was calculated to assess rater agreement. Results Across all generated images, 89.8% depicted light skin, and 10.2% depicted dark skin. Adobe Firefly demonstrated the highest alignment with U.S. demographic data, with a non‐significant chi‐square result (38.1% dark skin, χ 2 (1) = 0.320, p = 0.572), indicating no meaningful difference between its generated skin tone diversity and census demographics. ChatGPT‐4o, Midjourney and Stable Diffusion significantly underrepresented dark skin with Fitzpatrick scores of >IV (6.0%, 3.9% and 8.7% dark skin, respectively; all p < 0.001). Across all platforms, only 15% of images were identifiable by raters as the intended condition. Adobe Firefly had the lowest accuracy (0.94%), while ChatGPT‐4o, Midjourney and Stable Diffusion demonstrated higher but still suboptimal accuracy (22%, 12.2% and 22.5%, respectively). Conclusions The study highlights substantial deficiencies in the diversity and accuracy of AI‐generated dermatological images. AI programs may exacerbate cognitive bias and health inequity, suggesting the need for ethical AI guidelines and diverse datasets to improve disease diagnosis and dermatologic care.
科研通智能强力驱动
Strongly Powered by AbleSci AI