Assessing Bone Age: A Paradigm for the Next Generation of Artificial Intelligence in Radiology

医学 放射科 人工智能 医学物理学 计算机科学
作者
David A. Rubin
出处
期刊:Radiology [Radiological Society of North America]
卷期号:301 (3): 700-701 被引量:5
标识
DOI:10.1148/radiol.2021211339
摘要

HomeRadiologyVol. 301, No. 3 PreviousNext Reviews and CommentaryFree AccessEditorialAssessing Bone Age: A Paradigm for the Next Generation of Artificial Intelligence in RadiologyDavid A. Rubin David A. Rubin Author AffiliationsFrom the Department of Radiology, NYU Grossman School of Medicine, 160 E 34th St, New York, NY 10016; All Pro Orthopedic Imaging Consultants, St Louis, Mo; and Radsource, Brentwood, Tenn.Address correspondence to the author (e-mail: [email protected]).David A. Rubin Published Online:Sep 28 2021https://doi.org/10.1148/radiol.2021211339MoreSectionsPDF ToolsImage ViewerAdd to favoritesCiteTrack CitationsPermissionsReprints ShareShare onFacebookTwitterLinked In See also the article by Eng et al in this issue.Dr Rubin is an adjunct professor of radiology at the New York University Grossman School of Medicine, president of All Pro Orthopedic Imaging Consultants, and a practicing musculoskeletal radiologist for Radsource. He is a fellow of the American College of Radiology and an associate editor of Radiology.Download as PowerPointOpen in Image Viewer The diagnosis, monitoring, and treatment planning for several musculoskeletal conditions, including various endocrinopathies, abnormal stature, scoliosis, and limb length discrepancies, rely on an accurate assessment of skeletal maturity. Skeletal maturity is typically estimated with a hand and wrist radiograph. A common method uses an atlas developed by Greulich and Pyle (1). The atlas is based on serial examinations performed in 1000 healthy boys and girls in the Cleveland, Ohio, area from 1931 through 1942—a "big data" experiment performed in the precomputer age. The participants were White, born in the United States, and mostly of Northern European descent and high socioeconomic status. Plates in the atlas show the most representative image from 100 radiographs of children at the age and sex of the reference standard. A radiologist makes multiple subjective judgements comparing individual bones in a patient with those depicted in the reference standard. He or she then assigns a bone (skeletal) age, which is defined as the chronologic age at which children on whom the standards were based would attain the same degree of skeletal maturity (1).Bone age determination seems like an ideal application for artificial intelligence (AI): It is based on one standardized posteroanterior radiograph. There is only one "diagnosis" (ie, the estimated bone age), unlike other applications where an algorithm used to detect pneumothorax could not fully evaluate a chest radiograph for nodules, infiltrates, or heart failure. The task is considered tedious and time consuming by many radiologists and requires experience to be an expert reader. Reliability and reproducibility are paramount, especially because sequential examinations are often performed in clinical practice. These reasons likely explain the multiple AI products already developed to estimate bone age (2) and the results of a 2017 Radiological Society of North America challenge that garnered 105 entries (3).In this issue of Radiology, Eng et al (4) conducted a multi-institutional randomized investigation of an AI technique trained to predict bone age based on a ground truth established by a four-expert panel, applying a robust statistical analysis. Two unique and laudable features of the study stand out. First, the multicenter design involved 93 radiologists, simulating a real-world scenario instead of an artificial one that testing in one academic center might have produced. Second, the authors compared the accuracy attained by the radiologists with access to an AI-generated bone age compared with the same radiologists when they worked unaided. Radiologists were shown the AI results but then could accept or override them. This approach—rather than testing the accuracy of AI alone compared with that of the radiologist alone—mimics how most practices would apply this technology. In addition, this method addresses a fear of some radiologists that they may be replaced instead of enhanced by AI in the future.The results showed that for five of the six included centers, the AI-aided method resulted in improved performance compared with performance without AI, with the mean absolute difference from the ground truth reduced from 6.0 to 5.4 months. The number of examinations assigned bone ages more than 12 months different from the ground truth also decreased in the AI-assisted scenario from 13.0% to 9.3%. Average interpretation time was 40 seconds faster in the AI-aided scenario.Additionally, the interpretations provided by the radiologists working together with the AI input were more accurate than those that were assigned by AI alone, which had a mean absolute difference of 6.2 months. The authors provide several explanations. Essentially, most radiologists were more likely to overrule an inaccurate AI-assigned bone age rather than incorrectly change a correct one. Furthermore, AI and radiologist performance were complementary, with certain cases more accurately assessed by either a human or a machine.The study did identify one center as an outlier, where results of the AI-aided interpretations were less accurate. The authors found that while the radiologists acting alone at this center outperformed their peers at the other locations, they were also more likely to modify an initial highly accurate estimate (at most 3 months different than ground truth) provided by the AI. Radiology practices embracing AI software must understand that individual behavior can potentially negate the benefits of AI. Remember: Your results may vary.While there were significant improvements in accuracy and interpretation times, the actual magnitude of the improvement was small: A mean improvement compared with the ground truth of 0.6 months (18 days) is unlikely to be clinically relevant and may not justify the implementation of an AI tool for many practices. This is especially true in practices with low volumes of requested bone age examinations, where a 40-second savings a few times a day may not be meaningful.The current study also discusses automation bias—the tendency of some radiologists to overly trust the AI even when it presents inaccurate estimates—which may result in time savings at the cost of decreased accuracy. The effect is somewhat analogous to that seen when other (non-AI) data are known before image interpretation. For example, observers first told the chronologic age of a patient are more likely to assign a bone age within 2 standard deviations of that age, compared with when the chronologic age is withheld (5). Again, users incorporating AI into their current or future practices need to be aware of this potential unconscious bias.So, what are the logical next steps for developing AI radiology applications? Continuing advances in computer architecture and programming techniques will incrementally improve performance and speed, although one can argue that the current accuracy of many AI algorithms (or ensembles of algorithms) already exceeds what is needed clinically. Does an error rate of 5.4 months for assigning bone age affect clinical care when the standard deviation for healthy children older than 3.5 years (1) already exceeds this amount? The ability of AI-assisted techniques to identify meaningful changes in sequential examinations still needs to be proven, but given the current accuracy, it is very likely that sensitivity to change will at least equal that available with human interpretations.I believe it is time to think about eliminating the human-based ground truth for future applications. While expert consensus was a necessary initial step in evaluating new algorithms, it is possible that some AI already outperforms radiologists, but current study design (using a human-based reference standard) makes that impossible to show. In essence, we are not training algorithms to find the most correct answer but rather to best predict what the radiologist-based diagnosis would be. Recently, Pan et al (6) showed that AI could be trained and validated using radiographs obtained from a diverse pediatric trauma population, where each patient's chronologic age was used as the ground truth.One major critique of using the Greulich and Pyle atlas as a reference standard is that it may not be equally applicable to children of different ethnic and racial backgrounds (7,8). Another is that changes in nutrition, physical fitness, and overall health may mean that normal ranges present in the 1930s no longer adequately apply to current populations. Why not leverage AI to develop new standards, using collections of radiographs obtained from otherwise healthy children for nonendocrine indications? The goal would be to predict each child's actual chronologic age on the day of radiography, not the bone age assigned by a cohort of expert radiologists. With current computing power, it would be possible to process thousands of images divided into groups by ethnicity, geography, or other factors to establish multiple new norms, in a small fraction of the 12 years needed to create the original data used for the Greulich and Pyle atlas.Large digital databases provide an opportunity to develop AI that moves beyond predicting how radiologists would interpret an image. For example, a recent study (9) investigated how AI could be trained not only to predict the Kellgren-Lawrence grade of knee osteoarthritis assigned by radiologists but also to prognosticate the risk of future joint replacement based on a knee radiograph. Only by shedding the limitations imposed by training using a human-based ground truth can researchers develop applications that will enable clinically relevant forecasts that are currently beyond the abilities of non–AI-aided radiologists.Disclosures of Conflicts of Interest: D.A.R. is on the ImageBiopsy Lab medical advisory board; is an associate editor of Radiology.References1. Greulich WW, Pyle SL. Radiographic atlas of skeletal development of the hand and wrist. 2nd ed. Stanford, Calif:Stanford University Press,1959. Crossref, Google Scholar2. Dallora AL, Anderberg P, Kvist O, Mendes E, Diaz Ruiz S, Sanmartin Berglund J. Bone age assessment with various machine learning techniques: A systematic literature review and meta-analysis. PLoS One 2019;14(7):e0220242. Crossref, Medline, Google Scholar3. Halabi SS, Prevedello LM, Kalpathy-Cramer J, et al. The RSNA Pediatric Bone Age Machine Learning Challenge. Radiology 2019;290(2):498–503. Link, Google Scholar4. Eng DK, Khandwala NB, Long J, et al. Artificial intelligence algorithm improves radiologist performance in skeletal age assessment: a prospective multicenter randomized controlled trial. Radiology 2021.https://doi.org/10.1148/radiol.2021204021. Published online September 28, 2021. Link, Google Scholar5. Berst MJ, Dolan L, Bogdanowicz MM, Stevens MA, Chow S, Brandser EA. Effect of knowledge of chronologic age on the variability of pediatric bone age determined using the Greulich and Pyle standards. AJR Am J Roentgenol 2001;176(2):507–510. Crossref, Medline, Google Scholar6. Pan I, Baird GL, Mutasa S, et al. Rethinking Greulich and Pyle: A Deep Learning Approach to Pediatric Bone Age Assessment Using Pediatric Trauma Hand Radiographs. Radiol Artif Intell 2020;2(4):e190198. Link, Google Scholar7. Zhang A, Sayre JW, Vachon L, Liu BJ, Huang HK. Racial differences in growth patterns of children assessed on the basis of bone age. Radiology 2009;250(1):228–235. Link, Google Scholar8. Alshamrani K, Messina F, Offiah AC. Is the Greulich and Pyle atlas applicable to all ethnicities? A systematic review and meta-analysis. Eur Radiol 2019;29(6):2910–2923. Crossref, Medline, Google Scholar9. Leung K, Zhang B, Tan J, et al. Prediction of Total Knee Replacement and Diagnosis of Osteoarthritis by Using Deep Learning on Knee Radiographs: Data from the Osteoarthritis Initiative. Radiology 2020;296(3):584–593. Link, Google ScholarArticle HistoryReceived: May 25 2021Revision requested: June 14 2021Revision received: June 15 2021Accepted: June 17 2021Published online: Sept 28 2021Published in print: Dec 2021 FiguresReferencesRelatedDetailsCited ByOpenness and Transparency in the Evaluation of Bias in Artificial IntelligenceDavid B. Larson, 27 September 2022 | Radiology, Vol. 306, No. 2Radiomics and Artificial Intelligence: From Academia to Clinical PracticePeter Steiger, 1 March 2022 | Radiology, Vol. 303, No. 3Accompanying This ArticleArtificial Intelligence Algorithm Improves Radiologist Performance in Skeletal Age Assessment: A Prospective Multicenter Randomized Controlled TrialSep 28 2021RadiologyRecommended Articles Point-of-Care Bone Age Evaluation: The Increasing Role of US in Resource-limited PopulationsRadiology2020Volume: 296Issue: 1pp. 170-171Performance of a Deep-Learning Neural Network Model in Assessing Skeletal Maturity on Pediatric Hand RadiographsRadiology2017Volume: 287Issue: 1pp. 313-322Statistical Confirmation of a Method of US Determination of Bone AgeRadiology2021Volume: 300Issue: 1pp. 176-183Rethinking Greulich and Pyle: A Deep Learning Approach to Pediatric Bone Age Assessment Using Pediatric Trauma Hand RadiographsRadiology: Artificial Intelligence2020Volume: 2Issue: 4Improving Automated Pediatric Bone Age Estimation Using Ensembles of Models from the 2017 RSNA Machine Learning ChallengeRadiology: Artificial Intelligence2019Volume: 1Issue: 6See More RSNA Education Exhibits Evidence Based Radiology in the age of Artificial Intelligence: The PICO/DATO ModelDigital Posters2020Working "Smarter" : A Review of Artificial Intelligence in Musculoskeletal RadiologyDigital Posters2022What Every Radiologist Should Know about Assessing Lower Limb Length DiscrepancyDigital Posters2019 RSNA Case Collection Pisiform DislocationRSNA Case Collection2021Erosive OsteoarthritisRSNA Case Collection2020Osteochondritis dissecansRSNA Case Collection2020 Vol. 301, No. 3 Metrics Altmetric Score PDF download
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
2秒前
2秒前
3秒前
蓝色发布了新的文献求助10
4秒前
万能图书馆应助落寞以寒采纳,获得10
6秒前
zhang完成签到,获得积分10
6秒前
KingYugene发布了新的文献求助10
8秒前
8秒前
9秒前
yuqinghui98发布了新的文献求助10
11秒前
冷静的孤风完成签到,获得积分10
11秒前
隐形曼青应助花凉采纳,获得10
12秒前
科研通AI5应助青衣采纳,获得10
12秒前
蓝色发布了新的文献求助10
13秒前
14秒前
刘璇关注了科研通微信公众号
15秒前
19秒前
cc发布了新的文献求助10
20秒前
天真千易发布了新的文献求助10
21秒前
慕青应助不成文采纳,获得10
21秒前
王先生完成签到 ,获得积分10
22秒前
蓝色发布了新的文献求助10
23秒前
27秒前
28秒前
鱼贝贝完成签到,获得积分10
29秒前
Orange应助蓝岳洋采纳,获得10
30秒前
余味应助外向宛菡采纳,获得10
30秒前
32秒前
Ann完成签到,获得积分10
34秒前
111完成签到,获得积分10
34秒前
zhuazhua完成签到 ,获得积分10
34秒前
Isaac完成签到 ,获得积分10
34秒前
蓝色发布了新的文献求助30
36秒前
37秒前
博修发布了新的文献求助30
37秒前
40秒前
小太阳完成签到,获得积分10
41秒前
43秒前
科研通AI5应助jj采纳,获得10
43秒前
QAQ完成签到,获得积分10
43秒前
高分求助中
Basic Discrete Mathematics 1000
Technologies supporting mass customization of apparel: A pilot project 600
Introduction to Strong Mixing Conditions Volumes 1-3 500
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
China Gadabouts: New Frontiers of Humanitarian Nursing, 1941–51 400
The Healthy Socialist Life in Maoist China, 1949–1980 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3799143
求助须知:如何正确求助?哪些是违规求助? 3344848
关于积分的说明 10321712
捐赠科研通 3061268
什么是DOI,文献DOI怎么找? 1680119
邀请新用户注册赠送积分活动 806904
科研通“疑难数据库(出版商)”最低求助积分说明 763445