Expert-level diagnosis of nasal polyps using deep learning on whole-slide imaging

医学鼻息肉人工智能放射科计算机科学病理

作者

Qingwu Wu,Jianning Chen,Huiyi Deng,Yong Ren,Yueqi Sun,Weihao Wang,Lianxiong Yuan,Haiyu Hong,Rui Zheng,Weifeng Kong,Xuekun Huang,Guifang Huang,Lunji Wang,Yana Zhang,Lanqing Han,Qintai Yang

出处

期刊：The Journal of Allergy and Clinical Immunology [Elsevier BV]
日期：2019-12-09 卷期号：145 (2): 698-701.e6 被引量：30

链接

nih.govdoi.org

标识

DOI：10.1016/j.jaci.2019.12.002

摘要

Chronic rhinosinusitis (CRS) is defined as a chronic inflammation of the nose and paranasal sinuses. It is estimated that CRS affects more than 100 million patients worldwide, and it involves high management costs and poor quality of life in affected subjects.1Fokkens W.J. Lund V.J. Mullol J. Bachert C. Alobid I. Baroody F. et al.European position paper on rhinosinusitis and nasal polyps 2012.Rhinol Suppl. 2012; 23 (p preceding table of contents, 1-298): 3PubMed Google Scholar The presence of eosinophils in nasal polyps is linked to higher postoperative visual analog pain scores, impaired quality of life, and a high recurrence rate.2Ikeda K. Shiozawa A. Ono N. Kusunoki T. Hirotsu M. Homma H. et al.Subclassification of chronic rhinosinusitis with nasal polyp based on eosinophil and neutrophil.Laryngoscope. 2013; 123: E1-E9Crossref PubMed Scopus (141) Google Scholar A better understanding of the ratio of eosinophils (RE) to infiltrating inflammatory cells in tissue is needed to improve diagnostic and treatment strategies for affected patients.3Snidvongs K. Lam M. Sacks R. Earls P. Kalish L. Phillips P.S. et al.Structured histopathology profiling of chronic rhinosinusitis in routine practice.Int Forum Allergy Rhinol. 2012; 2: 376-385Crossref PubMed Scopus (139) Google Scholar Thus far, there are no uniform standards or rules regarding diagnosis of eosinophilic CRS with nasal polyps (eCRSwNP), and a variety of problems exist in practice. Some researchers recommend that the amounts of eosinophils per hpf be more than 15 or 100.2Ikeda K. Shiozawa A. Ono N. Kusunoki T. Hirotsu M. Homma H. et al.Subclassification of chronic rhinosinusitis with nasal polyp based on eosinophil and neutrophil.Laryngoscope. 2013; 123: E1-E9Crossref PubMed Scopus (141) Google Scholar,4Wen W. Liu W. Zhang L. Bai J. Fan Y. Xia W. et al.Increased neutrophilia in nasal polyps reduces the response to oral corticosteroid therapy.J Allergy Clin Immunol. 2012; 129: 1522-1528.e1525Abstract Full Text Full Text PDF PubMed Scopus (199) Google Scholar Most researchers support the assessment of the RE in several random hpfs, with eCRSwNP diagnosed when the RE is greater than 10%.5Mahdavinia M. Suh L.A. Carter R.G. Stevens W.W. Norton J.E. Kato A. et al.Increased noneosinophilic nasal polyps in chronic rhinosinusitis in US second-generation Asians suggest genetic regulation of eosinophilia.J Allergy Clin Immunol. 2015; 135: 576-579Abstract Full Text Full Text PDF PubMed Scopus (81) Google Scholar,6Cao P.P. Li H.B. Wang B.F. Wang S.B. You X.J. Cui Y.H. et al.Distinct immunopathologic characteristics of various types of chronic rhinosinusitis in adult Chinese.J Allergy Clin Immunol. 2009; 124 (484.e471-e472): 478-484Abstract Full Text Full Text PDF PubMed Scopus (328) Google Scholar The traditional method (REslide-tm) dictates that the pathologist assess the RE to infiltrating inflammatory cells (which include eosinophils, neutrophils, lymphocytes, plasma cells, etc.) in 10 random hpfs for the tissue.6Cao P.P. Li H.B. Wang B.F. Wang S.B. You X.J. Cui Y.H. et al.Distinct immunopathologic characteristics of various types of chronic rhinosinusitis in adult Chinese.J Allergy Clin Immunol. 2009; 124 (484.e471-e472): 478-484Abstract Full Text Full Text PDF PubMed Scopus (328) Google Scholar However, the RE obviously differs between various hpfs. Preliminary studies have shown sampling errors among the estimates based on 10 random hpfs and in the overall eosinophil counts in the total sample. Therefore, we considered the RE of whole-slide imaging (WSI) as the criterion standard (REslide-actual) for assessing eCRSwNP for its lack of sampling error. However, WSI is difficult in practice because it is both time-consuming and subjective. Artificial intelligence (AI), especially deep learning algorithms, has made great progress and is similar to or even better than humans in terms of visual perception and speech recognition. Therefore, we aimed to establish an AI evaluation platform (the AI CRS evaluation platform [AICEP], REslide-predict) to diagnose eCRSwNP rapidly and accurately via deep learning and WSI. A total of 195 nasal polyp specimens were obtained from 3 affiliated hospitals of Sun Yat-sen University (179 from The Third Hospital, 9 from The First Hospital, and 7 from The Fifth Hospital). After WSI, we automatically extracted 26,589 patches in the lamina propria of mucosa and marked the actual RE in each patch (REpatch-actual, see the Methods section in this article's Online Repository at www.jacionline.org). The patches were classified as the training data set, the internal validation data set, and the independent external test data set (see Fig E1 in this article's Online Repository at www.jacionline.org). In this study, our AICEP was used to compare 3 common architectures (Resnet50, Xception, and Inception V3) for application of a transfer learning algorithm to assess their performance in the classification and regression of patches extracted from the whole-slide images (Fig 1). Within 100 epochs (iterations through the entire training data set), the retrained weights were saved owing to the absence of further improvement in the mean absolute error (see Fig E2, A in this article's Online Repository at www.jacionline.org) and mean square error loss (Fig E2). First, we completed the qualitative classification of both the internal validation and external test data sets by using the Resnet50, Xception, and InceptionV3 models. The WSI results were classified as eosinophilic when REslide exceeded 10% (see the Methods section in this article's Online Repository at www.jacionline.org). The respective sensitivities for the internal and external data sets were 97.0% and 93.5% for the Resnet50 model, 90.1% and 84.2% for the Xception model, and 93.9% and 90.3% for the InceptionV3 model, respectively. The corresponding specificities were 86.0% and 84.6%, 88.2% and 88.4%, and 88.2% and 86.4%, individually. Our study showed that internal authentication was far superior to external authentication (see Fig E3 in this article's Online Repository at www.jacionline.org). The areas under the receiver operating characteristic curve from the internal validation and external test data sets of Inception V3 were 0.974 and 0.957, respectively, which indicated that this was the best model (Fig 2, A and B). Second, the convolutional neural network was visualized to identify the region of eosinophils, which confirmed that the model was able to learn from the characteristics of eosinophils only (Fig 2, C and D). In addition, for quantitative analysis of the AICEP, we found that the mean absolute errors of the REpatch-actual and the predicted RE (REpatch-predict) in both the internal validation data set and external test data set were 4.3% and 5.8%. Meanwhile, both the consistency of intraclass correlation coefficient and the agreement of REpatch-predict and REpatch-actual in the internal validation data set and external test data set were greater than 0.95, indicating high consistency from the AICEP analysis (see Table E1 in this article's Online Repository at www.jacionline.org). When compared with REslide-predict from the AICEP, pathologist simulation, and REslide-actual from the internal validation data set of 12 patients, the AICEP could diagnose all 12 patients correctly, whereas the traditional method made only 10 correct diagnoses, unfortunately resulting in 2 misdiagnosed patients (patients 4 and 5; Fig 2, E). Similarly, when we compared REslide-predict from the AICEP with pathologist simulation and REslide-actual from the external test data set of 16 patients, the AICEP correctly diagnosed all 16 patients, whereas the traditional method could have resulted in misdiagnosis of 4 patients (patients 6, 7, 8, and 10; Fig 2, F). Finally, we compared the time required for diagnosis between the AICEP and pathologist judgment. The result showed that the AICEP (5.4 ± 0.87 minutes) took less time than REslide-tm (12.7 ± 2.78 minutes) and REslide-actual (148.6 ± 34.36 minutes, P < .0001, see Table E2 in this article's Online Repository at www.jacionline.org). In our study, we advocated WSI assessment instead of REslide-tm. Although WSI is undoubtedly more accurate, it requires an immense amount of time. What is worse, in China, the medical resources in the Midwest are significantly worse than those in the eastern coastal areas, and the numbers of pathologists are inadequate, especially in some primary hospitals. To some extent, the AICEP can solve this problem well, as it can diagnose nasal polyp pathologic types by WSI and AI more efficiently. AI-facilitated diagnosis can alleviate doctors' workload and contribute to provision of high-quality medical care to patients in need.7Luo H. Xu G. Li C. He L. Luo L. Wang Z. et al.Real-time artificial intelligence for detection of upper gastrointestinal cancer by endoscopy: a multicentre, case-control, diagnostic study.Lancet Oncol. 2019; 20: 1645-1654Abstract Full Text Full Text PDF PubMed Scopus (171) Google Scholar,8Lin H. Li R. Liu Z. Chen J. Yang Y. Chen H. et al.Diagnostic efficacy and therapeutic decision-making capacity of an artificial intelligence platform for childhood cataracts in eye clinics: a multicentre randomized controlled trial.EClinicalMedicine. 2019; 9: 52-59Abstract Full Text Full Text PDF PubMed Scopus (83) Google Scholar It is well known that the diagnosis of disease depends on the intuition and experience of pathologists. Moreover, large workload can lead to pathologists' working inefficiency and increase the chance of mistakes being made. Our results showed that REslide-tm may result in wrong diagnosis; this was especially true when the proportion of tissue eosinophils was approximately 10%. However, this problem can be resolved by our AICEP, which can diagnose all patients accurately. Although AI has already shown great potential for assisting doctors in diagnosis and decision making, there are still some limitations. For instance, the real-world diagnostic accuracy of AI was lower than that reported in a previous study conducted with screening data sets.9Long E. Lin H. Liu Z. Wu X. Wang L. Jiang J. et al.An artificial intelligence platform for the multihospital collaborative management of congenital cataracts.Nat Biomed Eng. 2017; 1Crossref Scopus (205) Google Scholar Our study showed a similar result, namely, that the AICEP performed better in the internal validation data set than in the external test data set. In our study, the internal validation data set and the training data set came from a similar process regarding slicing, staining, and WSI scanning, whereas these aspects may differ in the external test data set. Thus, it is important to optimize the AICEP with data from multiple centers. Overall, the AICEP marks the first use of deep learning in combination with WSI in diagnosis of nasal polyps. It can evaluate the pathologic characterizations of nasal polyps faster and more accurately. We believe that the AICEP will be used widely, in particular in primary hospitals, and even all around the world through the cloud platform. The authors would like to thank Chunkui Shao (professor, Department of Pathology, The Third Affiliated Hospital of Sun Yat-sen University) and his colleagues for their help. This study was approved by the research ethics committee of the Institute of Basic Research in Clinical Medicine, Third Affiliated Hospital of Sun Yat-sen University ([2019]02-157-01). The research was registered at the Chinese Clinical Trails Registry (http://www.chictr.org.cn/index.aspx) with the number ChiCTR1900021601. Biopsy specimens for patients with CRS with nasal polyps (n = 1465) were obtained from the Department of Otolaryngology in the Third Affiliated Hospital of Sun Yat-sen University (SYSU) in China from January 2008 to December 2018. After screening for staining, size, and quality of specimens, 179 patients were used in this analysis. The patients were randomly divided into 2 groups: 167 patients in the training data set and 12 patients in the internal validation data set. After all slides were scanned through an automatic digital slide scanner (Panoramic 250 FLASH, 3DHISTECH Ltd, Budapest, Hungary), we obtained 179 digital whole-slide images. The lamina propria of mucosa were sketched, excluding large glands, through an automated slide analysis platform (Radboud University Medical Center, The Netherlands) to yield regions of interest. Patches in the regions of interest were automatically extracted under a 400 × hpf by using Openslide (version 3.4.1, University of Pittsburgh, Pittsburgh, Pa). There were 167 whole-slide images containing 23,048 patches for the training data set and 12 whole-slide images containing 1,577 patches for the internal validation data set (Fig E1). A total of 16 patients (16 whole-slide images) with nasal polyps were randomly selected from the First Affiliated Hospital of SYSU (n = 9) and the Fifth Affiliated Hospital of SYSU (n = 7) from January 2017 to December 2018. Independent preparations by each hospital were used for hematoxylin and eosin staining as well as for WSI scanning. In total, 1964 patches were obtained by using the method already mentioned. In total, 26,589 patches were independently described and labeled by a committee comprising 2 competent pathologists with more than 10 years of experience and an expert pathologist with more than 30 years of experience who was consulted in case of disagreement. The 2 competent pathologists identified and counted the number of eosinophils, number of lymphocytes, number of neutrophils, and number of plasma cells in each patch. The number of infiltrating inflammatory cells was regarded as the sum, and the ratio of eosinophils (REpatch-actual) was n1/t, where n1 is the number of eosinophils and t is the sum of the number of eosinophils, number of lymphocytes, number of neutrophils, and number of plasma cells in each patch. When the 2 pathologists' assessment of REpatch-actual differed by less than 5%, the average value was used. If the difference was greater than 5%, the patch was rechecked by the expert pathologist and the value was corrected as necessary. These assessments yielded the average of all patches from WSI, designated as REslide-actual. Patients with CRS with nasal polyps were classified as eosinophilic when the proportion of tissue eosinophils exceeded 10% of the total infiltrating inflammatory cells as previously reportedE1Cao P.P. Li H.B. Wang B.F. Wang S.B. You X.J. Cui Y.H. et al.Distinct immunopathologic characteristics of various types of chronic rhinosinusitis in adult Chinese.J Allergy Clin Immunol. 2009; 124 (484.e471-e472): 478-484Abstract Full Text Full Text PDF PubMed Scopus (438) Google Scholar; otherwise, they were regarded as non-eCRSwNP. In this study, our AICEP was used to compare 3 commonly used architectures (Resnet50, Xception, and Inception V3) for application of a transfer learning algorithm to assess their performance in the classification and regression of patches extracted from the whole-slide images. Each model loaded the weights pretrained on the ImageNet data set, then removed their top layer. Next, to distinguish patches with REpatch values greater or less than the truncated value with use of a classification algorithm, a full-connection layer with 2 neurons was added and each neuron contained weights and an activation function, so that it could map input value to output value nonlinearly. To predict exact REpatch values with a regression algorithm, we chose the model with the greatest area under the curve and added a full-connection layer containing only 1 neuron. Importantly, no activation function was used at this time to ensure that the model exhibited a broader output value. Within 100 epochs (iterations through the entire training data set), the retrained weights were saved owing to the absence of further improvement in the mean absolute error (Fig E2, A) and the mean square error loss (Fig E2, B). Finally, the parameters of all layers of quantitative regression architecture were fine-tuned in accordance with the input images and corresponding labels (Fig 1). To train and evaluate our models, we adopted the Keras library (version 2.2) framework using TensorFlow (version 1.8) backend within Python (version 3.6) programming language, including libraries such as numpy, matplotlib, and Scikit-learn. Computing power was provided by 1 Tesla V100 graphics processing unit with 32 gigabytes of memory on a Nvidia DGX1 server, which had 8 Tesla V100 graphics processing units, 512 gigabytes of double data rate 4 memory, and 7 terabytes of solid-state driver memory. For the internal validation data set and external test data set with use of Resnet50, Xception, and InceptionV3 for data training, the AICEP provided an effective approach for qualitative classification. The WSI results were classified as eosinophilic when REslide exceeded 10%, as previously mentioned. The sensitivity (true positive rate) and specificity (false-positive rate) of the confusion matrices of these 3 models were calculated, as were the areas under the receiver operating characteristic curve. The model with the highest value of the area under the receiver operating characteristic curve was selected for subsequent quantitative analyses. In addition, to verify whether the model was trained correctly on the basis of the characteristics of eosinophils, we used visual gradient-weighted class activation mapping. All patches in both the internal validation and external test data sets were input into the AICEP model for simulation, which produced REpatch-predict. In addition, the mean absolute errors of REpatch-predict and REpatch-actual were calculated. The concordance between REpatch-predict and REpatch-actual was evaluated by using the intraclass correlation coefficient. For the internal validation and external test data sets, we compared REslide-predict and REslide-actual separately. The concordance between REslide-predict and REslide-actual was evaluated by using the intraclass correlation coefficient. In addition, we randomly selected 10 REpatch values of each WSI analysis by using a bootstrap method and calculated the average. The bootstrap process was repeated 50 times for each WSI analysis to evaluate and compare the diagnostic effect of the traditional method and the AICEP. Times for REslide-predict, REslide-tm, and REslide-actual were calculated. With use of a bootstrap simulation of 10 random fields for diagnosis, each instance of WSI was repeated 50 times and compared with REslide-actual. The intraclass correlation coefficient was used to assess agreement between REpredict with REactual. Receiver operating curves were adopted to evaluate the diagnostic results of the AICEP on eCRSwNP. All tests were 2 sided, and a P value less than 0.05 was considered statistically significant.Fig E2Plot showing the performance in the training and internal validation data sets. Mean absolute error was plotted against the training epoch (A) and mean square error loss was plotted against the training epoch (B) during training the quantitative regression architecture over the course of 100 epochs. The mean absolute error and loss of validation showed great performance, with little overfitting because of the diversity of the training data set.View Large Image Figure ViewerDownload Hi-res image Download (PPT)Fig E3Confusion matrix of the models' classification of patch with an RE of 10% or greater from a patch with an RE less than 10%. A-C, Confusion matrices of the internal validation data set for models of the Resnet50, Xception, and Inception V3 architectures, respectively. D-F, Confusion matrix of the independent external test data set for models of the Resnet50, Xception, and Inception V3 architectures, respectively.View Large Image Figure ViewerDownload Hi-res image Download (PPT)Table E1Consistency assessment for the AICEP in internal validation data set and external test data set according to the REpatch-actual and REslide-actualLevelInternal validation data setExternal test data setICC consistency (95% CI)ICC agreement (95% CI)ICC consistency (95% CI)ICC agreement (95% CI)REpatch0.981 (0.979-0.983)0.981 (0.979-0.982)0.977 (0.975-0.979)0.976 (0.970-0.980)REslide0.999 (0.997-1.000)0.999 (0.998-1.000)0.995 (0.985-0.998)0.993 (0.973-0.998)ICC, Intraclass correlation coefficient. Open table in a new tab Table E2Comparison of time consumption between the AICEP and pathologistsMethodMean time ± SD (min)95% CIREslide-predict5.4 ± 0.875.28-5.52REslide-tm12.7 ± 2.7812.31-13.09REslide-actual148.6 ± 34.36143.78-153.42 Open table in a new tab ICC, Intraclass correlation coefficient.

求助该文献

最长约 10秒，即可获得该文献文件

Expert-level diagnosis of nasal polyps using deep learning on whole-slide imaging

今日热心研友