信息抽取
计算机科学
短信
自然语言处理
人工智能
多发性硬化
病变
医学
情报检索
模式识别(心理学)
病理
万维网
精神科
作者
Qiang Fang,Richard Choo,Yuping Duan,Yuxia Duan,Hongming Chen,Yun Gao,Yunyan Zhang,Zhiqun Mao
摘要
Purpose: To investigate how bidirectional encoder representations from transformers (BERT)-based models help extract treatment response information from free-text radiology reports. Materials and methods: This study involved 400 brain MRI reports from 115 participants with multiple sclerosis. New MRI lesion activity including new or enlarging T2 (newT2) and enhancing T1 (enhanceT1) lesions for assessing treatment responsiveness was identified using the named entity recognition technique along with BERT. Likewise, 2 other associated entities were also identified: the remaining brain MRI lesions (regT2), and lesion location. Report sentences containing any of the 4 entities were labeled for model development, totally 2568. Four recognized BERT models were investigated, each with conditional random field integrated for lesion versus location classification, trained using variable sample sizes (500–2000 sentences). Regularity was then applied for lesion subtyping. Model evaluation utilized a flexible F1 score, among others. Results: The Clinical-BERT performed the best. It achieved the best testing flexible F1 score of 0.721 in lesion and location classification, 0.741 in lesion only classification, and 0.771 in regT2 subtyping. With growing sample sizes, only Clinical-BERT performed increasingly better, which also had the best area under the curve of 0.741 in lesion classification at training using 2000 sentences. The PubMed-BERT achieved the best testing flexible F1 score of 0.857 in location only classification, and 0.846 and 0.657 in subtyping newT2 and enhanceT1, respectively. Conclusion: Based on a small sample size, our methods demonstrate the potential for extracting critical treatment-related information from free-text radiology reports, especially Clinical-BERT.
科研通智能强力驱动
Strongly Powered by AbleSci AI