自然语言处理
鉴定(生物学)
人工智能
计算机科学
生物
植物
作者
Ahmad Mortadi,Waleed Nazih,Mohamed I. Eldesouki,Yasser Hifny
出处
期刊:ACM Transactions on Asian and Low-Resource Language Information Processing
日期:2025-03-16
摘要
Medical discharge summaries are vital documents in healthcare, often containing Personally Identifiable Information (PII), raising concerns regarding privacy and regulatory compliance. This paper proposes a cutting-edge approach that utilizes intelligent data de-identification to address this challenge. This paper employs Natural Language Processing (NLP) techniques such as Named Entity Recognition (NER), a hybrid approach that integrates Machine Learning (ML) models, Regular Expressions (REGEX)-based recognizers, and extensive lists of names and addresses. The proposed method focuses on achieving a delicate balance between extracting valuable insights from data and safeguarding sensitive information. The evaluation against benchmarks demonstrates significant improvements in de-identification performance, particularly in discharge summaries. We present findings from our system’s evaluation of synthesized discharge summaries, the OntoNotes dataset, and the CoNLL-2003 dataset, demonstrating its effectiveness in anonymizing diverse medical text sources.
科研通智能强力驱动
Strongly Powered by AbleSci AI