Real-world data medical knowledge graph: construction and applications

计算机科学 聚类分析 概率逻辑 图形 数据挖掘 嵌入 聚类系数 排名(信息检索) 医学诊断 情报检索 机器学习 人工智能 理论计算机科学 医学 病理
作者
Linfeng Li,Peng Wang,Jun Yan,Yao Wang,Simin Li,Jinpeng Jiang,Zhe Sun,Buzhou Tang,Tsung‐Hui Chang,Shenghui Wang,Yuting Liu
出处
期刊:Artificial Intelligence in Medicine [Elsevier BV]
卷期号:103: 101817-101817 被引量:196
标识
DOI:10.1016/j.artmed.2020.101817
摘要

Medical knowledge graph (KG) is attracting attention from both academic and healthcare industry due to its power in intelligent healthcare applications. In this paper, we introduce a systematic approach to build medical KG from electronic medical records (EMRs) with evaluation by both technical experiments and end to end application examples. The original data set contains 16,217,270 de-identified clinical visit data of 3,767,198 patients. The KG construction procedure includes 8 steps, which are data preparation, entity recognition, entity normalization, relation extraction, property calculation, graph cleaning, related-entity ranking, and graph embedding respectively. We propose a novel quadruplet structure to represent medical knowledge instead of the classical triplet in KG. A novel related-entity ranking function considering probability, specificity and reliability (PSR) is proposed. Besides, probabilistic translation on hyperplanes (PrTransH) algorithm is used to learn graph embedding for the generated KG. A medical KG with 9 entity types including disease, symptom, etc. was established, which contains 22,508 entities and 579,094 quadruplets. Compared with term frequency - inverse document frequency (TF/IDF) method, the normalized discounted cumulative gain ([email protected]) increased from 0.799 to 0.906 with the proposed ranking function. The embedding representation for all entities and relations were learned, which are proven to be effective using disease clustering. The established systematic procedure can efficiently construct a high-quality medical KG from large-scale EMRs. The proposed ranking function PSR achieves the best performance under all relations, and the disease clustering result validates the efficacy of the learned embedding vector as entity's semantic representation. Moreover, the obtained KG finds many successful applications due to its statistics-based quadruplet. where Ncomin is a minimum co-occurrence number and R is the basic reliability value. The reliability value can measure how reliable is the relationship between Si and Oij. The reason for the definition is the higher value of Nco(Si, Oij), the relationship is more reliable. However, the reliability values of the two relationships should not have a big difference if both of their co-occurrence numbers are very big. In our study, we finally set Ncomin = 10 and R = 1 after some experiments. For instance, if co-occurrence numbers of three relationships are 1, 100 and 10000, their reliability values are 1, 2.96 and 5 respectively.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
世间安得双全法完成签到,获得积分0
5秒前
swordshine完成签到,获得积分10
14秒前
熊熊出击完成签到 ,获得积分10
16秒前
lql完成签到 ,获得积分10
22秒前
完美世界应助科研通管家采纳,获得10
32秒前
cdercder应助科研通管家采纳,获得10
32秒前
cdercder应助科研通管家采纳,获得10
32秒前
cdercder应助科研通管家采纳,获得10
32秒前
俊逸的盛男完成签到 ,获得积分10
40秒前
秀丽的初柔完成签到 ,获得积分10
43秒前
luoman5656完成签到,获得积分10
44秒前
阿怪完成签到 ,获得积分10
44秒前
aiyawy完成签到 ,获得积分10
44秒前
白枫完成签到 ,获得积分10
45秒前
tutu完成签到 ,获得积分10
47秒前
斯文的芹菜完成签到 ,获得积分10
51秒前
单薄碧灵完成签到 ,获得积分10
55秒前
superspace完成签到 ,获得积分10
1分钟前
qw1完成签到,获得积分10
1分钟前
nicolaslcq完成签到,获得积分10
1分钟前
勤恳的书文完成签到 ,获得积分10
1分钟前
芜湖起飞完成签到 ,获得积分10
1分钟前
YZzzJ完成签到 ,获得积分10
1分钟前
缓慢的甜瓜完成签到,获得积分10
1分钟前
猪猪完成签到 ,获得积分10
1分钟前
1分钟前
涛1完成签到 ,获得积分10
2分钟前
分析完成签到 ,获得积分10
2分钟前
wanghui完成签到 ,获得积分10
2分钟前
科研不掉头发完成签到,获得积分10
2分钟前
小鱼女侠完成签到 ,获得积分10
2分钟前
科研通AI5应助科研通管家采纳,获得10
2分钟前
isedu完成签到,获得积分10
2分钟前
沐雨篱边完成签到 ,获得积分10
2分钟前
2分钟前
北斗HH完成签到,获得积分10
2分钟前
小齐完成签到 ,获得积分10
2分钟前
3分钟前
dery完成签到,获得积分10
3分钟前
lhl完成签到,获得积分10
3分钟前
高分求助中
【此为提示信息,请勿应助】请按要求发布求助,避免被关 20000
Continuum Thermodynamics and Material Modelling 2000
Encyclopedia of Geology (2nd Edition) 2000
105th Edition CRC Handbook of Chemistry and Physics 1600
Maneuvering of a Damaged Navy Combatant 650
Периодизация спортивной тренировки. Общая теория и её практическое применение 310
Mixing the elements of mass customisation 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3779209
求助须知:如何正确求助?哪些是违规求助? 3324802
关于积分的说明 10219909
捐赠科研通 3039903
什么是DOI,文献DOI怎么找? 1668514
邀请新用户注册赠送积分活动 798702
科研通“疑难数据库(出版商)”最低求助积分说明 758503