Improved pathogenicity prediction for rare human missense variants

错义突变致病性推论计算机科学人工智能机器学习注释利用极限（数学）特征（语言学）个性化医疗计算生物学数据挖掘遗传学生物突变数学基因微生物学数学分析语言学哲学计算机安全

作者

Yingzhou Wu,Hanqing Liu,Roujia Li,Song Sun,Jochen Weile,Frederick P. Roth

出处

期刊：American Journal of Human Genetics [Elsevier BV]
日期：2021-09-21 卷期号：108 (10): 1891-1906 被引量：105

链接

cell.com nih.gov nih.govdoi.org

标识

DOI：10.1016/j.ajhg.2021.08.012

摘要

The success of personalized genomic medicine depends on our ability to assess the pathogenicity of rare human variants, including the important class of missense variation. There are many challenges in training accurate computational systems, e.g., in finding the balance between quantity, quality, and bias in the variant sets used as training examples and avoiding predictive features that can accentuate the effects of bias. Here, we describe VARITY, which judiciously exploits a larger reservoir of training examples with uncertain accuracy and representativity. To limit circularity and bias, VARITY excludes features informed by variant annotation and protein identity. To provide a rationale for each prediction, we quantified the contribution of features and feature combinations to the pathogenicity inference of each variant. VARITY outperformed all previous computational methods evaluated, identifying at least 10% more pathogenic variants at thresholds achieving high (90% precision) stringency. The success of personalized genomic medicine depends on our ability to assess the pathogenicity of rare human variants, including the important class of missense variation. There are many challenges in training accurate computational systems, e.g., in finding the balance between quantity, quality, and bias in the variant sets used as training examples and avoiding predictive features that can accentuate the effects of bias. Here, we describe VARITY, which judiciously exploits a larger reservoir of training examples with uncertain accuracy and representativity. To limit circularity and bias, VARITY excludes features informed by variant annotation and protein identity. To provide a rationale for each prediction, we quantified the contribution of features and feature combinations to the pathogenicity inference of each variant. VARITY outperformed all previous computational methods evaluated, identifying at least 10% more pathogenic variants at thresholds achieving high (90% precision) stringency.

求助该文献

最长约 10秒，即可获得该文献文件

Improved pathogenicity prediction for rare human missense variants

今日热心研友