序列(生物学)
成对比较
背景(考古学)
先验概率
潜变量
概率逻辑
计算机科学
潜变量模型
贝叶斯定理
生成模型
生成语法
贝叶斯概率
变化(天文学)
计算生物学
人工智能
生物
机器学习
遗传学
物理
古生物学
天体物理学
作者
Adam J. Riesselman,John Ingraham,Debora S. Marks
出处
期刊:Nature Methods
[Nature Portfolio]
日期:2018-09-19
卷期号:15 (10): 816-822
被引量:568
标识
DOI:10.1038/s41592-018-0138-4
摘要
The functions of proteins and RNAs are defined by the collective interactions of many residues, and yet most statistical models of biological sequences consider sites nearly independently. Recent approaches have demonstrated benefits of including interactions to capture pairwise covariation, but leave higher-order dependencies out of reach. Here we show how it is possible to capture higher-order, context-dependent constraints in biological sequences via latent variable models with nonlinear dependencies. We found that DeepSequence ( https://github.com/debbiemarkslab/DeepSequence ), a probabilistic model for sequence families, predicted the effects of mutations across a variety of deep mutational scanning experiments substantially better than existing methods based on the same evolutionary data. The model, learned in an unsupervised manner solely on the basis of sequence information, is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space. DeepSequence is an unsupervised deep latent-variable model that predicts the effects of mutations on the basis of evolutionary sequence information.
科研通智能强力驱动
Strongly Powered by AbleSci AI