健身景观
序列(生物学)
序列空间
代表(政治)
蛋白质设计
高斯过程
计算机科学
蛋白质工程
适应度函数
贝叶斯概率
定向进化
蛋白质测序
人工智能
机器学习
热稳定性
高斯分布
计算生物学
数学
生物
蛋白质结构
遗传学
肽序列
遗传算法
化学
基因
纯数学
政治
计算化学
人口学
突变体
政治学
生物化学
社会学
酶
巴拿赫空间
法学
人口
作者
Philip A. Romero,Andreas Krause,Frances H. Arnold
标识
DOI:10.1073/pnas.1215251110
摘要
Knowing how protein sequence maps to function (the “fitness landscape”) is critical for understanding protein evolution as well as for engineering proteins with new and useful properties. We demonstrate that the protein fitness landscape can be inferred from experimental data, using Gaussian processes, a Bayesian learning technique. Gaussian process landscapes can model various protein sequence properties, including functional status, thermostability, enzyme activity, and ligand binding affinity. Trained on experimental data, these models achieve unrivaled quantitative accuracy. Furthermore, the explicit representation of model uncertainty allows for efficient searches through the vast space of possible sequences. We develop and test two protein sequence design algorithms motivated by Bayesian decision theory. The first one identifies small sets of sequences that are informative about the landscape; the second one identifies optimized sequences by iteratively improving the Gaussian process model in regions of the landscape that are predicted to be optimized. We demonstrate the ability of Gaussian processes to guide the search through protein sequence space by designing, constructing, and testing chimeric cytochrome P450s. These algorithms allowed us to engineer active P450 enzymes that are more thermostable than any previously made by chimeragenesis, rational design, or directed evolution.
科研通智能强力驱动
Strongly Powered by AbleSci AI