定向进化
序列空间
序列(生物学)
定向分子进化
作文(语言)
系列(地层学)
蛋白质工程
功能(生物学)
蛋白质测序
化学空间
计算机科学
酶
培训(气象学)
计算生物学
人工智能
生物
生物信息学
遗传学
数学
肽序列
生物化学
基因
地理
突变体
语言学
古生物学
气象学
哲学
巴拿赫空间
纯数学
药物发现
作者
Yutaka Saitô,Misaki Oikawa,Takumi Sato,Hikaru Nakazawa,Tomoyuki Ito,Tomoshi Kameda,Koji Tsuda,Mitsuo Umetsu
标识
DOI:10.1101/2021.08.13.456323
摘要
Abstract Machine learning (ML) is becoming an attractive tool in mutagenesis-based protein engineering because of its ability to design a variant library containing proteins with a desired function. However, it remains unclear how ML guides directed evolution in sequence space depending on the composition of training data. Here, we present a ML-guided directed evolution study of an enzyme to investigate the effects of a known “highly positive” variant (i.e., variant known to have high enzyme activity) in training data. We performed two separate series of ML-guided directed evolution of Sortase A with and without a known highly positive variant called 5M in training data. In each series, two rounds of ML were conducted: variants predicted by the first round were experimentally evaluated, and used as additional training data for the second-round prediction. The improvements in enzyme activity were comparable between the two series, both achieving enzyme activity 2.2–2.5 times higher than 5M. Intriguingly, the sequences of the improved variants were largely different between the two series, indicating that ML guided the directed evolution to the distinct regions of sequence space depending on the presence/absence of the highly positive variant in the training data. This suggests that the sequence diversity of improved variants can be expanded not only by conventional ML using the whole training data, but also by ML using a subset of the training data even when it lacks highly positive variants. In summary, this study demonstrates the importance of regulating the composition of training data in ML-guided directed evolution.
科研通智能强力驱动
Strongly Powered by AbleSci AI