Boosting(机器学习)
计算机科学
蛋白质工程
机器学习
人工智能
数据挖掘
生物
生物化学
酶
作者
Hoi Yee Chu,John H.C. Fong,Dawn G L Thean,Peng Zhou,Francis Fung,Yuanhua Huang,Alan S.L. Wong
出处
期刊:Cell systems
[Elsevier]
日期:2024-02-01
卷期号:15 (2): 193-203.e6
被引量:1
标识
DOI:10.1016/j.cels.2024.01.002
摘要
A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information.
科研通智能强力驱动
Strongly Powered by AbleSci AI