蛋白酵素
生物信息学
蛋白酶
计算生物学
计算机科学
蛋白质工程
人工智能
机器学习
背景(考古学)
序列(生物学)
定向进化
生物信息学
生物
遗传学
生物化学
突变体
酶
基因
古生物学
作者
L. F. Huber,Tim Kucera,Simon Höllerer,Karsten Borgwardt,Sven Panke,Markus Jeschek
标识
DOI:10.1038/s41467-025-60622-7
摘要
Abstract Protein engineering has recently seen tremendous transformation due to machine learning (ML) tools that predict structure from sequence at unprecedented precision. Predicting catalytic activity, however, remains challenging, restricting our capabilities to design protein sequences with desired catalytic function in silico. This predicament is mainly rooted in a lack of experimental methods capable of recording sequence-activity data in quantities sufficient for data-intensive ML techniques, and the inefficiency of searches in the enormous sequence spaces inherent to proteins. Herein, we address both limitations in the context of engineering proteases with tailored substrate specificity. We introduce a DNA recorder for deep specificity profiling of proteases in Escherichia coli as we demonstrate testing 29,716 candidate proteases against up to 134 substrates in parallel. The resulting sequence-activity data on approximately 600,000 protease-substrate pairs does not only reveal key sequence determinants governing protease specificity, but allows to build a data-efficient deep learning model that accurately predicts protease sequences with desired on- and off-target activities. Moreover, we present epistasis-aware training set design as a generalizable strategy to streamline searches within enormous sequence spaces, which strongly increases model accuracy at given experimental efforts and is thus likely to have implications for protein engineering far beyond proteases.
科研通智能强力驱动
Strongly Powered by AbleSci AI