生物信息学
计算机科学
训练集
机器学习
人工智能
酶
变压器
实验数据
深度学习
计算生物学
化学
生物化学
生物
数学
物理
电压
统计
基因
量子力学
作者
Alexander Kroll,Sahasra Ranjan,Martin K. M. Engqvist,Martin J. Lercher
标识
DOI:10.1038/s41467-023-38347-2
摘要
Abstract For most proteins annotated as enzymes, it is unknown which primary and/or secondary reactions they catalyze. Experimental characterizations of potential substrates are time-consuming and costly. Machine learning predictions could provide an efficient alternative, but are hampered by a lack of information regarding enzyme non-substrates, as available training data comprises mainly positive examples. Here, we present ESP, a general machine-learning model for the prediction of enzyme-substrate pairs with an accuracy of over 91% on independent and diverse test data. ESP can be applied successfully across widely different enzymes and a broad range of metabolites included in the training data, outperforming models designed for individual, well-studied enzyme families. ESP represents enzymes through a modified transformer model, and is trained on data augmented with randomly sampled small molecules assigned as non-substrates. By facilitating easy in silico testing of potential substrates, the ESP web server may support both basic and applied science.
科研通智能强力驱动
Strongly Powered by AbleSci AI