溶解度
计算机科学
大肠杆菌
大肠杆菌蛋白质类
化学
计算生物学
生物化学
生物
有机化学
基因
作者
Vineet Thumuluri,Hannah-Marie Martiny,José Juan Almagro Armenteros,Jesper Salomon,Henrik Nielsen,Alexander Rosenberg Johansen
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2021-11-23
卷期号:38 (4): 941-946
被引量:72
标识
DOI:10.1093/bioinformatics/btab801
摘要
Abstract Motivation Solubility and expression levels of proteins can be a limiting factor for large-scale studies and industrial production. By determining the solubility and expression directly from the protein sequence, the success rate of wet-lab experiments can be increased. Results In this study, we focus on predicting the solubility and usability for purification of proteins expressed in Escherichia coli directly from the sequence. Our model NetSolP is based on deep learning protein language models called transformers and we show that it achieves state-of-the-art performance and improves extrapolation across datasets. As we find current methods are built on biased datasets, we curate existing datasets by using strict sequence-identity partitioning and ensure that there is minimal bias in the sequences. Availability and implementation The predictor and data are available at https://services.healthtech.dtu.dk/service.php?NetSolP and the open-sourced code is available at https://github.com/tvinet/NetSolP-1.0. Supplementary information Supplementary data are available at Bioinformatics online.
科研通智能强力驱动
Strongly Powered by AbleSci AI