蛋白质组
数据库搜索引擎
鸟枪蛋白质组学
序列数据库
计算机科学
计算生物学
注释
蛋白质组学
蛋白质测序
鉴定(生物学)
蛋白质基因组学
数据库
生物信息学
数据挖掘
生物
肽序列
基因组学
情报检索
搜索引擎
人工智能
基因组
遗传学
基因
植物
作者
Sung Kyu Robin Park,Titus Jung,Peter Thuy-Boun,Ana Y. Wang,John R. Yates,Dennis W. Wolan
标识
DOI:10.1021/acs.jproteome.8b00722
摘要
We designed a metaproteomic analysis method (ComPIL) to accommodate the ever-increasing number of sequences against which experimental shotgun proteomics spectra could be accurately and rapidly queried. Our objective was to create these large databases for the analysis of complex metasamples with unknown composition, including those derived from human, animal, and environmental microbiomes. The amount of high-throughput sequencing data has substantially increased since our original database was assembled in 2014. Here, we present a rebuild of the ComPIL libraries comprised of updated publicly disseminated sequence data as well as a modified version of the search engine ProLuCID-ComPIL optimized for querying experimental spectra. ComPIL 2.0 consists of 113 million protein records and roughly 4.8 billion unique tryptic peptide sequences and is 2.3 times the size of our original version. We searched a data set collected on a healthy human gut microbiome proteomic sample and compared the results to demonstrate that ComPIL 2.0 showed a substantial increase in the number of unique identified peptides and proteins compared to the first ComPIL version. The high confidence of protein identification and accuracy demonstrated by the use of ComPIL 2.0 may encourage the method's application for large-scale proteomic annotation of complex protein systems.
科研通智能强力驱动
Strongly Powered by AbleSci AI