注释
管道(软件)
基因组
数据库
计算机科学
序列数据库
基因注释
数据挖掘
计算生物学
生物
基因组
人工智能
基因
遗传学
程序设计语言
作者
Dai Chunxiao,Yuanyuan Qu,Weize Wu,Shuzhen Li,Zhuo Chen,Shengyang Lian,Jiawei Jing
出处
期刊:Water Research
[Elsevier BV]
日期:2023-03-02
卷期号:235: 119814-119814
被引量:18
标识
DOI:10.1016/j.watres.2023.119814
摘要
Quorum sensing (QS) has attracted great attention due to its important role in the bacterial interactions and its relevance to water management. With the development of high-throughput sequencing technology, a specific database for QS-related sequence annotation is urgently needed. Here, Hidden Markov Model (HMM) profiles for 38 types of QS-related proteins were built using a total of 4024 collected seed sequences. Based on both homolog search and keywords confirmation against the non-redundant database, we established a QS-related protein (QSP) database, that includes 809,721 protein sequences and 186,133 nucleotide sequences, downloaded available at: https://github.com/chunxiao-dcx/QSP. The entries were classified into 38 types and 315 subtypes among 91 bacterial phyla. Furthermore, an automatic annotation pipeline, named QSAP, was developed for rapid annotation, classification and abundance quantification of QSP-like sequences from sequencing data. This pipeline provided the two homolog alignment strategies offered by Diamond (Blastp) or HMMER (Hmmscan), as well as a data cleansing function for a subset or union set of the hits. The pipeline was tested using 14 metagenomic samples from various water environments, including activated sludge, deep-sea sediments, estuary water, and reservoir water. The QSAP pipeline is freely available for academic use in the code repository at: https://github.com/chunxiao-dcx/QSAP. The establishment of this database and pipeline, provides a useful tool for QS-related sequence annotation in a wide range of projects, and will increase our understanding of QS communication in aquatic environments.
科研通智能强力驱动
Strongly Powered by AbleSci AI