后缀树
后缀
基因组
计算机科学
树(集合论)
压缩后缀数组
广义后缀树
并行算法
人类基因组
序列(生物学)
理论计算机科学
数据结构
并行计算
生物
数学
遗传学
基因
程序设计语言
哲学
数学分析
语言学
作者
Matteo Comin,Montse Farreras
标识
DOI:10.1089/cmb.2012.0256
摘要
The construction of suffix trees for very long sequences is essential for many applications, and it plays a central role in the bioinformatic domain. With the advent of modern sequencing technologies, biological sequence databases have grown dramatically. Also the methodologies required to analyze these data have become more complex everyday, requiring fast queries to multiple genomes. In this article, we present parallel continuous flow (PCF), a parallel suffix tree construction method that is suitable for very long genomes. We tested our method for the suffix tree construction of the entire human genome, about 3GB. We showed that PCF can scale gracefully as the size of the input genome grows. Our method can work with an efficiency of 90% with 36 processors and 55% with 172 processors. We can index the human genome in 7 minutes using 172 processes.
科研通智能强力驱动
Strongly Powered by AbleSci AI