德布鲁因图
德布鲁恩序列
压缩后缀数组
基因组
广义后缀树
计算机科学
后缀
后缀数组
基因组学
时间复杂性
枚举
图形
系统基因组学
参考基因组
导线
数据结构
后缀树
理论计算机科学
算法
计算生物学
生物
组合数学
系统发育树
遗传学
数学
基因
语言学
大地测量学
程序设计语言
哲学
克莱德
地理
作者
Shoshana Marcus,Hayan Lee,Michael C. Schatz
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2014-11-13
卷期号:30 (24): 3476-3483
被引量:131
标识
DOI:10.1093/bioinformatics/btu756
摘要
We explore deep topological relationships between suffix trees and compressed de Bruijn graphs and introduce an algorithm, splitMEM, that directly constructs the compressed de Bruijn graph in time and space linear to the total number of genomes for a given maximum genome size. We introduce suffix skips to traverse several suffix links simultaneously and use them to efficiently decompose maximal exact matches into graph nodes. We demonstrate the utility of splitMEM by analyzing the nine-strain pan-genome of Bacillus anthracis and up to 62 strains of Escherichia coli, revealing their core-genome properties.
科研通智能强力驱动
Strongly Powered by AbleSci AI