基因组
计算机科学
计算生物学
染色体
序列(生物学)
遗传学
生物
算法
基因
作者
Shanshan Gao,Quang Tran,Vinhthuy Phan
出处
期刊:EPiC series in computing
日期:2019-03-18
卷期号:60: 65-55
被引量:1
摘要
Sequencing depth, which refers to the expected coverage of nucleotides by reads, is computed based on the assumption that reads are synthesized uniformly across chromosomes. In reality, read coverage across genomes is not uniform. Although a coverage of 10x, for example, means a nucleotide is covered 10 times on average, in certain parts of a genome, nucleotides are covered much more or much less. One factor that influences coverage is the ability of a read aligner to align reads to genomes. If a part of a genome is complex, e.g. having many repeats, aligners might have troubles aligning reads to that region, resulting in low coverage. We introduce a systematic approach to predict the effective coverage of genomes by short-read aligners. The effective coverage of a chromosome is defined as the actual amount of bases covered by reads. We show that the quantity is highly correlated with repeat complexity of genomes. Specifically, we show that the more repeats a genome has, the less it is covered by short reads. We demonstrated this strong correlation with five popular short- read aligners in three species: Homo sapiens, Zea mays, and Glycine max. Additionally, we show that compared to other measure of sequence complexity, repeat complexity is most appropriate. This works makes it possible to predict effective coverage of genomes at a given sequencing depth.
科研通智能强力驱动
Strongly Powered by AbleSci AI