生物
原始数据
DNA测序
计算生物学
计算机科学
集合(抽象数据类型)
数据集
数据质量
基因组学
吞吐量
数据挖掘
基因组
遗传学
人工智能
基因
运营管理
经济
公制(单位)
电信
程序设计语言
无线
出处
期刊:Heredity
[Springer Nature]
日期:2016-10-19
卷期号:118 (2): 111-124
被引量:98
摘要
Sequencing has revolutionized biology by permitting the analysis of genomic variation at an unprecedented resolution. High-throughput sequencing is fast and inexpensive, making it accessible for a wide range of research topics. However, the produced data contain subtle but complex types of errors, biases and uncertainties that impose several statistical and computational challenges to the reliable detection of variants. To tap the full potential of high-throughput sequencing, a thorough understanding of the data produced as well as the available methodologies is required. Here, I review several commonly used methods for generating and processing next-generation resequencing data, discuss the influence of errors and biases together with their resulting implications for downstream analyses and provide general guidelines and recommendations for producing high-quality single-nucleotide polymorphism data sets from raw reads by highlighting several sophisticated reference-based methods representing the current state of the art.
科研通智能强力驱动
Strongly Powered by AbleSci AI