计算机科学
质量(理念)
数据科学
数据质量
范围(计算机科学)
数据挖掘
工程类
运营管理
认识论
哲学
公制(单位)
程序设计语言
作者
Maximilian Sprang,Matteo Krüger,Miguel A. Andrade-Navarro,Jean-Fred Fontaine
出处
期刊:Life science alliance
[Life Science Alliance]
日期:2021-08-30
卷期号:4 (11): e202101113-e202101113
被引量:1
标识
DOI:10.26508/lsa.202101113
摘要
More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to know if quality features are relevant in all experimental conditions. Therefore, the NGS community would highly benefit from condition-specific data-driven guidelines derived from many publicly available experiments, which reflect routinely generated NGS data. In this work, we have characterized well-known quality guidelines and related features in big datasets and concluded that they are too limited for assessing the quality of a given NGS file accurately. Therefore, we present new data-driven guidelines derived from the statistical analysis of many public datasets using quality features calculated by common bioinformatics tools. Thanks to this approach, we confirm the high relevance of genome mapping statistics to assess the quality of the data, and we demonstrate the limited scope of some quality features that are not relevant in all conditions. Our guidelines are available at https://cbdm.uni-mainz.de/ngs-guidelines .
科研通智能强力驱动
Strongly Powered by AbleSci AI