插补(统计学)
全基因组关联研究
背景(考古学)
数据质量
遗传关联
样本量测定
人口
生物
质量保证
数据挖掘
数据科学
计算机科学
遗传学
统计
缺少数据
医学
数学
工程类
基因
单核苷酸多态性
机器学习
古生物学
公制(单位)
病理
基因型
外部质量评估
环境卫生
运营管理
作者
Van Q. Truong,Jakob A. Woerner,Tess Cherlin,Yuki Bradford,Anastasia Lucas,Chelsea C. Okeh,Manu Shivakumar,Daniel Hui,Rachit Kumar,Milton Pividori,Susan Jones,Abigail C. Bossa,Stephen D. Turner,Marylyn D. Ritchie,Shefali S. Verma
摘要
Abstract Genome‐wide association studies (GWAS) are being conducted at an unprecedented rate in population‐based cohorts and have increased our understanding of the pathophysiology of many complex diseases. Regardless of the context, the practical utility of this information ultimately depends upon the quality of the data used for statistical analyses. Quality control (QC) procedures for GWAS are constantly evolving. Here, we enumerate some of the challenges in QC of genotyped GWAS data and describe the approaches involving genotype imputation of a sample dataset along with post‐imputation quality assurance, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of the GWAS data (genotyped and imputed), including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We provide detailed guidelines along with a sample dataset to suggest current best practices and discuss areas of ongoing and future research. © 2022 Wiley Periodicals LLC.
科研通智能强力驱动
Strongly Powered by AbleSci AI