插补(统计学)
联营
基因分型
计算机科学
数据挖掘
全基因组关联研究
计算生物学
统计
基因型
生物
单核苷酸多态性
缺少数据
遗传学
机器学习
人工智能
数学
基因
作者
Camille Clouard,Kristiina Ausmees,Carl Nettelblad
标识
DOI:10.1186/s12859-022-04974-7
摘要
Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented.
科研通智能强力驱动
Strongly Powered by AbleSci AI