参考基因组
生物
基因组
遗传学
基因组学
比较基因组学
人口
计算生物学
进化生物学
基因
人口学
社会学
作者
Sheikh Bilal Ahmad,Ying Su,Yani Hao,Tayyaba Razzaq,R. Arshad,Yi Zhang,Yingchun Zhang,Xingyi Wang,Guizhou Huang,Xiangnian Su,Ting Hou,Chaochao Li,Xuanwen Yang,C Li,Zhenzhou Chu,Q. Wang,Yu Zhang,Zhongxin Jin,Qi Xu,Xiaodong Xu
摘要
Abstract Most genomic studies start by mapping sequencing data to a reference genome. The quality of reference genome assembly, genetic relatedness to the studied population, and the mapping method employed directly impact variant calling accuracy and subsequent genomic analyses, introducing reference bias and resulting in erroneous conclusions. However, the impacts of reference bias have gained limited attention. This study compared population genomic analyses using four different reference genomes of mango (Mangifera indica), including the two haploid assemblies of haplotype-resolved telomere-to-telomere (T2T) genome assembly, a pangenome, and an older version of the reference genome available on NCBI. The choice of reference genome dramatically impacted the mapping efficiency and resulted in notable differences in calling the genetic variants, particularly structural variations (SVs). Phylogenetic analysis was more sensitive to the reference genome compared to genetic differentiation. Population genomic analyses of artificial selection in domestication and SV hotspot regions varied across reference genomes. Notably, the gene enrichment analyses showed significant differences in the top enriched biological processes depending on the reference genome used. Overall, the mango pangenome outperformed the other reference genomes across various metrics, followed by T2T reference genomes, as they captured greater diversity and effectively reduced reference bias. Our findings highlight the role of the mango pangenome in reducing reference bias and underscore the critical role of reference genome selection, suggesting that it is one of the most important factors in population genomic studies.
科研通智能强力驱动
Strongly Powered by AbleSci AI