全基因组关联研究
比例(比率)
计算机科学
机器学习
人工智能
计算生物学
基因
生物
遗传学
地理
单核苷酸多态性
地图学
基因型
作者
Jingtian Wang,Huiling Chen,Guoping Shu,Miaomiao Zhao,Anlong Zheng,Xingzhi Chang,Guiqi Li,Yibo Wang,Yuan‐Ming Zhang
标识
DOI:10.1016/j.xplc.2025.101385
摘要
Genetic dissection and breeding by design for polygenic traits remain challenges. To meet these challenges, it is important to identify as many genes as possible and key genes. Therefore, here, a genome-wide scanning plus machine learning framework was developed and integrated with advanced computational techniques to propose a novel algorithm called Fast3VmrMLM to mine more and key genes for polygenic traits in the era of big data and artificial intelligence. The algorithm was also extended to identify haplotype (Fast3VmrMLM-Hap) and molecular (Fast3VmrMLM-mQTL) variants. In simulation studies, Fast3VmrMLM outperformed existing methods in detecting dominant, small and rare variants, taking 3.30 and 5.43 hours (20 threads) to analyze the 18K rice and UK biobank-scale datasets, respectively. Fast3VmrMLM identified more known (211) and candidate (384) genes for 14 traits in the 18K rice dataset than FarmCPU (100 known genes), while Fast3VmrMLM identified 26 known and 24 candidate genes for 7 yield-related traits in a maize NC II design and Fast3VmrMLM-mQTL identified two known soybean genes around structural variants. We demonstrated that the new two-step framework outperformed genome-wide scanning alone. In breeding by design, a genetic network constructed by machine learning using all known/candidate genes in this study identified 21 key genes for rice yield-related traits, while all the associated markers gave high prediction accuracies in rice (0.7443) and maize (0.8492) and excellent hybrid combinations. A new breeding by design strategy based on the identified key genes was also proposed. This study provides an excellent method for gene mining and breeding by design.
科研通智能强力驱动
Strongly Powered by AbleSci AI