作者
Frank M. You,Chunfang Zheng,John Joseph Zagariah Daniel,Pingchuan Li,Bunyamin Taran,Sylvie Cloutier
摘要
Genomic selection (GS) is a core strategy in modern breeding programs, yet the rapid expansion of statistical, machine-learning (ML), and deep-learning (DL) models has made systematic evaluation and practical deployment increasingly challenging. To address these issues, we developed MultiGS, a unified and user-friendly framework that integrates linear, ML, DL, hybrid, and ensemble GS models within a standardized and computationally efficient workflow. MultiGS is implemented through two complementary pipelines: MultiGS-R, a Java/R pipeline implementing 12 statistical and ML models, and MultiGS-P, a Python pipeline integrating 17 models including five linear models, three ML approaches, and nine recently developed DL architectures implemented within the framework. We benchmarked MultiGS using wheat, maize, and flax datasets representing contrasting prediction scenarios. Wheat and maize were evaluated using random training-test splits within the same population, reflecting suitable conditions for assessing model capacity and scalability. Under these scenarios, several DL, hybrid, and ensemble models achieved prediction accuracies comparable to RR-BLUP and consistently exceeded those of GBLUP. In contrast, the flax dataset represented a true across-population prediction scenario with limited training set size and strong population structure. In this challenging context, classical linear models provided stable baselines, while a subset of DL architectures - particularly graph-based models and BLUP-integrated hybrids - demonstrated comparatively improved generalization across populations. Comparisons with previously published DL tools showed that MultiGS models achieved comparable or improved prediction accuracies while requiring lower computational costs, enabling routine retraining and large-scale evaluation. Overall, MultiGS informs, scenario-specific model selection and provides a practical platform for deploying genomic prediction under realistic breeding conditions. The software is freely available on GitHub (https://github.com/AAFC-ORDC-Crop-Bioinfomatics/MultiGS).