工作流程
计算机科学
全基因组关联研究
背景(考古学)
模块化设计
可扩展性
数据科学
多样性(控制论)
计算生物学
数据挖掘
数据库
单核苷酸多态性
生物
遗传学
人工智能
程序设计语言
基因
基因型
古生物学
作者
Adnan Kivanc Corut,Jason G. Wallace
标识
DOI:10.1101/2023.07.10.548365
摘要
Abstract Genome-wide association studies (GWAS) have been widely used to identify genetic variation associated with complex traits. Despite its success and popularity, the traditional GWAS approach comes with a variety of limitations. For this reason, newer methods for GWAS have been developed, including the use of pan-genomes instead of a reference genome and the utilization of markers beyond single-nucleotide polymorphisms, such as structural variations and k-mers. The k-mers based GWAS approach has especially gained attention from researchers in recent years. However, these new methodologies can be complicated and challenging to implement. Here we present kGWASflow, a modular, user-friendly, and scalable workflow to perform GWAS using k-mers. We adopted an existing kmersGWAS method into an easier and more accessible workflow using management tools like Snakemake and Conda and eliminated the challenges caused by missing dependencies and version conflicts. kGWASflow increases the reproducibility of the kmersGWAS method by automating each step with Snakemake and using containerization tools like Docker. The workflow encompasses supplemental components such as quality control, read-trimming procedures, and generating summary statistics. kGWASflow also offers post-GWAS analysis options to identify the genomic location and context of trait-associated k-mers. kGWASflow can be applied to any organism and requires minimal programming skills. kGWASflow is freely available on GitHub ( https://github.com/akcorut/kGWASflow ) and Bioconda ( https://anaconda.org/bioconda/kgwasflow ).
科研通智能强力驱动
Strongly Powered by AbleSci AI