Associate Professor University of Georgia Athens, Georgia
Body of Abstract: Despite its success, Genome-wide association studies (GWAS) come with variety of limitations. Newer methods for GWAS have been developed including the use of markers beyond single-nucleotide polymorphisms, such as pan-genomes, structural variation, and k-mers. These new methodologies can be complicated and challenging to implement. As such, we aimed to develop a modular, user-friendly, and scalable workflow to perform GWAS using k-mers.
We adapted the method of Voichek et al. (2020) into an easier and more accessible workflow using management tools like Snakemake and Conda, and eliminated the challenges caused by missing dependencies and version conflicts. Several additional components were also added to the workflow, including trimming, diagnostic analyses, and results summaries. Post-GWAS analysis options include mapping k-mers to a reference genome, finding the source reads of k-mers, assembling source reads of k-mers and mapping them to a reference genome.
To demonstrate our workflow, we reproduced the results from a simple E. coli antibiotic resistance analysis of 241 strains (Earle et al. 2016) and a more complex analysis of across 261 inbred maize lines (Flint-Garcia et al. 2005) for the phenotypes of kernel color, leaf angle, seed oil, cob color, and flowering time.
Modular and customizable kGWASflow highly increases the accessibility of k-mers based GWAS while also allowing for additional expansion and improvement in the future. Future expansions could include adding other k-mers based GWAS methods, such as using k-mer occurrence counts. Finding sex determining regions in a genome using kmersGWAS method can be another way to make use of kGWASflow.