HudsonAlpha Institute for Biotechnology Stone Mountain, Georgia
Body of Abstract: Conserved noncoding sequences (CNS) are short (median 50bp) stretches of non protein coding DNA that exhibit sequence conservation at levels much greater than expected under neutral evolution. Therefore, their conservation implies selective pressure for their maintenance. The main hypothesized function of CNS is encoding transcription factor binding site arrays to regulate gene expression. Here, I present a preliminary characterization of CNS across Rosaceae leveraging 10 high-quality long-read genome assemblies. Rosaceae consists of dozens of high-value crops such as apple, pear, cherry, strawberry, and blackberry allowing translatable findings across several horticultural communities. I aligned the entire genome of these 10 assemblies selected to represent crops as well as span different whole-genome duplication histories. By comparing rates of sequence conservation against neutral expectation, I annotated dozens of megabases of previously unannotated genomic regions as CNS. Associations of CNS with protein coding regions revealed enrichments for specific functions as well as effects on gene regulation. Furthermore, I investigated the intersection between CNS variation and haplotypic variation, polyploidy, and potentially domestication.