Assistant Professor Western Carolina University Sylva, North Carolina
Body of Abstract: High-quality genome reference sequences have been developed for Gossypium species. A striking similarity between the sequenced species is that they all share about ~15% of total annotated genes that are currently indicated as “protein of unknown function”. Sequence and structure analysis tools and databases have created the opportunity for students to perform initial characterization of functional features in these uncharacterized proteins. Additional insight can be gained by targeting multiple members within a protein family and comparing across whole families for family-specific features as well as unique diversifications within subfamilies. As part of a CURE course, students each studied subfamily members of an uncharacterized protein family from the cotton genome and findings were compared across families. Family and subfamily groups were determined by PhyloGenes PANTHER trees, sequence analyses predicted subfamily domain architectures and motifs, AlphaFold structure analyses compared surface and binding site features and transcriptomics data was mined for expression profiles. Although protein families shared sequence patterns, there were distinct differences between subfamilies. Beyond diversified motifs, additional domains were found in subfamilies of a number of families such as the RER and EPRL families. Structure analyses showed slight variants in conserved surfaces and were helpful in resolving inconsistent transmembrane predictions. Some differences between subfamilies suggest the need for further division of domain designations.