Graduate Research Associate, PhD Candidate University of Arizona Tucson, Arizona
Body of Abstract: Quantifying transcripts is crucial for understanding how plants dynamically respond to stress, which is essential for future crop improvement efforts. RNA-seq is a genomic approach for quantifying messenger RNA transcripts. However, the time required to collect, process, and analyze RNA-seq samples hinders the large-scale application of this method in field trials. Various sensor technologies are being deployed to study plant phenotypes in field trials using remote and proximal sensing platforms. One such technology is hyperspectral data, which has been used to predict important agronomic traits in several crop species. Despite its potential, no study has attempted to directly predict gene expression using hyperspectral data. Developing predictive models for gene expression using hyperspectral data could alleviate the bottleneck in analyzing gene expression in large field trials and serve as a valuable selection index for breeding lines. In this study, we present a novel hyperspectral approach for predicting gene expression in cotton. We developed partial least squares regression models to predict the expression of individual transcripts, resulting in 16,853 models. Three training phases were conducted to assess the effect of irrigation treatment on prediction accuracy, including the use of single irrigation treatments (either well-watered or water-limited) and combined input data. A total of 50,559 models were trained across the three phases, with the combined models performing best. The top 30% of the combined models accurately predicted gene expression in a variety of biological processes and pathways, including cell cycle and growth, morphogenesis and differentiation, signaling pathways, metabolic processes, and biosynthesis of secondary metabolites.