Manipulating large-scale Arabidopsis microarray expression data: identifying dominant expression patterns and biological process enrichment.
ABSTRACT A series of large-scale Arabidopsis thaliana microarray expression experiments profiling genome-wide expression across different developmental stages, cell types, and environmental conditions have resulted in tremendous amounts of gene expression data. This gene expression is the output of complex transcriptional regulatory networks and provides a starting point for identifying the dominant transcriptional regulatory modules acting within the plant. Highly co-expressed groups of genes are likely to be regulated by similar transcription factors. Therefore, finding these co-expressed groups can reduce the dimensionality of complex expression data into a set of dominant transcriptional regulatory modules. Determining the biological significance of these patterns is an informatics challenge and has required the development of new methods. Using these new methods we can begin to understand the biological information contained within large-scale expression data sets.
- SourceAvailable from: Philip N Benfey[show abstract] [hide abstract]
ABSTRACT: Because proteins are the major functional components of cells, knowledge of their cellular localization is crucial to gaining an understanding of the biology of multicellular organisms. We have generated a protein expression map of the Arabidopsis root providing the identity and cell type-specific localization of nearly 2,000 proteins. Grouping proteins into functional categories revealed unique cellular functions and identified cell type-specific biomarkers. Cellular colocalization provided support for numerous protein-protein interactions. With a binary comparison, we found that RNA and protein expression profiles are weakly correlated. We then performed peak integration at cell type-specific resolution and found an improved correlation with transcriptome data using continuous values. We performed GeLC-MS/MS (in-gel tryptic digestion followed by liquid chromatography-tandem mass spectrometry) proteomic experiments on mutants with ectopic and no root hairs, providing complementary proteomic data. Finally, among our root hair-specific proteins we identified two unique regulators of root hair development.Proceedings of the National Academy of Sciences 03/2012; 109(18):6811-8. · 9.74 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Plants need to continuously adjust their transcriptome in response to various stresses that lead to inhibition of photosynthesis and the deprivation of cellular energy. This adjustment is triggered in part by a coordinated re-programming of the energy-associated transcriptome to slow down photosynthesis and activate other energy-promoting gene networks. Therefore, understanding the stress-related transcriptional networks of genes belonging to energy-associated pathways is of major importance for engineering stress tolerance. In a bioinformatics approach developed by our group, termed 'gene coordination', we previously divided genes encoding for enzymes and transcription factors in Arabidopsis thaliana into three clusters, displaying altered coordinated transcriptional behaviors in response to multiple biotic and abiotic stresses (Plant Cell, 23, 2011, 1264). Enrichment analysis indicated further that genes controlling energy-associated metabolism operate as a compound network in response to stress. In the present paper, we describe in detail the network association of genes belonging to six central energy-associated pathways in each of these three clusters described in our previous paper. Our results expose extensive stress-associated intra- and inter-pathway interactions between genes from these pathways, indicating that genes encoding proteins involved in energy-associated metabolism are expressed in a highly coordinated manner. We also provide examples showing that this approach can be further utilized to elucidate candidate genes for stress tolerance and functions of isozymes.The Plant Journal 01/2012; 70(6):954-66. · 6.58 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: SignificanceSeeds are complex structures that are comprised of the embryo, endosperm, and seed coat. Despite their importance for food, fiber, and fuel, the cellular processes that characterize different regions of the seed are not known. We profiled gene activity genome-wide in every organ, tissue, and cell type of Arabidopsis seeds from fertilization through maturity. The resulting mRNA datasets provide unique insights into the cellular processes that occur in understudied seed regions, revealing unexpected overlaps in the functional identities of seed regions and enabling predictions of gene regulatory networks. This dataset is an essential resource for studies of seed biology.Proceedings of the National Academy of Sciences 01/2013; · 9.74 Impact Factor