[Show abstract][Hide abstract] ABSTRACT: Alfalfa (Medicago sativa L.) is the primary forage legume crop species in the United States and plays essential economic and ecological roles in agricultural systems across the country. Modern alfalfa is the result of hybridization between tetraploid M. sativa ssp. sativa and M. sativa ssp. falcata. Due to its large and complex genome, there are few genomic resources available for alfalfa improvement.
A de novo transcriptome assembly from two alfalfa subspecies, M. sativa ssp. sativa (B47) and M. sativa ssp. falcata (F56) was developed using Illumina RNA-seq technology. Transcripts from roots, nitrogen-fixing root nodules, leaves, flowers, elongating stem internodes, and post-elongation stem internodes were assembled into the Medicago sativa Gene Index 1.2 (MSGI 1.2) representing 112,626 unique transcript sequences. Nodule-specific and transcripts involved in cell wall biosynthesis were identified. Statistical analyses identified 20,447 transcripts differentially expressed between the two subspecies. Pair-wise comparisons of each tissue combination identified 58,932 sequences differentially expressed in B47 and 69,143 sequences differentially expressed in F56. Comparing transcript abundance in floral tissues of B47 and F56 identified expression differences in sequences involved in anthocyanin and carotenoid synthesis, which determine flower pigmentation. Single nucleotide polymorphisms (SNPs) unique to each M. sativa subspecies (110,241) were identified.
The Medicago sativa Gene Index 1.2 increases the expressed sequence data available for alfalfa by ninefold and can be expanded as additional experiments are performed. The MSGI 1.2 transcriptome sequences, annotations, expression profiles, and SNPs were assembled into the Alfalfa Gene Index and Expression Database (AGED) at http://plantgrn.noble.org/AGED/ , a publicly available genomic resource for alfalfa improvement and legume research.
[Show abstract][Hide abstract] ABSTRACT: Background
Common bean (Phaseolus vulgaris) is grown throughout the world and comprises roughly 50% of the grain legumes consumed worldwide. Despite this, genetic resources for common beans have been lacking. Next generation sequencing, has facilitated our investigation of the gene expression profiles associated with biologically important traits in common bean. An increased understanding of gene expression in common bean will improve our understanding of gene expression patterns in other legume species.
Combining recently developed genomic resources for Phaseolus vulgaris, including predicted gene calls, with RNA-Seq technology, we measured the gene expression patterns from 24 samples collected from seven tissues at developmentally important stages and from three nitrogen treatments. Gene expression patterns throughout the plant were analyzed to better understand changes due to nodulation, seed development, and nitrogen utilization. We have identified 11,010 genes differentially expressed with a fold change ≥ 2 and a P-value < 0.05 between different tissues at the same time point, 15,752 genes differentially expressed within a tissue due to changes in development, and 2,315 genes expressed only in a single tissue. These analyses identified 2,970 genes with expression patterns that appear to be directly dependent on the source of available nitrogen. Finally, we have assembled this data in a publicly available database, The Phaseolus vulgaris Gene Expression Atlas (Pv GEA), http://plantgrn.noble.org/PvGEA/ . Using the website, researchers can query gene expression profiles of their gene of interest, search for genes expressed in different tissues, or download the dataset in a tabular form.
These data provide the basis for a gene expression atlas, which will facilitate functional genomic studies in common bean. Analysis of this dataset has identified genes important in regulating seed composition and has increased our understanding of nodulation and impact of the nitrogen source on assimilation and distribution throughout the plant.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-866) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract] ABSTRACT: Analysis of genome-scale gene networks (GNs) using large-scale gene expression data provides unprecedented opportunities to uncover gene interactions and regulatory networks involved in various biological processes and developmental programs, leading to accelerated discovery of novel knowledge of various biological processes, pathways and systems. The widely used context likelihood of relatedness (CLR) method based on the mutual information (MI) for scoring the similarity of gene pairs is one of the accurate methods currently available for inferring GNs. However, the MI-based reverse engineering method can achieve satisfactory performance only when sample size exceeds one hundred. This in turn limits their applications for GN construction from expression data set with small sample size. We developed a high performance web server, DeGNServer, to reverse engineering and decipher genome-scale networks. It extended the CLR method by integration of different correlation methods that are suitable for analyzing data sets ranging from moderate to large scale such as expression profiles with tens to hundreds of microarray hybridizations, and implemented all analysis algorithms using parallel computing techniques to infer gene-gene association at extraordinary speed. In addition, we integrated the SNBuilder and GeNa algorithms for subnetwork extraction and functional module discovery. DeGNServer is publicly and freely available online.
BioMed Research International 11/2013; 2013:856325. DOI:10.1155/2013/856325 · 2.71 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The accurate construction and interpretation of gene association networks (GANs) is challenging, but crucial, to the understanding of gene function, interaction and cellular behavior at the genome level. Most current state-of-the-art computational methods for genome-wide GAN reconstruction require high-performance computational resources. However, even high-performance computing cannot fully address the complexity involved with constructing GANs from very large-scale expression profile datasets, especially for the organisms with medium to large size of genomes, such as those of most plant species. Here, we present a new approach, GPLEXUS (http://plantgrn.noble.org/GPLEXUS/), which integrates a series of novel algorithms in a parallel-computing environment to construct and analyze genome-wide GANs. GPLEXUS adopts an ultra-fast estimation for pairwise mutual information computing that is similar in accuracy and sensitivity to the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) method and runs ∼1000 times faster. GPLEXUS integrates Markov Clustering Algorithm to effectively identify functional subnetworks. Furthermore, GPLEXUS includes a novel 'condition-removing' method to identify the major experimental conditions in which each subnetwork operates from very large-scale gene expression datasets across several experimental conditions, which allows users to annotate the various subnetworks with experiment-specific conditions. We demonstrate GPLEXUS's capabilities by construing global GANs and analyzing subnetworks related to defense against biotic and abiotic stress, cell cycle growth and division in Arabidopsis thaliana.
Nucleic Acids Research 10/2013; 42(5). DOI:10.1093/nar/gkt983 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Legumes play a vital role in maintaining the nitrogen cycle of the biosphere. They conduct symbiotic nitrogen fixation through endosymbiotic relationships with bacteria in root nodules. However, this and other characteristics of legumes, including mycorrhization, compound leaf development and profuse secondary metabolism, are absent in the typical model plant Arabidopsis thaliana. We present LegumeIP (http://plantgrn.noble.org/LegumeIP/), an integrative database for comparative genomics and transcriptomics of model legumes, for studying gene function and genome evolution in legumes. LegumeIP compiles gene and gene family information, syntenic and phylogenetic context and tissue-specific transcriptomic profiles. The database holds the genomic sequences of three model legumes, Medicago truncatula, Glycine max and Lotus japonicus plus two reference plant species, A. thaliana and Populus trichocarpa, with annotations based on UniProt, InterProScan, Gene Ontology and the Kyoto Encyclopedia of Genes and Genomes databases. LegumeIP also contains large-scale microarray and RNA-Seq-based gene expression data. Our new database is capable of systematic synteny analysis across M. truncatula, G. max, L. japonicas and A. thaliana, as well as construction and phylogenetic analysis of gene families across the five hosted species. Finally, LegumeIP provides comprehensive search and visualization tools that enable flexible queries based on gene annotation, gene family, synteny and relative gene expression.
[Show abstract][Hide abstract] ABSTRACT: We present GRNCASE - a high performance gene regulatory network (GRN) construction and functional module discovery and analysis web server. GRNCASE is able to fast and accurately construct global gene regulatory networks from large scale gene expression data (for example, gene expression profiles from hybridization experiments with thousands of microarrays; and each microarray can contain tens of thousands of probe-sets) attributing to its parallel system implementation of 1) fast B-spline-based mutual information calculation algorithm and 2) effective reduction of false positive predictions through data processing inequality (DPI) analysis. The GRNCASE server also integrates Markov clustering algorithm for functional modules discovery and analysis in the constructed GRNs.
The GRNCASE is publicly and freely available at http://plantgrn.noble.org/GRNCASE/.
International Plant and Animal Genome Conference XXI 2013;
[Show abstract][Hide abstract] ABSTRACT: Background. Plant hormone (also known as phytohormone) signaling and regulatory networks play critical roles in almost all aspects of plant growth, development, senescence, and its responses to different environmental challenges. Systematic knowledge of global hormone signaling and regulatory networks is therefore fundamentally important to understanding plant biology. Significant progress in hormone research has been made during the past decade. Yet to date, little is known about the hormonal cross-talks, regulation mechanisms between different plant hormones, and interactions between downstream regulatory genes. And, key components are yet-to-be discovered to fill the pathway holes.
Results We constructed comprehensive global hormonal signaling and regulatory networks, which comprehensively integrate hormone biosynthesis, metabolism cascades, signaling transductions and gene co-expression networks, for the model plant Arabidopsis thaliana. Currently, the HRGRN hosts protein-protein interactions, none-coding RNAs (ncRNAs) and their regulatory targets, transcription factors and their target genes, and gene-gene interactions from co-expression-based network analysis, which were manually curated and integrated. Furthermore, we developed powerful “Graph Search”-based pathway and sub-network analysis and visualization tools for global hormonal signaling and network analysis, novel and essential gene relationship discovery. In addition, we developed interactive and visualized graph based network analysis tools, by which researchers can investigate genes, pathways, and subnetworks, including hormonal cross-talks.
Conclusions. We developed HRGRN: a graph search-empowered integrative database of Arabidopsis hormone signaling and gene regulation, which is publically and freely available at http://plantgrn.noble.org/hrgrn/.
International Plant and Animal Genome Conference XXII 2014;
[Show abstract][Hide abstract] ABSTRACT: The accurate re-construction and interpretation of genome-scale Gene Association Networks (GANs) from large-scale gene expression data is challenging but crucial to the understanding of gene function, interaction, and cellular behavior at the genome level. Yet, reverse-engineering of genome-scale GANs from large-scale gene expression data and subsequently functional module discovery remains a very computationally intensive task; it generally requires both effective statistical approaches and efficient algorithm implementations through parallel computing engineering. Thus to date, it stands as one of the most pressing open problems in computational systems biology.
To facilitate genome-scale GAN reconstruction, we developed two new web servers: 1) GPLEXUS (http://plantgrn.noble.org/GPLEXUS/) and 2) DeGNServer (http://plantgrn.noble.org/DeGNServer/), both integrate a series of novel algorithms in a parallel-computing environment to construct and analyze genome-wide GANs from very large-scale expression profile datasets.
The GPLEXUS was specially designed for decipher genome-scale networks from large number of expression profiles for the organisms with medium to large size of genomes, such as those of most plant species. GPLEXUS adopts an ultra-fast estimation for pairwise mutual information computing which is similar in accuracy and sensitivity to the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) method and runs approximately one thousand times faster. GPLEXUS integrates Markov Clustering Algorithm (MCL) to effectively identify functional sub-networks. Furthermore, GPLEXUS includes a novel ‘condition-removing’ method to identify the major experimental conditions in which each sub-network operates.
The DeGNServer extended the widely used Context Likelihood of Relatedness (CLR) framework by the integration of six proven correlation/association analysis methods, e.g. Spearman rank correlation, Pearson correlation, Mutual-information, Maximum information coefficient, Kendall rank correlation, Thei-sen Estimator, and then implemented the algorithms in parallel and distributed environment to enable genome-scale gene-gene association inference from small to medium-sale gene expression profiles at extraordinary speed while maintaining high prediction accuracy. Furthermore, we integrated the SNBuilder and GeNa algorithms for sub-network extraction and functional module discovery.
International Plant and Animal Genome Conference XXII 2014;
[Show abstract][Hide abstract] ABSTRACT: The LegumeIP hosts large-scale integrative genomics, transcriptomics data and bioinformatics resources to study gene function and gene and genome evolution in legumes. The data resources include genomics sequences, gene models, proteins sequences, EST sequence, and InterProScan annotations of three legume species (Medicago truncatula, Lotus japonicas, Glycine max) and two outgroup species (Arabidopsis thaliana, Poplar trichocarpa). Recently, we updated our database from LegumeIP to a new version, LegumeIP 2.0, incorporating the following new features/functions:
* Whole-genome gene regulatory network construction for Medicago truncatula, Glycine max, Lotus japonicas, Arabidopsis thaliana;
* Functional module analysis of constructed network and systematic transcription Factor prediction for these three Legume species;
* An integrated functional homology comparison of three legume species through sequence-based and network-based approach;
* Updated Medicago truncatula gene models to version 4.0 ;
* Additional genomics data from various legume genome sequencing projects.
LegumeIP 2.0 is freely and publically available at http://plantgrn.noble.org/LegumeIP/.
International Plant and Animal Genome Conference XXI 2013;