[Show abstract][Hide abstract] ABSTRACT: The biological networks controlling plant signal transduction, metabolism, and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and coordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout published literatures, publicly available high-throughput datasets, and third-party databases. Many "unknown" yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at network level from existing heterogeneous resources remains challenging and time-consuming for biologists.Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism, and gene regulatory networks. HRGRN utilizes Neo4j, which a highly scalable graph database management system, to host large-scale of biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talks between pathways.Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build sub-networks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/.
Full-text · Article · Dec 2015 · Plant and Cell Physiology
[Show abstract][Hide abstract] ABSTRACT: The LegumeIP 2.0 database hosts large-scale genomics and transcriptomics data and provides integrative bioinformatics tools
for the study of gene function and evolution in legumes. Our recent updates in LegumeIP 2.0 include gene and protein sequences,
gene models and annotations, syntenic regions, protein families and phylogenetic trees for six legume species: Medicago truncatula, Glycine max (soybean), Lotus japonicus, Phaseolus vulgaris (common bean), Cicer arietinum (chickpea) and Cajanus cajan (pigeon pea) and two outgroup reference species: Arabidopsis thaliana and Poplar trichocarpa. Moreover, the LegumeIP 2.0 features the following new data resources and bioinformatics tools: (i) an integrative gene expression
atlas for four model legumes that include 550 array hybridizations from M. truncatula, 962 gene expression profiles of G. max, 276 array hybridizations from L. japonicas and 56 RNA-Seq-based gene expression profiles for C. arietinum. These datasets were manually curated and hierarchically organized based on Experimental Ontology and Plant Ontology so that
users can browse, search, and retrieve data for their selected experiments. (ii) New functions/analytical tools to query,
mine and visualize large-scale gene sequences, annotations and transcriptome profiles. Users may select a subset of expression
experiments and visualize and compare expression profiles for multiple genes. The LegumeIP 2.0 database is freely available
to the public at http://plantgrn.noble.org/LegumeIP/.
Full-text · Article · Nov 2015 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: Alfalfa (Medicago sativa L.) is the primary forage legume crop species in the United States and plays essential economic and ecological roles in agricultural systems across the country. Modern alfalfa is the result of hybridization between tetraploid M. sativa ssp. sativa and M. sativa ssp. falcata. Due to its large and complex genome, there are few genomic resources available for alfalfa improvement.
A de novo transcriptome assembly from two alfalfa subspecies, M. sativa ssp. sativa (B47) and M. sativa ssp. falcata (F56) was developed using Illumina RNA-seq technology. Transcripts from roots, nitrogen-fixing root nodules, leaves, flowers, elongating stem internodes, and post-elongation stem internodes were assembled into the Medicago sativa Gene Index 1.2 (MSGI 1.2) representing 112,626 unique transcript sequences. Nodule-specific and transcripts involved in cell wall biosynthesis were identified. Statistical analyses identified 20,447 transcripts differentially expressed between the two subspecies. Pair-wise comparisons of each tissue combination identified 58,932 sequences differentially expressed in B47 and 69,143 sequences differentially expressed in F56. Comparing transcript abundance in floral tissues of B47 and F56 identified expression differences in sequences involved in anthocyanin and carotenoid synthesis, which determine flower pigmentation. Single nucleotide polymorphisms (SNPs) unique to each M. sativa subspecies (110,241) were identified.
The Medicago sativa Gene Index 1.2 increases the expressed sequence data available for alfalfa by ninefold and can be expanded as additional experiments are performed. The MSGI 1.2 transcriptome sequences, annotations, expression profiles, and SNPs were assembled into the Alfalfa Gene Index and Expression Database (AGED) at http://plantgrn.noble.org/AGED/ , a publicly available genomic resource for alfalfa improvement and legume research.
[Show abstract][Hide abstract] ABSTRACT: Background
Common bean (Phaseolus vulgaris) is grown throughout the world and comprises roughly 50% of the grain legumes consumed worldwide. Despite this, genetic resources for common beans have been lacking. Next generation sequencing, has facilitated our investigation of the gene expression profiles associated with biologically important traits in common bean. An increased understanding of gene expression in common bean will improve our understanding of gene expression patterns in other legume species.
Combining recently developed genomic resources for Phaseolus vulgaris, including predicted gene calls, with RNA-Seq technology, we measured the gene expression patterns from 24 samples collected from seven tissues at developmentally important stages and from three nitrogen treatments. Gene expression patterns throughout the plant were analyzed to better understand changes due to nodulation, seed development, and nitrogen utilization. We have identified 11,010 genes differentially expressed with a fold change ≥ 2 and a P-value < 0.05 between different tissues at the same time point, 15,752 genes differentially expressed within a tissue due to changes in development, and 2,315 genes expressed only in a single tissue. These analyses identified 2,970 genes with expression patterns that appear to be directly dependent on the source of available nitrogen. Finally, we have assembled this data in a publicly available database, The Phaseolus vulgaris Gene Expression Atlas (Pv GEA), http://plantgrn.noble.org/PvGEA/ . Using the website, researchers can query gene expression profiles of their gene of interest, search for genes expressed in different tissues, or download the dataset in a tabular form.
These data provide the basis for a gene expression atlas, which will facilitate functional genomic studies in common bean. Analysis of this dataset has identified genes important in regulating seed composition and has increased our understanding of nodulation and impact of the nitrogen source on assimilation and distribution throughout the plant.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-866) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract] ABSTRACT: Analysis of genome-scale gene networks (GNs) using large-scale gene expression data provides unprecedented opportunities to uncover gene interactions and regulatory networks involved in various biological processes and developmental programs, leading to accelerated discovery of novel knowledge of various biological processes, pathways and systems. The widely used context likelihood of relatedness (CLR) method based on the mutual information (MI) for scoring the similarity of gene pairs is one of the accurate methods currently available for inferring GNs. However, the MI-based reverse engineering method can achieve satisfactory performance only when sample size exceeds one hundred. This in turn limits their applications for GN construction from expression data set with small sample size. We developed a high performance web server, DeGNServer, to reverse engineering and decipher genome-scale networks. It extended the CLR method by integration of different correlation methods that are suitable for analyzing data sets ranging from moderate to large scale such as expression profiles with tens to hundreds of microarray hybridizations, and implemented all analysis algorithms using parallel computing techniques to infer gene-gene association at extraordinary speed. In addition, we integrated the SNBuilder and GeNa algorithms for subnetwork extraction and functional module discovery. DeGNServer is publicly and freely available online.
Full-text · Article · Nov 2013 · BioMed Research International
[Show abstract][Hide abstract] ABSTRACT: The accurate construction and interpretation of gene association networks (GANs) is challenging, but crucial, to the understanding of gene function, interaction and cellular behavior at the genome level. Most current state-of-the-art computational methods for genome-wide GAN reconstruction require high-performance computational resources. However, even high-performance computing cannot fully address the complexity involved with constructing GANs from very large-scale expression profile datasets, especially for the organisms with medium to large size of genomes, such as those of most plant species. Here, we present a new approach, GPLEXUS (http://plantgrn.noble.org/GPLEXUS/), which integrates a series of novel algorithms in a parallel-computing environment to construct and analyze genome-wide GANs. GPLEXUS adopts an ultra-fast estimation for pairwise mutual information computing that is similar in accuracy and sensitivity to the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) method and runs ∼1000 times faster. GPLEXUS integrates Markov Clustering Algorithm to effectively identify functional subnetworks. Furthermore, GPLEXUS includes a novel 'condition-removing' method to identify the major experimental conditions in which each subnetwork operates from very large-scale gene expression datasets across several experimental conditions, which allows users to annotate the various subnetworks with experiment-specific conditions. We demonstrate GPLEXUS's capabilities by construing global GANs and analyzing subnetworks related to defense against biotic and abiotic stress, cell cycle growth and division in Arabidopsis thaliana.
Full-text · Article · Oct 2013 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: Legumes play a vital role in maintaining the nitrogen cycle of the biosphere. They conduct symbiotic nitrogen fixation through
endosymbiotic relationships with bacteria in root nodules. However, this and other characteristics of legumes, including mycorrhization,
compound leaf development and profuse secondary metabolism, are absent in the typical model plant Arabidopsis thaliana. We present LegumeIP (http://plantgrn.noble.org/LegumeIP/), an integrative database for comparative genomics and transcriptomics of model legumes, for studying gene function and genome
evolution in legumes. LegumeIP compiles gene and gene family information, syntenic and phylogenetic context and tissue-specific
transcriptomic profiles. The database holds the genomic sequences of three model legumes, Medicago truncatula, Glycine max and Lotus japonicus plus two reference plant species, A. thaliana and Populus trichocarpa, with annotations based on UniProt, InterProScan, Gene Ontology and the Kyoto Encyclopedia of Genes and Genomes databases.
LegumeIP also contains large-scale microarray and RNA-Seq-based gene expression data. Our new database is capable of systematic
synteny analysis across M. truncatula, G. max, L. japonicas and A. thaliana, as well as construction and phylogenetic analysis of gene families across the five hosted species. Finally, LegumeIP provides
comprehensive search and visualization tools that enable flexible queries based on gene annotation, gene family, synteny and
relative gene expression.
Full-text · Article · Nov 2011 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: We present GRNCASE - a high performance gene regulatory network (GRN) construction and functional module discovery and analysis web server. GRNCASE is able to fast and accurately construct global gene regulatory networks from large scale gene expression data (for example, gene expression profiles from hybridization experiments with thousands of microarrays; and each microarray can contain tens of thousands of probe-sets) attributing to its parallel system implementation of 1) fast B-spline-based mutual information calculation algorithm and 2) effective reduction of false positive predictions through data processing inequality (DPI) analysis. The GRNCASE server also integrates Markov clustering algorithm for functional modules discovery and analysis in the constructed GRNs.
The GRNCASE is publicly and freely available at http://plantgrn.noble.org/GRNCASE/.
[Show abstract][Hide abstract] ABSTRACT: Background. Plant hormone (also known as phytohormone) signaling and regulatory networks play critical roles in almost all aspects of plant growth, development, senescence, and its responses to different environmental challenges. Systematic knowledge of global hormone signaling and regulatory networks is therefore fundamentally important to understanding plant biology. Significant progress in hormone research has been made during the past decade. Yet to date, little is known about the hormonal cross-talks, regulation mechanisms between different plant hormones, and interactions between downstream regulatory genes. And, key components are yet-to-be discovered to fill the pathway holes.
Results We constructed comprehensive global hormonal signaling and regulatory networks, which comprehensively integrate hormone biosynthesis, metabolism cascades, signaling transductions and gene co-expression networks, for the model plant Arabidopsis thaliana. Currently, the HRGRN hosts protein-protein interactions, none-coding RNAs (ncRNAs) and their regulatory targets, transcription factors and their target genes, and gene-gene interactions from co-expression-based network analysis, which were manually curated and integrated. Furthermore, we developed powerful “Graph Search”-based pathway and sub-network analysis and visualization tools for global hormonal signaling and network analysis, novel and essential gene relationship discovery. In addition, we developed interactive and visualized graph based network analysis tools, by which researchers can investigate genes, pathways, and subnetworks, including hormonal cross-talks.
Conclusions. We developed HRGRN: a graph search-empowered integrative database of Arabidopsis hormone signaling and gene regulation, which is publically and freely available at http://plantgrn.noble.org/hrgrn/.
[Show abstract][Hide abstract] ABSTRACT: The accurate re-construction and interpretation of genome-scale Gene Association Networks (GANs) from large-scale gene expression data is challenging but crucial to the understanding of gene function, interaction, and cellular behavior at the genome level. Yet, reverse-engineering of genome-scale GANs from large-scale gene expression data and subsequently functional module discovery remains a very computationally intensive task; it generally requires both effective statistical approaches and efficient algorithm implementations through parallel computing engineering. Thus to date, it stands as one of the most pressing open problems in computational systems biology.
To facilitate genome-scale GAN reconstruction, we developed two new web servers: 1) GPLEXUS (http://plantgrn.noble.org/GPLEXUS/) and 2) DeGNServer (http://plantgrn.noble.org/DeGNServer/), both integrate a series of novel algorithms in a parallel-computing environment to construct and analyze genome-wide GANs from very large-scale expression profile datasets.
The GPLEXUS was specially designed for decipher genome-scale networks from large number of expression profiles for the organisms with medium to large size of genomes, such as those of most plant species. GPLEXUS adopts an ultra-fast estimation for pairwise mutual information computing which is similar in accuracy and sensitivity to the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) method and runs approximately one thousand times faster. GPLEXUS integrates Markov Clustering Algorithm (MCL) to effectively identify functional sub-networks. Furthermore, GPLEXUS includes a novel ‘condition-removing’ method to identify the major experimental conditions in which each sub-network operates.
The DeGNServer extended the widely used Context Likelihood of Relatedness (CLR) framework by the integration of six proven correlation/association analysis methods, e.g. Spearman rank correlation, Pearson correlation, Mutual-information, Maximum information coefficient, Kendall rank correlation, Thei-sen Estimator, and then implemented the algorithms in parallel and distributed environment to enable genome-scale gene-gene association inference from small to medium-sale gene expression profiles at extraordinary speed while maintaining high prediction accuracy. Furthermore, we integrated the SNBuilder and GeNa algorithms for sub-network extraction and functional module discovery.
[Show abstract][Hide abstract] ABSTRACT: The LegumeIP hosts large-scale integrative genomics, transcriptomics data and bioinformatics resources to study gene function and gene and genome evolution in legumes. The data resources include genomics sequences, gene models, proteins sequences, EST sequence, and InterProScan annotations of three legume species (Medicago truncatula, Lotus japonicas, Glycine max) and two outgroup species (Arabidopsis thaliana, Poplar trichocarpa). Recently, we updated our database from LegumeIP to a new version, LegumeIP 2.0, incorporating the following new features/functions:
* Whole-genome gene regulatory network construction for Medicago truncatula, Glycine max, Lotus japonicas, Arabidopsis thaliana;
* Functional module analysis of constructed network and systematic transcription Factor prediction for these three Legume species;
* An integrated functional homology comparison of three legume species through sequence-based and network-based approach;
* Updated Medicago truncatula gene models to version 4.0 ;
* Additional genomics data from various legume genome sequencing projects.
LegumeIP 2.0 is freely and publically available at http://plantgrn.noble.org/LegumeIP/.