Reverse engineering module networks by PSO-RNN hybrid modeling

Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC, USA.
BMC Genomics (Impact Factor: 4.04). 02/2009; 10 Suppl 1(Suppl 1):S15. DOI: 10.1186/1471-2164-10-S1-S15
Source: PubMed

ABSTRACT Inferring a gene regulatory network (GRN) from high throughput biological data is often an under-determined problem and is a challenging task due to the following reasons: (1) thousands of genes are involved in one living cell; (2) complex dynamic and nonlinear relationships exist among genes; (3) a substantial amount of noise is involved in the data, and (4) the typical small sample size is very small compared to the number of genes. We hypothesize we can enhance our understanding of gene interactions in important biological processes (differentiation, cell cycle, and development, etc) and improve the inference accuracy of a GRN by (1) incorporating prior biological knowledge into the inference scheme, (2) integrating multiple biological data sources, and (3) decomposing the inference problem into smaller network modules.
This study presents a novel GRN inference method by integrating gene expression data and gene functional category information. The inference is based on module network model that consists of two parts: the module selection part and the network inference part. The former determines the optimal modules through fuzzy c-mean (FCM) clustering and by incorporating gene functional category information, while the latter uses a hybrid of particle swarm optimization and recurrent neural network (PSO-RNN) methods to infer the underlying network between modules. Our method is tested on real data from two studies: the development of rat central nervous system (CNS) and the yeast cell cycle process. The results are evaluated by comparing them to previously published results and gene ontology annotation information.
The reverse engineering of GRNs in time course gene expression data is a major obstacle in system biology due to the limited number of time points. Our experiments demonstrate that the proposed method can address this challenge by: (1) preprocessing gene expression data (e.g. normalization and missing value imputation) to reduce the data noise; (2) clustering genes based on gene expression data and gene functional category information to identify biologically meaningful modules, thereby reducing the dimensionality of the data; (3) modeling GRNs with the PSO-RNN method between the modules to capture their nonlinear and dynamic relationships. The method is shown to lead to biologically meaningful modules and networks among the modules.

Download full-text


Available from: Yuji Zhang, Sep 01, 2015
  • Source
    • "Basically, microarray data contain 1-10% missing values that could affect up to 95% of genes [10]. The occurrence of missing values in microarray data disadvantageously influences downstream analyses, such as discovery of differentially expressed genes [11,12], construction of gene regulatory networks [13,14], supervised classification of clinical samples [15], gene cluster analysis [10,16], and biomarker detection. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used. In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets. In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (, which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses.
    BMC Systems Biology 12/2013; 7 Suppl 6(Suppl 6):S12. DOI:10.1186/1752-0509-7-S6-S12 · 2.85 Impact Factor
  • Source
    • "They are the smallest basic functional and evolutionarily conserved units in biological networks. Our hypothesis is that NMs of a network are the significant sub-patterns that represent the backbone of the network, which serves as the focused portion out of thousands of nodes (e.g., drugs, diseases, and genes,) [14,15]. These NMs could also form large aggregated modules that perform specific functions by forming associations in overlapping NMs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background A huge amount of associations among different biological entities (e.g., disease, drug, and gene) are scattered in millions of biomedical articles. Systematic analysis of such heterogeneous data can infer novel associations among different biological entities in the context of personalized medicine and translational research. Recently, network-based computational approaches have gained popularity in investigating such heterogeneous data, proposing novel therapeutic targets and deciphering disease mechanisms. However, little effort has been devoted to investigating associations among drugs, diseases, and genes in an integrative manner. Results We propose a novel network-based computational framework to identify statistically over-expressed subnetwork patterns, called network motifs, in an integrated disease-drug-gene network extracted from Semantic MEDLINE. The framework consists of two steps. The first step is to construct an association network by extracting pair-wise associations between diseases, drugs and genes in Semantic MEDLINE using a domain pattern driven strategy. A Resource Description Framework (RDF)-linked data approach is used to re-organize the data to increase the flexibility of data integration, the interoperability within domain ontologies, and the efficiency of data storage. Unique associations among drugs, diseases, and genes are extracted for downstream network-based analysis. The second step is to apply a network-based approach to mine the local network structure of this heterogeneous network. Significant network motifs are then identified as the backbone of the network. A simplified network based on those significant motifs is then constructed to facilitate discovery. We implemented our computational framework and identified five network motifs, each of which corresponds to specific biological meanings. Three case studies demonstrate that novel associations are derived from the network topology analysis of reconstructed networks of significant network motifs, further validated by expert knowledge and functional enrichment analyses. Conclusions We have developed a novel network-based computational approach to investigate the heterogeneous drug-gene-disease network extracted from Semantic MEDLINE. We demonstrate the power of this approach by prioritizing candidate disease genes, inferring potential disease relationships, and proposing novel drug targets, within the context of the entire knowledge. The results indicate that such approach will facilitate the formulization of novel research hypotheses, which is critical for translational medicine research and personalized medicine.
    Journal of Biomedical Semantics 01/2013; In revision. DOI:10.1186/2041-1480-5-33 · 2.62 Impact Factor
  • Source
    • "Swarm intelligence has also been used for inference of GRN from data. Particle swarm optimization (PSO) is used for the reconstruction of gene networks modeled by recurrent neural networks (RNN) in [18] and [19]. Also, a hybrid of differential evolution and PSO (DEPSO) for training RNN is investigated in [20]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A swarm intelligence technique called the bees algorithm is formulated to build synthetic networks of the budding yeast cell-cycle. The resulting networks contain the original fixed points of the budding yeast cell-cycle network plus additional fixed points to reduce the basin size of the fixed point associated to the G1 phase of the cell-cycle, with the purpose of promoting cell proliferation for biotechnological applications. One thousand synthetic networks were found using the bees algorithm, 84.5% had basins size for the G1 fixed point less or equal to 10, whereas the original model has a basin size for that fixed point of 1764. One of the synthetic networks was analyzed by a biologist concluding that the resulting model was quite consistent from a biological point of view, supporting the proposed method as a tool for biologist to construct synthetic networks with desired characteristics.
    Machine Learning and Applications (ICMLA), 2012 11th International Conference on; 12/2012
Show more