Effects of cDNA microarray time-series data size on gene regulatory network inference accuracy
DOI: 10.1145/1854776.1854842 Conference: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 2010, Niagara Falls, NY, USA, August 2-4, 2010
A number of models and algorithms have been proposed in the past for gene regulatory network (GRN) inference; however, none of them address the effects of the size of the time-series microarray expression data in terms of number of time-points. In this paper, we study this problem by analyzing the behavior of two algorithms based on information theory models. These algorithms were implemented on different sizes of data generated by synthetic network generation tools. Experiments show that the performances of these algorithms reach a saturation point after a specific data size, thus giving the biologist an idea about what size of data will give the best inference accuracy. Also, the fact that the accuracy saturates after a specific number of time points (the saturation point being different for different algorithms) suggests that generating time-series data for a lot of time-points will not necessary improve the inference accuracy beyond a certain point. To understand this saturation, we found out that the information theoretic quantity, mutual information, tends to zero as the number of time points increase although the entropy in the network rises to unity. This illustrates the fact that mutual information (MI) might not be the best metric to use for GRN inference algorithms. To modify the MI metric we introduce a new method of computing time lags between any pair of genes and present the time lagged mutual information (TLMI) metric for reverse engineering of GRNs.
Available from: Kurt Gust
- "In the proposed approaches, unlike the relevance network class of algorithms and our previous works      , in which network inference is based on just pair-wise MI computations, we attempt to infer complex structures. To scale up to genome level inference, we restrict the mutual information computations to infer three node structures (a gene and its two potential regulators). "
[Show abstract] [Hide abstract]
ABSTRACT: Inferring the genetic network architecture in cells is of great importance to biologists as it can lead to the understanding of cell signaling and metabolic dynamics underlying cellular processes, onset of diseases, and potential discoveries in drug development. The focus today has shifted to genome scale inference approaches using information theoretic metrics such as mutual information over the gene expression data. In this paper, we propose two classes of inference algorithms using scoring schemes on complex interactions which are primarily based on information theoretic metrics. The central idea is to go beyond pair-wise interactions and utilize more complex structures between any node (gene or transcription factor) and its possible multiple regulators (only transcription factors). While this increases the network inference complexity over pair-wise interaction based approaches, it achieves much higher accuracy. We restricted the complex interactions considered in this paper to 3-node structures (any node and its two regulators) to keep our schemes scalable to genome-scale inference and yet achieve higher accuracy than other state of the art approaches. Detailed performance analyses based on benchmark precision and recall metrics over the known Escherichia coli transcriptional regulatory network, indicated that the accuracy of the proposed algorithms (sCoIn, aCoIn and its variants) is consistently higher in comparison to popular algorithms such as context likelihood of relatedness (CLR), relevance networks (RN) and GEneNetwork Inference with Ensemble of trees (GENIE3).
[Show abstract] [Hide abstract]
ABSTRACT: Structural analysis over well studied transcriptional regulatory networks indicates that these complex networks are made up of small set of reoccurring patterns called motifs. While information theoretic approaches have been immensely popular, these approaches rely on inferring the regulatory networks by aggregating pair-wise interactions. In this paper, we propose novel structure based information theoretic approaches to infer transcriptional regulatory networks from the microarray expression data. The core idea is to go beyond pair-wise interactions and consider more complex structures as found in motifs. While this increases the network inference complexity over pair-wise interaction based approaches, it achieves much higher accuracy and yet is scalable to genome-level inference. Detailed performance analyses based on benchmark precision and recall metrics on the known Escherichia coli's transcriptional regulatory network indicates that the accuracy of the proposed algorithms is consistently higher in comparison to popular algorithms such as context likelihood of relatedness (CLR), relevance networks (RN) and GEneNetwork Inference with Ensemble of trees (GENIE3). In the proposed approaches the size of structures was limited to three node cases (any node and its two parents). Analysis on a smaller network showed that the performance of the algorithm improved when more complex structures were considered for inference, although such higher level structures may be computationally challenging to infer networks at the genome scale.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.