Figure - available via license: Creative Commons Attribution 2.0 Generic
Content may be subject to copyright.
Workflow scheme for the analysis of the filtered networks. We obtain publicly available expression sets from breast cancer data (CGAP SAGE libraries); these are mapped to TRANSPATH signaling molecules and KEGG metabolic activities. Strict resp. 1-extended filtering yields 4 networks per disease condition.
Reconstruction of protein-protein interaction or metabolic networks based on expression data often involves in silico predictions, while on the other hand, there are unspecific networks of in vivo interactions derived from knowledge bases.We analyze networks designed to come as close as possible to data measured in vivo, both with respect to the se...
... Cutoff optimization as presented in this paper is a very flexible and generalizable inference strategy. Most other methods that account for prior knowledge integrate the biological reference directly into a specific network inference or regression framework  , for example by penalizing or enhancing specific edges according to the biological reference. On the contrary, our approach uses prior knowledge as an external reference system to optimize the purely data-driven association matrix. ...
Correlation networks are commonly used to statistically extract biological interactions between omics markers. Network edge selection is typically based on the significance of the underlying correlation coefficients. A statistical cutoff, however, is not guaranteed to capture biological reality, and heavily depends on dataset properties such as sample size. We here propose an alternative, innovative approach to address the problem of network reconstruction. Specifically, we developed a cutoff selection algorithm that maximizes the agreement to a given ground truth. We first evaluate the approach on IgG glycomics data, for which the biochemical pathway is known and well-characterized. The optimal network outperforms networks obtained with statistical cutoffs and is robust with respect to sample size. Importantly, we can show that even in the case of incomplete or incorrect prior knowledge, the optimal network is close to the true optimum. We then demonstrate the generalizability of the approach on an untargeted metabolomics and a transcriptomics dataset from The Cancer Genome Atlas (TCGA). For the transcriptomics case, we demonstrate that the optimized network is superior to statistical networks in systematically retrieving interactions that were not included in the biological reference used for the optimization. Overall, this paper shows that using prior information for correlation network inference is superior to using regular statistical cutoffs, even if the prior information is incomplete or partially inaccurate.
Correlation networks are frequently used to statistically extract biological interactions between omics markers. Network edge selection is typically based on the statistical significance of the correlation coefficients. This procedure, however, is not guaranteed to capture biological mechanisms. We here propose an alternative approach for network reconstruction: a cutoff selection algorithm that maximizes the overlap of the inferred network with available prior knowledge. We first evaluate the approach on IgG glycomics data, for which the biochemical pathway is known and well-characterized. Importantly, even in the case of incomplete or incorrect prior knowledge, the optimal network is close to the true optimum. We then demonstrate the generalizability of the approach with applications to untargeted metabolomics and transcriptomics data. For the transcriptomics case, we demonstrate that the optimized network is superior to statistical networks in systematically retrieving interactions that were not included in the biological reference used for optimization. Correlation network inference is typically based on the significance of the correlation coefficients, but this procedure is not guaranteed to capture biological mechanisms. Here, the authors develop a cutoff selection algorithm that maximizes the overlap between inferred networks and prior knowledge.
Even though vast amounts of genome-wide gene expression data have become available in plants, it remains a challenge to effectively mine this information for the discovery of genes and gene networks, for instance those that control agronomically important traits. These networks reflect potential interactions among genes and, therefore, can lead to a systematic understanding of the molecular mechanisms underlying targeted biological processes. We discuss methods to analyze gene networks using gene expression data, specifically focusing on four common statistical approaches used to reconstruct networks: correlation, feature selection in supervised learning, probabilistic graphical model, and meta-prediction. In addition, we discuss the effective use of these methods for acquiring an in-depth understanding of biological systems in plants.