ArticlePDF Available

Analysis of gene expression data on metabolic networks



Content may be subject to copyright.
Analysis of gene expression data on metabolic networks
Anna-Lena Kranz1, Marcus Oswald3, Thorsten Bonato3, Hanna Seitz3, Gerhard Reinelt3,
Heiko Runz4, Johannes Zschocke4, Roland Eils1,2 and Rainer König1,2
1Department of Bioinformatics and Functional Genomics, Institute for Pharmacy and Molecular Biotechnology, University of
Heidelberg, 69120 Heidelberg, Germany. 2Theoretical Bioinformatics, German Cancer Research Center (DKFZ), 69120
Heidelberg, Germany. 3 Institute of Computer Science, University of Heidelberg, 69120 Heidelberg, Germany. 4Institute for
Human Genetics, University of Heidelberg, 69120 Heidelberg, Germany.
When analysing gene expression data it is often not enough to examine single genes but rather
to evaluate groups of genes [1]. As the modular structure of complex networks plays a critical
role in functionality, the use of metabolic networks provides a more structural approach to the
analysis of gene expression data. Determining significant expression patterns of topologically
associated genes enables the identification of functionally relevant central components in the
network with respect to different conditions of interest.
We invented a novel technique: The identification of sub-graphs in the metabolic network and
thereby the grouping of reactions into parts with their major connections is achieved with
clustering procedures on the network [2, 3]. Several clustering heuristics are existent that have
been applied to the clustering of metabolic networks by our group, i.e. simulated annealing [4], a
greedy and a consecutive ones heuristic [5, 6]. The overall modularity of the clustering as a
quality control is determined and optimised [7].
Once the clusters are identified, the gene expression data is mapped onto the corresponding
enzymatic reactions of its metabolic network. It is then possible to extract expression patterns for
each cluster using a combinatorial approach that has been developed in our group. Thereby,
values for every possible expression pattern of genes within a cluster are calculated. These
values show essential differences between samples of different
conditions and identify regions with a varying pattern between
different states.
As a case study our approach is applied to gene expression
data of HeLa cells under different concentrations of cholesterol.
This has high clinical relevance as impaired sterol biosynthesis
can cause severe human diseases (see Figure 1) [8]. Our
results promise new insights into sterol biosynthesis when
applied to the human metabolic network. With our method it is
not only possible to detect broken enzymes but also to discover
crucial imbalances of affected pathways in the cell.
Figure 1: Section of the cholesterol biosynthesis pathway showing interrupts causing the genetic diseases Lathosterolosis,
Conradi-Hünemann Syndrome and Smith-Lemli-Opitz Syndrome. [8]
1. Rapaport, F., et al., Classification of microarray data using gene networks. BMC Bioinformatics, 2007. 8: p. 35.
2. König, R., et al., Discovering functional gene expression patterns in the metabolic network of Escherichia coli with wavelets
transforms. BMC Bioinformatics, 2006. 7: p. 119.
3. König, R. and R. Eils, Gene expression analysis on biochemical networks using the Potts spin model. Bioinformatics, 2004. 20(10):
p. 1500-5.
4. Guimera, R. and L.A. Nunes Amaral, Functional cartography of complex metabolic networks. Nature, 2005. 433(7028): p. 895-900.
5. Christof, T., M. Oswald, and G. Reinelt, Consecutive Ones and a Betweenness Problem in Computational Biology, in Proceedings
of the 6th International IPCO Conference on Integer Programming and Combinatorial Optimization. 1998, Springer-Verlag.
6. Oswald, M. and G. Reinelt, Polyhedral Aspects of the Consecutive Ones Problem, in Proceedings of the 6th Annual International
Conference on Computing and Combinatorics. 2000, Springer-Verlag.
7. Newman, M.E. and M. Girvan, Finding and evaluating community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys,
2004. 69(2 Pt 2): p. 026113.
8. Haas, D., et al., Abnormal sterol metabolism in holoprosencephaly: Studies in cultured lymphoblasts. J Med Genet, 2007.
... Penghitungan menggunakan fungsi jarak Euclidean bergantung kepada magnituda level ekspresi gen. Hasil penghitungan koefisien dengan Euclidean adalah matriks dissimilarity, semakin kecil koefisien dua gen maka semakin identik kedua gen tersebut [15] . ...
... Dengan demikian E[xy] = E[x]E[y]. Koefiesien korelasi Pearson menghasilkan matriks similarity, semakin kecil koefisien antara dua gen, maka semakin tidak identik kedua gen tersebut [15] . ...
... Terdapat beberapa jenis algoritma HC yang dapat diaplikasikan untuk data microarray yaitu [15] : ...
Conference Paper
Full-text available
In the recent few decades, molecular biology as well as the gene data grows fast. The present of microarray technology can produce a huge of gene expression data in just once experiment. For the fast growing of gene data, there is a need for analyzing data and finding the information hidden in the data. One technique to analyze microarray data is by using clustering. In this paper, we proposed a software module for clustering microarray data and measuring the predictive power of the clustering algorithm. Clustering algorithms used are Hierarchical Clustering (HC), K-Means with K-Means and Kmedoids method, and Self Organizing Map (SOM). The result of the clustering microarray data is used as the input for measuring the predictive power of the algorithm by using statistical approach, Figure Of Merit (FOM). The index result of applying FOM can be used as a reference to choose the proper algorithm for clustering certain microarray data. We have found that by applying FOM to budding yeast data Saccharomyces cerevisiae, K-Means with K-Median method superiors K-Means method and SOM.
Full-text available
Microarray technology produces gene expression data on a genomic scale for an endless variety of organisms and conditions. However, this vast amount of information needs to be extracted in a reasonable way and funneled into manageable and functionally meaningful patterns. Genes may be reasonably combined using knowledge about their interaction behaviour. On a proteomic level, biochemical research has elucidated an increasingly complete image of the metabolic architecture, especially for less complex organisms like the well studied bacterium Escherichia coli. We sought to discover central components of the metabolic network, regulated by the expression of associated genes under changing conditions. We mapped gene expression data from E. coli under aerobic and anaerobic conditions onto the enzymatic reaction nodes of its metabolic network. An adjacency matrix of the metabolites was created from this graph. A consecutive ones clustering method was used to obtain network clusters in the matrix. The wavelet method was applied on the adjacency matrices of these clusters to collect features for the classifier. With a feature extraction method the most discriminating features were selected. We yielded network sub-graphs from these top ranking features representing formate fermentation, in good agreement with the anaerobic response of hetero-fermentative bacteria. Furthermore, we found a switch in the starting point for NAD biosynthesis, and an adaptation of the l-aspartate metabolism, in accordance with its higher abundance under anaerobic conditions. We developed and tested a novel method, based on a combination of rationally chosen machine learning methods, to analyse gene expression data on the basis of interaction data, using a metabolic network of enzymes. As a case study, we applied our method to E. coli under oxygen deprived conditions and extracted physiologically relevant patterns that represent an adaptation of the cells to changing environmental conditions. In general, our concept may be transferred to network analyses on biological interaction data, when data for two comparable states of the associated nodes are made available.
Full-text available
Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks in order to elucidate the functions perturbed at the level of pathways. However, integrating a priori knowledge of the gene networks could help in the statistical analysis of gene expression data and in their biological interpretation. We propose a method to integrate a priori the knowledge of a gene network in the analysis of gene expression data. The approach is based on the spectral decomposition of gene expression profiles with respect to the eigenfunctions of the graph, resulting in an attenuation of the high-frequency components of the expression profiles with respect to the topology of the graph. We show how to derive unsupervised and supervised classification algorithms of expression profiles, resulting in classifiers with biological relevance. We illustrate the method with the analysis of a set of expression profiles from irradiated and non-irradiated yeast strains. Including a priori knowledge of a gene network for the analysis of gene expression data leads to good classification performance and improved interpretability of the results.
Conference Paper
A 0/1-matrix has the consecutive ones property for rows if its columns can be permuted such that in every row all ones appear consecutively. Whereas it is easy to decide whether a given matrix has the consecutive ones property, it is difficult to find for a given 0/1-matrix B a consecutive ones matrix A that resembles B as closely as possible (in a sense to be specified). In this paper we study the latter problem from a polyhedral point of view and discuss an integer programming formulation that can serve as a basis for a branc h-and-cutalgorithm and give separation algorithms.
Holoprosencephaly (HPE) is the most common structural malformation of the developing forebrain in humans. The HPE phenotype is extremely variable and the etiology is heterogeneous. Among a variety of embryological toxins that can induce HPE, inhibitors, and other pertubations of cholesterol biosynthesis have been shown to be important factors, most likely because cholesterol is required in the Sonic hedgehog signaling cascade. Decreased levels of maternal cholesterol during pregnancy increase the risk for preterm delivery, but they are not associated with congenital malformations. However, if the fetus is affected by an inborn error of endogenous cholesterol synthesis, a reduction of maternal cholesterol concentration and cholesterol transport over the placenta aggravates the phenotypic expression. Exposure to lipophilic statins in early pregnancy may be associated with a substantial risk for structural CNS defects.
Motivation: Microarray technology allows us to profile the expression of a large subset or all genes of a cell. Biochemical research over the last three decades has elucidated an increasingly complete image of the metabolic architecture. For less complex organisms, such as Escherichia coli, the biochemical network has been described in much detail. Here, we investigate the clustering of such networks by applying gene expression data that define edge lengths in the network. Results: The Potts spin model is used as a nearest neighbour based clustering algorithm to discover fragmentation of the network in mutants or in biological samples when treated with drugs. As an example, we tested our method with gene expression data from E.coli treated with tryptophan excess, starvation and trpyptophan repressor mutants. We observed fragmentation of the tryptophan biosynthesis pathway, which corresponds well to the commonly known regulatory response of the cells.
High-throughput techniques are leading to an explosive growth in the size of biological databases and creating the opportunity to revolutionize our understanding of life and disease. Interpretation of these data remains, however, a major scientific challenge. Here, we propose a methodology that enables us to extract and display information contained in complex networks. Specifically, we demonstrate that we can find functional modules in complex networks, and classify nodes into universal roles according to their pattern of intra- and inter-module connections. The method thus yields a 'cartographic representation' of complex networks. Metabolic networks are among the most challenging biological networks and, arguably, the ones with most potential for immediate applicability. We use our method to analyse the metabolic networks of twelve organisms from three different superkingdoms. We find that, typically, 80% of the nodes are only connected to other nodes within their respective modules, and that nodes with different roles are affected by different evolutionary constraints and pressures. Remarkably, we find that metabolites that participate in only a few reactions but that connect different modules are more conserved than hubs whose links are mostly within a single module.
We propose and study a set of algorithms for discovering community structure in networks-natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using any one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Conference Paper
. In this paper we consider a variant of the betweenness problem occurring in computational biology. We present a new polyhedral approach which incorporates the solution of consecutive ones problems and show that it supersedes an earlier one. A particular feature of this new branch-and-cut algorithm is that it is not based on an explicit integer programming formulation of the problem and makes use of automatically generated facet-defining inequalities. 1 Introduction The general Betweenness Problem is the following combinatorial optimization problem. We are given a set of n objects 1; 2; : : : ; n, a set B of betweenness conditions, and a set B of non-betweenness conditions. Every element of B (of B) is a triple (i; j; k) (a triple (i; j; k)) requesting that object j should be placed (should not be placed) between objects i and k. The task is to find a linear order of all objects such that as many betweenness and non-betweenness conditions are satisfied, resp. to characterize all or...