Phylogenetic distances are encoded in networks of interacting pathways.
ABSTRACT Although metabolic reactions are unquestionably shaped by evolutionary processes, the degree to which the overall structure and complexity of their interconnections are linked to the phylogeny of species has not been evaluated in depth. Here, we apply an original metabolome representation, termed Network of Interacting Pathways or NIP, with a combination of graph theoretical and machine learning strategies, to address this question. NIPs compress the information of the metabolic network exhibited by a species into much smaller networks of overlapping metabolic pathways, where nodes are pathways and links are the metabolites they exchange.
Our analysis shows that a small set of descriptors of the structure and complexity of the NIPs combined into regression models reproduce very accurately reference phylogenetic distances derived from 16S rRNA sequences (10-fold cross-validation correlation coefficient higher than 0.9). Our method also showed better scores than previous work on metabolism-based phylogenetic reconstructions, as assessed by branch distances score, topological similarity and second cousins score. Thus, our metabolome representation as network of overlapping metabolic pathways captures sufficient information about the underlying evolutionary events leading to the formation of metabolic networks and species phylogeny. It is important to note that precise knowledge of all of the reactions in these pathways is not required for these reconstructions. These observations underscore the potential for the use of abstract, modular representations of metabolic reactions as tools in studying the evolution of species.
Supplementary data are available at Bioinformatics online.
Article: Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks.[show abstract] [hide abstract]
ABSTRACT: To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees. To compare the structures of metabolic networks in organisms, we adopted the exponential graph kernel, which is a kernel-based approach with a labeled graph that includes a label matrix and an adjacency matrix. To construct the phylogenetic trees, we used an unweighted pair-group method with arithmetic mean, i.e., a hierarchical clustering algorithm. We applied the kernel-based network profiling method in a comparative analysis of nine carbohydrate metabolic networks from 81 biological species encompassing Archaea, Eukaryota, and Eubacteria. The resulting phylogenetic hierarchies generally support the tripartite scheme of three domains rather than the two domains of prokaryotes and eukaryotes. By combining the kernel machines with metabolic information, the method infers the context of biosphere development that covers physiological events required for adaptation by genetic reconstruction. The results show that one may obtain a global view of the tree of life by comparing the metabolic pathway structures using meta-level information rather than sequence information. This method may yield further information about biological evolution, such as the history of horizontal transfer of each gene, by studying the detailed structure of the phylogenetic tree constructed by the kernel-based method.BMC Bioinformatics 02/2006; 7:284. · 2.75 Impact Factor
[show abstract] [hide abstract]
ABSTRACT: The program MODELTEST uses log likelihood scores to establish the model of DNA evolution that best fits the data. AVAILABILITY: The MODELTEST package, including the source code and some documentation is available at http://bioag.byu. edu/zoology/crandall_lab/modeltest.html.Bioinformatics 02/1998; 14(9):817-8. · 5.47 Impact Factor
[show abstract] [hide abstract]
ABSTRACT: Horizontal gene transfer (HGT) has been shown to widely spread in organisms by comparative genomic studies. However, its effect on the phylogenetic relationship of organisms, especially at a system level of different cellular functions, is still not well understood. In this work, we have constructed phylogenetic trees based on the enzyme, reaction, and gene contents of metabolic networks reconstructed from annotated genome information of 82 sequenced organisms. Results from different phylogenetic distance definitions and based on three different functional subsystems (i.e., metabolism, cellular processes, information storage and processing) were compared. Results based on the three different functional subsystems give different pictures on the phylogenetic relationship of organisms, reflecting the different extents of HGT in the different functional systems. In general, horizontal transfer is prevailing in genes for metabolism, but less in genes for information processing. Nevertheless, the major results of metabolic network-based phylogenetic trees are in good agreement with the tree based on 16S rRNA and genome trees, confirming the three domain classification and the close relationship between eukaryotes and archaea at the level of metabolic networks. These results strongly support the hypothesis that although HGT is widely distributed, it is nevertheless constrained by certain pre-existing metabolic organization principle(s) during the evolution. Further research is needed to identify the organization principle and constraints of metabolic network on HGT which have large impacts on understanding the evolution of life and in purposefully manipulating cellular metabolism.Molecular Phylogenetics and Evolution 05/2004; 31(1):204-13. · 3.61 Impact Factor