Article

A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases

Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, CA 94025, USA.
BMC Bioinformatics (Impact Factor: 2.67). 07/2004; 5:76. DOI: 10.1186/1471-2105-5-76
Source: PubMed

ABSTRACT The PathoLogic program constructs Pathway/Genome databases by using a genome's annotation to predict the set of metabolic pathways present in an organism. PathoLogic determines the set of reactions composing those pathways from the enzymes annotated in the organism's genome. Most annotation efforts fail to assign function to 40-60% of sequences. In addition, large numbers of sequences may have non-specific annotations (e.g., thiolase family protein). Pathway holes occur when a genome appears to lack the enzymes needed to catalyze reactions in a pathway. If a protein has not been assigned a specific function during the annotation process, any reaction catalyzed by that protein will appear as a missing enzyme or pathway hole in a Pathway/Genome database.
We have developed a method that efficiently combines homology and pathway-based evidence to identify candidates for filling pathway holes in Pathway/Genome databases. Our program not only identifies potential candidate sequences for pathway holes, but combines data from multiple, heterogeneous sources to assess the likelihood that a candidate has the required function. Our algorithm emulates the manual sequence annotation process, considering not only evidence from homology searches, but also considering evidence from genomic context (i.e., is the gene part of an operon?) and functional context (e.g., are there functionally-related genes nearby in the genome?) to determine the posterior belief that a candidate has the required function. The method can be applied across an entire metabolic pathway network and is generally applicable to any pathway database. The program uses a set of sequences encoding the required activity in other genomes to identify candidate proteins in the genome of interest, and then evaluates each candidate by using a simple Bayes classifier to determine the probability that the candidate has the desired function. We achieved 71% precision at a probability threshold of 0.9 during cross-validation using known reactions in computationally-predicted pathway databases. After applying our method to 513 pathway holes in 333 pathways from three Pathway/Genome databases, we increased the number of complete pathways by 42%. We made putative assignments to 46% of the holes, including annotation of 17 sequences of previously unknown function.
Our pathway hole filler can be used not only to increase the utility of Pathway/Genome databases to both experimental and computational researchers, but also to improve predictions of protein function.

0 Followers
 · 
182 Views
 · 
5 Downloads
  • Source
    • "On the other hand MEP and Pathway Tools hole filler represent an alternative approach that tackles the gap filling issue identifying missing genes rather than missing reactions, and these tools achieve this goal using expression data and species homology, respectively . As such, this will eventually lead to the expansion of the reconstructed model to include more genes and enzymes and possibly rewire the connectivity of the network [47] [48]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The increased demand and consumption of fossil fuels have raised interest in finding renewable energy sources throughout the globe. Much focus has been placed on optimizing microorganisms and primarily microalgae, to efficiently produce compounds that can substitute for fossil fuels. However, the path to achieving economic feasibility is likely to require strain optimization through using available tools and technologies in the fields of systems and synthetic biology. Such approaches invoke a deep understanding of the metabolic networks of the organisms and their genomic and proteomic profiles. The advent of next generation sequencing and other high throughput methods has led to a major increase in availability of biological data. Integration of such disparate data can help define the emergent metabolic system properties, which is of crucial importance in addressing biofuel production optimization. Herein, we review major computational tools and approaches developed and used in order to potentially identify target genes, pathways, and reactions of particular interest to biofuel production in algae. As the use of these tools and approaches has not been fully implemented in algal biofuel research, the aim of this review is to highlight the potential utility of these resources toward their future implementation in algal research.
    BioMed Research International 09/2014; 2014((2014)):1-12. DOI:10.1155/2014/649453 · 2.71 Impact Factor
  • Source
    • "More recent comparative research has proceeded by focusing on alignment techniques that can identify similar parts between pathways, providing further insight for drug target identification [4,5], meaningful reconstruction of phylogenetic trees [6,7], and identification of enzymes clusters and missing enzymes [8,9]. Here too approaches in the literature vary: some consider multiple pathways and identify their frequent or conserved subgraphs [10,11]; others also build their alignments [12-21]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Comparing the metabolic pathways of different species is useful for understanding metabolic functions and can help in studying diseases and engineering drugs. Several comparison techniques for metabolic pathways have been introduced in the literature as a first attempt in this direction. The approaches are based on some simplified representation of metabolic pathways and on a related definition of a similarity score (or distance measure) between two pathways. More recent comparative research focuses on alignment techniques that can identify similar parts between pathways. Results We propose a methodology for the pairwise comparison and alignment of metabolic pathways that aims at providing the largest conserved substructure of the pathways under consideration. The proposed methodology has been implemented in a tool called MP-Align, which has been used to perform several validation tests. The results showed that our similarity score makes it possible to discriminate between different domains and to reconstruct a meaningful phylogeny from metabolic data. The results further demonstrate that our alignment algorithm correctly identifies subpathways sharing a common biological function. Conclusion The results of the validation tests performed with MP-Align are encouraging. A comparison with another proposal in the literature showed that our alignment algorithm is particularly well-suited to finding the largest conserved subpathway of the pathways under examination.
    BMC Systems Biology 05/2014; 8(1):58. DOI:10.1186/1752-0509-8-58 · 2.85 Impact Factor
  • Source
    • "Non-biochemically based pathway prediction systems use statistical inference methods to generate reactions between compounds [57]. These systems include machine learning methods [58], the Bayesian method [59], comparative genomics [60] and metabolic network alignment [61]. These methods are very useful to identify missing links in the network [57,62]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Bioinformatics and biodegradation are two primary scientific fields in applied microbiology and biotechnology. The present review describes development of various bioinformatics tools that may be applied in the field of biodegradation. Several databases, including the University of Minnesota Biocatalysis/Biodegradation database (UM-BBD), a database of biodegradative oxygenases (OxDBase), Biodegradation Network-Molecular Biology Database (Bionemo) MetaCyc, and BioCyc have been developed to enable access to information related to biochemistry and genetics of microbial degradation. In addition, several bioinformatics tools for predicting toxicity and biodegradation of chemicals have been developed. Furthermore, the whole genomes of several potential degrading bacteria have been sequenced and annotated using bioinformatics tools.
    Biological Procedures Online 04/2014; 16:8. DOI:10.1186/1480-9222-16-8 · 1.30 Impact Factor
Show more

Preview

Download
5 Downloads
Available from