The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases.
ABSTRACT The MetaCyc database (http://metacyc.org/) provides a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains more than 1800 pathways derived from more than 30,000 publications, and is the largest curated collection of metabolic pathways currently available. Most reactions in MetaCyc pathways are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes and literature citations. BioCyc (http://biocyc.org/) is a collection of more than 1700 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference database, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs contain additional features, including predicted operons, transport systems and pathway-hole fillers. The BioCyc website and Pathway Tools software offer many tools for querying and analysis of PGDBs, including Omics Viewers and comparative analysis. New developments include a zoomable web interface for diagrams; flux-balance analysis model generation from PGDBs; web services; and a new tool called Web Groups.
- SourceAvailable from: Lukas A Mueller[show abstract] [hide abstract]
ABSTRACT: The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism.Nucleic Acids Research 10/2009; 38(Database issue):D473-9. · 8.28 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: Pathway Tools is a production-quality software environment for creating a type of model-organism database called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc integrates the evolving understanding of the genes, proteins, metabolic network and regulatory network of an organism. This article provides an overview of Pathway Tools capabilities. The software performs multiple computational inferences including prediction of metabolic pathways, prediction of metabolic pathway hole fillers and prediction of operons. It enables interactive editing of PGDBs by DB curators. It supports web publishing of PGDBs, and provides a large number of query and visualization tools. The software also supports comparative analyses of PGDBs, and provides several systems biology analyses of PGDBs including reachability analysis of metabolic networks, and interactive tracing of metabolites through a metabolic network. More than 800 PGDBs have been created using Pathway Tools by scientists around the world, many of which are curated DBs for important model organisms. Those PGDBs can be exchanged using a peer-to-peer DB sharing system called the PGDB Registry.Briefings in Bioinformatics 12/2009; 11(1):40-79. · 5.30 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.BMC Bioinformatics 01/2010; 11:15. · 3.02 Impact Factor
The MetaCyc database of metabolic pathways and
enzymes and the BioCyc collection of pathway/
Ron Caspi1, Tomer Altman1, Kate Dreher2, Carol A. Fulcher1, Pallavi Subhraveti1,
Ingrid M. Keseler1, Anamika Kothari1, Markus Krummenacker1, Mario Latendresse1,
Lukas A. Mueller3, Quang Ong1, Suzanne Paley1, Anuradha Pujar3,
Alexander G. Shearer1, Michael Travers1, Deepika Weerasinghe1, Peifen Zhang2and
Peter D. Karp1,*
1SRI International, 333 Ravenswood, Menlo Park, CA 94025,2Department of Plant Biology, Carnegie Institution,
260 Panama Street, Stanford, CA 94305 and3Boyce Thompson Institute for Plant Research, Tower Road,
Ithaca, NY 14853, USA
Received September 29, 2011; Revised October 19, 2011; Accepted October 21, 2011
provides a comprehensive and freely accessible
resource for metabolic pathways and enzymes
from all domains of life. The pathways in MetaCyc
metabolic pathways and are curated from the
primary scientific literature.
more than 1800 pathways derived from more than
30000 publications, and is the largest curated col-
lection of metabolic pathways currently available.
Most reactions in MetaCyc pathways are linked to
one or more well-characterized enzymes, and
both pathways and enzymes are annotated with
reviews, evidence codes and literature citations.
BioCyc (http://biocyc.org/) is a collection of more
Databases (PGDBs). Each BioCyc PGDB contains
the full genome and predicted metabolic network
of one organism. The network, which is predicted
by the Pathway Tools software using MetaCyc as
a reference database, consists of metabolites,
including predicted operons, transport systems
and pathway-hole fillers. The BioCyc website and
Pathway Tools software offer many tools for
querying and analysis of PGDBs, including Omics
developments include a zoomable web interface
for diagrams; flux-balance analysis model gener-
ation from PGDBs; web services; and a new tool
called Web Groups.
non-redundant reference database of small-molecule me-
tabolism. It contains metabolic pathway and enzyme data
experimentally demonstrated in the scientific literature (1).
Because MetaCyc contains only experimentally deter-
mined pathways and enzymes, and due to its tight integra-
tion of data and references, MetaCyc is a uniquely
valuable resource in fields including genome analysis,
metabolism, and metabolic engineering. The metabolic
pathways and enzymes in MetaCyc are derived from
organisms representing all domains of life.
In conjunction with its role as a general reference on
metabolism, MetaCyc is used as a reference database for
the PathoLogic component of the Pathway Tools software
(2) to computationally predict the metabolic network of
any organism having a sequenced and annotated genome
(3). In this automated process, a predicted metabolic
network is created in the form of a Pathway/Genome
Database(PGDB). In addition
creation of PGDBs, the editing capabilities of Pathway
Tools enable scientists to improve and update these com-
putationally generated PGDBs by manual curation.
MetaCyc has been used by SRI to create more than
1700 PGDBs (as of October 2011), which are available
(http://metacyc.org/) isa highlycurated,
*To whom correspondence should be addressed. Tel: +1 650 859 4358; Fax: +1 650 859 3735; Email: email@example.com
Nucleic Acids Research, 2012, Vol. 40, Database issuePublished online 18 November 2011
? The Author(s) 2011. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Interested scientists may adopt and curate any of these
PGDBs through the BioCyc website (http://biocyc.org/
In addition, MetaCyc is used by other scientists to
create additional PGDBs, many of which are available
to the general public through the scientists’ own
websites. Together with BioCyc, these PGDBs form the
MetaCyc family of databases (4).
More than 100 groups have used Pathway Tools and
MetaCyc to create PGDBs for their organisms of interest,
Saccharomyces cerevisiae (5), Arabidopsis thaliana (6),
Oryza sativa (7), Mus musculus (8), Bos taurus (9),
Medicago truncatula (10), Populus trichocarpa (11),
Dictyostelium discoideum (12), Leishmania major (13),
species (15), bioenergy-related organisms (BeoCyc) and
many pathogenic bacteria (16) (see http://biocyc.org/
otherpgdbs.shtml for a more complete list). A few
examples of organisms that were studied within the last
year using Pathway Tools include Bacillus acidocaldarius,
B. circulans,B. filicolonicus,
licheniformis and B. stearothermophilus (17), Clostridium
difficile (18), C. thermocellum (19), Corynebacterium
erythropolis (25), R. opacus PD630 (26), S. cerevisiae and
Pichia pastoris (27), Serratia symbiotica (28), Shewanella
species (29), Vibrio vulnificus
axonopodis (31) and X. citri (32). Pathway Tools can
also generate PGDBs from metagenomic data sets (33).
A web server included in Pathway Tools enables the
publishing of PGDBs through either the Internet or an
internal network. The Navigator component of Pathway
Tools allows the browsing and analysis of PGDBs either
locally or over the Internet. A detailed description of
Pathway Tools can be found in (34).
PGDBs generated by Pathway Tools and MetaCyc are
an excellent platform for the integration of genome infor-
mation with many other types of data regarding metabol-
ism, regulation and genetics. They provide powerful tools
for analyzing omics data sets from experiments related to
gene transcription, metabolomics, proteomics, ChIP-chip
analysis and so on. During the past 2 years, we again sig-
nificantly expanded the data content of MetaCyc and
BioCyc. We also added supporting enhancements to the
Pathway Tools software and BioCyc website, as described
in the following sections.
Expansion of MetaCyc
All pathways in MetaCyc are curated from the experimen-
tal literature. Since the last Nucleic Acids Research publi-
cation (2 years ago) (1), we added 413 new base pathways
(pathways comprised of reactions only, where no portion
of the pathway is designated as a subpathway) and 40
superpathways (pathways composed of at least one base
pathway plus additional reactions or pathways), and
updated 107 existing pathways, for a total of 560 new
and revised pathways. The total number of base
pathways grew by 28%, from 1399 (version 13.5)
to 1790 (version 15.5) (the total increase is less than
413 pathways because some existing pathways were
deleted from the database during this period) while the
total number of superpathways grew by 17%, from 235
(version 13.5) to 275 (version 15.5).
Along with the increase in pathway number, the number
of enzymes, reactions, chemical compounds and citations
in the database grew by 30, 19, 13 and 49%, respectively;
the number of referenced organisms increased by 23%
(currently at 2216).
New pathway classes defined in MetaCyc
The pathways in MetaCyc are classified by an ontology
developed at SRI that is constantly updated to reflect
curation needs. Recently, we added two new top-level
classes to that ontology: Activation/Inactivation/Inter-
conversion and Metabolic Clusters.
was added to describe certain pathways that did not fit
well into any other classes, and, as its name implies,
includes the three subclasses: Activation, Inactivation
and Interconversion. In contrast to a standard ‘biosynthe-
sis’ pathway in which a biologically active compound is
pathways involve relatively minor chemical modifications
to existing compounds that result in a substantial increase
in their biological activity. An example activation pathway
is sulfate activation for sulfonation.
Similarly, inactivation pathways involve relatively
minor chemical modifications to existing biologically
active compounds that result in a substantial decrease in
their biological activity. This is in contrast to standard
compound is broken down into a set of simple metabol-
ites. An example inactivation pathway is gibberellin inacti-
vation II (methylation).
Interconversion pathways describe the bidirectional
conversion of a bio-molecule to a different form, where
the forward and backward conversions often prompt sig-
compound, resulting in its activation and deactivation,
respectively. For an example, see medicarpin conjugates
The Metabolic Clusters class was added to classify
metabolic diagrams that do not describe the classical
notion of a pathway. In pathways, all reactions are con-
nected to one another, whereas metabolic clusters
comprise a collection of non-connected but related reac-
tions that together describe a common phenomenon. For
example, see tRNA methylation (yeast), which describes a
collection of tRNA methyltransferase-catalyzed reactions
Ontology distribution of MetaCyc pathways
The six top-level categories (or classes) of the MetaCyc
pathwayontology areBiosynthesis, Degradation/
Nucleic Acids Research, 2012,Vol.40, Database issue D743
Metabolites and Energy, Detoxification,
Inactivation/Interconversion and Metabolic Clusters.
In version 15.5, the largest top-level class is Biosyn-
thesis, with 1143 base pathways. Its main subclasses are
Secondary Metabolites Biosynthesis (447); Cofactors,
Prosthetic Groups, and Electron Carriers Biosynthesis
(186); Amino Acids Biosynthesis (110); and Fatty Acids
and Lipids Biosynthesis (124).
The second-largest top-level class is Degradation/
Within this group, the largest subclasses are Aromatic
Degradation (117), Inorganic Nutrients Metabolism (94),
Carbohydrates Degradation (84).
Precursor Metabolites and Energy, contains 158 base
Metabolism (15), Methanogenesis (13) and Electron
The other three top-level classes are much smaller. The
Detoxification class doubled in size and now contains 32
base pathways, and the new Activation/Inactivation/
Interconversion and Metabolic Clusters classes contain
22 and 19 pathways, respectively.
During the previous 2 years, the number of metazoan
pathways in MetaCyc increased by 42%, from 174 to 247
pathways. Plant pathways increased by 22% to 784, and
archaeal pathways increased by 17% to 126. The number
of pathways classified as bacterial actually decreased by
12%, as a result of a more accurate taxonomic classifica-
tion of pathways.
class, Generation of
Table 1 lists the species with the largest number of
(meaning that there is experimental evidence for the
occurrence of these pathways in the organism), while
Table 2 describes the distribution of pathways in
MetaCyc based on the taxonomic classification of
associated species. The list of pathways added to
MetaCyc since the last NAR publication is too long to
specify here. For a complete report, see the MetaCyc
Curation of bioenergy pathways
Bioenergy is a rapidly growing area of research that
focuses primarily on biomass conversion and biofuels pro-
duction. To address the needs of the bioenergy research
bioenergy-related pathways and enzymes in MetaCyc,
starting with version 15.1 (released June 2011). Fields
that receive attention are hydrogen production, cellulosic
biomass biosynthesis and degradation, and algal oil pro-
duction. So far we have created seven different hydrogen
biosynthesis pathways, provided upgraded structures and
commentary to many of the cellulosic biomass compo-
nents, such as cellulose, hemicelluloses, xylan, arabinan,
arabinogalactan, arabinoxylan, glucuronoxylan, gluco-
rhamnogalacturonan, and curated pathways for the bio-
synthesis and degradation of several of these polymers by
different organisms. For an example, see cellulose degrad-
ation I (cellulosome).
Table 1. List of species with 18 or more experimentally elucidated pathways represented in MetaCyc (meaning
that there is experimental evidence for the occurrence of these pathways in the organism)
The species are grouped by taxonomic domain and are ordered within each domain based on the number of
pathways (number following species name) to which the given species was assigned. Some pathways may be
labeled with a higher-level taxon, such as genus, if all the species within that genus are thought to have the
given pathway. However, such higher-level taxa are not included in this table.
D744 Nucleic AcidsResearch, 2012, Vol.40,Database issue
Curation of engineered pathways
Since its inception, MetaCyc included only natural
pathways that occur in unmodified organisms. However,
over the years users indicated to us that it would be useful
database. Version 15.5 of MetaCyc (released October
2011) is the first to include such engineered pathways.
To avoid confusion, engineered pathways are clearly
indicated by the title ‘MetaCyc Engineered Pathway’
next to the pathway name. A text line above the
pathway. It does not occur naturally in any known
organism, and has been constructed in a living cell by
In addition, the organisms that contributed enzymes to
the pathway are listed under the description ‘The enzymes
catalyzing the steps of this pathway have been assembled
from the following organisms’. Engineered pathways are
excluded by our PathoLogic software when predicting the
presence of pathways in organism-specific PGDBs.
For an example of an engineered pathway, see pyruvate
fermentation to hexanol.
This isan engineered
Chimeric and conspecific pathways
Users of MetaCyc are familiar with the concept of
superpathways, which are constructed in PGDBs by
combining multiple elements (at least one base pathway
or superpathway, along with additional pathways or reac-
tions) to show relationships between them and depict
a larger portion of the metabolic network within a single
diagram. Although most MetaCyc superpathways consist
of pathways known to occur in the same organism, we
sometimes find it useful to construct superpathways
from pathways that are known to occur in different or-
superpathway can provide an overview of a metabolic
field. For example, combining all the known pathways
for aerobic degradation of aromatic compounds into a
single diagram provides a useful overview of this topic
[see superpathway of aromatic compound degradation
To distinguish such pathways from those that occur in
their entirety in a single organism, we defined the terms
‘conspecific pathways’ and ‘chimeric pathways’.
While a conspecific pathway comprises a set of reactions
that are expected to be found within each organism that
has the pathway, a chimeric pathway comprises reactions
from multiple organisms, and most commonly does not
occur in its entirety in a single organism. Only sections of
chimeric pathways are likely to occur in their entirety in
single organisms. The two types of pathways are treated
differently by the PathoLogic program during the creation
of new PGDBs. When PathoLogic predicts a conspecific
pathway to occur in another organism, the pathway will
be transferred to that organism in its entirety. In the near
future we will enhance PathoLogic so that when it predicts
a chimeric pathway to occur in an organism-specific
PGDB, it will remove extraneous reactions from the
pathway to produce a conspecific version of the pathway.
Conspecific pathways can be either base pathways or
superpathways, while chimeric pathways are always
To alert the user to the fact that a pathway is chimeric
the following note appears above the summary section:
‘This is a chimeric pathway, comprising reactions from
multiple organisms, and typically will not occur in its
entirety in a single organism. The taxa listed here are
likely to catalyze only subsets of the reactions depicted
in this pathway.’ In addition, the pathway’s title states
‘MetaCyc Chimeric Pathway’.
Kinetic data in PGDBs
We have recently more than doubled the number of types
of enzyme kinetic data that can be captured in Pathway
Tools PGDBs. When available, the following types of data
are now collected in newly curated MetaCyc enzymes:
optimal pH, Kivalues for inhibitors and Kmvalues for
Interactions with other databases
IUBMB. MetaCyc is regularly updated with data from
the Nomenclature Committee of the International Union
of Biochemistry and Molecular Biology (NC-IUBMB),
which includes new and modified EC entries. The last sup-
plement incorporated is supplement 17, and the data was
retrieved from the ExplorEnz database (35). In addition,
starting with release 15.0, the EC entries at ExplorEnz are
linked to MetaCyc reaction pages and vice versa.
NCBI taxonomy. The full NCBI Taxonomy database (36)
is integrated into Pathway Tools, enabling specification
of taxa using NCBI Taxonomy, and allowing taxonomic
Table 2. The distribution of pathways in MetaCyc based on the taxo-
nomic classification of associated species
784 Euryarchaeota 125
For example, the statement ‘Tenericutes 19’ means that there is experi-
mental evidence for at least 19 MetaCyc pathways for their occurrence
in members of this taxonomic group. Major Taxonomic groups are
grouped by domain and are ordered within each domain based on
the number of pathways (number following taxon name) associated
with the taxon. A pathway may be associated with multiple organisms.
Nucleic Acids Research, 2012,Vol.40, Database issueD745
querying of MetaCyc pathways and enzymes. We continue
to update the taxonomy entries with each major release of
Gene ontology. The mapping between MetaCyc reactions
and Gene Ontology (GO) process and function terms (37)
is being continuously maintained by the GO Editorial
Office at the EBI. An updated file is at http://www.
Links to other databases. During the last 2 years we have
added extensive links from MetaCyc to PubChem and to
KEGG. In version 15.5 of MetaCyc there are 4014 reac-
tions that contain links to KEGG reactions. MetaCyc
compounds contain 4449 links to KEGG compounds,
8814 links to PubChem compounds and 3800 links to
EXPANSION OF BIOCYC
The BioCyc databases are organized into three tiers.
. Tier 1 PGDBs have received at least 1year of manual
curation. While some Tier 1 PGDBs (e.g. MetaCyc
and EcoCyc) received decades of manual curation
and are updated continuously, others are less well
curated and are still in need of significant curation.
. Tier 2 PGDBs have received moderate amounts of
review (<1year), and may or may not be updated on
an ongoing basis and
. Tier 3 PGDBs were created computationally, and
received no subsequent manual review or updating.
During the past 2 years, the number of BioCyc PGDBs
increased from 508 (version 13.1) to 1129 (version 15.1).
Version 15.5, to be released in October 2011, will include
>1700 PGDBs. The PGDBs AraCyc (A. thaliana col,
curated by PMN) and YeastCyc (S. cerevisiae, curated by
SGD) have been promoted from Tier 2 to Tier 1 status,
and the PGDB HumanCyc (Homo sapiens, curated by SRI)
will be upgraded to Tier 1 starting with release 15.5,
bringing the total of Tier 1 PGDBs to five (along with
EcoCyc and MetaCyc). As of version 15.1, Tier 2
includes 32 PGDBs, and Tier 3 includes 1093 PGDBs.
Some Tier 2 PGDBs were provided by groups outside
SRI. Database authors are identified on the database
summary page (Tools!Reports!Summary Statistics).
SOFTWARE AND WEBSITE ENHANCEMENTS
The following paragraphs describe significant enhance-
ments to Pathway Tools and to the BioCyc website
during the past 2 years.
Web groups—sharing and analysis of object groups
Starting in July 2011, BioCyc includes a new feature called
Web Groups, that extends the web-based interface to
allow end users to create, share and compute with collec-
tions of Pathway Tools objects (Figures 1–3). Web
Groups are a step in the direction of making Pathway
Tools a platform for collaborative computing and know-
A Web Group is a spreadsheet-like structure that can
contain both Pathway Tools objects and other values such
as numbers or strings. Like a spreadsheet, it is organized
by rows and columns. The typical group contains a set of
Pathway Tools objects in the first column (e.g. a set of
genes generated by a search). The other columns contain
properties of the object (e.g. the chromosome position of
each gene), or the result of a transformation (e.g. the re-
actions catalyzed by the gene products, or the correspond-
ing genes from a different organism). The system provides
35 built-in transformations, each of which applies to a
specific type of object. Example transformations include:
transform a group of genes into the group of pathways
containing that gene, or into the group of all genes that
regulate the expression of those genes; transform a group
of pathways into a group of all metabolites that are sub-
strates within the pathway. The transformations can be
applied to columns other than the first, creating a
Web Groups can be created from search result sets, by
importing data from external spreadsheets or text files,
and by adding objects individually from either their web
pages or from the group itself. They can be exported to
spreadsheets, and group columns of the appropriate types
can be exported to the cellular overview. Web Groups can
be shared publicly, or with selected other users.
The Web Groups interface also allows users to apply an
enrichment/depletion analysis to the contents of a group
(Figure 3). Enrichment/depletion analysis enables users to
evaluate over- or under-representation of certain qualities
or traits within an object group—for example, determining
which genes out of a specified gene group are involved in
one or more Gene Ontology categories. To enable this type
of analysis, Pathway Tools includes a statistical analysis
engine that can be applied to the content of groups.
Performing enrichment analysis on a group results in
creation of a new group that contains the analysis results.
Example use cases for Groups:
(1) Users are interested in genes of the trp operon. They
perform a search for genes containing the string ‘trp’,
and turn the results into a group. Some of the gene
names do not seem to contain that string, so the users
add a column for the gene synonyms to see why they
matched. After doing that, the users can see that some
do not belong (e.g. the ribB gene matched because of
the synonym ‘htrP’), so they delete that row from the
group table. They then use a transformation from
genes to their products, adding a column with the
gene products; a second transformation adds a
column containing the reactions that the products
catalyze (Figure 1). Next they use additional trans-
formations to obtain the substrates involved in those
reactions, to create a new group from those substrates,
and to add the molecular structures (Figure 2).
(2) The users have obtained an essential gene list from
experimental investigations. They can define a Web
Group containing those essential genes, and use
group operations to highlight the genes on the
D746Nucleic AcidsResearch, 2012, Vol.40,Database issue
cellular overview to view its metabolic pathway dis-
tribution, or use enrichment analysis to determine
over-represented GO categories.
(3) The users have obtained a set of metabolites of
interest from a metabolomics experiment. They can
perform an enrichment analysis to determine over-
represented metabolic pathways in that group.
New web cellular overview
We have re-engineered the web-based metabolic map
diagrams available via Pathway Tools (38). As for the
desktop version of Pathway Tools, the new web versions
of these diagrams are organism specific, capturing the
unique metabolic pathway complement of each organism,
and are created by automatic layout algorithms (Figure 4).
The diagrams are zoomable and queryable; users can
search for metabolic entities (e.g. metabolites, enzymes
and pathways) by various criteria such as by name and
by EC number. Search results are highlighted on the
diagram to indicate their locations. An omics viewer
mode allows the diagram to be painted with large-scale
data sets such as gene-expression, metabolomics and
reaction flux data. Such displays can be animated (for
data sets containing multiple time points), and are still
zoomable. Omics data can be painted programmatically
using web services (38), and bookmarks can be generated
to save highlighting patterns for later use. Extensive
tooltips are provided to identify metabolites, reactions
and pathways within the diagram on mouse rollover.
Generation of flux-balance models from PGDBs
Pathway Tools now has the ability to generate genome-
scale flux-balance analysis (FBA) models from PGDBs.
Our goals for this effort were to accelerate FBA model
development, and to streamline the interpretation of
modeling results. We achieved those goals in several ways.
In our approach, the PGDB is both a database and an
executable model. Therefore, the user can query, browse
and edit the metabolic model within the PGDB using the
many interactive features of Pathway Tools (such as
reaction and pathway editors). The user programmatically
generates from the PGDB the set of linear equations that
comprise the FBA model, and Pathway Tools invokes the
SCIP (40) linear solver to solve those equations, and then
obtains the results via the SCIP API.
Figure 1. An object group was created from the results of a search of the EcoCyc PGDB for genes containing the text string ‘trp’. After deleting a
few rows of the table, two more columns were added by several transformations performed on the gene group, including the transformation
‘Products of gene’ and the transformation ‘Reaction of gene’.
Nucleic Acids Research, 2012,Vol.40, Database issueD747
Since the FBA modeling is tightly integrated with
Pathway Tools, the user does not need to directly invoke
the linear solver, nor inspect its output files; Pathway
Tools can paint the resulting fluxes onto the Cellular
Overview for visual analysis. In addition, Pathway Tools
guides the user in producing a complete functional model
that produces all metabolites in the biomass equation.
We have developed special capabilities within Pathway
Tools for accelerating the development of FBA models
using a multiple-gap-filling approach. Using past tech-
niques, FBA models typically had development times on
the order of 1year because metabolic network models are
always incomplete at the start of the model development
process, and it is very time consuming to determine how to
extend the model to become functional. Using the new
Pathway Tools functionality, we were able to build FBA
models for the EcoCyc and HumanCyc PGDBs in
?1month each. Pathway Tools uses a meta-optimization
approach to simultaneously suggest a minimal number of
alternative types of model modifications to optimize the
number of metabolites in the biomass equation that the
FBA model is able to produce. The software suggests new
reactions to add to the model from MetaCyc, proposes
reactions within the model whose directions should be
reversed, and suggests additional nutrients and secreted
compounds that can be added to the model. Furthermore,
in contrast to other existing tools, when metabolites
cannot be produced by the model, Pathway Tools
identifies those compounds, allowing the user to focus
model debugging efforts on specific metabolites.
The Pathway Tools FBA module also supports evalu-
ation of single and multiple gene and reaction knock-outs;
genes or reactions whose removal prevents production of
any biomass component are judged to be essential. The
FBA module is available only in the desktop mode of
Pathway Tools, and is not accessible via Pathway Tools
Dead-end metabolite finder
The ability to identify dead-end metabolites is a valuable
method for identifying errors and incompleteness in a
metabolic network, for FBA modeling and other applica-
tions. Dead-end metabolites are compounds that are only
Figure 2. An object group created by several transformations performed on the group shown in Figure 1. The first column contains all substrates
that are included in the ‘Reaction’ column of that table, and the second column shows the structures of these compounds. These columns were
generated using the transformations ‘Substrates of reaction’ and ‘Structures of compound’.
D748 Nucleic AcidsResearch, 2012, Vol.40,Database issue
produced by, or only consumed by, the metabolic network
of an organism. Although such situations sometimes
reflect the correct biology, they usually indicate errors in
the metabolic model. A tool for identifying dead-end me-
tabolites is available in both web (Tools!Dead End
Metabolites) and desktop modes.
Metabolic choke points are metabolites that are either
produced by only a single reaction in the metabolic
network, or are consumed by only one reaction in the
network, and were found to be enriched for anti-microbial
drug targets (41). A tool for identifying metabolic choke
points is available in both web (Tools!Chokepoint
Reactions) and desktop modes.
Web services allow programs to query structured data
from websites, and invoke web computations. Starting
with version 14.5 (Fall 2010) Pathway Tools based
websites provide a number of web services (see http://
. Retrieving XML-structured information about individ-
ual genes, pathways, reactions, metabolites and so on,
. Performing targeted queries that return XML results,
such as retrieving all of the genes or metabolites within
a metabolic pathway,
. Executing queries in the BioVelo (42) language against
. Highlighting sets of objects in the Cellular Overview
. Displaying omics data on the Cellular Overview, on a
table of pathways, and on individual pathways.
BioCyc ortholog data
BioCyc makes extensive use of ortholog data. Examples
for ortholog use in BioCyc include local alignment of a
chromosome region in a multi-genome browser, an option
to show the ortholog of a gene or a protein in another
organism by selecting the command ‘Gene (Protein)!
ShowThisGene (Protein) in Another Database’, and an
editor that allows propagation of annotations from one
PGDB to another, across multiple genes, based on
orthology. Starting with version 15.0, BioCyc ortholog
information is computed in house by running NCBI
BLAST pair-wise searches between all proteomes of all
PGDBs. We consider orthologs as genes that are likely
to be counterparts of one another in two different organ-
isms because they are the most closely related in this pair
of organisms, and we define two proteins as orthologs if
they are the bi-directional best BLAST hits of one
Combined gene/protein/RNA pages
generated separate information pages for genes and their
products. However, we merged these two pages into a
single page because it was confusing to users to remember
which information was contained in which page, and some
users never realized that both types of pages existed. Thus,
a single page now provides information about genes and
their protein or RNA products.
New monoisotopic mass data and search
To facilitate analysis of metabolomics data in BioCyc, we
augmented the compound search form on our website to
allow searching for a list of monoisotopic molecular
weight values, of the type produced by high-resolution
mass spectrometry (starting with release 14.5). The
search can be accessed from the menu item Search->
Compounds (Figure 5) and allows changing the tolerance
Figure 3. Enrichment analysis of Web Groups objects. A group of
Escherichia coli genes was analyzed for enrichment of the genes in
pathways. The resulting table includes a list of pathways, the P-value
for each pathway and the subgroup of genes from the original group
that participate in each pathway. The table has been modified by
removing some rows that represented pathway classes and super-
pathways, leaving only base pathways.
Nucleic Acids Research, 2012,Vol.40, Database issueD749
in ppm increments The search results are presented in a
table that allows easy linking to compound pages, to
simplify the identification of plausible candidates for
each weight value.
Organism selection by taxonomy
One of the challenges in designing the BioCyc website was
to enable easy selection of a PGDB of interest from the
large number of available databases. Previously the
only selection mechanism was based on the name of
the desired organism. Starting with version 15.5, it is
possible to select a PGDB from BioCyc by browsing the
organism taxonomy (Figure 6). In addition, the new
selector window contains an option to display the
Organism Summary page upon PGDB selection. This
page provides background information about the PGDB
Figure 5. (A) Searching HumanCyc for several monoisotopic molecular weights, with specified tolerance of 5ppm. This type of search is useful for
analysis of compounds identified by mass spectroscopy, enabling researchers to find candidate compounds known to exist in the organism, and to
learn about their roles in the metabolic network. (B) The result of the search is a table that includes matching compounds, their monoisotopic mass,
the query mass they match and their chemical formula. The compound name is a hyperlink to the compound’s page, enabling users to quickly learn
about the reactions and pathways in which the compound participates in this organism.
Figure 4. The new Web Cellular Omics Viewer. This figure, showing a Cellular Omics Viewer for the bacterium E. coli, depicts the overlay of a gene
expression data set (39). The level of transcription is indicated by the color of the reactions that are catalyzed by the enzymes encoded by the specific
genes. The legend for mapping colors to data values is not shown in the figure. By hovering the mouse cursor over a compound or a reaction, the
user can create popup windows that provide information and enable navigation to the relevant compound page or to a pathway display.
D750 Nucleic AcidsResearch, 2012, Vol.40,Database issue
such as an author list, the source for the sequence, the
number and type of replicons that were used for creating
the PGDB, the taxonomic lineage of the organism and
relevant publications, as well as some statistics about the
content of the database.
Ports to 64-bit Windows and 64-bit Macintosh platforms
We have ported Pathway Tools to the 64-bit Windows and
Macintosh platforms. Henceforth, 32-bit versions of
Pathway Tools will not be available for those platforms.
We have made many improvements to the Pathway Tools
pathway layout algorithms to improve the aesthetics of
pathway layouts. We have changed the color scales used
in the omics viewers to improve them from a human
factors perspective. We added a signaling pathway editor
to Pathway Tools. We have made many performance im-
provements to the web mode of Pathway Tools.
How to learn more about MetaCyc and BioCyc
The BioCyc.org and MetaCyc.org websites provide
several informational resources, including an online
BioCyc guided tour (http://biocyc.org/samples.shtml), a
guide to the BioCyc database collection (http://biocyc.
guide for EcoCyc (http://biocyc.org/ecocyc/EcoCycUser
Guide.shtml), a Pathway/Genome Database Concepts
and many webinar videos that combine narration with
online demonstration of different topics (http://biocyc.
a guide forMetaCyc
org/webinar.shtml). We routinely host workshops and tu-
torials (on site and at conferences) that provide training
and in-depth discussion of our software for beginning and
advanced users. To stay informed about recent changes
and enhancements to our software, join the BioCyc
mailing list at http://biocyc.org/subscribe.shtml. A list of
our publications is available online (http://biocyc.org/
A variety of additional enhancements are planned. We are
currently working on adding reaction atom mappings to
MetaCyc and other PGDBs. Plans include the addition of
many more genomes to BioCyc, including those from the
Human Microbiome Project, and the addition of more
types of data to BioCyc PGDBs, such as predicted GO
terms and protein localizations.
The MetaCyc and BioCyc databases are freely and openly
available to all. See http://biocyc.org/download.shtml for
download information. New versions of the downloadable
data files and of the BioCyc and MetaCyc websites are
released four times per year.
This article was prepared as an account of work sponsored
by an agency of the US Government. Neither the US
Government nor any agency thereof, nor any of their em-
ployees, makes any warranty, express or implied, or
assumes any legal liability or responsibility for the
accuracy, completeness, or usefulness of any information,
apparatus, product, or process disclosed, or represents
that its use would not infringe on privately owned
rights. Reference herein to any specific commercial
product, process or service by trade name, trademark,
manufacturer or otherwise does not necessarily constitute
or imply its endorsement, recommendation or favoring by
the US government or any agency thereof. The views and
opinions of authors expressed herein do not necessarily
state or reflect those of the US Government or any
National Institute of General Medical Sciences of the
GM077678, GM088849 and GM075742); Department of
DE-SC0004878); National Science Foundation (MetaCyc
curation performed by the Plant Metabolic Network,
grants IOS-1026003 and DBI-0640769). Funding for
open access charge: A grant from the National Institute
of General Medical Sciences of the National Institutes of
Conflict of interest statement. None declared.
Figure 6. The new database selector lets the user select a PGDB either
by typing a name of an organism or by browsing the organism
taxonomy. If the ‘Go to Organism Summary page for selected
database’ box at the bottom of the selector window is checked, the
software will display that page upon selection, providing background
information and statistics about that database.
Nucleic Acids Research, 2012,Vol.40, Database issueD751
1. Caspi,R., Altman,T., Dale,J.M., Dreher,K., Fulcher,C.A.,
Gilham,F., Kaipa,P., Karthikeyan,A.S., Kothari,A.,
Krummenacker,M. et al. (2010) The MetaCyc database
of metabolic pathways and enzymes and the BioCyc collection of
pathway/genome databases. Nucleic Acids Res., 38, D473–D479.
2. Karp,P.D., Paley,S.M., Krummenacker,M., Latendresse,M.,
Dale,J.M., Lee,T.J., Kaipa,P., Gilham,F., Spaulding,A.,
Popescu,L. et al. (2010) Pathway Tools version 13.0: integrated
software for pathway/genome informatics and systems biology.
Brief Bioinform., 11, 40–79.
3. Dale,J.M., Popescu,L. and Karp,P.D. (2010) Machine learning
methods for metabolic pathway prediction. BMC Bioinformatics,
4. Karp,P.D. and Caspi,R. (2011) A survey of metabolic databases
emphasizing the MetaCyc family. Arch. Toxicol., 85, 1015–1033.
5. Christie,K.R., Weng,S., Balakrishnan,R., Costanzo,M.C.,
Dolinski,K., Dwight,S.S., Engel,S.R., Feierbach,B., Fisk,D.G.,
Hirschman,J.E. et al. (2004) Saccharomyces Genome Database
(SGD) provides tools to identify and analyze sequences from
Saccharomyces cerevisiae and related sequences from other
organisms. Nucleic Acids Res., 32, D311–D314.
6. Mueller,L.A., Zhang,P. and Rhee,S.Y. (2003) AraCyc: a
biochemical pathway database for Arabidopsis. Plant Physiol.,
7. Liang,C., Jaiswal,P., Hebbard,C., Avraham,S., Buckler,E.S.,
Casstevens,T., Hurwitz,B., McCouch,S., Ni,J., Pujar,A. et al.
(2008) Gramene: a growing plant comparative genomics resource.
Nucleic Acids Res., 36, D947–D953.
8. Evsikov,A.V., Dolan,M.E., Genrich,M.P., Patek,E. and Bult,C.J.
(2009) MouseCyc: a curated biochemical pathways database for
the laboratory mouse. Genome Biol., 10, R84.
9. Seo,S. and Lewin,H.A. (2009) Reconstruction of metabolic
pathways for the cattle genome. BMC Syst. Biol., 3, 33.
10. Urbanczyk-Wochniak,E. and Sumner,L.W. (2007) MedicCyc:
a biochemical pathway database for Medicago truncatula.
Bioinformatics, 23, 1418–1423.
11. Zhang,P., Dreher,K., Karthikeyan,A., Chi,A., Pujar,A., Caspi,R.,
Karp,P., Kirkup,V., Latendresse,M., Lee,C. et al. (2010) Creation
of a genome-wide metabolic pathway database for Populus
trichocarpa using a new approach for reconstruction and curation
of metabolic pathways for plants. Plant Physiol., 153, 1479–1491.
12. Fey,P., Gaudet,P., Curk,T., Zupan,B., Just,E.M., Basu,S.,
Merchant,S.N., Bushmanova,Y.A., Shaulsky,G., Kibbe,W.A.
et al. (2009) dictyBase - a Dictyostelium bioinformatics resource
update. Nucleic Acids Res., 37, D515–D519.
13. Doyle,M.A., MacRae,J.I., De Souza,D.P., Saunders,E.C.,
McConville,M.J. and Likic,V.A. (2009) LeishCyc: a biochemical
pathways database for Leishmania major. BMC Syst. Biol., 3, 57.
14. May,P., Christian,J.O., Kempa,S. and Walther,D. (2009)
ChlamyCyc: an integrative systems biology database and
web-portal for Chlamydomonas reinhardtii. BMC Genomics,
15. Bombarely,A., Menda,N., Tecle,I.Y., Buels,R.M., Strickler,S.,
Fischer-York,T., Pujar,A., Leto,J., Gosselin,J. and Mueller,L.A.
(2011) The Sol Genomics Network (solgenomics.net): growing
tomatoes using Perl. Nucleic Acids Res., 39, D1149–D1155.
16. Snyder,E.E., Kampanya,N., Lu,J., Nordberg,E.K., Karur,H.R.,
Shukla,M., Soneja,J., Tian,Y., Xue,T., Yoo,H. et al. (2007)
PATRIC: the VBI PathoSystems Resource Integration Center.
Nucleic Acids Res., 35, D401–D406.
17. Cibis,E., Ryznar-Luty,A., Krzywonos,M., Lutoslawski,K. and
Miskiewicz,T. (2011) Betaine removal during thermo- and
mesophilic aerobic batch biodegradation of beet molasses vinasse:
influence of temperature and pH on the progress and efficiency
of the process. J. Environ. Manage., 92, 1733–1739.
18. Scaria,J., Janvilisri,T., Fubini,S., Gleed,R.D., McDonough,S.P.
and Chang,Y.F. (2011) Clostridium difficile transcriptome analysis
using pig ligated loop model reveals modulation of pathways not
modulated in vitro. J. Infect. Dis., 203, 1613–1620.
19. Brown,S.D., Guss,A.M., Karpinets,T.V., Parks,J.M., Smolin,N.,
Yang,S., Land,M.L., Klingeman,D.M., Bhandiwad,A.,
Rodriguez,M. Jr et al. (2011) Mutant alcohol dehydrogenase
leads to improved ethanol tolerance in Clostridium thermocellum.
Proc. Natl Acad. Sci. U.S.A., 108, 13752–13757.
20. Ruiz,J.C., D’Afonseca,V., Silva,A., Ali,A., Pinto,A.C.,
Santos,A.R., Rocha,A.A., Lopes,D.O., Dorella,F.A.,
Pacheco,L.G. et al. (2011) Evidence for reductive genome
evolution and lateral acquisition of virulence functions in two
Corynebacterium pseudotuberculosis strains. PLoS One, 6, e18551.
21. Giannone,R.J., Huber,H., Karpinets,T., Heimerl,T., Kuper,U.,
Rachel,R., Keller,M., Hettich,R.L. and Podar,M. (2011)
Proteomic characterization of cellular and molecular processes
that enable the Nanoarchaeum equitans-Ignicoccus hospitalis
relationship. PLoS One, 6, e22942.
22. Banerjee,R., Vats,P., Dahale,S., Kasibhatla,S.M. and Joshi,R.
(2011) Comparative genomics of cell envelope components in
mycobacteria. PLoS One, 6, e19280.
23. Lamichhane,G., Freundlich,J.S., Ekins,S., Wickramaratne,N.,
Nolan,S.T. and Bishai,W.R. (2011) Essential metabolites of
Mycobacterium tuberculosis and their mimics. MBio, 2,
24. Landeta,C., Davalos,A., Cevallos,M.A., Geiger,O., Brom,S. and
Romero,D. (2011) Plasmids with a chromosome-like role in
rhizobia. J. Bacteriol., 193, 1317–1326.
25. Aggarwal,S., Karimi,I.A. and Lee,D.Y. (2011) Flux-based analysis
of sulfur metabolism in desulfurizing strains of Rhodococcus
erythropolis. FEMS Microbiol. Lett., 315, 115–121.
26. Holder,J.W., Ulrich,J.C., DeBono,A.C., Godfrey,P.A.,
Desjardins,C.A., Zucker,J., Zeng,Q., Leach,A.L.B., Ghiviriga,I.,
Dancel,C. et al. (2011) Comparative and functional genomics of
Rhodococcus opacus PD630 for biofuels development. PLoS
Genet., 7, e1002219.
27. Baumann,K., Dato,L., Graf,A.B., Frascotti,G., Dragosits,M.,
Porro,D., Mattanovich,D., Ferrer,P. and Branduardi,P. (2011)
The impact of oxygen on the transcriptome of recombinant S.
cerevisiae and P. pastoris - a comparative analysis. BMC
Genomics, 12, 218.
28. Burke,G.R. and Moran,N.A. (2011) Massive genomic decay in
Serratia symbiotica, a recently evolved symbiont of aphids.
Genome Biol. Evol., 3, 195–208.
29. Rodrigues,J.L., Serres,M.H. and Tiedje,J.M. (2011) Large-scale
comparative phenotypic and genomic analyses reveal ecological
preferences of Shewanella species and identify metabolic pathways
conserved at the genus level. Appl. Environ. Microbiol., 77,
30. Kim,H.U., Kim,S.Y., Jeong,H., Kim,T.Y., Kim,J.J., Choy,H.E.,
Yi,K.Y., Rhee,J.H. and Lee,S.Y. (2011) Integrative genome-scale
metabolic analysis of Vibrio vulnificus for drug targeting and
discovery. Mol. Syst. Biol., 7, 460.
31. Li,J. and Wang,N. (2011) Genome-wide mutagenesis of
Xanthomonas axonopodis pv. citri reveals novel genetic
determinants and regulation mechanisms of biofilm formation.
PLoS One, 6, e21804.
32. Li,J. and Wang,N. (2011) The wxacO gene of Xanthomonas citri
ssp. citri encodes a protein with a role in lipopolysaccharide
biosynthesis, biofilm formation, stress tolerance and virulence.
Mol. Plant Pathol., 12, 381–396.
33. Jaenicke,S., Ander,C., Bekel,T., Bisdorf,R., Droge,M.,
Gartemann,K.H., Junemann,S., Kaiser,O., Krause,L., Tille,F.
et al. (2011) Comparative and joint analysis of two metagenomic
datasets from a biogas fermenter obtained by 454-pyrosequencing.
PLoS One, 6, e14519.
34. Karp,P.D., Paley,S. and Romero,P. (2002) The Pathway Tools
software. Bioinformatics, 18(Suppl. 1), S225–S232.
35. McDonald,A.G., Boyce,S. and Tipton,K.F. (2009) ExplorEnz:
the primary source of the IUBMB enzyme list. Nucleic Acids
Res., 37, D593–D597.
36. Sayers,E.W., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K.,
Chetvernin,V., Church,D.M., DiCuccio,M., Edgar,R., Federhen,S.
et al. (2009) Database resources of the National Center for
Biotechnology Information. Nucleic Acids Res., 37, D5–D15.
37. Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H.,
Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T.
et al. (2000) Gene ontology: tool for the unification of biology.
The Gene Ontology Consortium. Nat. Genet., 25, 25–29.
D752 Nucleic AcidsResearch, 2012, Vol.40,Database issue
38. Latendresse,M. and Karp,P.D. (2011) Web-based metabolic
network visualization with a zooming user interface. BMC
Bioinformatics, 12, 176.
39. Tao,H., Bausch,C., Richmond,C., Blattner,F.R. and Conway,T.
(1999) Functional genomics: expression analysis of Escherichia coli
growing on minimal and rich media. J. Bacteriol., 181,
40. Achterberg,T. (2009) SCIP: solving constraint integer programs.
Math. Program. Comput., 1, 1–41.
41. Yeh,I., Hanekamp,T., Tsoka,S., Karp,P.D. and Altman,R.B.
(2004) Computational analysis of Plasmodium falciparum
metabolism: organizing genomic information to facilitate drug
discovery. Genome Res., 14, 917–924.
42. Latendresse,M. and Karp,P.D. (2010) An advanced
web query interface for biological databases. Database, 2010,
Nucleic Acids Research, 2012,Vol.40, Database issueD753