BookPDF Available

Software Platform for Metabolic Network Reconstruction of Mycobacterium tuberculosis

Authors:

Abstract and Figures

Tuberculosis (TB) is one of the major infectious diseases still prevailing on this planet. Emergence of drug resistant strains and problems of current treatment ­regimen warrant need for new drugs for TB. At the same time, economic factor plays a significant role as most patients are in the lowest income bracket of the society. This implies new drugs have to be developed in an innovative manner that allows delivery of drugs at low cost. Drug discovery is in general an expensive and capital-intensive process. A new type of big science is emerging that involves knowledge integration of small sciences as well as coordinating community-based participation. Social dynamics plays critical role in making project successful because open collaboration involves participants with diverse motivations and interests. Thus, proper “social engineering” will play greater role in scientific project planning and management in future. Open Source Drug Discovery (OSDD), initiated by Council for Scientific and Industrial Research (CSIR) of India, is one of such projects aiming at the development of drugs for TB. The fact that drug discovery is a competitive space, bringing in ­openness and collaboration through e-community-based approach is a challenging task. This article describes the international collaboration among OSDD, the Systems Biology Institute (SBI: Japan), and Okinawa Institute of Science and Technology (OIST: Japan) for reconstruction of a comprehensive and high-precision map of ­metabolic network of Mycobacterium tuberculosis (mTB) through a virtual ­collaborative space. The fact that OSDD involved large number of non-experts guided by experts in the process further sets it apart from other existing ways of addressing scientific problems of this scale
Content may be subject to copyright.
21
J. McFadden et al. (eds.), Systems Biology of Tuberculosis,
DOI 10.1007/978-1-4614-4966-9_2, © Springer Science+Business Media, LLC 2013
Chapter 2
Software Platform for Metabolic Network
Reconstruction of Mycobacterium tuberculosis
Samik Ghosh , Yukiko Matsuoka , Yoshiyuki Asai , Hiroaki Kitano ,
Anshu Bhardwaj , Vinod Scaria , Rohit Vashisht , Anup Shah ,
Anupam Kumar Mondal , Priti Vishnoi , Kumari Sonal ,
Akanksha Jain , Priyanka Priyadarshini , Kausik Bhattacharyya ,
Vikas Kumar , Anurag Passi , Pratibha Sharma, and Samir Brahmachari
S. Ghosh
The Systems Biology Institute , Tokyo 108-0071 , Japan
Y. Matsuoka
The Systems Biology Institute , Tokyo 108-0071 , Japan
ERATO Kawaoka Infection-Induced Host Response Project,
Japan Science and Technology Agency , Tokyo 108-8639 , Japan
Y. Asai
Okinawa Institute of Science and Technology , Okinawa 904-0495 , Japan
H. Kitano (*)
The Systems Biology Institute , Tokyo 108-0071 , Japan
Okinawa Institute of Science and Technology , Okinawa 904-0495 , Japan
Sony Computer Science Laboratories, Inc , Tokyo 141-0022 , Japan
e-mail: kitano@sbi.jp
A. Bhardwaj R. Vashisht A. Jain P. Priyadarshini A. Passi
Council of Scienti fi c and Industrial Research ,
2 Ra fi Marg , Delhi 110001 , India
V. Scaria A. Shah A. K. Mondal P. Vishnoi K. Sonal K. Bhattacharyya
V. Kumar P. Sharma
CSIR-Institute of Genomics and Integrative Biology ,
Mall Road , Delhi 110007 , India
e-mail: skb@igib.res.in
S. Brahmachari (*)
Council of Scienti fi c and Industrial Research , 2 Ra fi Marg , Delhi 110001 , India
CSIR-Institute of Genomics and Integrative Biology , Mall Road , Delhi 110007 , India
Samik Ghosh and Anshu Bhardwaj are Joint rst authors.
Tuberculosis (TB) is one of the major infectious diseases still prevailing on this
planet. Emergence of drug resistant strains and problems of current treatment regimen
warrant need for new drugs for TB. At the same time, economic factor plays a
22 S. Ghosh et al.
signi cant role as most patients are in the lowest income bracket of the society. This
implies new drugs have to be developed in an innovative manner that allows delivery
of drugs at low cost. Drug discovery is in general an expensive and capital-intensive
process. A new type of big science is emerging that involves knowledge integration
of small sciences as well as coordinating community-based participation. Social
dynamics plays critical role in making project successful because open collaboration
involves participants with diverse motivations and interests. Thus, proper “social
engineering” will play greater role in scienti c project planning and management in
future. Open Source Drug Discovery (OSDD), initiated by Council for Scienti c and
Industrial Research (CSIR) of India, is one of such projects aiming at the development
of drugs for TB. The fact that drug discovery is a competitive space, bringing in
openness and collaboration through e-community-based approach is a challenging
task. This article describes the international collaboration among OSDD, the Systems
Biology Institute (SBI: Japan), and Okinawa Institute of Science and Technology
(OIST: Japan) for reconstruction of a comprehensive and high-precision map of
metabolic network of Mycobacterium tuberculosis (mTB) through a virtual
collaborative space. The fact that OSDD involved large number of non-experts guided
by experts in the process further sets it apart from other existing ways of addressing
scienti c problems of this scale.
1 Issues in Drug Discovery for Tuberculosis
Tuberculosis (TB) is still a major killer in developing countries as 9.4 million new
patients are reported in 2009 globally and signi cant percentage of them are multi-drug
resistant TB and some are extensively drug resistant [ 1 ] , but only a handful of drug
discovery projects exist due to economic affordability of patients mismatching against
possible investment for the development of such drugs. Unless, cost effective drug
development can be achieved, those who suffer from neglected diseases will have no
hope for new treatment and fast recovery. Developing technologies to signi cantly
mitigate these problems is socially signi cant.
However, we are facing the reality that there are diseases where effective cure is not
generally available for extreme drug resistance strain of mTB. Even if the drug is devel-
oped, the cost of drug discovery may be too high to make it widely available for poorest
segment of patient population. Cost of developing drug is too high not only due to high
failure rate and long duration from discovery stage to approval, but also the R&D cost
of each product is high (an estimated average cost of $454 million per product [ 2 ] ).
Considerable efforts have to be made to signi cantly improve drug discovery
productivity and to deliver drugs at affordable cost. It is critical for R&D productivity
that in-depth understanding of complexity of biological systems and means to
predict potential outcome of candidate compounds when used in cells, model
animals, and patients [ 3, 4 ] to be better established. Proper introduction of systems
biology approach to drug discovery is expected to rectify the situation by providing
better understanding of biology behind diseases at system-level and ultimately by
enabling us to use precision computational models of cells, organs, and patients.
23
2 Software Platform for Metabolic Network Reconstruction…
Introduction of precision modeling may open up improved productivity for
combinatorial drug design including re-purposing of existing drugs. An apparent
problem of combinatorial drugs is huge search space intrinsic from combinatorial
nature of such an approach. Brute force approaches will be too expensive for
combination of more than three components. Enabling systematic computational
identi cation of possible combination at early stage is critical for productivity
improvement. This is critically important in addressing the urgent medical needs
where potential opportunities and early cases have been reported [
5– 8 ] . National
Cancer Institute is carrying out a large-scale systematic study on combinatorial drugs
and initial results seem to be promising [ 9 ] . Many of CNS related diseases and drug
resistant infectious diseases might require multiple points of interventions as well
[ 10 ] . Improved ef ciency of combinatorial drug discovery pipeline, possibly enabled
by precision modeling approach, provides exciting opportunities.
An interesting opportunity in combinatorial drug is that it may be possible to
discover novel combinations of existing drugs for new indications. For example,
there is an interesting study that demonstrated the combined use of chlorpromazin
(antipsycotics agent) and pentamidine (antiparasitic agent) can be equally or more
effective than paclitaxel for a speci c cancer [ 6 ] . It is interesting to note that the cost
of paclitaxel in Japan is about 44,000 Yen (about 450USD) for 100 mg, whereas cost
for chlorpromazine is about 9 Yen and pentamidine is about 2,800 Japanese Yen. The
point is not whether this speci c combination is effective or not at the end. The point
is there will be numbers of such combinations that can create drugs at signi cantly
lower cost. Combined price is less than 3,000 Yen that is almost 1/10 of paclitaxel.
This opens up tremendous opportunities not only for industrial countries, but also for
developing countries where cost of drug is a critical issue. It should be noted that
patents of many drugs expire sooner or later that implies options for combinations at
lower cost will continue to increase in future [ 11 ] .
Furthermore, such an approach may open up a new opportunity to reuse compounds
that were failed in clinical and preclinical trials as one of the compounds in combinatorial
drugs. It is possible compounds that have not been as effective as expected as a
mono-therapy drug may be re-purposed in the context of combination. With new criteria
for combinatorial drug context, it is possible that the entire compound library may have
to be revisited.
Although combinatorial drug is a promising approach, the issue is how to discover
effective combination at practical ef ciency. This is potentially a combinatorial
explosion process, and without innovative scheme it would only result in a low
ef ciency hit-and-error process. Random screening is too inef cient for this approach
and may not capture interesting synergetic combinations. An exhaustive screening has
been tried for the two-component combination, but it was limited to combination
among 1,200 candidate compounds and dif cult to be scaled for multiple component
combinations [ 6 ] . Most likely such an undertaking would be far beyond the
capability of a single pharmaceutical company or publicly funded projects. Thus, it
is essential that computational approach to be established at the practical level that
can predict possible combinations for further study.
24 S. Ghosh et al.
2 Knowledge Integration: A Challenge in Precision Modeling
The challenge of developing precision biological models is that it requires integration
of knowledge and data at all levels from genomics and proteomics to imaging and
physiology. While various data from high-throughput experiments provide us with
genome-wide characteristics, understanding detailed mechanisms has to depend on
individual “small sciences.” It is unfeasible to obtain such knowledge by a single
large-scale project due to both nancial and sociological reasons. Each researcher is
interested in a speci c aspect of biology using organisms that they think are most
suitable for the study. Due to diversity of biological systems, choice is diverse and
researchers make a choice for good reasons. Even if one obtained a large-scale
funding, it is not practical to force suf ciently large numbers of researchers to put
their systems aside and work on a new species and biological problems.
At the same time, using existing resources such as pathway databases for modeling
is not a solution either. Although these pathway databases are developed mainly based
on manual curation of publications, it does not mean they are well covered or accurate.
Since pathway databases have to cover broad range of pathways, each pathway is
curated and represented with limited coverage, depth, and accuracy. Current “Gold
Standard” is manually curated models carefully build based on the literature and vari-
ous data resources by a small group of people who spend months on the same pathway
to the extent that they acquire in-depth insights on the pathway [
12 ] . This is what we
call “deep curation” as exempli ed by a series of comprehensive molecular interac-
tion maps [ 13, 14 ] . However, the deep curation of large-scale network maps from the
literature is extremely labor-intensive and stressful work. Also, it is very dif cult to
maintain motivations to continuously up-dating maps and models to keep up with the
new discoveries for many years. Automated literature mining has been extensively
investigated, but nowhere near the stage to replace human curators. At the same time,
quality control dependent on the individual groups, and updating and xing errors can
be slowed by this centralization. How we can solve this problem impacts productivity
and practicality of computational approach for drug discovery for wider targets.
3 Needs for Virtual Big Science
Scienti c projects with large funding to achieve de ned mission are often called
Big Science. Successful big science projects shall have clearly de ned goals, pos-
sible means to achieve it, and strong social justi cations to support such endeavor
through public funding. At the same time, what type of project can be supported
widely beyond scienti c community depends on social needs at that time. While
most biology has been and continue to be small science, the human genome project
and a series of large-scale genome projects can be considered big science in biology.
A typical characteristic of big science is a project with large-scale engineering to
support a speci c scienti c discovery. Big science is not a new phenomenon. The
25
2 Software Platform for Metabolic Network Reconstruction…
legendary Manhattan Project, and more recently the Large Hadron Collider the
Human Genome Project connected experts by bringing them together physically.
These projects are essentially equipment driven data acquisition projects, and such
project will continue to provide us with new ndings by improving measurement
equipment. There are desires to obtain comprehensive understanding of speci c
cellular systems and biological processes using high-throughput measurements so
that comprehensive picture of biological systems can be observed from a speci c
aspect. Emergence of systems biology as mainstream biology is accelerating this
tendency as it often requires measurements and analysis of various large-scale and
multi-faceted data. At the same time, the reality is that new knowledge critical for
in-depth and precise understanding is often derived from small science. This means
that a new type of big science is needed that consolidates data and knowledge not
only from large-scale projects, but also from discoveries by small science. Thus, it
is inevitable to form a “virtual big science” by connecting large numbers of research-
ers around the globe to attain large-scale knowledge integration in an emergent
manner. The implication of this is that the initiative needs to have a widely accept-
able objectives, leadership, and proper sociological design to make it sustainable.
Some of the more recent genome annotation jamborees have also followed a simi-
lar approach. However, the growth of Internet brought in a paradigm shift in imple-
menting large collaborative projects. Galaxy Zoo (
http://www.galaxyzoo.org/ ) is one of
the pioneers in using Internet for launching the rst and largest ever citizen-science
experiment involving non-experts in classifying galaxies. This collaborative approach
has contributed mightily to the outcome of this project. Translating this concept to IPR
intensive areas to solve challenging scienti c problems seems to be a promising path
towards speedy and affordable solutions. The OSDD project of the Council of
Scienti c and Industrial Research (CSIR), India, has taken a similar approach for
solving challenges in drug discovery process [ 15 ] . Similar experiments, involving
larger e-community, are being done with challenging problems like protein folding
( http://www.ncbi.nlm.nih.gov/pubmed/20686574 ). However, drug discovery is a very
competitive space making it even more challenging to open it for global participation
through virtual communities. The following discussion elaborates more on the pro-
cess and the framework designed to achieve these goals.
4 Open Source Drug Discovery for Tuberculosis
Drug discovery for neglected diseases can be a successful emergent collaborative
project. It carries a good cause, socially appealing, needs collective efforts, and
participants will have a sense of pride for their contribution. Towards this OSDD
project was initiated by CSIR of Indian government [ 16 ] . OSDD targets drug discovery
for Tuberculosis through open collaboration realizing that con dentiality and IP rights
slows down the drug discovery process and makes it extremely expensive due to
large-scale failure of discovery effort going from hit to lead to preclinical and clinical
trials. The OSDD project was conceived and designed to make drug discovery cost
26 S. Ghosh et al.
effective by distributed co-creation in an open source mode. As one of the many
projects, OSDD launched the Connect to Decode Project ( http://c2d.osdd.net ) as an
open call to comprehensively re-annotate the genome of M. tuberculosis to facilitate
its systems level understanding. A large number of students and researchers registered
for the project, pan India as well as overseas, and contributed on a voluntary basis.
Over 830 researchers and students participated in ve components of the project.
More than 400 students curated literature (>10,000 published literature) for Mtb
genome across ve components, namely, Interactome/Pathway Annotation (IPW), TB
Gene Ontology (TBGO), Glycomics, Structural/fold annotation, and Immunome.
One of these components, IPW, was designed to achieve two goals, the rst was to
create a protein–protein functional interaction network and the second to reconstruct
a comprehensive and detailed metabolic map of Mtb discussed here. Due to the nature
of the project, Indian students involved are highly passionate and motivated in
contributing for the common goal. The desire to learn and excel in resource limited
setting further fuels the motivation for contributing to the project. Unlike past efforts
for collective pathway reconstruction and curation such as yeast metabolic map, any
volunteer researchers and students can join the effort. Thus this project has distinctive
open-endedness in both quantity and quality of participation. This is a novel model
where motivation was the key as opposed to incentive and sets an example towards a
novel social engineering model for involving large communities in solving challenging
problems. However, the best contributors from all ve components were shortlisted
based on their contributions and were awarded a net-book for their contribution (India
800 foundation). Thus the OSDD project has become an emotional enterprise rather
than a professional enterprise.
This project satis es some of the criteria for successful emerging projects such
as clear and appealing goal, motivations of participants, nancial support, and
willingness to address a global issue. The distributed and collective genome
re-annotation was indeed a social experiment. With this experiment, a large-scale
distributive reconstruction of biological networks was shown to be possible with
a proper software platform, well-de ned work ow, and project management when
the objectives of the project designed to motivate potential participants
(Fig.
2.1 ).
5 OSDD Mtb Metabolome Challenge
The OSDD Mtb Metabolome challenge involved manually mining literature on Mtb
research, speci cally on the metabolic reactions involved towards understanding
the function of enzymatic proteins at the global scale. While similar efforts have
been reported in the past, the OSDD Mtb Metabolome challenge engendered a
unique open source community collaborative platform involving researchers and
student volunteers from across India and worldwide who worked to mine knowl-
edge buried in the experimental studies and unlock relevant information for each
enzyme and metabolites systematically. The process involved well-de ned protocols
27
2 Software Platform for Metabolic Network Reconstruction…
and work-packages for knowledge acquisition, representation, peer-review by stu-
dents under active guidance from experts in the eld. The overall roadmap of the
challenge is schematically elucidated in Fig. 2.2 .
While the collection of individual metabolic reactions and enzymes for the
knowledge aggregation phase of map reconstruction, their integration in a global
context is the key towards understanding the network dynamics at a systems level.
Towards this direction, metabolome challenge project, in collaboration with The
Systems Biology Institute, Japan and Okinawa Institute of Science and Technology
(OIST), Japan ( http://www.oist.jp ), employed a systems biology computational
platform for reconstructing a large-scale, standard compliant metabolic map.
The integrated platform, as outlined in Fig. 2.3 , involved two key computational
tools— CellDesigner (developed by Kitano’s group under JST, NEDO, and MEXT
funding) is a graphical molecular network editing and analysis software suite that
complies with Systems Biology Markup Language (SBML) and Systems Biology
Graphical Notation (SBGN) standards. CellDesigner has been successfully used to
create large-scale molecular interaction maps based on literature curation ( http://
www.celldesigner.org/models.html ).
The need to extend the computational tools to an online, community collaboration
paradigm motivated the use of Payao , by Kitano group and OIST. Payao is a web-
based biological pathway sharing and tagging service ( http://www.payaologue.org ).
Fig. 2.1 A scene from Connect2Decode onsite session at New Delhi, India. Over 200 participants
get together for a week for the nal assembly of the metabolic map. Each student is given a laptop
and hands-on training for assembling the metabolic reactions into pathways using cell designer
(
http://c2d.osdd.net ; http://osdd-c2d.blogspot.com/ ). Detailed work ow is in Figure 2.1
28 S. Ghosh et al.
It aims to provide a Google Maps ( http://www.maps.google.com ) equivalent for
biological pathways, wherein researchers can share large-scale, curated, and
annotated network maps using software like CellDesigner and publish it to the
community online. Both tools employ systems biology standards for knowledge
representation and exchange, namely, The SBML ( http://sbml.org ), a set of standards
Fig. 2.2 The Mtb Metabolome Challenge ( http://c2d.osdd.net )
Fig. 2.3 Integrated map curation platform deployed for the Mtb Metabolome Challenge
29
2 Software Platform for Metabolic Network Reconstruction…
developed to facilitate effective and ef cient sharing of models de ned as a set of
biochemical reactions, and The SBGN ( http://sbgn.org ), a common graphical
representation standard in the life sciences.
Based on the integrated curation platform, systematic work ow was employed
in close global partnership between India and Japan teams, to collect, mine, con-
struct, annotate, and publish the metabolic map, as outlined in Fig. 2.4 .
A project-wide distributed reconstruction has been tested using CellDesigner
network editing software [ 18 ] that was provided by the Systems Biology Institute,
Tokyo, Japan, as a part of agreement with CSIR.
Participants used Google documents to collect and curate data on the metabolic
reactions of Mtb following standard operating protocols and manuals in data format
amenable for programmatic processing. This data is then converted into SBML les
using the Connect to Decode plug in developed by SBI. Different SBML les for each
pathway were drawn and then merged using CellDesigner to generate the complete
map. Thus, the interactions are merged and re-layed out in CellDesigner . Multiple
iterations of pathway construction and integration took place. After a few month of
distributed curation session, everyone from Indian side and Tokyo side got together in
Delhi for a week for the nal assembly of the entire network. Final draft network
covers the TB metabolic network with around 1,394 genes in 13 meta-pathways.
The data curated at each level is shared with the scienti c community using the
OSDD portal.
Fig. 2.4 Distributed pathway reconstruction work ow. Republished with permission from Nature
Publishing Group: Kitano, H., Ghosh, S., and Matsuoka, Y., Social engineering for virtual “big
science” in systems biology, Nature Chemical Biology, 7 (6) 323–326, 2011 [
17 ]
30 S. Ghosh et al.
6 Software Platform for Open Collaborative Network
Reconstruction
One of the main challenges in biomedical research is the vast quantity of data, and
scattered pieces of knowledge that have to be all integrated to make sense and be
useful. It is not possible for a human to extract useful knowledge or integrate them
coherently without systematic aids from computational tools. Thus, computational
tools are critically important in systems biology.
Software platforms have transformed industries such as aviation, movies and
entertainment, electronics, and others by drastically improving productivity and by
offering new capabilities. Biological sciences are not different. In particular, the
success of systems biology, and its application areas such as systems drug design,
leverages on sophisticated data handling, modeling, integrated computational anal-
ysis, and knowledge integration.
A cornerstone in open collaborative biomedical research is the development of
sophisticated computation tools and services. Particularly, instead of stand-alone
and disparate components, software need to be integrated in an end-to-end platform
architecture to leverage different databases, experimental data, and knowledge gener-
ated at multiple scales of research. As outlined in Fig. 2.5 , systems biology platform
needs to build on community-driven standards, pathway and network modeling,
together with community collaborative platform which empower social engineering
in virtual big science projects.
Fig. 2.5 Software platforms for systems biology
31
2 Software Platform for Metabolic Network Reconstruction…
While several reviews have focused on standardization and software platforms to
drive large-scale systems biology research, in this article, we delve into community
collaborative platforms, particularly in the context of open innovations for drug
discovery in neglected areas like tuberculosis.
As explained earlier, creating an extensive model of gene-regulatory and
biochemical networks with the latest data is a painstaking task. Curation is essential
to create an accurate model. Yet as science and technology advances rapidly, once
curated models soon become out-of-date and need to be revised constantly. Many
pathways and networks are now available online via pathway database, such as
Reactome, BioModels.net, Panther Pathways, and many pathway editors are
available [ 12 ] . What is needed is a framework to facilitate tracking and update
mechanism for modelers and researchers in the community to contribute to the
collaborative model building and curation process.
WikiPathways [ 19 ] is an effort for such a collaborative platform in the Wiki
style. While the Wiki system has its strength in collaborative editing and version
tracking, it does not provide access control or explicit community tagging mecha-
nisms. In a community-driven model enrichment environment, it is effective to dif-
ferentiate privileges to special interest group (SIG) members for curation
activities—commenting on existing tags, adding tags to models, annotating
individual component inside a model, and validating the annotations. In view of the
complexity of biological pathways and the expertise of biologists in different areas,
a community platform for biology requires an exquisite balance of federated
resource sharing and quality control of information by a SIG of experts in the
particular pathway or process. An access-control privilege system allows the com-
munity to share and disseminate the knowledge, while enabling a dedicated SIG to
maintain high-quality, curated information.
Payao , developed jointly by The Systems Biology Institute, Tokyo, Japan and
OIST, Japan, is a community-driven molecular pathway curation framework. The
system is named after a sh aggregating device, an arti cial oating raft where sh
congregate, and popular in Okinawa/Philippine area. Payao aims to become a bio-
logical knowledge aggregating system, which enables a community to work on the
same models simultaneously, insert tags as pop-up balloon to the parts of the model,
exchange comments, record the discussions, and eventually update the models
accurately and concurrently.
Payao serves for enrichment phase of the curation. It is a web-based platform,
providing an interface for adding tags and comments to the components (such as
Species, Reactions, and speci ed area) of the model, as well as community
management functionality. The information on the users and tag data are stored in a
relational database on the server.
Payao adopts community standards, accepting SBML [ 20 ] format models and
displays them in SBGN [ 21 ] compliant CellDesigner [ 18 ] graphical notation. As
Payao accepts pathway models stored in SBML format and uses CellDesigner APIs
(Application Program Interface) for visualization, the most suitable SBML editor
for Payao is CellDesigner . In SBML format, models can capture details of
32 S. Ghosh et al.
biochemical process descriptions, not only protein–protein interactions. Adopting
SBML format enables the models to be easily used as the base of computational
data analysis or simulation of dynamic behaviors. The Payao platform enriches the
model curation process by providing a host of features for user management, tag-
ging, and model updates.
Forming a community is an important step for curation. Different expert groups
can contribute variety of information to the model. As Web-based Payao can be
accessible from all physical locations, it enables experts across the world to com-
municate in a collaborative curation effort.
Community is formed around a pathway model. It is the model owner who sets
access control over the registered model. In the Payao system, access controls can
be set by specifying the privileges to individuals as well as to user-categories, such
as guest, login user, and model user (who are invited to access the model by the
model owner). This enables a user to stage the curation process; initiate the curation
within a small group (e.g. SIG), then switch the access control of the model for
public viewing.
The tagging on the visually represented pathways is a characteristic of Payao,
which makes the curators easy to grasp the nature of the pathway while discussing
on the speci c component of the pathways. Like Google Maps, tags are displayed
in a bubble form attached to the items (Species, Reactions, or any speci ed area),
and click to expand and display the content of the information in the tag. Tags can
be speci c keywords, links, PubMed IDs as well as free text.
The Mtb metabolic map curated using Google docs and merged using Cell
Designer was converted into SBML and is being made available through the Payao
platform for larger community access and tagging.
7 Sustainability of Large-Scale Interaction Map Development
The challenge remains how to design and generate platform that enable continuous
updates of the new data into the system and subsequent quality checks towards bet-
ter annotation and data analysis.
We should carefully look at reasons why projects like Wikipedia and Linux took
off and keep ying. In case of Linux, there was a hacker culture that support open
source and sharing of knowledge as signi ed by Free Software Foundation by
Richard Stallman where contribution to the community at large was the pride of the
hackers. At the same time, there was practical need to develop open source operat-
ing systems as opposed to closed commercial systems such as Microsoft Windows.
Among the several efforts for open source operating systems such as FreeBSD,
Linux survived mainly because it hit the right moment and had more applications
and publications than other initiatives. Thus, if it was not Linux, then it could
have been FreeBSD or other initiative that lled this space. Wikipedia essentially
inherit similar culture. The goal that is widely shared and exciting, and a sense of
33
2 Software Platform for Metabolic Network Reconstruction…
participation have been key factors driving the community-based initiative. While
these motivations are sustainable over time is yet to be seen, it was indeed effective
to get things started and matured enough.
The biological community too is getting infected with the Web 2.0 concept as it
is being generally realized that biological problems are too complex to be solved
and translated into public good by any one individual or organization. A formalized
recognition system which may include micro-impact factors or micro-attribution for
contributions may be a very effective way of encouraging more participation as
done in Sysborg2.0, the OSDD collaborative platform [
12 ] . Such index or credit
should also become part of the merit system as is the case with Citation index which
is a widely used measure of scienti c contributions. For micro-attribution to be
accepted in the merit system, it has to acquire universality as the currency in
scienti c evaluation. Simply asking for contribution and assigning micro-attribu-
tions assume that people are motivated by individual bene ts. It is generally true to
assume that, but it is also weak motivation factors to be a successful project. In the
most successful project, people are driven by the vision, passion, and dedication
aligned with individual aspiration to future. In OSDD there is a serious attempt
towards addressing these issues by giving due credit to signi cant contributions
through Sysborg2.0 by including the contributors as authors in publications also.
We have followed this approach in various publications that have originated from
community-driven projects [ 13, 22 ] . However, there are critical issues in community
building including reaching a critical mass of active members to ensure sustainability
and community-driven feedback loops. OSDD Community as of now stands at more
than 5,000 registered users from more than 130 countries indicating development of
a self-sustaining group. At any given time more than 10 % of the participants are
actively involved.
8 Can We Scale-Up?
A related question to sustainability of open collaboration is the issue of scale. The same
skepticism of motivation and culture may apply. However, if the project frames the
mission in a socially appealing and attractive way, there may be a chance that broader
participation can be expected. Due to social signi cance of nding effective cure for
drug resistant TB, it may attract those who are willing to contribute even without per-
sonal or professional bene ts, and they may simply be pride of being a part of it.
It should be noted that such collaborations are possible now due to development
of various standards and software that comply with such standards. Standards like
SBML [ 20 ] , SBGN [ 21 ] , and BioPAX [ 22 ] and tools and platforms like TBrowse
[ 23 ] and Sysborg2.0 [ 15 ] ensure a certain level of interoperability. However,
technology alone cannot make things work, particularly when projects have to
involve large numbers of interested parties with varying motivations, carrier
aspirations, and opinions. The OSDD Community is a large group of researchers
and students with heterogeneous expertise and interests. The OSDD portal provides
34 S. Ghosh et al.
a common platform for facilitating interactions among the members which enables
identi cation of complementary skills and interests for fruitful collaborations and
quick outcomes. The successful implementation of the Connect to Decode Phase I
project has lead to launch of its second phase ( http://c2d.osdd.net ) where models for
predicting anti-tuberculosis property are already published [ 24 ] . Thus, broader
social consideration can be a major “key for success” when launching increasingly
complex projects [ 25 ] . Social engineering will be recognized as an indispensable
part of research activity in coming years for large-scale and complex big sciences
because it is the people who do science, not technology or machines.
Acknowledgments The Indian team is fully supported by CSIR/OSDD, India. The Japanese
team is, in part, supported by funding from the HD-Physiology Project of the Japan Society for the
Promotion of Science (JSPS) to the Okinawa Institute of Science and Technology (OIST), and the
International Strategic Collaborative Research Program (BBSRC-JST) of the Japan Science and
Technology Agency (JST), the Exploratory Research for Advanced Technology (ERATO)
programme of JST to the Systems Biology Institute (SBI).
References
1. Donald PR, van Helden PD (2009) The global burden of tuberculosis—combating drug
resistance in dif cult times. N Engl J Med 360:2393–2395
2. PricewaterhouseCoopers (2007b) Pharma 2020: Virtual R&D—which path will you take?
3. FDA (2004) Challenge and opportunity on the critical path to new medical products,
http://
www.fda.gov/ScienceResearch/SpecialTopics/CriticalPathInitiative/CriticalPathOpportunities
Reports/ucm077262.htm
4. PricewaterhouseCoopers (2007a) Pharma 2020: the vision—which path will you take?
5. Apsel B, Blair JA, Gonzalez B, Nazif TM, Feldman ME, Aizenstein B, Hoffman R, Williams
RL, Shokat KM, Knight ZA (2008) Targeted polypharmacology: discovery of dual inhibitors
of tyrosine and phosphoinositide kinases. Nat Chem Biol 4:691–699
6. Borisy AA, Elliott PJ, Hurst NW, Lee MS, Lehar J, Price ER, Serbedzija G, Zimmermann GR,
Foley MA, Stockwell BR et al (2003) Systematic discovery of multicomponent therapeutics.
Proc Natl Acad Sci USA 100:7977–7982
7. Hopkins AL (2008) Network pharmacology: the next paradigm in drug discovery. Nat Chem
Biol 4:682–690
8. Kitano H (2007) A robustness-based approach to systems-oriented drug design. Nat Rev Drug
Discov 6:202–210
9. Kummar S, Chen HX, Wright J, Holbeck S, Millin MD, Tomaszewski J, Zweibel J, Collins J,
Doroshow JH (2010) Utilizing targeted cancer therapeutic agents in combination: novel
approaches and urgent requirements. Nat Rev Drug Discov 9:843–856
10. Lim J, Hao T, Shaw C, Patel AJ, Szabo G, Rual JF, Fisk CJ, Li N, Smolyar A, Hill DE et al
(2006) A protein-protein interaction network for human inherited ataxias and disorders of
Purkinje cell degeneration. Cell 125:801–814
11. Taneja B, Yadav J, Chakraborty TK, Brahmachari SK (2009) An Indian effort towards
affordable drugs: “generic to designer drugs”. Biotechnol J 4:348–360
12. Bauer-Mehren A, Furlong LI, Sanz F (2009) Pathway databases and tools for their exploita-
tion: bene ts, current limitations and challenges. Mol Syst Biol 5:290
13. Caron E, Ghosh S, Matsuoka Y, Ashton-Beaucage D, Therrien M, Lemieux S, Perreault C,
Roux PP, Kitano H (2010) A comprehensive map of the mTOR signaling network. Mol Syst
Biol 6:453
35
2 Software Platform for Metabolic Network Reconstruction…
14. Oda K, Kitano H (2006) A comprehensive map of the toll-like receptor signaling network. Mol
Syst Biol 2(2006):0015
15. Bhardwaj A, Scaria V, Raghava GP, Lynn AM, Chandra N, Banerjee S, Raghunandanan MV,
Pandey V, Taneja B, Yadav J et al (2011) Open source drug discovery—a new paradigm of
collaborative research in tuberculosis drug development. Tuberculosis (Edinb) 91:479–486
16. Singh S (2008) India takes an open source approach to drug discovery. Cell 133:201–203
17. Kitano H, Ghosh S, Matsuoka Y (2011) Social engineering for virtual ‘big science’ in systems
biology. Nat Chem Biol 7:323–326
18. Funahashi A, Matsuoka Y, Jouraku A, Morohashi M, Kikuchi N, Kitano H (2008) Cell Designer
3.5: a versatile modeling tool for biochemical networks. Proc IEEE 96:1254–1265
19. Pico AR, Kelder T, van Iersel MP, Hanspers K, Conklin BR, et al (2008) WikiPathways:
Pathway Editing for the People. PLoS Biol 6(7): e184.doi:10.1371/journal.pbio.0060184
20. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray
D, Cornish-Bowden A et al (2003) The systems biology markup language (SBML): a medium
for representation and exchange of biochemical network models. Bioinformatics 19:524–531
21. Le Novere N, Hucka M, Mi H, Moodie S, Schreiber F, Sorokin A, Demir E, Wegner K, Aladjem
MI, Wimalaratne SM et al (2009) The Systems Biology Graphical Notation. Nat Biotechnol
27:735–741
22. Demir E, Cary MP, Paley S, Fukuda K, Lemer C, Vastrik I, Wu G, D’Eustachio P, Schaefer C,
Luciano J et al (2010) The BioPAX community standard for pathway data sharing. Nat
Biotechnol 28:935–942
23. Bhardwaj A, Bhartiya D, Kumar N, Scaria V (2009) TBrowse: an integrative genomics map of
Mycobacterium tuberculosis. Tuberculosis (Edinb) 89:386–387
24. Periwal V, Rajappan JK, Jaleel AU, Scaria V (2011) Predictive models for anti-tubercular
molecules using machine learning on high-throughput biological screening datasets. BMC Res
Note 4:504
25. Hill C (2007) The post-scienti fi c society, Issues in Science and Technology. Fall:78–84
Article
Toxicity pathway modeling is an effective approach to understanding how biological systems function under chemical perturbations. Many efforts have been made to construct pathways by data-driven or literature- based approaches to elucidate the mechanisms of action of toxicity. In this chapter, we explain how to build a literature-based pathway map in a collaborative manner using in silico platforms such as CellDesigner to draw pathways and networks, Payao as the curation platform, iPathways+ as the publishing platform, and Garuda to integrate curated pathways while adopting model-descriptive standards such as Systems Biology Markup Language as a fi le format and Systems Biology Graphical Notation as the graphical representation.
Article
Full-text available
On March 2004, Food and Drug Administration published the White Paper: Challenge and Opportunity on the Critical Path to New Medical Products. And then, Critical Path Opportunities Report (March 2006), Critical Path Opportunities List (March 2006), and Critical Path Opportunities for Generic Drugs (May, 2007) were published by FDA. This review paper introduced the related information to understand the challenge and opportunity on the Critical Path to new medical products.
Article
Full-text available
The clinical success of multitargeted kinase inhibitors has stimulated efforts to identify promiscuous drugs with optimal selectivity profiles. It remains unclear to what extent such drugs can be rationally designed, particularly for combinations of targets that are structurally divergent. Here we report the systematic discovery of molecules that potently inhibit both tyrosine kinases and phosphatidylinositol-3-OH kinases, two protein families that are among the most intensely pursued cancer drug targets. Through iterative chemical synthesis, X-ray crystallography and kinome-level biochemical profiling, we identified compounds that inhibit a spectrum of new target combinations in these two families. Crystal structures revealed that the dual selectivity of these molecules is controlled by a hydrophobic pocket conserved in both enzyme classes and accessible through a rotatable bond in the drug skeleton. We show that one compound, PP121, blocks the proliferation of tumor cells by direct inhibition of oncogenic tyrosine kinases and phosphatidylinositol-3-OH kinases. These molecules demonstrate the feasibility of accessing a chemical space that intersects two families of oncogenes.
Article
Full-text available
Motivation: Molecular biotechnology now makes it possible to build elaborate systems models, but the systems biology community needs information standards if models are to be shared, evaluated and developed cooperatively. Results: We summarize the Systems Biology Markup Language (SBML) Level 1, a free, open, XML-based format for representing biochemical reaction networks. SBML is a software-independent language for describing models common to research in many areas of computational biology, including cell signaling pathways, metabolic pathways, gene regulation, and others. Availability: The specification of SBML Level 1 is freely available from http://www.sbml.org/
Article
Full-text available
Molecular biotechnology now makes it possible to build elaborate systems models, but the systems biology community needs information standards if models are to be shared, evaluated and developed cooperatively. We summarize the Systems Biology Markup Language (SBML) Level 1, a free, open, XML-based format for representing biochemical reaction networks. SBML is a software-independent language for describing models common to research in many areas of computational biology, including cell signaling pathways, metabolic pathways, gene regulation, and others. The specification of SBML Level 1 is freely available from http://www.sbml.org/
Article
Full-text available
Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.
Article
Full-text available
Tuberculosis is a contagious disease caused by Mycobacterium tuberculosis (Mtb), affecting more than two billion people around the globe and is one of the major causes of morbidity and mortality in the developing world. Recent reports suggest that Mtb has been developing resistance to the widely used anti-tubercular drugs resulting in the emergence and spread of multi drug-resistant (MDR) and extensively drug-resistant (XDR) strains throughout the world. In view of this global epidemic, there is an urgent need to facilitate fast and efficient lead identification methodologies. Target based screening of large compound libraries has been widely used as a fast and efficient approach for lead identification, but is restricted by the knowledge about the target structure. Whole organism screens on the other hand are target-agnostic and have been now widely employed as an alternative for lead identification but they are limited by the time and cost involved in running the screens for large compound libraries. This could be possibly be circumvented by using computational approaches to prioritize molecules for screening programmes. We utilized physicochemical properties of compounds to train four supervised classifiers (Naïve Bayes, Random Forest, J48 and SMO) on three publicly available bioassay screens of Mtb inhibitors and validated the robustness of the predictive models using various statistical measures. This study is a comprehensive analysis of high-throughput bioassay data for anti-tubercular activity and the application of machine learning approaches to create target-agnostic predictive models for anti-tubercular agents.
Article
Several issues and factors associated with the development of the US in the post-scientific society, are discussed. The post-scientific society in the country will continue to rely on advanced scientific and engineering research to develop. It is also expected that innovation will be carried out in studios, atelier, and cyberspace in the post-scientific society. Companies are increasing using information networks, such as Goggle YouTube, eBay, and Yahoo, to undertake innovation and create new wealth. It is also suggested that business successes in the post-scientific society in the country will be achieved by searching new knowledge, regardless of its origin. This new knowledge will be integrated with significant knowledge of cultures and consumer preferences, to create new innovations in the post-scientific society.
Article
Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.
Article
It is being realized that the traditional closed-door and market driven approaches for drug discovery may not be the best suited model for the diseases of the developing world such as tuberculosis and malaria, because most patients suffering from these diseases have poor paying capacity. To ensure that new drugs are created for patients suffering from these diseases, it is necessary to formulate an alternate paradigm of drug discovery process. The current model constrained by limitations for collaboration and for sharing of resources with confidentiality hampers the opportunities for bringing expertise from diverse fields. These limitations hinder the possibilities of lowering the cost of drug discovery. The Open Source Drug Discovery project initiated by Council of Scientific and Industrial Research, India has adopted an open source model to power wide participation across geographical borders. Open Source Drug Discovery emphasizes integrative science through collaboration, open-sharing, taking up multi-faceted approaches and accruing benefits from advances on different fronts of new drug discovery. Because the open source model is based on community participation, it has the potential to self-sustain continuous development by generating a storehouse of alternatives towards continued pursuit for new drug discovery. Since the inventions are community generated, the new chemical entities developed by Open Source Drug Discovery will be taken up for clinical trial in a non-exclusive manner by participation of multiple companies with majority funding from Open Source Drug Discovery. This will ensure availability of drugs through a lower cost community driven drug discovery process for diseases afflicting people with poor paying capacity. Hopefully what LINUX the World Wide Web have done for the information technology, Open Source Drug Discovery will do for drug discovery.