BioPAX – A community standard for pathway data sharing

Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, New York, USA.
Nature Biotechnology (Impact Factor: 41.51). 09/2010; 28(9):935-42. DOI: 10.1038/nbt1210-1308c
Source: PubMed


Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.

Download full-text


Available from: Nicolas Le Novère,
  • Source
    • "Definition 13 in SM) of another. For example, for an enzyme E and a biochemical reaction BR (RDF protein and RDF interaction respectively in RIIG terminology , see Definitions 9–12 in SM), the ''CONTROLLER'' RDF object property of the Biopax 2 ontology [62] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Objectives: We developed Resource Description Framework (RDF)-induced InfluGrams (RIIG) - an informatics formalism to uncover complex relationships among biomarker proteins and biological pathways using the biomedical knowledge bases. We demonstrate an application of RIIG in morphoproteomics, a theranostic technique aimed at comprehensive analysis of protein circuitries to design effective therapeutic strategies in personalized medicine setting. Methods: RIIG uses an RDF "mashup" knowledge base that integrates publicly available pathway and protein data with ontologies. To mine for RDF-induced Influence Links, RIIG introduces notions of RDF relevancy and RDF collider, which mimic conditional independence and "explaining away" mechanism in probabilistic systems. Using these notions and constraint-based structure learning algorithms, the formalism generates the morphoproteomic diagrams, which we call InfluGrams, for further analysis by experts. Results: RIIG was able to recover up to 90% of predefined influence links in a simulated environment using synthetic data and outperformed a naïve Monte Carlo sampling of random links. In clinical cases of Acute Lymphoblastic Leukemia (ALL) and Mesenchymal Chondrosarcoma, a significant level of concordance between the RIIG-generated and expert-built morphoproteomic diagrams was observed. In a clinical case of Squamous Cell Carcinoma, RIIG allowed selection of alternative therapeutic targets, the validity of which was supported by a systematic literature review. We have also illustrated an ability of RIIG to discover novel influence links in the general case of the ALL. Conclusions: Applications of the RIIG formalism demonstrated its potential to uncover patient-specific complex relationships among biological entities to find effective drug targets in a personalized medicine setting. We conclude that RIIG provides an effective means not only to streamline morphoproteomic studies, but also to bridge curated biomedical knowledge and causal reasoning with the clinical data in general.
    Journal of Biomedical Informatics 08/2014; 52. DOI:10.1016/j.jbi.2014.08.003 · 2.19 Impact Factor
  • Source
    • "Attempts have been made to resolve these key issues through the development of numerous data standards (e.g. SBML [11], CellML [12], PSI-MI [13], BioPAX [14], GO [15] and SBO [16]), the implementation of centralized and federated databases (e.g. cPath [17], PathCase [18] and Pathway Commons [19]) and the proposal of design methodologies for software and databases (e.g. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Advances in high-throughput technologies have enabled extensive generation of multi-level omics data. These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. This leads to difficulties in data accessibility and often results in errors when data are merged and integrated from varied resources. Therefore, integration and management of systems biological data remain very challenging. Methods To overcome this, we designed and developed a dedicated database system that can serve and solve the vital issues in data management and hereby facilitate data integration, modeling and analysis in systems biology within a sole database. In addition, a yeast data repository was implemented as an integrated database environment which is operated by the database system. Two applications were implemented to demonstrate extensibility and utilization of the system. Both illustrate how the user can access the database via the web query function and implemented scripts. These scripts are specific for two sample cases: 1) Detecting the pheromone pathway in protein interaction networks; and 2) Finding metabolic reactions regulated by Snf1 kinase. Results and conclusion In this study we present the design of database system which offers an extensible environment to efficiently capture the majority of biological entities and relations encountered in systems biology. Critical functions and control processes were designed and implemented to ensure consistent, efficient, secure and reliable transactions. The two sample cases on the yeast integrated data clearly demonstrate the value of a sole database environment for systems biology research.
    Source Code for Biology and Medicine 07/2014; 9(1):17. DOI:10.1186/1751-0473-9-17
  • Source
    • "A common problem in these databases is the lack of standardization of information representation, preventing an easy exchange of information between databases and software tools. Progress has been done to overcome this limitation through the introduction of several language standards, among others, SBML (Hucka et al., 2003), BioPAX (Demir et al., 2010), SBGL (Le Novère et al., 2009), and SBOL (Galdzicki et al., 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The development and application of biotechnology-based strategies has had a great socio-economical impact and is likely to play a crucial role in the foundation of more sustainable and efficient industrial processes. Within biotechnology, metabolic engineering aims at the directed improvement of cellular properties, often with the goal of synthesizing a target chemical compound. The use of computer-aided design (CAD) tools, along with the continuously emerging advanced genetic engineering techniques have allowed metabolic engineering to broaden and streamline the process of heterologous compound-production. In this work, we review the CAD tools available for metabolic engineering with an emphasis, on retrosynthesis methodologies. Recent advances in genetic engineering strategies for pathway implementation and optimization are also reviewed as well as a range of bionalytical tools to validate in silico predictions. A case study applying retrosynthesis is presented as an experimental verification of the output from Retropath, the first complete automated computational pipeline applicable to metabolic engineering. Applying this CAD pipeline, together with genetic reassembly and optimization of culture conditions led to improved production of the plant flavonoid pinocembrin. Coupling CAD tools with advanced genetic engineering strategies and bioprocess optimization is crucial for enhanced product yields and will be of great value for the development of non-natural products through sustainable biotechnological processes.
    Journal of Biotechnology 04/2014; 192. DOI:10.1016/j.jbiotec.2014.03.029 · 2.87 Impact Factor
Show more