A brief overview on the BioPAX and SBML standards for formal presentation of complex biological knowledge

Source: arXiv


A brief informal overview on the BioPAX and SBML standards for formal
presentation of complex biological knowledge.

Download full-text


Available from: Leo Lahti, Sep 30, 2015
24 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: in a few weeks by a single graduate student with access to DNA samples and associated phenotypes, an Internet connection to the public genome databases, a thermal cycler and a DNA-sequencing machine. With the recent publication of a draft sequence of the mouse genome,, identification of the mutations underlying a vast number of interesting mouse phenotypes has simi-larly been greatly simplified. Comparison of the human and mouse sequences shows that the proportion of the In contemplating a vision for the futureof genomicsresearch,itisappropri-ate to consider the remarkable path that has brought us here. The rollfold (Figure 1) shows a timeline of land-mark accomplishments in genetics and genomics, beginning with Gregor Mendel's discovery of the laws of heredity, and their rediscovery in the early days of the twentiethcentury. Recognitionof DNAasthe hereditary material,, determination of its structure,, elucidation of the genetic code,, development of recombinant DNA tech-nologies,,and establishment of increasingly automatable methods for DNA sequen-cing,-,set the stage for the Human Genome Project (HGP) to begin in 1990 (see also www. nature. com/nature/DNA50). Thanks to the vision of the original planners, and the creativity and determination of a legion of talented scientists who decided to make this project their overarching focus, all of the initial objectives of the HGP have now
    Nature 05/2003; 422(6934):835-47. DOI:10.1038/nature01626 · 41.46 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a system, QPACA (Quantitative Pathway Analysis in Cancer) for analysis of biological data in the context of pathways. QPACA supports data visualization and both fine- and coarse-grained specifications, but, more importantly, addresses the problems of pathway recognition and pathway augmentation. Given a set of genes hypothesized to be part of a pathway or a coordinated process, QPACA is able to reliably distinguish true pathways from non-pathways using microarray expression data. Relying on the observation that only some of the experiments within a dataset are relevant to a specific biochemical pathway, QPACA automates selection of this subset using an optimization procedure. We present data on all human and yeast pathways found in the KEGG pathway database. In 117 out of 191 cases (61%), QPACA was able to correctly identify these positive cases as bona fide pathways with p-values measured using rigorous permutation analysis. Success in recognizing pathways was dependent on pathway size, with the largest quartile of pathways yielding 83% success. In cross-validation tests of pathway membership prediction, QPACA was able to yield enrichments for predicted pathway genes over random genes at rates of 2-fold or better the majority of the time, with rates of 10-fold or better 10-20% of the time. The software is available for academic research use free of charge by email request. Data used in the paper may be downloaded from
    Bioinformatics 02/2006; 22(2):233-41. DOI:10.1093/bioinformatics/bti764 · 4.98 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).
    Nature Biotechnology 03/2004; 22(2):177-83. DOI:10.1038/nbt926 · 41.51 Impact Factor