Article

Chem2Bio2RDF: A Linked Open Data Portal for Chemical Biology

12/2010;
Source: arXiv

ABSTRACT

The Chem2Bio2RDF portal is a Linked Open Data (LOD) portal for systems chemical biology aiming for facilitating drug discovery. It converts around 25 different datasets on genes, compounds, drugs, pathways, side effects, diseases, and MEDLINE/PubMed documents into RDF triples and links them to other LOD bubbles, such as Bio2RDF, LODD and DBPedia. The portal is based on D2R server and provides a SPARQL endpoint, but adds on few unique features like RDF faceted browser, user-friendly SPARQL query generator, MEDLINE/PubMed cross validation service, and Cytoscape visualization plugin. Three use cases demonstrate the functionality and usability of this portal. Comment: 8 pages, 10 figures

Download full-text

Full-text

Available from: Ying Ding
  • Source
    • "In the biomedical field, a resource copy has been represented by Bio2RDF [16], a system for integrated access to a large number of biomedical databases through Semantic Web technologies RDF, ie, for data representation and SPARQL (SPARQL Protocol and RDF Query Language) [17] for queries. In [18] and [19] is presented the portal Chem2Bio2RDF a Linked Open Data (LOD) portal for chemical systems biology to facilitate drug discovery. It converts about 25 different datasets in the genes, compounds, drugs, roads, side effects, diseases and RDF triples that links to other bubbles LOD, as Bio2RDF, LODD and DBpedia. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene annotation is a process that encompasses multiple approaches on the analysis of nucleic acids or protein sequences in order to assign structural and functional characteristics to gene models. When thousands of gene models are being described in an organism genome, construction and visualization of gene networks impose novel challenges in the understanding of complex expression patterns and the generation of new knowledge in genomics research. In order to take advantage of accumulated text data after conventional gene sequence analysis, this work applied semantics in combination with visualization tools to build transcriptome networks from a set of coffee gene annotations. A set of selected coffee transcriptome sequences, chosen by the quality of the sequence comparison reported by Basic Local Alignment Search Tool (BLAST) and Interproscan, were filtered out by coverage, identity, length of the query, and e-values. Meanwhile, term descriptors for molecular biology and biochemistry were obtained along the Wordnet dictionary in order to construct a Resource Description Framework (RDF) using Ruby scripts and Methontology to find associations between concepts. Relationships between sequence annotations and semantic concepts were graphically represented through a total of 6845 oriented vectors, which were reduced to 745 non-redundant associations. A large gene network connecting transcripts by way of relational concepts was created where detailed connections remain to be validated for biological significance based on current biochemical and genetics frameworks. Besides reusing text information in the generation of gene connections and for data mining purposes, this tool development opens the possibility to visualize complex and abundant transcriptome data, and triggers the formulation of new hypotheses in metabolic pathways analysis.
    Preview · Article · Jan 2012
  • Source
    • "ECMLS '10 Chicago, Illinois USA Copyright 2010 ACM X-XXXXX-XX-X/XX/XX ...$10.00. data point visualization tool, named PubChemBrowse, to display chemical structures with complex properties such as gene and disease relationships established through querying Chem2Bio2RDF system [1] for drug discovery. "
    [Show abstract] [Hide abstract]
    ABSTRACT: SUMMARY Visualization of large-scale high dimensional data is highly valuable for data analysis facilitating scientific discovery in many fields. We present PubChemBrowse, a customized visualization tool for cheminformatics research. It provides a novel 3D data point browser that displays complex properties of massive data on commodity clients. As in Geographic Information System browsers for Earth and Environment data, chemical compounds with similar properties are nearby in the browser. PubChemBrowse is built around in-house high performance parallel Multi-dimensional scaling and Generative topographic mapping services and supports fast interaction with an external property database. These properties can be overlaid on 3D mapped compound space or queried for individual points. We prototype the integration with Chem2Bio2RDF system using SPARQL endpoint to access over 20 publicly accessible bioinformatics databases. We describe our design and implementation of the integrated PubChemBrowse application and outline its use in drug discovery. The same core technologies are generally applicable to develop high performance scientific data browsing systems for other applications. Copyright © 2011 John Wiley & Sons, Ltd.
    Full-text · Article · Dec 2011 · Concurrency and Computation Practice and Experience
  • Source
    • "ECMLS '10 Chicago, Illinois USA Copyright 2010 ACM X-XXXXX-XX-X/XX/XX ...$10.00. data point visualization tool, named PubChemBrowse, to display chemical structures with complex properties such as gene and disease relationships established through querying Chem2Bio2RDF system [1] for drug discovery. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Visualization of large-scale high dimensional data tool is highly valuable for scientific discovery in many fields. We present PubChemBrowse, a customized visualization tool for cheminformatics research. It provides a novel 3D data point browser that displays complex properties of massive data on commodity clients. As in GIS browsers for Earth and Envi- ronment data, chemical compounds with similar properties are nearby in the browser. PubChemBrowse is built around in-house high performance parallel MDS (Multi-Dimensional Scaling) and GTM (Generative Topographic Mapping) ser- vices and supports fast interaction with an external property database. These properties can be overlaid on 3D mapped compound space or queried for individual points. We proto- type use with Chem2Bio2RDF system using SPARQL query language to access over 20 publicly accessible bioinformatics databases. We describe our design and implementation of the integrated PubChemBrowse application and outline its use in drug discovery. The same core technologies can be used to develop similar high dimensional browsers in other scientific areas.
    Full-text · Conference Paper · Jun 2010
Show more