Life sciences on the Semantic Web: The Neurocommons and beyond
Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands. Briefings in Bioinformatics
(Impact Factor: 9.62).
04/2009; 10(2):193-204. DOI: 10.1093/bib/bbp004
Translational research, the effort to couple the results of basic research to clinical applications, depends on the ability to effectively answer questions using information that spans multiple disciplines. The Semantic Web, with its emphasis on combining information using standard representation languages, access to that information via standard web protocols, and technologies to leverage computation, such as in the form of inference and distributable query, offers a social and technological basis for assembling, integrating and making available biomedical knowledge at Web scale. In this article, we discuss the use of Semantic Web technology for assembling and querying biomedical knowledge from multiple sources and disciplines. We present the Neurocommons prototype knowledge base, a demonstration intended to show the feasibility and benefits of using these technologies. The prototype knowledge base can be used to experiment with and assess the scalability of current tools and methods for creating such a resource, and to elicit issues that will need to be addressed in order to expand the scope and use of it. We demonstrate the utility of the knowledge base by reviewing a few example queries that provide answers to precise questions relevant to the understanding of disease. All components of the knowledge base are freely available at http://neurocommons.org/, enabling readers to reconstruct the knowledge base and experiment with this new technology.
Available from: Volker Tresp
- "Knowledge graphs are also used in several specialized domains. For instance, Bio2RDF , Neurocommons , and LinkedLifeData  are knowledge graphs that integrate multiple sources of biomedical information. These have been been used for question answering and decision support in the life sciences. "
[Show abstract] [Hide abstract]
ABSTRACT: Relational machine learning studies methods for the statistical analysis of
relational, or graph-structured, data. In this paper, we provide a review of
how such statistical models can be "trained" on large knowledge graphs, and
then used to predict new facts about the world (which is equivalent to
predicting new edges in the graph). In particular, we discuss two different
kinds of statistical relational models, both of which can scale to massive
datasets. The first is based on tensor factorization methods and related latent
variable models. The second is based on mining observable patterns in the
graph. We also show how to combine these latent and observable models to get
improved modeling power at decreased computational cost. Finally, we discuss
how such statistical models of graphs can be combined with text-based
information extraction methods for automatically constructing knowledge graphs
from the Web. In particular, we discuss Google's Knowledge Vault project.
Available from: PubMed Central
- "CoCoMac and BAMS have both built their knowledge bases as database schema models, which limit the ability to create open and flexible linked data models (Ruttenberg et al., 2009). NeuroLex and the ConnectomeWiki use a semantic data model. "
[Show abstract] [Hide abstract]
ABSTRACT: The ability to transmit, organize, and query information digitally has brought with it the challenge of how to best use this power to facilitate scientific inquiry. Today, few information systems are able to provide detailed answers to complex questions about neuroscience that account for multiple spatial scales, and which cross the boundaries of diverse parts of the nervous system such as molecules, cellular parts, cells, circuits, systems and tissues. As a result, investigators still primarily seek answers to their questions in an increasingly densely populated collection of articles in the literature, each of which must be digested individually. If it were easier to search a knowledge base that was structured to answer neuroscience questions, such a system would enable questions to be answered in seconds that would otherwise require hours of literature review. In this article, we describe NeuroLex.org, a wiki-based website and knowledge management system. Its goal is to bring neurobiological knowledge into a framework that allows neuroscientists to review the concepts of neuroscience, with an emphasis on multiscale descriptions of the parts of nervous systems, aggregate their understanding with that of other scientists, link them to data sources and descriptions of important concepts in neuroscience, and expose parts that are still controversial or missing. To date, the site is tracking ~25,000 unique neuroanatomical parts and concepts in neurobiology spanning experimental techniques, behavioral paradigms, anatomical nomenclature, genes, proteins and molecules. Here we show how the structuring of information about these anatomical parts in the nervous system can be reused to answer multiple neuroscience questions, such as displaying all known GABAergic neurons aggregated in NeuroLex or displaying all brain regions that are known within NeuroLex to send axons into the cerebellar cortex.
Available from: Michel Dumontier
- "Although there are several efforts for provisioning life science linked data such as Neurocommons , LinkedLifeData , W3C HCLS , Chem2Bio2RDF  and BioLOD , Bio2RDF is unique in several ways. First, Bio2RDF attempts to capture the intended meaning serialized by the original data providers in both content and structure. "
[Show abstract] [Hide abstract]
ABSTRACT: A key activity for life scientists in this post "-omics" age involves searching for and integrating biological data from a multitude of independent databases. However, our ability to find relevant data is hampered by non-standard web and database interfaces backed by an enormous variety of data formats. This heterogeneity presents an overwhelming barrier to the discovery and reuse of resources which have been developed at great public expense.To address this issue, the open-source Bio2RDF project promotes a simple convention to integrate diverse biological data using Semantic Web technologies. However, querying Bio2RDF remains difficult due to the lack of uniformity in the representation of Bio2RDF datasets.
We describe an update to Bio2RDF that includes tighter integration across 19 new and updated RDF datasets. All available open-source scripts were first consolidated to a single GitHub repository and then redeveloped using a common API that generates normalized IRIs using a centralized dataset registry. We then mapped dataset specific types and relations to the Semanticscience Integrated Ontology (SIO) and demonstrate simplified federated queries across multiple Bio2RDF endpoints.
This coordinated release marks an important milestone for the Bio2RDF open source linked data framework. Principally, it improves the quality of linked data in the Bio2RDF network and makes it easier to access or recreate the linked data locally. We hope to continue improving the Bio2RDF network of linked data by identifying priority databases and increasing the vocabulary coverage to additional dataset vocabularies beyond SIO.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.