Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The increase in volume and complexity of biological data has led to increased requirements to reuse that data. Consistent and accurate metadata is essential for this task, creating new challenges in semantic data annotation and in the constriction of terminologies and ontologies used for annotation. The BioSharing community are developing standards and terminologies for annotation, which have been adopted across bioinformatics, but the real challenge is to make these standards accessible to laboratory scientists. Widespread adoption requires the provision of tools to assist scientists whilst reducing the complexities of working with semantics. This paper describes unobtrusive ‘stealthy’ methods for collecting standards compliant, semantically annotated data and for contributing to ontologies used for those annotations. Spreadsheets are ubiquitous in laboratory data management. Our spreadsheet‐based RightField tool enables scientists to structure information and select ontology terms for annotation within spreadsheets, producing high quality, consistent data without changing common working practices. Furthermore, our Populous spreadsheet tool proves effective for gathering domain knowledge in the form of Web Ontology Language (OWL) ontologies. Such a corpus of structured and semantically enriched knowledge can be extracted in Resource Description Framework (RDF), providing further means for searching across the content and contributing to Open Linked Data (http://linkeddata.org/). Copyright © 2012 John Wiley & Sons, Ltd.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... [2][3][4][5][6][7][8][9][10][11][12][13][14][15] Shared representations for data and metadata, grounded in well-defined ontology terms, can help reduce confusion when sharing materials between researchers. 16 The Synthetic Biology Open Language (SBOL) 6 has been developed to address this challenge. SBOL provides a standardized format for the electronic exchange of information on the structural and functional aspects of biological designs, supporting the use of engineering principles such as abstraction, modularity, and standardization for synthetic biology. ...
Preprint
Full-text available
Standards support synthetic biology research by enabling the exchange of component information. However, using formal representations, such as the Synthetic Biology Open Language (SBOL), typically requires either a thorough understanding of these standards or a suite of tools developed in concurrence with the ontologies. Since these tools may be a barrier for use by many practitioners, the Excel-SBOL Converter was developed to allow easier use of SBOL and integration into existing workflows. The converter consists of two Python libraries: one that converts Excel templates to SBOL, and another that converts SBOL to an Excel workbook. Both libraries can be used either directly or via a SynBioHub plugin. We illustrate the operation of the Excel-SBOL Converter with two case studies: uploading experimental data with the study’s metadata linked to the measurements and downloading the Cello part repository. Graphical TOC Entry
Article
Scientific discovery is increasingly driven by the collection, analysis, and comprehension of digital data. Collaborations between domain scientists and computer scientists can accelerate both the investigation and applications processes. The Microsoft eScience Workshop is a recognized venue for showcasing such collaborations and serves as a forum for exchanging both domain and computational researches. This editorial provides an overview of the papers that resulted from selected research collaboration presented at the 2010 Microsoft eScience workshop. Copyright (C) 2012 John Wiley & Sons, Ltd.
Article
Full-text available
We propose three urgent actions to advance this key field. First, authors, journals and curators should immediately begin to work together to facilitate the exchange of data between journal publications and databases. Second, in the next five years, curators, researchers and ...
Conference Paper
Full-text available
In this paper a novel approach is presented for generating RDF graphs of arbitrary complexity from various spreadsheet layouts. Currently, none of the available spreadsheet-to-RDF wrappers supports cross tables and tables where data is not aligned in rows. Similar to RDF123, XLWrap is based on template graphs where fragments of triples can be mapped to specific cells of a spreadsheet. Additionally, it features a full expression algebra based on the syntax of OpenOffice Calc and various shift operations, which can be used to repeat similar mappings in order to wrap cross tables including multiple sheets and spreadsheet files. The set of available expression functions includes most of the native functions of OpenOffice Calc and can be easily extended by users of XLWrap. Additionally, XLWrap is able to execute SPARQL queries, and since it is possible to define multiple virtual class extents in a mapping specification, it can be used to integrate information from multiple spreadsheets. XLWrap supports a special identity concept which allows to link anonymous resources (blank nodes) – which may originate from different spreadsheets – in the target graph.
Article
Full-text available
Background Ontologies are being developed for the life sciences to standardise the way we describe and interpret the wealth of data currently being generated. As more ontology based applications begin to emerge, tools are required that enable domain experts to contribute their knowledge to the growing pool of ontologies. There are many barriers that prevent domain experts engaging in the ontology development process and novel tools are needed to break down these barriers to engage a wider community of scientists. Results We present Populous, a tool for gathering content with which to construct an ontology. Domain experts need to add content, that is often repetitive in its form, but without having to tackle the underlying ontological representation. Populous presents users with a table based form in which columns are constrained to take values from particular ontologies. Populated tables are mapped to patterns that can then be used to automatically generate the ontology's content. These forms can be exported as spreadsheets, providing an interface that is much more familiar to many biologists. Conclusions Populous's contribution is in the knowledge gathering stage of ontology development; it separates knowledge gathering from the conceptualisation and axiomatisation, as well as separating the user from the standard ontology authoring environments. Populous is by no means a replacement for standard ontology editing tools, but instead provides a useful platform for engaging a wider community of scientists in the mass production of ontology content.
Article
Full-text available
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open 'data commoning' culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared 'Investigation-Study-Assay' framework to support that vision.
Conference Paper
Full-text available
We describe a mapping language for converting data contained in spreadsheets into the Web Ontology Language (OWL). The developed language, called M2, overcomes shortcomings with existing mapping techniques, including their restriction to well-formed spreadsheets reminiscent of a single relational database table and verbose syntax for expressing mapping rules when transforming spreadsheet contents into OWL. The M2 language provides expressive, yet concise mechanisms to create both individual and class axioms when generating OWL ontologies. We additionally present an implementation of the mapping approach, Mapping Master, which is available as a plug-in for the Protégé ontology editor.
Conference Paper
Full-text available
We describe RDF123, a highly flexible open-source tool for translating spreadsheet data to RDF. Existing spreadsheet-to-rdf tools typically map only to star-shaped RDF graphs, i.e. each spreadsheet row is an instance, with each column representing a property. RDF123, on the other hand, allows users to define mappings to arbitrary graphs, thus allowing much richer spreadsheet semantics to be expressed. Further, each row in the spreadsheet can be mapped with a fairly different RDF scheme. Two interfaces are available. The first is a graphical application that allows users to create their mapping in an intuitive manner. The second is a Web service that takes as input a URL to a Google spreadsheet or CSV file and an RDF123 map, and provides RDF as output.
Conference Paper
Full-text available
In this poster, we present a set of nine scenarios, identified in the NeOn Methodology, for building ontology networks.
Conference Paper
Full-text available
We describe the design and use of the Ontology Pre-Processor Language (OPPL) as a means of embedding the use of Knowledge Patterns in OWL ontologies. We illustrate the specification of patterns in OPPL and discuss the advantages of its adoption by Ontology Engineers with respect to ontology generation, transformation, and maintainability. The consequence of the declarative specification of patterns will be their unambiguous description inside an ontology in OWL. Thus, OPPL enables an ontology engineer to work at the level of the pattern, rather than of the raw OWL axioms. Moreover, patterns can be analysed rigorously, so that the repercussions of their reuse can be better understood by ontology engineers and tools implementers. Thus the delivery of patterns with OPPL can provide a means of addressing the opacity and sustainability of OWL ontologies.
Article
Full-text available
When developing the Ontology of Biomedical Investigations (OBI), the process of adding classes with similar patterns of logical definition is time consuming, error prone, and requires an editor to have some expertise in OWL. Moreover, the process is poorly suited for a large number of domain experts who have limited experience with ontology development, and this can hinder contributions. We have developed a procedure to ease this task and allow such domain experts to add terms to the ontology in a way that both effectively includes complex logical definitions, yet requires minimal manual intervention by the OBI developers. The procedure is based on editing a Quick Term Template in a spreadsheet format that is subsequently converted into an OWL file. This procedure promises to be a robust and scalable approach for ontology enrichment as evidenced by encouraging results obtained when evaluated with an early version of the MappingMaster Protege plugin.
Article
Full-text available
Background: Experimental descriptions are typically stored as free text without using standardized terminology, creating challenges in comparison, reproduction and analysis. These difficulties impose limitations on data exchange and information retrieval.
Article
Full-text available
Chronic renal disease is a global health problem. The identification of suitable biomarkers could facilitate early detection and diagnosis and allow better understanding of the underlying pathology. One of the challenges in meeting this goal is the necessary integration of experimental results from multiple biological levels for further analysis by data mining. Data integration in the life science is still a struggle, and many groups are looking to the benefits promised by the Semantic Web for data integration. We present a Semantic Web approach to developing a knowledge base that integrates data from high-throughput experiments on kidney and urine. A specialised KUP ontology is used to tie the various layers together, whilst background knowledge from external databases is incorporated by conversion into RDF. Using SPARQL as a query mechanism, we are able to query for proteins expressed in urine and place these back into the context of genes expressed in regions of the kidney. The KUPKB gives KUP biologists the means to ask queries across many resources in order to aggregate knowledge that is necessary for answering biological questions. The Semantic Web technologies we use, together with the background knowledge from the domain's ontologies, allows both rapid conversion and integration of this knowledge base. The KUPKB is still relatively small, but questions remain about scalability, maintenance and availability of the knowledge itself. The KUPKB may be accessed via http://www.e-lico.eu/kupkb.
Article
Full-text available
Motivation: In the Life Sciences, guidelines, checklists and ontologies describing what metadata is required for the interpretation and reuse of experimental data are emerging. Data producers, however, may have little experience in the use of such standards and require tools to support this form of data annotation. Results: RightField is an open source application that provides a mechanism for embedding ontology annotation support for Life Science data in Excel spreadsheets. Individual cells, columns or rows can be restricted to particular ranges of allowed classes or instances from chosen ontologies. The RightField-enabled spreadsheet presents selected ontology terms to the users as a simple drop-down list, enabling scientists to consistently annotate their data. The result is 'semantic annotation by stealth', with an annotation process that is less error-prone, more efficient, and more consistent with community standards. Availability and implementation: RightField is open source under a BSD license and freely available from http://www.rightfield.org.uk
Article
Full-text available
The 1990s and the first years of this new millennium have witnessed the growing interest of many practitioners in methodologies that support the creation of single or isolated ontologies. All these approaches have supposed a step forward since they have transformed the art of constructing single ontologies into an engineering activity. With the goal of speeding up the ontology development process, ontology practitioners are starting to reuse and re-engineer as much as possible knowledge resources (such as ontologies, thesauri, lexicons, and classification schemas), which already have reached some degree of consensus. In this paper, we present the set of nine scenarios identified in the NeOn Methodology framework. Additionally, we present how such scenarios have been followed in different use cases.
Article
Full-text available
We present Populous, a tool for gathering content with which to populate an ontology. Domain experts need to add content, that is often repetitive in its form, but without having to tackle the underlying ontological representation. Populous presents users with a table based form in which columns are constrained to take values from particular ontologies; the user can select a concept from an ontology via its meaningful label to give a value for a given entity attribute. Populated tables are mapped to patterns that can then be used to automatically generate the ontology's content. Populous's contribution is in the knowledge gathering stage of ontology development. It separates knowledge gathering from the conceptualisation and also separates the user from the standard ontology authoring environments. As a result, Populous can allow knowledge to be gathered in a straight-forward manner that can then be used to do mass production of ontology content. Comment: in Adrian Paschke, Albert Burger begin_of_the_skype_highlighting end_of_the_skype_highlighting, Andrea Splendiani, M. Scott Marshall, Paolo Romano: Proceedings of the 3rd International Workshop on Semantic Web Applications and Tools for the Life Sciences, Berlin,Germany, December 8-10, 2010
Article
Full-text available
Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.
Article
Full-text available
The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories. Availability and Implementation: Software, documentation, case studies and implementations at http://www.isa-tools.org Contact: isatools@googlegroups.com
Article
Full-text available
Background Experimental descriptions are typically stored as free text without using standardized terminology, creating challenges in comparison, reproduction and analysis. These difficulties impose limitations on data exchange and information retrieval. Results The Ontology for Biomedical Investigations (OBI), developed as a global, cross-community effort, provides a resource that represents biomedical investigations in an explicit and integrative framework. Here we detail three real-world applications of OBI, provide detailed modeling information and explain how to use OBI. Conclusion We demonstrate how OBI can be applied to different biomedical investigations to both facilitate interpretation of the experimental process and increase the computational processing and integration within the Semantic Web. The logical definitions of the entities involved allow computers to unambiguously understand and integrate different biological experimental processes and their relevant components. Availability OBI is available at http://purl.obolibrary.org/obo/obi/2009-11-02/obi.owl
Article
Full-text available
The Ontology Lookup Service (OLS; http://www.ebi.ac.uk/ols) has been providing several means to query, browse and navigate biomedical ontologies and controlled vocabularies since it first went into production 4 years ago, and usage statistics indicate that it has become a heavily accessed service with millions of hits monthly. The volume of data available for querying has increased 7-fold since its inception. OLS functionality has been integrated into several high-usage databases and data entry tools. Improvements in the data model and loaders, as well as interface enhancements have made the OLS easier to use and capture more annotations from the source data. In addition, newly released software packages now provide easy means to fully integrate OLS functionality in external applications.
Article
Full-text available
The developers of the Ontology of Biomedical Investigations (OBI) primarily use Protégé for editing. However, adding many classes with similar patterns of logical definition is time consuming, error prone, and requires the editor to have some expertise in OWL. Therefore, the process is poorly suited for a large number of domain experts who have limited experience Protégé and ontology development. We have developed a procedure to ease this task and allow such domain experts to add terms to the ontology in a way that both effectively includes complex logical definitions yet requires minimal manual intervention by OBI developers. The procedure is based on editing a Quick Term Template in a spreadsheet format which is subsequently converted into an OWL file. This procedure promises to be a robust and scalable approach for ontology enrichment.
Article
Full-text available
Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.
Article
Full-text available
Biomedical ontologies provide essential domain knowledge to drive data integration, information retrieval, data annotation, natural-language processing and decision support. BioPortal (http://bioportal.bioontology.org) is an open repository of biomedical ontologies that provides access via Web services and Web browsers to ontologies developed in OWL, RDF, OBO format and Protégé frames. BioPortal functionality includes the ability to browse, search and visualize ontologies. The Web interface also facilitates community-based participation in the evaluation and evolution of ontology content by providing features to add notes to ontology terms, mappings between terms and ontology reviews based on criteria such as usability, domain coverage, quality of content, and documentation and support. BioPortal also enables integrated search of biomedical data resources such as the Gene Expression Omnibus (GEO), ClinicalTrials.gov, and ArrayExpress, through the annotation and indexing of these resources with ontologies in BioPortal. Thus, BioPortal not only provides investigators, clinicians, and developers ‘one-stop shopping’ to programmatically access biomedical ontologies, but also provides support to integrate data from a variety of biomedical resources.
Article
Full-text available
As research in the biological sciences continues to advance at a rapid pace, it is increasingly important that the data be captured, standardized, organized and made accessible to the scientific community. This is the job of a biocurator. Here we describe the process of biocuration from our perspective as FlyBase curators.
Article
Full-text available
The MAGE-TAB format for microarray data representation and exchange has been proposed by the microarray community to replace the more complex MAGE-ML format. We present a suite of tools to support MAGE-TAB generation and validation, conversion between existing formats for data exchange, visualization of the experiment designs encoded by MAGE-TAB documents and the mining of such documents for semantic content. Availability: Software is available from http://tab2mage.sourceforge.net/ Contact: tfrayner@gmail.com
Article
Full-text available
To thrive, the field that links biologists and their data urgently needs structure, recognition and support.
Article
Full-text available
Chris F Taylor, Dawn Field, Susanna-Assunta Sansone, Jan Aerts, Rolf Apweiler, Michael Ashburner, Pierre-Alain Binz, Molly Bogue, Tim Booth, Alvis Brazma, Ryan R Brinkman, Catherine A Ball, Eric W Deutsch, Oliver Fiehn, Jennifer Fostel, Peter Ghazal, Frank Gibson, Adam Michael Clark, Graeme Grimes, John M Hancock, Nigel W Hardy, Henning Hermjakob, Randall K Julian Jr, Tanya Gray, Carsten Kettner, Christopher Kinsinger, Eugene Kolker, Martin Kuiper, Matthew Kane, Jim Leebens-Mack, Suzanna E Lewis, Phillip Lord, Ann-Marie Mallon, Nicolas Le Nov?re, Hiroshi Masuya, Ruth McNally, Alexander Mehrle, Norman Morrison, Nishanth Marthandan, John Quackenbush, James M Reecy, Donald G Robertson, Philippe Rocca-Serra, Sandra Orchard, Heiko Rosenfelder, Javier Santoyo-Lopez, Richard H Scheuermann, Daniel Schober, Henry Rodriguez, Jason Snape, Christian J Stoeckert Jr, Keith Tipton, Peter Sterk, Andreas Untergasser, Barry Smith,Jo Vandesompele, Stefan Wiemann. (2008). Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nature Biotechnology 26(8), 889-896.
Article
Full-text available
Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
Article
Full-text available
ArrayExpress is a new public database of microarray gene expression data at the EBI, which is a generic gene expression database designed to hold data from all microarray platforms. ArrayExpress uses the annotation standard Minimum Information About a Microarray Experiment (MIAME) and the associated XML data exchange format Microarray Gene Expression Markup Language (MAGE-ML) and it is designed to store well annotated data in a structured way. The ArrayExpress infrastructure consists of the database itself, data submissions in MAGE-ML format or via an online submission tool MIAMExpress, online database query interface, and the Expression Profiler online analysis tool. ArrayExpress accepts three types of submission, arrays, experiments and protocols, each of these is assigned an accession number. Help on data submission and annotation is provided by the curation team. The database can be queried on parameters such as author, laboratory, organism, experiment or array types. With an increasing number of organisations adopting MAGE-ML standard, the volume of submissions to ArrayExpress is increasing rapidly. The database can be accessed at http://www.ebi.ac.uk/arrayexpress.
Article
Full-text available
Proteomics is rapidly evolving into a high-throughput technology, in which substantial and systematic studies are conducted on samples from a wide range of physiological, developmental, or pathological conditions. Reference maps from 2D gels are widely circulated. However, there is, as yet, no formally accepted standard representation to support the sharing of proteomics data, and little systematic dissemination of comprehensive proteomic data sets. This paper describes the design, implementation and use of a Proteome Experimental Data Repository (PEDRo), which makes comprehensive proteomics data sets available for browsing, searching and downloading. It is also serves to extend the debate on the level of detail at which proteomics data should be captured, the sorts of facilities that should be provided by proteome data management systems, and the techniques by which such facilities can be made available. The PEDRo database provides access to a collection of comprehensive descriptions of experimental data sets in proteomics. Not only are these data sets interesting in and of themselves, they also provide a useful early validation of the PEDRo data model, which has served as a starting point for the ongoing standardisation activity through the Proteome Standards Initiative of the Human Proteome Organisation.
Article
Full-text available
ArrayExpress is a public repository for microarray data that supports the MIAME (Minimum Informa-tion About a Microarray Experiment) requirements and stores well-annotated raw and normalized data. As of November 2004, ArrayExpress contains data from ∼12 000 hybridizations covering 35 species. Data can be submitted online or directly from local databases or LIMS in a standard format, and password-protected access to prepublication data is provided for reviewers and authors. The data can be retrieved by accession number or queried by vari-ous parameters such as species, author and array platform. A facility to query experiments by gene and sample properties is provided for a growing subset of curated data that is loaded in to the ArrayExpress data warehouse. Data can be visualized and analysed using Expression Profiler, the integrated data analysis tool. ArrayExpress is available at http://www.ebi.ac.uk/arrayexpress.
Article
Full-text available
We describe an ontology for cell types that covers the prokaryotic, fungal, animal and plant worlds. It includes over 680 cell types. These cell types are classified under several generic categories and are organized as a directed acyclic graph. The ontology is available in the formats adopted by the Open Biological Ontologies umbrella and is designed to be used in the context of model organism genome and other biological databases. The ontology is freely available at http://obo.sourceforge.net/ and can be viewed using standard ontology visualization tools such as OBO-Edit and COBrA.
Article
Full-text available
The generation of large amounts of microarray data and the need to share these data bring challenges for both data management and annotation and highlights the need for standards. MIAME specifies the minimum information needed to describe a microarray experiment and the Microarray Gene Expression Object Model (MAGE-OM) and resulting MAGE-ML provide a mechanism to standardize data representation for data exchange, however a common terminology for data annotation is needed to support these standards. Here we describe the MGED Ontology (MO) developed by the Ontology Working Group of the Microarray Gene Expression Data (MGED) Society. The MO provides terms for annotating all aspects of a microarray experiment from the design of the experiment and array layout, through to the preparation of the biological sample and the protocols used to hybridize the RNA and analyze the data. The MO was developed to provide terms for annotating experiments in line with the MIAME guidelines, i.e. to provide the semantics to describe a microarray experiment according to the concepts specified in MIAME. The MO does not attempt to incorporate terms from existing ontologies, e.g. those that deal with anatomical parts or developmental stages terms, but provides a framework to reference terms in other ontologies and therefore facilitates the use of ontologies in microarray data annotation. The MGED Ontology version.1.2.0 is available as a file in both DAML and OWL formats at http://mged.sourceforge.net/ontologies/index.php. Release notes and annotation examples are provided. The MO is also provided via the NCICB's Enterprise Vocabulary System (http://nciterms.nci.nih.gov/NCIBrowser/Dictionary.do). Stoeckrt@pcbi.upenn.edu Supplementary data are available at Bioinformatics online.
Article
Full-text available
In recent years, as a knowledge-based discipline, bioinformatics has been made more computationally amenable. After its beginnings as a technology advocated by computer scientists to overcome problems of heterogeneity, ontology has been taken up by biologists themselves as a means to consistently annotate features from genotype to phenotype. In medical informatics, artifacts called ontologies have been used for a longer period of time to produce controlled lexicons for coding schemes. In this article, we review the current position in ontologies and how they have become institutionalized within biomedicine. As the field has matured, the much older philosophical aspects of ontology have come into play. With this and the institutionalization of ontology has come greater formality. We review this trend and what benefits it might bring to ontologies and their use within biomedicine.
Article
Full-text available
Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support. We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion. MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML.
Article
Full-text available
This article summarizes the motivation for, and the proceedings of, the first ISA-TAB workshop held December 6-8, 2007, at the EBI, Cambridge, UK. This exploratory workshop, organized by members of the Microarray Gene Expression Data (MGED) Society's Reporting Structure for Biological Investigations (RSBI) working group, brought together a group of developers of a range of collaborative systems to discuss the use of a common format to address the pressing need of reporting and communicating data and metadata from biological, biomedical, and environmental studies employing combinations of genomics, transcriptomics, proteomics, and metabolomics technologies along with more conventional methodologies. The expertise of the participants comprised database development, data management, and hands-on experience in the development of data communication standards. The workshop's outcomes are set to help formalize the proposed Investigation, Study, Assay (ISA)-TAB tab-delimited format for representing and communicating experimental metadata. This article is part of the special issue of OMICS on the activities of the Genomics Standards Consortium (GSC).
Conference Paper
The interpretation and integration of experimental data depends on consistent metadata and uniform annotation. However, there are many barriers to the acquisition of this rich semantic metadata, not least the overhead and complexity of its collection by scientists. We present RightField, a lightweight spreadsheet-based annotation tool for lowering the barrier of manual metadata acquisition; and a data integration application for extracting and querying RDF data from these enriched spreadsheets. By hiding the complexities of semantic annotation, we can improve the collection of rich metadata, at source, by scientists. We illustrate the approach with results from the SysMO program, showing that RightField supports the whole workflow of semantic data collection, submission and RDF querying in Systems Biology. The RightField tool is freely available from http://www.rightfield.org.uk, and the code is open source under the BSD License.
Article
We propose a simple spreadsheet-based format, Microarray Gene Expression Tabular (MAGE-TAB), that will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion. MAGE-TAB will enable laboratories without bioinformatics experience or support to submit data in a standard format using their favorite spreadsheet application. The MAGE- TAB format is self-contained, and does not require understanding of MAGE-ML.
Conference Paper
Modularity is a key requirement for large ontologies in order to achieve re-use, maintainability, and evolution. Mechanisms for 'normalisation' to achieve analogous aims are standard for databases. However, no similar notion of normalisation has yet emerged for ontologies. This paper proposes initial criteria for a two-step normalisation of ontologies implemented using OWL or related DL based formalisms. For the first - "ontological normalisation" - we accept Welty and Guarino's analysis. For the second - "implementation normalisation" - we propose an approach based on decomposing ("untangling") the ontology into independent disjoint skeleton taxonomies restricted to be simple trees, which can then be recombined using definitions and axioms to represent the relationships between them explicitly.
Article
We present the OWL API, a high level Application Programming Interface (API) for working with OWL ontologies. The OWL API is closely aligned with the OWL 2 structural specification. It supports parsing and rendering in the syntaxes defined in the W3C specification (Functional Syntax, RDF/XML, OWL/XML and the Manchester OWL Syntax); manipulation of ontological structures; and the use of reasoning engines. The reference implementation of the OWL API, written in Java, includes validators for the various OWL 2 profiles - OWL 2 QL, OWL 2 EL and OWL 2 RL. The OWL API has widespread usage in a variety of tools and applications.
Conference Paper
Biological knowledge has been, to date, coded by biologists in axiomatically lean bio-ontologies. To facilitate axiomatic enrichment, complex semantics can be encapsulated as Ontology Design Patterns (ODPs). These can be applied across an ontology to make the domain knowledge explicit and therefore available for computational inference. The same ODP is often required in many different parts of the same ontology and the manual construction of often complex ODP semantics is loaded with the possibility of slips, inconsistencies and other errors. To address this issue we present the Ontology PreProcessor Language (OPPL), an axiom-based language for selecting and transforming portions of OWL ontologies, offering a means for applying ODPs. Example ODPs for the common need to represent “modifiers” of independent entities are presented and one of them is used as a demonstration of how to use OPPL to apply it.
Article
Systems biology research is typically performed by multidisciplinary groups of scientists, often in large consortia and in distributed locations. The data generated in these projects tend to be heterogeneous and often involves high-throughput “omics” analyses. Models are developed iteratively from data generated in the projects and from the literature. Consequently, there is a growing requirement for exchanging experimental data, mathematical models, and scientific protocols between consortium members and a necessity to record and share the outcomes of experiments and the links between data and models. The overall output of a research consortium is also a valuable commodity in its own right. The research and associated data and models should eventually be available to the whole community for reuse and future analysis.
The Proteomics Identifications database (PRIDE, http://www.ebi.ac.uk/pride) is one of the main repositories designed to store, disseminate, and analyze mass spectrometry-based proteomics datasets. In this unit, an overview of the PRIDE system is given, including its key satellite tools: the Ontology Lookup Service (OLS), the Protein Identifier Cross-Referencing Service (PICR), and Database on Demand (DoD). Also described in detail are procedures for submitting data to PRIDE, and accessing data stored in PRIDE using the BioMart interface. Finally, to demonstrate the potential of PRIDE as a source for data mining, an example protocol is provided to showcase the powerful cross-domain query capabilities available through a combination of BioMarts.
Article
Motivation: MicroRNAs (miRNAs) are involved in an abundant class of post-transcriptional regulation activated through binding to the 3′ -untranslated region (UTR) of mRNAs. The current wealth of mammalian miRNA genes results mostly ...
SysMO-DB: just enough exchange for systems biology data and models
  • W Müller
  • O Krebs
  • I Rojas
  • K Wolstencroft
  • S Owen
  • S Alexejevs
  • C Goble
  • J Snoep
Müller W, Krebs O, Rojas I, Wolstencroft K, Owen S, Alexejevs S, Goble C, Snoep J. SysMO-DB: just enough exchange for systems biology data and models. Microsoft Research eScience Workshop, Pittsburgh, USA, 2009.
Applying ontology design patterns in bio-ontologies. Knowledge Engineering: Practice and Patterns
  • M Egana
  • A Rector
  • R Stevens
  • A Antezana
Egana M, Rector A, Stevens R, Antezana A. Applying ontology design patterns in bio-ontologies. Knowledge Engineering: Practice and Patterns, Proceedings 2008; 5268:7-16.