The MGED Ontology: a resource for semantics-based description of microarray experiments.
ABSTRACT The generation of large amounts of microarray data and the need to share these data bring challenges for both data management and annotation and highlights the need for standards. MIAME specifies the minimum information needed to describe a microarray experiment and the Microarray Gene Expression Object Model (MAGE-OM) and resulting MAGE-ML provide a mechanism to standardize data representation for data exchange, however a common terminology for data annotation is needed to support these standards.
Here we describe the MGED Ontology (MO) developed by the Ontology Working Group of the Microarray Gene Expression Data (MGED) Society. The MO provides terms for annotating all aspects of a microarray experiment from the design of the experiment and array layout, through to the preparation of the biological sample and the protocols used to hybridize the RNA and analyze the data. The MO was developed to provide terms for annotating experiments in line with the MIAME guidelines, i.e. to provide the semantics to describe a microarray experiment according to the concepts specified in MIAME. The MO does not attempt to incorporate terms from existing ontologies, e.g. those that deal with anatomical parts or developmental stages terms, but provides a framework to reference terms in other ontologies and therefore facilitates the use of ontologies in microarray data annotation.
The MGED Ontology version.1.2.0 is available as a file in both DAML and OWL formats at http://mged.sourceforge.net/ontologies/index.php. Release notes and annotation examples are provided. The MO is also provided via the NCICB's Enterprise Vocabulary System (http://nciterms.nci.nih.gov/NCIBrowser/Dictionary.do).
Supplementary data are available at Bioinformatics online.
- SourceAvailable from: Deborah L Mcguinness[Show abstract] [Hide abstract]
ABSTRACT: The field of translational biomedical informatics seeks to integrate knowledge from basic science, directed research into diseases, and clinical insights into a form that can be used to discover effective treatments of diseases. Currently, representations of experimental provenance reside in models specific to each sub-domain: biospecimen management tools track the histories of biospecimens, how they are handled and disposed; high throughput assay-based experiments are described in a format specifically designed for that data; and experimental workflow systems, such as Laboratory Information Management Systems (LIMS), each represent their portion of the research pipeline using models specifically designed to those tasks. In recent years, a concept of a general-purpose provenance model has emerged from the computational workflow domain. In bioinformatics there has been an explosion of data due to the use of high-throughput assays such as microarrays for research in biology and biomedicine. These assays are used to produce data for experiments based on the current gene expression of cells, commonly expressed polymorphisms, determining the epigenetic regulation of genes, and many others. During this time, the community has developed and adopted a standard for describing experiments and the data that they generate. Adoption of these standards, along with data sharing requirements from funding institutions, has resulted in the publication of tens of thousands of high throughput experiments performed over the last ten years. Because of this, it has become the de-facto format for describing experiments in biomedicine. This standard is referred to as the MAGE (MicroArray and Gene Expression) standard. Like with other parts of the translational research pipeline, these experimental representations are primarily representations of workflow, but are not currently integrated with other types of biomedical data. We propose a vision for a common model of provenance representations across the translational research pipeline, and show that one of the largest sources of data in that pipeline, microarray-based experiments, can be accurately represented in general-purpose models of provenance that are already used to represent computational workflows. We demonstrate methods and tools to generate RDF representations of a commonly used MAGE format, MAGE-TAB, mappings of MAGE documents to two general-purpose provenance representations, OPM (Open Provenance Model) and PML (Proof Markup Language). We show through a use case simulation that the data represented in MAGE documents can be completely represented in OPM and PML through use of round trip analysis of certain examples. The success in mapping MAGE documents into general-purpose provenance models shows that promise in the implementation of the translational research provenance vision.
- [Show abstract] [Hide abstract]
ABSTRACT: The life sciences field is entering an era of big data with the breakthroughs of science and technology. More and more big data-related projects and activities are being performed in the world. Life sciences data generated by new technologies are continuing to grow in not only size but also variety and complexity, with great speed. To ensure that big data has a major influence in the life sciences, comprehensive data analysis across multiple data sources and even across disciplines is indispensable. The increasing volume of data and the heterogeneous, complex varieties of data are two principal issues mainly discussed in life science informatics. The ever-evolving next-generation Web, characterized as the Semantic Web, is an extension of the current Web, aiming to provide information for not only humans but also computers to semantically process large-scale data. The paper presents a survey of big data in life sciences, big data related projects and Semantic Web technologies. The paper introduces the main Semantic Web technologies and their current situation, and provides a detailed analysis of how Semantic Web technologies address the heterogeneous variety of life sciences big data. The paper helps to understand the role of Semantic Web technologies in the big data era and how they provide a promising solution for the big data in life sciences.Bioscience trends 01/2014; 8(4):192-201. · 1.21 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult.Journal of Biomedical Semantics 01/2014; 5(Suppl 1 Proceedings of the Bio-Ontologies Spec Interest G):S3.
The MGED Ontology; A resource for semantics-based description of
Patricia L. Whetzel*1, Helen Parkinson2, Helen C. Causton3, Liju Fan4, Jennifer
Fragoso6, Laurence Game3, Mervi Heiskanen6, Norman Morrison7, Philippe Rocca-
Susanna-Assunta Sansone2, Chris Taylor2, Joseph White8, Christian J. Stoeckert,
1. Center for Bioinformatics and Department of Genetics, University of Pennsylvania
School of Medicine, USA
2. European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SD, UK
3. MRC Clinical Sciences Centre, Faculty of Medicine, Imperial College,
Hammersmith Hospital Campus, DuCane Road, London W12 0NN, UK
4. Ontology Workshop LLC, P.O. Box 182, Columbia, MD 21045-9998, USA
5. NIEHS PO Box 12233 MD F1-05, 111 Alexander Drive
Research Triangle Park, NC, 27709-2233, USA
6. NCICB, NCI Center for Bioinformatics, 6116 Executive Blvd, Rockville, MD 20852,
7. Department of Computer Science, Kilburn Building University of Manchester,
Oxford Road, Manchester, M13 9PL, UK
8. Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA
Associate Editor: Alvis Brazma
* Corresponding authors
The generation of large amounts of microarray data and the need to share this data
bring challenges for both data management and annotation and highlights the need
for standards. MIAME specifies the minimum information needed to describe a
microarray experiment and the MAGE Object Model (MAGE-OM) and resulting
MAGE-ML provide a mechanism to standardize data representation for data
exchange, however a common terminology for data annotation is needed to support
Here we describe the MGED Ontology (MO) developed by the Ontology Working
Group of the Microarray Gene Expression Data (MGED) Society. The MO provides
terms for annotating all aspects of a microarray experiment from the design of the
experiment and array layout, through to the preparation of the biological sample and
the protocols used to hybridize the RNA and analyze the data. The MO was
developed to provide terms for annotating experiments in line with the MIAME
guidelines that is, to provide the semantics to describe a microarray experiment
according to the concepts specified in MIAME. The MO does not attempt to
incorporate terms from existing ontologies, e.g. those that deal with anatomical parts
or developmental stages terms, but provides a framework to reference terms in other
ontologies and therefore facilitates the use of ontologies in microarray data
© The Author (2006). Published by Oxford University Press. All rights reserved. For Permissions, please email: firstname.lastname@example.org
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access
version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press
are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety
but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact email@example.com
Bioinformatics Advance Access published January 19, 2006
Availability: The MGED Ontology version.1.2.0 is available as a file in both DAML
and OWL formats at: http://mged.sourceforge.net/ontologies/index.php. Release
notes and annotation examples are provided. The MO is also provided via the
NCICB’s Enterprise Vocabulary System
Microarray experiments are both complex and high-throughput, so data storage,
management, exchange and annotation present challenges for biologists and
bioinformaticians. There are a variety of academic and commercial database
systems available (Gardiner-Garden, 2001) for laboratories and institutions as well as
community resources such as ArrayExpress (Parkinson, et al., 2005), the Gene
Expresssion Omnibus (GEO) (Barrett et al., 2005) and the Center for Information
Biology Gene Expression Database (CiBEX) (Ikeo, et al., 2003) that provide access
to public microarray data. The development and use of the Microarray Gene
Expression Object Model (MAGE-OM), and the related XML format (MAGE-ML)
(Spellman, et al., 2002) have provided a common syntactic format for data exchange
and a structure that can capture data described according to the MIAME (Minimum
Information About at Microarray Experiment) guidelines (Brazma, et al., 2001).
However, neither MIAME nor the MAGE-OM provides explicit terminology to
annotate this complex domain. We are therefore faced with the problem of
consistently describing methodology, experimental design, sequences and biological
samples across diverse resources
The MO was developed to provide the semantics required to support the MAGE-OM
and as a resource for the development of tools for microarray data acquisition and
query (See Figure 1). The MO is primarily an ontology used to annotate microarray
experiments, however it contains concepts that are universal to other types of
functional genomics experiments such as protocol and experiment design and can
thus also be used for annotation of some of the data in these domains. The major
component of the ontology involves biological descriptors relating to samples or their
processing; it is not an ontology of molecular, cellular or organismal biology, such as
the Gene Ontology (Gene Ontology Consortium, 2001).
The MGED Ontology content and structure
The MO is a semantic resource that includes terminology for all aspects of
microarray experiments. It was developed by the microarray community and is a
species neutral ontology that focuses on the commonalities among experiments
rather than the differences between them. In building the MO, we evaluated which
ontological resources were needed to describe microarray experiments and
developed use cases based on queries of experimental meta-data. Many of the
authors manage and/or develop microarray databases and the annotation provided
by users of these resources was used as a source of concepts for the ontology in the
preliminary card sorting exercise. These contributed to the biological content of the
MO. Concepts were mapped between contributors, defined and properties and
synonyms were created. The MO was initially released in DAML+OIL format and
later in OWL. This set of classes is meant to fulfil the needs of users for annotating
biological samples, experiments and sample processing during a microarray
Users of the MAGE-OM (and the related exchange format MAGE-ML) have
contributed to the MO; and in part the MO was developed to support the annotation
of data in MAGE-ML format (Figure 1). The need to support MAGE has had a
significant impact on the top-level structure of the MO, while the requirements of the
data-generating community have largely determined the content. The impact this has
had on the MO is explored below. Although the MO was primarily developed for use
by the microarray gene expression community the ontology, like the MAGE-OM, can
also be used to describe experiments generated on other functional genomics
platforms such as array-centric comparative genome hybridization (CGH), chromatin
immunoprecipitation on a chip (location analysis) or proteomics experiments and is
currently being used for these purposes.
Structure of the MGED Ontology
The MGED ontology consists of two parts: a stable core ontology and an extended
ontology . MO version 1.2 contains 229 classes, 110 properties and 658 instances
(individuals). The core ontology includes a minimal semantic set that is stable for use
in production software and contains all necessary MAGE classes to map the MO
content to the MAGE-OM, while the extended ontology permits further development.
This bipartite model is also used in the mmCIF vocabulary as part of the Protein Data
Bank (Berman, et al., 2000) and permits evolution of content while ensuring that the
basic structure needed for related applications is maintained. Although subclasses
are used to organise instances the MGED Core Ontology (MCO) is not highly nested
so that it can readily be presented in web-based applications. MCO classes that are
referenced in multiple MAGE-OM packages, such as DataType and Scale, are direct
subclasses of the MCO. The MCO also contains classes to track terms that have
been deprecated and the reason for deprecation.
There are four types of classes used in the MO:
Instantiated MO classes are those that refer to parts of the microarray
experiment and contain terms that are common to many experiments.
They can be described in terms of properties, contained instances and
subclasses (and their properties and values). For example SurfaceType is
instantiated within the MO. (Figure 2)
Abstract classes used to provide organization and structure to the MO.
For example, the abstract ExperimentDesignType class provides
organization to several instantiated subclasses for types of experiments
addressing the effects of compounds (PerturbationalDesign class) or
address the differences between strains (BiologicalProperty class) and
instances that describe a particular type of experiment e.g.
time_series_design are provided.
Abstract classes used to represent MAGE classes that have an ontology
entry association to allow developers to identify which MO terms to use.
For example the PhysicalArrayDesign class is a MAGE class represented
in the MO as it has an ontology entry association called SurfaceType.
Abstract classes that are subclasses of OntologyEntry which are
instantiated from some other identified resource. For example Organism,
MGED Core Ontology (MCO)
The MCO hierarchy reflects the structure of the packages in the MAGE-OM and
represents a set of IS-A relationships in the sense that all the classes are a kind of
descriptor for microarray experiments. The top-level classes mimic the MAGE –OM
structure and were provided for software developers using MAGE-OM and requiring
MO to annotate their MAGE-ML. The lower level classes contain the experimental
details used by annotators of microarray experiments and are usually presented in
the context of some annotation or query application, The top-level MCO class names
therefore are the same as the packages in the MAGE-OM and the MCO instantiated
classes are named after the association to the MAGE-OM OntologyEntry class. The
MCO does not duplicate the entirety of MAGE-OM, but includes only those classes in
MAGE-OM that have an association to the OntologyEntry class. Therefore,
navigating from MAGE-OM to the MO requires no concept mapping. This decision
was taken after discussion with the developers of MAGE-OM and with the input of
the MGED advisory board. The alternative – to build a stand alone ontology and map
it to MAGE-OM later was not practical as there was considerable demand for the MO
from those using the MAGE-OM. A MAGE-OM view is therefore explicit within the
MO. The MCO uses organising subclasses so that similar types of terms are grouped
together within a class, these obey the is-a heirarchy. For example, the class
ExperimentDesignType contains five subclasses: PertubationalDesign,
MethodologicalDesign, BiologicalProperty, EpidemiologicalDesign,
BioMolecularAnnotation. The additional subclasses separate terms such as
compound_treatment_design from replicate_design and reduces the list from 52
terms for all classes of ExperimentDesignType to a maximum of 16 terms within the
MO Classes, properties and attributes
Experimental or sample descriptors in the MO fall into one of three categories: the
types of information (classes) that need to be captured, their properties (attributes)
and the actual values (instances) used. All classes, properties, and instances in MO
are defined in natural language. Synonyms, exact and non-exact, are included in the
definition for the term as OilEd, the software used for the initial development of the
MO, has limited synonym handling at the instance level (Bechhofer et al., 2001).
For example in a hypothetical study in which mice were injected with a drug
categories or classes for ‘Organism’ are provided in the MCO, to indicate that mice
were used, for ‘Compound’ to indicate which substance, drug or chemical was used,
and for ‘Treatment’ to indicate how the compound was administered to the mice.
Classes are also provided for Age, Sex, Strain and other characteristics relating to
the mice. The classes from the MCO can be instantiated or abstract as described in
the previous section.
Abstract classes (type iii) having instances external to the MO are all subclasses of
the OntologyEntry class and inherit properties including a reference to a database
and a URI. The database entry association specifies the type of semantic resource
e.g., organism database, compound database, and the URI provides the web
address of the resource. This information identifies the term as being external to the
MO and the class that it instantiates as internal to the MO.
Classes of this type, such as Compound, cannot easily be provided in an itemised list
within the MCO as the number of terms needed is large and such terms are present
in external resources. Many of these classes are the focus of efforts by other groups
to generate ontologies or various types of controlled vocabularies. MO therefore
provides pointers to relevant efforts, for example, in the case of ‘Compound’ as
ChemIDplus (Tomasulo, 2002), available from the National Library of Medicine,
which includes 350,000 chemical records that can be searched by CAS Registry
Other examples of this type of abstract class include ‘organism’, for which the
taxonomy is available from the National Center for Biotechnology Information
(Wheeler, et al., 2005) and ‘disease’. For some classes multiple non-orthogonal
choices are available, such as GALEN (Rogers, et al., 2001), ICD-9, and the nascent
Disease Ontology (http://diseaseontology.sourceforge.net/). It is clear that in some
cases there are competing efforts, e.g. there are several mammalian anatomy
ontologies. The MGED Ontology does not attempt to provide mappings between
synonymous terms in different ontologies, or preferentially recommend one over the
other instead, it provides source information for these terms, which in turn can be
On occasion, an external ontology emerges which supersedes part of the MO. The
Sequence Ontology (SO) (Eilbeck, 2005) is used for semantics relating to sequence
features and describes properties of the sequences represented on the array (exon,
gene etc). The SO was found to be non orthogonal with instances from the MO class
BioSequenceType. A mapping was therefore performed between the MGED ontology
terms and the SO terms. As the SO has matured the corresponding MO terms have
been deprecated in favour of using the SO directly.
Where there are incomplete term lists MO can be used to extend these, for example,
instances of light units were absent from the list of terms provided by the MAGE-OM
and were therefore included in the MO. The MO is extensible while the MAGE-OM is
not and it is likely that future versions of the MAGE-OM will devolve all semantic
content to a supporting ontology.
Using and accessing the MO
The MO is primarily used in three ways
i) Embedded within an application to annotate or query microarray data,
e.g., by biologists who may have little knowledge of the MO structure
Directly for annotating microarray data, e.g., by an annotator
For producing an application that uses the MO, e.g., by a software
This diversity among uses and user groups is similar to that of the Gene Ontology
which is used in many applications including direct use by annotators who select
appropriate terms for a given gene product. Access to the MO is provided in line
with the needs of each of these user groups.
i) MO files are available in their native OWL format with release notes for
developers who typically parse the OWL file and use it locally to build an
application seen by biologists
Via web browser access of the NCI Metathesaurus which allows the tree
structure to be visualised and navigated
Via a web page where a URL identifies each each Class, Property, or
instance in the ontology e.g.
In anticipation of providing MO terms through web services, the MO is registered with
Use of the MO for data annotation
Use of the MO is best demonstrated by considering an example in which the
ontology is used to describe part of a microarray experiment. The information
obtained from the biologist is free text:
‘A murine embryo fibroblast cell line (Swiss 3T3-L1) was plated out. Two
plates were treated with 10 nM insulin, two with 100 nM insulin and the other
two were left untreated. The cells were harvested after 4 hours incubation.’
This description can be annotated using terms from the MO (Figure 3)
The experiment is a kind of PerturbationalDesign, and instances from this class
dose_response_design, compound_treatment_design further describe how the
experiment was conducted The cell type and cell lines are described using the MO
terms ‘CellLine’ and ‘CellType’ respectively, however, the MO does not include
instances that specify particular cell lines or cell types so other, domain specific,
ontologies need to be referenced. Here the MO is used to refer to the terms
‘Fibroblast’ and ‘3T3-L1 Cells’ from the NCI Metathesaurus. Further examples of
how the MO can be used to annotate experiments can be found at
Systematically annotated and published experiments can also be downloaded, along
with the MAGE-ML used for data transfer from public repositories such as
ArrayExpress. One example of a published experiment that has been annotated
using the MO and exported as MAGE-ML can be accessed at
ption= (Kemp et al., 2003).
Encoding the MO in MAGE-ML
MO concepts are typically expressed as MAGE-ML when annotated microarray data is
exchanged. The MAGE-OM recognizes that semantics are required and provides a
mechanism to provide semantic content via the MAGE-OM OntologyEntry. The MAGE-ML
format was not built to express complex concepts parsimoniously and relationship types
cannot currently be expressed in MAGE due to limitations in the MAGE-OM. As a
consequence, the MAGE-ML structure becomes complex when represented in MAGE (even
though the ontology is not deeply nested) and leads to XML bloat and the need for a rule-
based system for application-processing semantics. This has been implemented by
ArrayExpress and is used to process complex MAGE-ML coding to a simpler state for local
queries. The XML bloat inherent in the representation of any ontology in MAGE-ML will not be
addressed completely until the next version of MAGE becomes available, so annotation
examples and pseudo code have been generated to assist developers to use the MO in the
context of the MAGE-OM. These examples are provided to promote consistent use of the
MO. An ontology helper module for the MAGEstk (Spellman, et al., 2002) for both Java and
Perl is also under development to support coding of the MO in MAGE-ML (code available
Use of the MO in applications for data annotation
The MO has been implemented in web-based microarray annotation applications
(Table 1) such as MIAMExpress (Parkinson, et al., 2005), Tox-MIAMExpress
(Mattes, et al., 2004), RAD Study Annotator (Manduchi, et al., 2004) and MiMiR
(Navarange et al. 2005). These applications provide forms for annotating the
components of a microarray experiment specified by MIAME and the MO terms are
typically presented in menus from which terms may be selected as part of a web
interface. Different strategies have been chosen for managing the MO. RAD
databases a local copy of the MO, maxdLoad2 presents a simplified abstraction of
the MO graph while utilizing the full set of terms if desired, and MIAMExpress
abstracts instantiated classes for local use. Tox-MIAMExpress abstracts those MO
classes relevant to the description of chemical treatments and toxicological endpoints
(e.g., Compound, Histology, Observation for macroscopic records, Test for clinical
chemistry assays). Once the data is submitted to a public repository such as
ArrayExpress, ontology-driven annotation will provide users with a powerful means to
query microarray experiments. The MO has also been made available directly via the
NCICB’s Enterprise Vocabulary System (Covitz et al. 2003) and is used by NCICB
applications such as caArray.
Revising and Extending the MO
The initial motivation for development of the MO was provided by the microarray data
community who presented a real and immediate need for terms for data description
and support for the MAGE-OM. Although much of the terminology needed by the
community was provided in the early releases, technology is evolving rapidly and
examples of novel requirements for data annotation arise continually. This however
can conflict with the need to maintain the stable core structure. The MO can therefore
be extended in the following two ways
i) By adding new Classes and/or instances to the MGED Extended
By addition of new instances to existing classes according to development
The MEO provides a framework for adding new classes that are not currently part of
the MCO. This ensures that the wider community can identify new terms for data
annotation within the MO and see the relationships among them, promotes
systematic use of terminology, and allows areas for further development to be readily
identified for future releases. The MEO also contains classes from previous versions
that represent knowledge we want to maintain, but which do not fit into the current
version of the MO.
When a term required for annotating an experiment is not available in the MO users
may add their own terms and definitions using one of the applications implementing
MO. User defined terms are curated by the MO developers via the MO tracker and
are added to the MO provided they are i) not domain or species specific and ii) are
orthogonal (do not overlap) with existing concepts. The MO web site also provides
release notes for each version of the MO that represent approved changes to the MO
such as corrections, or new instances. MO development and maintenance activities
such as proposals for new terms or modifications to definitions are discussed via the
MO tracker and curated by the MGED Ontology working group. (Figure 4).
The MO supports MAGE-OM v1 and v1.1 and provides descriptors for microarray
experiments for use by biologists and software developers. The MO is in active use
by both of these communities of users, however, the ontology is also evolving in line
with their needs. Areas for future development include the addition of terms for
describing normalization and data transformation, and the review of existing term
usage in resources using the MO.
Changes are also being made to leverage the improved representational power
provided by OWL (the ontology was migrated from DAML+OIL to OWL
representation for this reason). Changes include the use of synonyms in definitions of
terms, the display of class trees (see
http://mged.sourceforge.net/ontologies/MGEDontology.php for a summary of
changes made) and use of Annotation properties for annotating MAGE classes
The MO is provided as a Resource Description Framework (RDF)-based file in either
the DAML or OWL formats. This format enables direct programmatic queries in the
form of web services that use software libraries which parse the RDF graph from
XML (e.g., http://www.redland.opensource.ac.uk/ ). We envision searching for MO
terms via web services at central registries such as BioMOBY
(http://www.biomoby.org/ ) and through annotation forms provided as part of
microarray data management applications. Thus, anyone requiring a term from the
most recent version of MO would be able to use the web service from their
application to view the available data for classes, properties and instances and the
relationships between them.
The MO has been implemented in annotation tools such as MIAMExpress, the RAD
Study Annotator, SMD, MiMiR and others (Table 1). The groups managing and
populating these resources collectively generate large amounts of data that present a
rich source of information annotated with a common terminology. The use of
common annotation among laboratories and experiments is expected to enhance the
utility of all the data and to facilitate queries and data mining and thousands of
experiments have been annotated using the MO to date.
The MO was originally developed to support the annotation of microarray
experiments, however, many of the MO classes describing biomaterials, protocols,
and experimental design are independent of the technology used and applies to
other functional genomics technologies (such as mass spectrometry, in situ
hybridization etc.). It is hoped that initiatives to provide standards in these other
domains will leverage the terms and relationships contained in the MO. Work towards
the development of a Functional Genomics Ontology (FuGO) has already begun as
part of a collaboration between the MGED Ontology Working Group, the MGED
Reporting Structure for Biological Investigations (RSBI,
http://www.mged.org/Workgroups/rsbi/rsbi.html ), the HUPO Proteomics Standards
Initiative (http://psidev.sourceforge.net/ ) and the Metabolomic Society
(http://www.metabolomicssociety.org/mstandards.html, Linden et al. 2005 ) working
groups. The resulting ontology will provide a consistent mechanism for annotating
functional genomics experiments that encompass different technological and
biological domains and assist in comparison of data across modalities. In the same
way that the MO was developed in parallel with the MAGE-OM, FuGO will be
developed in parallel with a Functional Genomics Object Model (FuGE –
(http://fuge.sourceforge.net/ ). The problems of representing complex semantics in an
XML format, and the need to permit evolution of the ontology which have been
problematic for the MO will inform such developments. In particular the difficulties in
modelling a complex domain and developing an ontology simultaneously have
resulted in a product that is MAGE-OM centric and therefore of limited use with other
object models. We hope to avoid this in future by providing mapping to relevant
object models rather than encoding these in the ontology. With this in mind we are
currently reviewing the MO, with a view to participating in the development of FuGO.
While FuGO is being developed the MO will continue to be maintained and extended
for use in microarray-specific applications.
We would like to thank the members of the Ontology Working Group that have
contributed to the MGED Ontology especially Catherine Ball, Paul Spellman, John
Matese and Angel Pizarro. We would also like to thank Robert Stevens for his help
and guidance and to the reviewers who provided a number of constructive comments
that significantly improved this manuscript. This work was supported in part by NIH
grant 1P41HG003619-01 and NIH-NIEHS contract 273-02-C-0027.
Ball, C.A., Awad, I.A., Demeter, J., Gollub, J., Hebert, J.M,. Hernandez-Boussard, T.,
et al., (2005) The Stanford Microarray Database accommodates additional
microarray platforms and data formats. Nucleic Acids Res. 33:D580-D582.
Barrett, T., Suzek, T.O., Troup, D.B., Wilhite, S.E., Ngau, W.C., Ledoux, P., et al.,
(2005) NCBI GEO: mining millions of expression profiles--database and tools.
Nucleic Acids Res. 33:D562-D566.
Bechhofer S, Horrocks I, Goble C, Stevens R. 2001. OilEd: a Reason-able Ontology
Editor for the Semantic Web. Proceedings of KI2001 2174:396-408
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H.,
Shindyalov, I.N. and Bourne, P.E. (2000) The Protein Data Bank, Nucleic Acids Res,
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert,
C., Aach, J., Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P.,
Holstege, F.C., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson, H., Robinson, A.,
Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vilo, J. and Vingron, M.
(2001) Minimum information about a microarray experiment (MIAME)-toward
standards for microarray data, Nat Genet, 29, 365-371.
Covitz PA, Hartel F, Schaefer C, De Coronado S, Fragoso G, Sahni H, Gustafson S,
Buetow KH. (2003) caCORE: a common infrastructure for cancer informatics.
Bioinformatics. 19, 2404-2412.
Eilbeck, K. (2005) The Sequence Ontology. Comparative and Functional Genomics,
Gardiner-Garden, M.a.L.T.G. (2001) A comparison of microarray databases,
Briefings in Bioinformatics, 2, 143-158.
Gene Ontology Consortium. (2001) Creating the gene ontology resource: design and
implementation. Genome Res. 11, 1425-1433
Ikeo, K., Ishi-i, J., Tamura, T., Gojobori, T. and Tateno, Y. (2003) CIBEX: center for
information biology gene expression database, C R Biol, 326, 1079-1082.
Kemp, T.J., Causton, H.C. and Clerk, A. (2003) Changes in gene expression induced
by H2O2 in cardiac myocytes: H2O2 promotes potent and sustained upregulation of
p21CIP1/Waf1, Biochem. Biophys. Res. Comm., 307, 416-421.
Lindon, J.C., Nicholson J.K., Holmes E., Keun H.C., Craig A., Pearce J.T., Bruce
S.J., Hardy N., Sansone S.A., Antti H., Jonsson P., Daykin C., Navarange M., Beger
R.D., Verheij E.R., Amberg A., Baunsgaard D., Cantor G.H., Lehman-McKeeman L.,
Earll M., Wold S., Johansson E., Haselden J.N., Kramer K., Thomas C., Lindberg J.,
Schuppe-Koistinen I., Wilson I.D., Reily M.D., Robertson D.G., Senn H., Krotzky A.,
Kochhar S., Powell J., van der Ouderaa F., Plumb R., Schaefer H., Spraul M.;
Standard Metabolic Reporting Structures working group. (2005) Summary
recommendations for standardization and reporting of metabolic analyses. Nat
Biotechnol 23, 833-838.
Manduchi, E., Grant, G.R., He, H., Liu, J., Mailman, M.D., Pizarro, A.D., Whetzel,
P.L. and Stoeckert, C.J., Jr. (2004) RAD and the RAD Study-Annotator: an approach
to collection, organization and exchange of all relevant information for high-
throughput gene expression studies, Bioinformatics, 20, 452-459.
Mattes, W.B., Pettit, S.D., Sansone, S.A., Bushel, P.R., Waters, M.D. (2004).
Database development in toxicogenomics: issues and efforts. Environ Health
Navarange, M., Game, L., Fowler, D., Wadekar, V., Banks, H., Cooley, N., Rahman,
F., Hinshelwood, J., Broderick, P. and H.C. Causton. (2005) MiMiR: A
comprehensive solution for storage, annotation and exchange of microarray data.
BMC Bioinformatics 6, 268-277.
Parkinson, H., Sarkans, U., Shojatalab, M., Abeygunawardena, N., Contrino, S.,
Coulson, R., Farne, A., Lara, G.G., Holloway, E., Kapushesky, M., Lilja, P.,
Mukherjee, G., Oezcimen, A., Rayner, T., Rocca-Serra, P., Sharma, A., Sansone, S.
and Brazma, A. (2005) ArrayExpress--a public repository for microarray gene
expression data at the EBI, Nucleic Acids Res, 33 Database Issue, D553-555.
Rogers, J., Roberts, A., Solomon, D., van der Haring, E., Wroe, C., Zanstra, P. and
Rector, A. (2001) GALEN ten years on: tasks and supporting tools, Medinfo, 10, 256-
Spellman, P.T., Miller, M., Stewart, J., Troup, C., Sarkans, U., Chervitz, S., Bernhart,
D., Sherlock, G., Ball, C., Lepage, M., Swiatek, M., Marks, W.L., Goncalves, J.,
Markel, S., Iordan, D., Shojatalab, M., Pizarro, A., White, J., Hubley, R., Deutsch, E.,
Senger, M., Aronow, B.J., Robinson, A., Bassett, D., Stoeckert, C.J., Jr. and Brazma,
A. (2002) Design and implementation of microarray gene expression markup
language (MAGE-ML), Genome Biol, 3, RESEARCH0046.
Tomasulo, P. (2002) ChemIDplus-super source for chemical and drug information,
Med Ref Serv Q, 21, 53-59.
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Church, D.M.,
DiCuccio, M., Edgar, R., Federhen, S., Helmberg, W., Kenton, D.L., Khovayko, O.,
Lipman, D.J., Madden, T.L., Maglott, D.R., Ostell, J., Pontius, J.U., Pruitt, K.D.,
Schuler, G.D., Schriml, L.M., Sequeira, E., Sherry, S.T., Sirotkin, K., Starchenko, G.,
Suzek, T.O., Tatusov, R., Tatusova, T.A., Wagner, L. and Yaschenko, E. (2005)
Database resources of the National Center for Biotechnology
Information, Nucleic Acids Res, 33 Database Issue, D39-45.
Table 1: Microarray resources that use the MGED Ontology
Figure 1. Illustration of the MO usage in annotation and data transfer with MAGE-ML.
Local applications (Table 1) provide terms from the MO organized by MO Classes.
These are generally stored in local relational databases from which MAGE-ML can
be generated. Data in the MAGE-ML can be transferred between a number of
applications and databases, including microarray data repositories in the public
domain such as ArrayExpress and GEO.
Figure 2. Class hierarchy of the MGED Ontology and relationship to the MAGE-OM.
In this example, the MAGE-OM specifies a “surfaceType” association to
OntologyEntry from PhysicalArrayDesign. Terms (polylysine, aminosilane,
unknown_surface_type) for surface type can be found in the MO in the class
“SurfaceType” which is located in the ArrayDesignPackage class. The relationship of
SurfaceType to PhysicalArrayDesign is captured in MO:
(PhysicalDesignType has_type SurfaceType).
Figure 3. Panel a) shows an expanded view of the MO and the terms that are
relevant for describing the design of an experiment in which cells were treated with
one of two concentrations of insulin. Panels b) and c) illustrate how this information is
represented in MiMiR (Navarange et al., 2005), one of the applications used for data
annotation and management that incorporates the MO. Terms selected from the
MGED Ontology have the prefix ‘MO:’ and those from the NCI Metathesaurus have
the prefix ‘NCI:’.
Figure 4. Views of the MGED Ontology. Panel a) shows an html version of the MGED
Ontology is available at
http://mged.sourceforge.net/ontologies/MGEDontology.php along with links to files,
notes, and other views. Panel b) The MO tracker at Sourceforge is used to
The MO was initially constructed in the DAML+OIL ontology (http://www.daml.org )
language using the OilEd ontology editor (Bechhofer et al., 2001). However,
DAML+OIL has now been replaced by the Web Ontology Language (OWL) as the
new standard language for ontologies. The MGED Ontology is now available in the
DAML+Oil, html and OWL formats
(http://mged.sourceforge.net/ontologies/MGEDontology.php ). The export
‘OWL/RDF file’ function of OilEd was used to generate the MO OWL file. Changes
that had to be made prior to export included prepending an underscore to any term
starting in a number (e.g.,“32P” became “_32P”), replacing “/” with “per” and
replacing URIs with an xml datatype (string). After export, manual changes included
global replacements of: xmlns:ns0 with xmlns; MGEDOntology.daml with
MGEDOntology.owl; and deletion of ns0:. Future development of the MO will be in
Protégé-OWL. The current DAML+OWL version 1.2 of the MO is the last version that
will be made available in this format.
<Database_assnref> <Database_ref identifier="MO"/>
documentation: Descriptions pertaining to the array.
constraints: restriction has_type has-class SurfaceType