Conference PaperPDF Available

Ontologies Come of Age.

Authors:

Abstract and Figures

Ontologies have moved beyond the domains of library science, philosophy, and knowledge representation. They are now the concerns of marketing departments, CEOs, and mainstream business. Research analyst companies such as Forrester Research report on the critical roles of ontologies in support of browsing and search for e-commerce and in support of interoperability for facilitation of knowledge management and configuration. One now sees ontologies used as central controlled vocabularies that are integrated into catalogues, databases, web publications, knowledge management applications, etc. Large ontologies are essential components in many online applications including search (such as Yahoo and Lycos), e- commerce (such as Amazon and eBay), configuration (such as Dell and PC-Order), etc. One also sees ontologies that have long life spans, sometimes in multiple projects (such as UMLS, SIC codes, etc.). Such diverse usage generates many implications for ontology environments. In this paper, we will discuss ontologies and requirements in their current instantiations on the web today. We will describe some desirable properties of ontologies. We will also discuss how both simple and complex ontologies are being and may be used to support varied applications. We will conclude with a discussion of emerging trends in ontologies and their environments and briefly mention our evolving ontology evolution environment. Introduction: The web's growing needs
Content may be subject to copyright.
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
Ontologies Come of Age
Deborah L. McGuinness
Associate Director and Senior Research Scientist
Knowledge Systems Laboratory
Stanford University
Stanford, CA 94305
dlm@ksl.stanford.edu
Abstract
Ontologies have moved beyond the domains of library science, philosophy, and knowledge representation.
They are now the concerns of marketing departments, CEOs, and mainstream business. Research analyst
companies such as Forrester Research report on the critical roles of ontologies in support of browsing and
search for e-commerce and in support of interoperability for facilitation of knowledge management and
configuration. One now sees ontologies used as central controlled vocabularies that are integrated into
catalogues, databases, web publications, knowledge management applications, etc. Large ontologies are
essential components in many online applications including search (such as Yahoo and Lycos), e-
commerce (such as Amazon and eBay), configuration (such as Dell and PC-Order), etc. One also sees
ontologies that have long life spans, sometimes in multiple projects (such as UMLS, SIC codes, etc.). Such
diverse usage generates many implications for ontology environments.
In this paper, we will discuss ontologies and requirements in their current instantiations on the web today.
We will describe some desirable properties of ontologies. We will also discuss how both simple and
complex ontologies are being and may be used to support varied applications. We will conclude with a
discussion of emerging trends in ontologies and their environments and briefly mention our evolving
ontology evolution environment.
Introduction: The web’s growing needs
We may be poised for the next major evolution of online environments. In the early days of the web,
HTML pages were generated by hand. The pages contained information about how to present information
on a page. Early adopters took to the web quickly since it provided a convenient method for information
sharing. Arguably, the generation of tools for machine generation and management of web pages allowed
the web to really take off. Tool platforms allowed non-technical people to generate and publish web pages
quickly and easily. The resulting pages typically included content and display information and targeted
human readers (rather than targeting programs or automatic readers).
The web continues to grow at an astounding rate with web pages ubiquitously integrated into many aspects
of business and personal life. However, web pages still preserve much of their character of being aimed at
human consumption. Thus, applications such as search still require humans to review results pages in
order to find the right answer to their queries. While search engine advances such as Google [Google
2000] improve the situation, most people agree that finding the exact information one is seeking on the
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
web today is not as easy as one would hope. One reason for this is that answers to search queries
typically are a rank ordered list of pages that may contain the answer to the query. The answers rarely are
just the portion of the page that the search engine “thought” contained the answer to the query.
Additionally, web pages typically do not contain markup information about the contents of the page. If
pages were marked up with information concerning what information or services could be obtained (and
how that information or service could be obtained), then a page could be used more effectively by
programs to return the portion of the page (or the answer from a service) that contains a specific answer to
a question. Once web pages are aimed for machine or program consumption, instead of human
consumption, the next generation of the web can be realized. The proliferation of markup languages aimed
at marking up content and services instead of just presentation information can be viewed as support for
this position. Markup languages such as XML [XML 2000], RDF [Lassila 1998, Lassila-Swick 1999], RDFS
[Brickley-Guha 2000], DAML [Hendler-McGuinness 2000], etc are becoming more accepted as users and
application developers see the need for more understanding of what is available from web pages.
The view presented in this paper is consistent with the vision being put forward by Tim Berners-Lee of the
W3C consortium. In a widely cited presentation [Berners-Lee 2000] at XML 2000 conference, Berners-Lee
presented his vision of the semantic web as being machine processable. We support this view as well.
We believe that the next web evolution requires machines to understand the content of pages – both what
can be obtained from pages and what that information means. Markup languages allow specification of
this information. Berners-Lee offered an architecture diagram1 in his presentation that provides a nice
foundation. We include it here in Figure 1.
1 http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide10-0.html
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
Figure 1: Berners-Lee’s Architecture
He shows the markup languages at the base (just above Unicode) for use in term specification (or in web
speak, “resource” definition). The next layer and the one we will consider here, is the ontology layer. In this
layer, we can define terms and their relationships to other terms. The next layer is the logic layer. In this
layer, we can deduce information, thereby allowing us to deduce implications of the term definitions and
relationships. In the rest of this paper, we will discuss the ontology and logic layers, what they have come
to mean on the web, and how one might generate ontologies and use them in applications.
Ontologies
The term ontology has been in use for many years. Merriam Webster, for example, dates ontology circa
1721 and provides two definitions (1) a branch of metaphysics concerned with the nature and relations of
being and (2) a particular theory about the nature of being or the kinds of existents. These definitions
provide an abstract philosophical notion of ontology. Mathematical or formal ontologies have also been
written about for many years. Smith [Smith 1998] points out that at least as early as 1900, the notion of a
formal ontology has been distinguished from formal logic by the philosopher Husserl. While ontologies
(even formal ontologies) have had a long history, they remained largely the topic of academic interest
among philosophers, linguists, librarians, and knowledge representation researchers until somewhat
recently.
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
Ontologies have been gaining interest and acceptance in computational audiences (in addition to
philosophical audiences). Guarino [Guarino 1998] provides a nice collection of fields that embrace
ontologies including knowledge engineering, knowledge representation, qualitative modeling, language
engineering, database design, information retrieval and extraction, and knowledge management and
organization. That collection put together in early 1998 did not include nearly the web emphasis that is
seen today. We would also include areas of library science [Dublin Core 1999], ontology-enhanced search
(e.g., eCyc (http://www.e-Cyc.com/) and FindUR [McGuinness 1998]), possibly the largest one, e-
commerce (e.g., Amazon.com, Yahoo Shopping, etc.), and configuration.
In this paper, we will be restricting our sense of ontologies to those we see emerging on the web. Today’s
use of ontology on the web has a different slant from the previous philosophical notions. One widely cited
definition of an ontology is Gruber’s [Gruber 1993] “A specification of a conceptualization”. We will use this
notion and expand upon it in our use of the term.
People (and computational agents) typically have some notion or conceptualization of the meaning of
terms. Software programs sometimes provide a specification of the inputs and outputs of a program, which
could be used as a specification of the program. Similarly ontologies can be used to provide a concrete
specification of term names and term meanings. Within the line of thought where an ontology is a
specification of the conceptualization of a term, there are still a number of potential interpretations. Web
ontologies may be viewed as a spectrum of detail in their specification. One might visualize a simple
(linear) spectrum of definitions in Figure 22 below.
Figure 2: An Ontology Spectrum
2 This spectrum arose out of a conversation in preparation for an ontology panel at AAAI ’99. The panelists
(Lehman, McGuinness, Ushold, and Welty), chosen because of their years of experience in ontologies
found that they encountered many forms of specifications that different people termed ontologies.
McGuinness refined the picture to the one included here.
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
One of the simplest notions of a possible ontology may be a controlled vocabulary – i.e., a finite list of
terms. Catalogs are an example of this category. Catalogs can provide an unambiguous interpretation of
terms – for example, every use of a term, say car – will denote exactly the same identifier – say 25.
Another potential ontology specification is a glossary (a list of terms and meanings). The meanings are
specified typically as natural language statements. This provides a kind of semantics or meaning since
humans can read the natural language statements and interpret them. Typically, interpretations are not
unambiguous and thus these specifications are not adequate for computer agents, thus this would not meet
the criteria of being machine processable.
Thesauri provide some additional semantics in their relations between terms. They provide information
such as synonym relationships. In many cases their relationships may be interpreted unambiguously by
agents. Typically thesauri do not provide an explicit hierarchy (although with narrower and broader term
specifications, one could deduce a simple hierarchy).
Early web specifications of term hierarchies, such as Yahoo’s, provide a basic notion of generalization and
specialization. Yahoo, for example, provides a small number of top-level categories such as apparel and
the category dresses as a kind of (women’s) apparel. A small number of people consider the previous
categories (of catalogues, glossaries, and thesauri) to be ontologies but many prefer to have an explicit
hierarchy included before something is considered an ontology. Yahoo, for example, does provide an
explicit hierarchy. Its hierarchy is not a strict subclass or “isa” [Brachman: 1983] hierarchy however. This
point was distinguished on the spectrum slide since it seems to capture many of the naturally occurring
taxonomies on the web. In these organization schemes, it is typically the case that an instance of a more
specific class is also an instance of the more general class but that is not enforced 100% of the time. For
example, the general category apparel includes a subcategory women (which should more accurately be
titled women’s apparel) which then includes subcategories accessories and dresses. While it is the case
that every instance of a dress is an instance of apparel (and probably an instance of women’s dress), it is
not the case that a dress is a woman and it is also not the case that a fragrance (an instance of a women’s
accessory) is an instance of apparel. This mixing of categories such as accessories in web classification
schemes is not unique to Yahoo – it appears in many web classification schemes3. Without true subclass
(or true “isa”) relationships, we will see that certain kinds of deductive uses of ontologies become
problematic.
The next point on the figure includes strict subclass hierarchies. In these systems if A is a superclass of B,
then if an object is an instance of B it necessarily follows that the object is an instance of A. For example, if
“Dress” is a subclass of “Apparel” and “MyFavoriteDress” is an instance of “Dress”, then it follows that
“MyFavoriteDress” is an instance of “Apparel”. Strict subclass hierarchies are necessary for exploitation of
inheritance. The next point on the ontology spectrum includes formal instance relationships. Some
classification schemes only include class names while others include ground individual content. This point
includes instances as well.
3 Some prominent hierarchies such as Yahoo have renamed their classes to broad disjunctive categories
such as “Apparel, Accessories, and Shoes” presumably in order to provide for more strict subclass
relationships. Disjunctive categories make inheritance more problematic however with class-specific
properties.
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
The next point includes frames4. Here classes include property information. For example, the “Apparel”
class may include properties of “price” and “isMadeFrom”. My specific dress may have a price of $100 and
may be made from cotton. Properties become more useful when they are specified at a general class level
and then inherited consistently by subclasses and instances. In a consumer hierarchy, a general category
like consumer product might have a “price” property associated with it. Possibly apparel would be the
general category to which the property “isMadeFrom” is associated. This would mean that the domain of
“isMadeFrom” is apparel. All subclasses of these categories would inherit these properties.
A more expressive point in the ontology spectrum includes value restrictions. Here we may place
restrictions on what can fill a property. For example, a “price” property might be restricted to have a filler
that is a number (or a number in a certain range) and “isMadeFrom” may be restricted have fillers that are a
kind of material. Here we can see a possible problem with a classification scheme that does not support
strict “isa” or subclass relationships. For example, if “Fragrance” were a subclass of “Apparel”, it would
inherit the property “isMadeFrom” and the value restriction of material that was stated.
As ontologies need to express more information, their expressive requirements grow. For example, we
may want to fill in the value of one property based on a mathematical equation using values from other
properties. Some languages allow ontologists to state arbitrary logical statements. Very expressive
ontology languages such as that seen in Ontolingua [Farquhar et al 1997] or CycL allow ontologists to specify
first order logic constraints between terms and more detailed relationships such as disjoint classes, disjoint
coverings, inverse relationships, part-whole relationships, etc.
In this paper, we will require the following properties to hold in order to consider something an ontology.
Specifications meeting these properties will be referred to as simple ontologies.
Finite controlled (extensible) vocabulary
Unambiguous interpretation of classes and term relationships
Strict hierarchical subclass relationships between classes
We consider the following properties typical but not mandatory:
Property specification on a per-class basis
Individual inclusion in the ontology
Value restriction specification on a per-class basis
Finally, the following properties may be desirable but not mandatory nor typical:
Specification of disjoint classes
Specification of arbitrary logical relationships between terms
Distinguished relationships such as inverse and part-whole
4 Frames were introduced by Minsky [Minsky 1975] and have been widely adopted, see for example [Fikes
& Kehler 1985, Karp 1992, and Chaudhri-et-al 1998].
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
The line in our chart is drawn such that everything to the right of it will be called an ontology and meet at
least the first three conditions stated above. Additionally, everything to the right of it can be used as a
basis for inference.
Simple Ontologies and Their Uses
We will now consider ontologies and their impact on applications. We break this section into two parts:
uses of simple ontologies and uses for more sophisticated ontologies. We do this because we
acknowledge that building the more complicated ontologies may be cost prohibitive for certain applications.
Simple ontologies however are not as costly to build and potentially more importantly, many are available.
Simple ontologies are available in many forms – many exist as freeware on the web today and also many
exist as internal information organization structures within companies, universities, etc. Some collaborative
efforts exist such as DMOZ (www.dmoz.com) that are generating large simple ontologies. DMOZ, for
example, leverages over 35,000 volunteer editors and at publication time, had over 360,000 classes in a
taxonomy. Additionally, some more sophisticated ontologies are available today. For example, the unified
medical language system (UMLS - http://www.nlm.nih.gov/research/umls/ and [Humphreys & Lindberg 1993])
developed by the national library of medicine is a large sophisticated ontology about medical terminology.
Some companies such as Cycorp (www.cyc.com ) are making available portions of large, detailed
ontologies. We will further address the issue of ontology acquisition and maintenance later, but for now,
we just wanted to make the point that many simple and some sophisticated ontologies are easily available
today.
Now lets consider some of the ways that simple ontologies may be used in practice.
First, they provide a controlled vocabulary. This by itself can provide great leverage since users, authors,
and databases can all use terms from the same vocabulary. In addition programs can generate interfaces
that encourage usage of the controlled terms. The result is that people use the same set of terms. Of
course, some of the terms may still be used with different senses, but common term usage is a start for
interoperability.
Second, a simple taxonomy may be used for site organization and navigation support. Many web sites
today expose on the left hand side of a page the top levels of a generalization hierarchy of terms. The
categories are typically hot and a user may click on them to expand the subcategories.
Third, taxonomies may be used to support expectation setting. It is an important user interface feature that
users be able to have realistic expectations of a site. If they may explore even the top level categories of
the hierarchy, they can quickly determine if the site might have content (and/or services) of interest to them.
Fourth, taxonomies may be used as “umbrella” structures from which to extend content. Some freely
available ontologies are attempting to provide the high level taxonomic organization from which many
efforts may inherit terms. The UNSPSC (Universal Standard Products and Services Classification
www.unspsc.org ) is one such categorization scheme. It was jointly done by the United Nations
Development Program and Dun & Bradstreet and was aimed at providing the infrastructure for
interoperability of terms in the domains of products and services. It provides a classification scheme (with
associated numbers). For example, Category 50 – Food, Beverage, and Tobacco Products has a subclass
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
family 5010 called “Fruits and vegetables and nuts and seeds5” which in turn contains a subclass 501015
called Vegetables, which in turn has a subclass commodity 50101538 called fresh vegetables. A number
of e-commerce applications today are looking for such umbrella organization structures and in fact a
number have chosen to be compliant with the UNSPSC. Most applications will need to extend these
ontologies, but if applications need to communicate between a number of content providers, it is convenient
to use a shared upper level ontology.
Fifth, taxonomies may provide browsing support. Content on a site may be tagged with terms from the
taxonomy. This may be done manually in the style of Yahoo or it may be done automatically (possibly
using a clustering approach). Once a page (or service) is meta-tagged with a term chosen from a
controlled vocabulary, then search engines may exploit the tagging and provide enhanced search
capabilities.
Sixth, taxonomies may be used to provide search support. A query expansion method may be used in
order to expand a user query with terms from more specific categories in the hierarchy. We exploited this
approach in our work on FindUR[McGuinness:1998] and found that under certain conditions (such as short
document length and limited content areas), query expansion can radically improve search.
Seventh, taxonomies may be used to sense disambiguation support. If the same term appears in
multiple places in a taxonomy, an application may move to a more general level in the taxonomy in order to
find the sense of the word. For example, if an ontology contains the information that Jordan is an instance
of a Basketball-player and also an instance of a country, an application may choose to query a user
searching for Jordan if she is interested in basketball-players or countries. Sense disambiguation using
ontologies may be seen in the work of eCyc along with Hotbot and Lycos.
Structured Ontologies and Their Uses
Up to this point, we have focused on simple taxonomies for usage in applications. Once ontologies begin
to have more structure however, they can provide more power in applications. Once the ontologies have
more structure than simple generalization links, property information can be used in many forms.
First, they can be used for simple kinds of consistency checking. If ontologies contain information about
properties and value restrictions on the properties, then type checking can be done within applications. For
example, if a class called “Goods” has a property called “price” that has a value restriction of number, then
something that is known to be of type “Goods” that has its “price” property filled in with a value that is not a
number can be caught as an error. This just exploits simple value restrictions that are types. A value
restriction might include a range, for example, a number between 10 and 100. Then if the ”price” is 10,000,
it is out of the range and can be determined to be an error.
Second ontologies may be used to provide completion. An application may obtain a small amount of
information from a user, such as the fact that she is looking for a high-resolution screen on a pc, and then
have the ontology expand the exact pixel range that is to be expected. This can be done simply by defining
what the term “HighResolutionPc” is with respect to a particular pixel range on two roles –
“verticalResolution” and “horizontalResolution”. Similarly, information may interact. For example, a
5 Note, if one is using the common logical meanings of connectives, this class should really be named
“Fruits or vegetables or nuts or seeds”.
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
medical system may obtain information from an ontology that if a patient is stated to be a man, then the
gender of the patient is “male” and that information may be used to determine that a question concerning
whether or not the patient is pregnant should not be asked since there could be information in the system
that things whose gender is male are disjoint from things that are pregnant.
Third, ontologies may be able to provide interoperability support. In the simple case of considering
controlled vocabularies, there is enhanced interoperability support since different users/applications are
using the same set of terms. In simple taxonomies, we can recognize when one application is using a term
that is more general or more specific than another term and greater facilitate interoperability. In more
expressive ontologies, we may have a complete operational definition for how one term relates to another
term and thus, we can use equality axioms or mappings to express one term precisely in terms of another
and thereby support more “intelligent” interoperability. For example, an ontology may include a definition
that a “StanfordEmployee” is equal to a “Person” whose “employer” property is filled with the individual
“Stanford University”. This definition may be used to expand the term “StanfordEmployee” in an application
that does not understand either “StanfordEmployee” or “Employee” but does understand the terms
“Person”, “employer”, and “Stanford”.
Fourth, ontologies may be used to support validation and verification testing of data (and schemas). If
an ontology contains class descriptions, such as “StanfordEmployee”, these definitions may be used as
queries to databases to discover what kind of coverage currently exists in datasets. For example, if one
was going to expose the class “StanfordEmployee” on an interface to some application, it would be useful
to know first if the dataset contained any instances of “Person” whose “employer” property was filled with
the value “Stanford”. Additionally, if in a simple data model, we stated that a “Person” had at most one
“employer”, then we could use that information to check to see if any current information on “Person”s in
the dataset contained more than one “employer” value. Similarly, checks could be done to see if there
were currently “Person”s in the dataset that were known to be “Employee”s yet did not have a value for the
“employer” property (thereby showing that the dataset is not complete). Chimaera [McGuinness-et-al:
2000] is an example ontology evolution environment that provides a set of diagnostics tests for checking
ontologies for both problems in the ontology definitions as well as problems with the instance data. It looks
for provable inconsistencies as well as conditions that “typically” reflect situations where an ontology or the
data may need to be fixed.
Fifth, ontologies containing markup information may encode entire test suites. An ontology may contain
a number of definitions of terms, some instance definitions, and then include a term definition that is
considered to be a query – find all terms that meet the following conditions. Markup information could be
encoded with this query to include what the answer should be, thus providing enough information to
encode regression testing data. We provide one such example ontology in
http://ksl.stanford.edu/projects/DAML/chimaera-jtp-cardinality-test1.daml. The ontology contains a
regression test suite for checking cardinality inferences (such as persons having two employers yet being
stated to have at most one employer) in a Stanford Theorem prover
(http://www.ksl.Stanford.EDU/software/jtp/ ).
Sixth, ontologies can provide the foundation for configuration support. Class terms may be defined so
that they contain descriptions of what kinds of parts may be in a system. Additionally interactions between
properties can be defined so that filling in a value for one property can cause another value to be filled in
for another slot. For example, one may generate an ontology of information about home theatre products
as is done in a small configurator example using a simple description logic-based system [McGuinness-et-
al 1995]. Terms such as television, amplifier, tuner, etc are defined. Additionally, information connecting
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
the terms together is included. A class of HighQualityTelevisions is defined so that users may choose from
this class and the configurator will automatically fill in limited sets of manufacturers to choose from,
minimum diagonal values, minimum price ranges etc. Also, information is encoded that propagates
restrictions from one component to another. For example, some of the components in this system were
meant to be sold in pairs. If one buys one particular kind speaker (which is only sold in pairs, thus two
speakers are added to the parts list), then restrictions on particular speaker stands appear in the
configuration specification. There are many such configuration examples using ontologies, some of which
are described in a special configuration issue of Artificial Intelligence for Engineering Design, Analysis, and
Manufacturing Journal [Darr-et-al 1998].
Seventh, ontologies can support structured, comparative, and customized search. For example, if one
is looking for televisions, a class description for television may be obtained from an ontology, its properties
may be obtained (such as diagonal, price, manufacturer, etc), and then a comparative presentation may be
made of televisions by presenting the values of each of the properties. Those properties can also be used
to provide a form for users to fill in so that they may provide a detailed set of specifications about the items
they are looking to find. This also provides the foundation for providing a number of different search
interfaces – the simple text box along with search interfaces exposing important properties of products.
More sophisticated ontologies may be generated that mark which properties are most useful to present in
comparative analyses so that users may have concise descriptions of the products instead of comparisons
in complete detail. Thus, ontologies with markup information may also be used to prune comparative
searches.
Eighth, ontologies may be used to exploit generalization/specialization information. If a search
application finds that a user’s query generates too many answers, one may dissect the query to see if any
terms in it appear in an ontology, and if so, then the search application may suggest specializing that term.
For example, if one did a search for concerts in the San Francisco Bay area and got too many answers, a
search engine might look up concert in an ontology and discover that there are subclasses of concert (and
it may also discover that there are specific concert locations in the Bay area).
The search engine could then choose to present the user with the option of looking for a particular kind of
concert (say rock concert). Further the search engine could proactively run queries in the background
while waiting for user input or also cache information from popular queries. Then the search engine could
also present a list of subclasses of concerts and provide the user with the approximate number of retrievals
the user would get if they specialized their query in the different manners. These are just some of the ways
in which ontologies may be used to refine search queries. We could also look at the ontology to provide
alternative values (by looking at siblings in the ontology) for terms specified in the search query.
We have not claimed to present an exhaustive list of the ways in which ontologies may be used in
applications. The above lists are just illustrative of some ways that ontologies have been used to support
intelligent applications.
Ontology Acquisition
Now that at least some readers may be convinced that ontologies are useful components in applications,
we will look at some sources of ontologies. First, as we mentioned previously, many ontologies exist in the
public domain. It may be possible to start with an existing industry standard and use that as the ontology
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
starting point. Most likely application developers will need to modify and/or extend ontologies that are
available and were developed for other uses. Still, one methodology for obtaining ontologies is to begin
with an industry standard ontology and then modify or extend it.
Another methodology is to semi-automatically generate a starting point for an ontology. Many taxonomic
structures exist on the web or in the table of contents of documents. One might crawl certain sites to obtain
a starting taxonomic structure and then analyze, modify, and extend that.
One question is where to look for existing ontologies or sources of information to be crawled. Many
controlled vocabularies are being made available today. Sometimes standards organizations, such as
NIST (the National Institute of Standards and Technology - http://www.nist.gov/), support efforts in
producing controlled vocabularies and ontologies. Some consortiums are forming to generate ontologies.
See, for example, RosettaNet (http://www.rosettanet.org) in the area of information technology, electronic
technology, electronic components, and semiconductor manufacturing. They are creating industry-wide
open e-business standards and providing a language for business processes. Sometimes trade
organizations provide class hierarchies on their sites that can also be used as a standard structured
controlled vocabulary. There are also broad sources of class structures. Essentially every e-commerce
site today encodes at least a taxonomic organization of terms. Sites like Amazon in organizing their book
and music information provides a very broad organization of information.
Another emerging trend is the use of markup languages. Some pages are being annotated using markup
languages such as XML, RDF, DAML, etc. The pages including the annotations may be using markup
terms from controlled vocabularies. Some libraries are emerging of ontologies potentially of use for
markup. For example, the DAML program maintains a library of DAML ontologies in
http://www.daml.org/ontologies/.
Much of this section has introduced the idea of obtaining either a simple or complex ontology as a starting
point and then analyzing, modifying, and maintaining it over time. In the next section, we will address the
issue of implications and needs from ontology-based applications.
Ontology-related Implications and Needs
When starting an ontology-based application, the two major concerns will be language and environment.
Language: When considering ontology-related applications, inevitably the issue of ontology language will
arise. An ontology must be encoded in some language. If one is using a simple ontology, few issues arise.
However, if one is considering a more complex ontology, expressive power of a representation and
reasoning language needs to be considered. As with any problem where a language is being chosen, it
must be epistemologically adequate -- the language must be able to express the concepts in the domain.
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
For example, if one wants to do range checking in an e-commerce application, then it would be unwise to
choose just a simple language that only contains subclass and instance relationships and does not include
property specification with value restrictions. There are a number of candidate ontology languages – in fact
there are so many that some research efforts arose in the last decade in order to produce standard
specification languages (such as the KRSS effort – the Knowledge Representation System Specification
effort [Patel-Schneider-Swartout 1992]) interchange formats (such as KIF -the Knowledge Interchange
Format which is now a proposed ANSI standard [KIF]), and common application programming interface
standards (such as OKBC – Open Knowledge Base Connectivity [Chaudhri-et-al, 1997]).
One does not just want to consider representational constructs in a language; one also wants to consider
the reasoning that may be supported in the language. Some fields such as description logics
(www.dl.kr.org ), make this a central focus in language design. They look for tradeoffs that maintain
expressive power needed by applications and also consider what it takes to provide inference engines that
can provide deductions based on the constructs represented in the language. For example, if a language
supports the notion of stating that two classes are disjoint, then a reasoning engine should be able to be
built that enforces the constraint that the classes are disjoint. Thus, an inference engine should be able to
warn a user if she is creating an instance or subclass of two disjoint classes.
Also, a language should be usable with existing platforms and should be something that non-experts can
use to do their conceptual modeling. The web is clearly the most important platform with which to be
compatible today, thus any language choice should be able to leverage the web. Additionally, frame-based
systems have had a long history of being thought of as conceptually easy to use, thus a frame paradigm
may be worth considering.
Language efforts seen today attempt to take the best of the research on expressive power along with
reasoning power and provide representationally powerful languages that have known reasoning properties.
The DARPA Agent Markup Language program, for example, attempted to take the emerging web
languages of today such as XML and RDF and create a language that is web compatible but draws on the
20 year history of description logics in choosing language constructs along with reasoning paradigms. The
resulting language –DAML+OIL – attempts to merge the best of existing web languages, description logics,
and frame reasoning systems. OIL [Bechhofer-et-al 2000] attempts to provide a layered approach to
language design.
Environment: Another consideration is how to analyze, modify, and maintain an ontology over time. If the
ontology is to be maintained by subject matter experts (and not by knowledge experts), most likely some
ontology tools will be needed. There are a number of simple ontology tools available commercially. Some
information retrieval companies such as Verity have provided simple editors for generating and browsing
simple generalization hierarchies. Verity, for example, has provided a “topic editor” for years which will
support users in generating taxonomies and utilizing them in search queries. Research efforts have
existed for many years in producing ontology toolkits. Stanford University’s previously mentioned tools of
Ontolingua [Farquhar-et-al 1997] and Chimaera [McGuinness-et-al. 2000] are just two examples, however
examples abound including OilEd (http://img.cs.man.ac.uk/oil/ ) from Manchester University and Protégé
[Protégé 2000] from Stanford Medical Informatics, just to name a few. Application developers may choose
commercial vendors as their toolkit provider, sophisticated research applications as the base, or
somewhere in between. Some companies with extensive ontology needs such as VerticalNet
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
(http://www.verticalnet.com/) have or are developing their own ontology tools in order to build ontologies
that meet the needs of a sophisticated commercial ontologist. Their tools were built after analyzing existing
research prototypes and were then designed to meet the commercial standards required in diverse,
collaborative, e-commerce applications of today.
When choosing to use or build an ontology environment, there are a number of issues that should be
considered including the following:
Collaboration and distributed workforce support. Some ontology environments allow users to share
a session –i.e., see each other’s work environments. This can be particularly useful for debugging.
Ontolingua, for example, supports this notion. Additionally, when workers are distributed in
location, it becomes important to have an environment that allows access from multiple places.
This is becoming much more typical today with server/client architectures. Finally, collaboration
may require concurrency control, locking, and a kind of versioning and permission system.
Platform interconnectivity. As applications become embedded in more complex platforms, it
becomes important for environments to be able to read and write compatible formats, be able to be
integrated with multiple hardware/software environments, etc. Java-based applications provide a
convenient approach to this problem but other systems that support multiple input and output
formats, understand common standards, and provide translation and mapping services may help.
Scale. Many ontology applications today may need to scale a few orders of magnitude larger than
past applications. It is important to look at scaling in terms of size of ontologies as well as numbers
of simultaneous users.
Versioning. As applications become long-lived and also are deployed in different environments
possibly internationally, it becomes important to be able to support many versions of ontologies. In
typical software engineering environments, there are source code control systems and versioning.
Security. Some applications will have needs for differing access to portions of the ontology. Thus,
it is important to have an environment that can expose portions of the ontology based on a security
model. The security model may need to support both read and write access.
Analysis. Environments are expected to support acquisition, evolution, and maintenance of
ontologies. Thus, it would be common to expect ontologies to have periods when they are
incomplete and incorrect. Analysis support that can focus the user’s attention in areas that are
likely to need modification can be quite useful. The Chimaera ontology environment, for example,
supports a number of diagnostic tests aimed at helping users identify provably incorrect ontologies
as well as possible problems.
Lifecycle issues. As ontologies become larger and longer lived, it would be expected that
application developers might be maintaining ontologies over many years. Additionally, they may be
constantly merging new ontologies into their system as their applications interconnect with more
diverse systems. Thus, it becomes important to consider support for ontology evolution issues
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
such as merging terms, breaking apart terms, multiple name spaces, source code control systems,
truth maintenance systems, regression testing systems, etc.
Ease of use. Even if an environment has everything an application developer may need, if it is
difficult for the user to decide how to use parts of the environment, they may not get used. Thus
training materials, tutorials, conceptual modeling support, graphical browsing tools, etc. all may be
important. We have written separately on some of the issues required to make description logic-
based systems usable in mainstream use. [McGuinness-Patel-Schneider 1998, Brachman-et-al,
1999].
Diverse user support. Some environments are made for power users, some for naïve users, and
some have settings that allow users to customize environments as appropriate to the type of user.
It is important to determine if the environment can support all of the types of users anticipated.
Presentation Style. Possibly closely related to user type is presentation style. Some users need to
see extensive detail, some need pruned information, and some need abstractions. Presentation of
information may be textual, graphical, or other. While no one environment needs to support all
presentation styles, it is important that the environment is at least extensible enough to have new
presentation methods added when needed.
Extensibility. It will be impossible to anticipate all of the needs an application will have. Thus, it is
important to use an environment that can adapt along with the needs of the users and the projects.
Conclusions
In this paper, we have noted the emergence of ontologies from academic obscurity into mainstream
business and practice on the web. We have introduced the term ontology along with a spectrum of
properties that ontologies may exhibit. We have provided criteria necessary, prototypical, and desirable for
simple and complex ontologies. We have also identified ways that ontologies (both simple and complex)
are being and may be used to provide value in many types of applications. We have addressed the issue
of acquiring ontologies and then maintaining and evolving ontologies. Finally, we have identified a number
of ontology-related issues that arise from the emergence of ontologies focusing on ontology language and
environment. Finally, we concluded with issues that are gaining importance as ontologies grow in their
importance and centrality in diverse applications.
References
[Bechhofer et al 2000] Sean Bechhofer, Jeen Broekstra, Stefan Decker, Michael Erdmann, Dieter Fensel,
Carole Goble, Frank van Harmelen, Ian Horrocks, Michel Klein, Deborah
McGuinness, Enrico Motta, Peter Patel-Schneider, Steffen Staab, and Rudi Studer,
“An informal description of Standard Oil and Instance OIL”, available on-line as
http://www.ontoknowledge.org/oil/downl/oil-whitepaper.pdf
[Berners-Lee 2000] Tim Berners-Lee, "Semantic Web on XML”, Keynote presentation for XML 2000.
Slides available at: http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide1-0.html.
Reporting available at: http://www.xml.com/pub/a/2000/12/xml2000/timbl.html
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
[Berners-Lee 1999] Tim Berners-Lee, "Weaving the Web”, Harper, San Francisco, 1999.
http://www.harpercollins.com/hc/bookpage/index.asp?isbn=0062515861
[Berners-Lee et al 1998] Tim Berners-Lee, Roy Fielding, and Larry Masinter, "Uniform Resource Identifiers
(URI): Generic Syntax", Internet Draft Standard RFC 2396, August 1998; available
on-line as http://www.isi.edu/in-notes/rfc2396.txt.
[Brachman 1983] Ronald J. Brachman, R. J. What ISA Is and Isn't: An Analysis of Taxonomic Links
in Semantic Networks. IEEE Computer, 16 (10), 30--6. 1983.
[Brachman et al. 1999] Ronald J. Brachman, Alex Borgida, Deborah L. McGuinness, and Peter F. Patel-
Schneider. "Reducing" CLASSIC to Practice: Knowledge Representation
Theory Meets Reality. In Artificial Intelligence 114(1-2) pages 203-237,
October, 1999.
[Brickley & Guha 2000] Dan Brickley & R.V.Guha, "Resource Description Framework (RDF) Schema
Specification 1.0", W3C Candidate Recommendation 27 March 2000, World Wide
Web Consortium, Cambridge (MA); available on-line as http://www.w3.org/TR/rdf-
schema/.
[Broekstra et. al. 2001] J. Broekstra, M. Klein, S. Decker, D. Fensel, F. van Harmelen, and I. Horrocks.
“Enabling knowledge representation on the Web by Extending RDF Schema”,
Proceedings of the International Conference on the World Wide Web (WWW10),
May 2001.
[Chaudhri et. al. 1998] Vinay Chaudhri, Adam Farquhar, Richard Fikes, Peter Karp, and James Rice;
OKBC: A Programmatic Foundation for Knowledge Base Interoperability”, AAAI
1998.
[Darr et. al. 1998] Tim Darr, Mark Fox, and Deborah L. McGuinness, editors. Special Configuration
Issue of the Artificial Intelligence for Engineering Design, Analysis, and
Manufacturing Journal 1998.
[Dublin Core 1999] "Dublin Core Metadata Element Set, Version 1.1: Reference Description", Dublin
Core Metadata Initiative, 1999; available on-line as
http://purl.org/dc/documents/rec-dces-19990702.htm
[Farquhar et al 1997] Adam Farquhar, Richard Fikes, and James Rice; “The Ontolingua Server: a Tool
for Collaborative Ontology Construction”, Intl. Journal of Human-Computer Studies
46, 1997
[Fikes & Kehler 1985] Richard Fikes & Tom Kehler, "The Role of Frame-Based Representation in
Reasoning", CACM 28(9): 904-920 (1985).
[Fikes & McGuinness 2001] Richard Fikes & Deborah L. McGuinness, "An Axiomatic Semantics for RDF, RDF
Schema, and DAML+OIL", KSL Technical Report KSL-01-01, Stanford University,
2001; available on-line as http://www.ksl.stanford.edu/people/dlm/daml-
semantics/abstract-axiomatic-semantics.html.
[Google, 2000] Connie Guglielmo and Charles Babcock, “Gaga over Google” , Interactive Week,
Nov 6, 2000. http://www.zdnet.com/intweek/stories/news/0,4164,2651081,00.html,
and http://www.google.com/about.html
[Guarino 1998] Nicola Guarino, “Formal Ontology and Information Systems''. In the Proceedings of
Formal Ontology in Information Systems, June 1998. Also in Frontiers in Artificial
Intelligence and Applications, IOS-Press, Washington, DC, 1998.
[Gruber 1993] Tom R. Gruber, “A translation approach to portable ontologies”. Knowledge
Acquisition, 5(2):199-220, 1993.
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
[Hendler & McGuinness 2000] James Hendler and Deborah McGuinness. ``The DARPA Agent Markup
Language''. In IEEE Intelligent Systems Trends and Controversies,
November/December 2000. Available from
http://www.ksl.stanford.edu/people/dlm/papers/ieee-daml01-abstract.html .
[Humphreys & Lindberg 1993] B. L. Humphreys and D. A. B. Lindberg. “The UMLS project: making the
conceptual connection between users and the information they need. Bulletin of
the Medical Library Association 81(2): 170.
[Husserl 1900] Edmund Husserl, Logische Untersuchungen, First edition Halle: Niemeyer,
1900/01.
[KIF] Knowledge Interchange Format, Language Description, draft proposed national
standard. NCITS.T2/98-004. http://logic.stanford.edu/kif/kif.html.
[Karp 1992] Peter D. Karp, "The design space of frame knowledge representation systems",
Technical Report 520, SRI International AI Center; available on line as
ftp://www.ai.sri.com/pub/papers/karp-freview.ps.Z
[Lassila 1998] Ora Lassila, "Web Metadata: A Matter of Semantics", IEEE Internet Computing
2(4): 30-37 (1998).
[Lassila & Swick 1999] Ora Lassila & Ralph Swick, "Resource Description Framework (RDF) Model and
Syntax Specification", W3C Recommendation 22 February 1999, World Wide Web
Consortium, Cambridge (MA); available on-line as http://www.w3.org/TR/REC-rdf-
syntax/.
[McGuinness-et-al 1995] Deborah L. McGuinness, Lori Alperin Resnick, and Charles Isbell. ``Description
Logic in Practice: A CLASSIC: Application.'' In Proceedings of the 14th
International Joint Conference on Artificial Intelligence, Montreal, Canada, August,
1995.
[McGuinness 1998] Deborah L. McGuinness ``Ontological Issues for Knowledge-Enhanced Search''. In
the Proceedings of Formal Ontology in Information Systems, June 1998. Also in
Frontiers in Artificial Intelligence and Applications, IOS-Press, Washington, DC,
1998.
[McGuinness-et-al 2000] Deborah L. McGuinness, Richard Fikes, James Rice, and Steve Wilder. An
Environment for Merging and Testing Large Ontologies. In the Proceedings of the
Seventh International Conference on Principles of Knowledge Representation and
Reasoning (KR2000), Breckenridge, Colorado, USA. April 12-15, 2000.
[McGuinness-Patel-Schneider] Deborah L. McGuinness and Peter F. Patel-Schneider. ``Usability Issues in
Knowledge Representation Systems''. In Proceedings of the Fifteenth National
Conference on Artificial Intelligence, Madison, Wisconsin, July, 1998. This is an
updated version of ``Usability Issues in Description Logic Systems'' published in
Proceedings of International Workshop on Description Logics, Gif sur Yvette,
(Paris), France, September, 1997.
[Minsky 1975] Marvin Minsky, "A Framework for Representing Knowledge", in Patrick Henry
Winston (ed.), The Psychology of Computer Vision, McGraw-Hill, New York, 1975.
[Patel-Schneider-Swartout] Peter F. Patel-Schneider and Bill Swartout; "Description-Logic Knowledge
Representation System Specification"; KRSS Group of the ARPA Knowledge
Sharing Effort. http://www-db.research.bell-labs.com/user/pfps/papers/krss-spec.ps
[Protégé 2000] The Protege Project. http://protege.stanford.edu
[Smith 1998] Barry Smith, “Basic Concepts of Formal Ontologies”, in N. Guarino (Ed.) Formal
Ontology in Information Systems, IOS Press, 1998.
Deborah L. McGuinness. “Ontologies Come of Age”. To appear in Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang
Wahlster, editors. The Semantic Web: Why, What, and How, MIT Press, 2001.
[Woods 1975] William A. Woods, "What's in a Link: Foundations for Semantic Networks", in
D.G.Bobrow & A.M.Collins (eds.), Representation and Understanding: Studies in
Cognitive Science, 35-82, Academic Press, New York, 1975.
[XML 2000] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, and Eve Maler, editors.
Extensible Markup Language (XML) 1.0 (Second Edition). W3C Recommendation
6 October 2000. http://www.w3.org/TR/2000/REC-xml-20001006
... It is important to note that all of the terms on the left hand side have been called "ontology" by at least some authors, which is part of the source for confusion about the word. Models based on the various points along the ontology spectrum have different uses (McGuinness, 2003). In the simplest case, a group of users can agree to use a controlled vocabulary for their domain. ...
... This expanded information could also be used for refining search, for instance. Ontologies can also facilitate interoperability, by aligning different terms that might be used in different applications (McGuinness, 2003). Now we are in a position to see why the ontologies on the most formal end of the spectrum are often taken as the default interpretation in the context of the semantic web, providing the conceptual underpinning for " ... making the semantics of metadata machine interpretable" (Staab & Stuber, 2004). ...
... The ontology specification level doesn't involve instances, that is because the ontology has to provide a context that makes possible to express the JSAP-LD instance: the instances are expressed inside the JSAP-LD and supported by the ontology semantic. Moreover, although the first released version of JSAP Ontology is enough expressive for the given purpose, it could be considered as a simple ontology which is quite convenient given the largely endorsed idea that a less complex applicationdomain ontology provides more re-usability -making it easy to be extend for more specific applications -and dynamism [19]. The semantic introduced with the JSAP Ontology is completely suitable for generating a JSON-LD context to be included in a JSAP-LD file through the entities and the properties, semantically and syntactically defined inside the ontology. ...
Conference Paper
This paper presents the JSON SPARQL Application Profile for Linked Data which is a JSON-LD compliant file that can be used to describe applications powered by the SPARQL Event Processing Architecture. Thanks to a publish-subscribe broker built on top of a generic SPARQL endpoint, the architecture allows the development of distributed and context-aware applications based on interoperable microservices. The SPARQL Application Profile includes all the information needed to describe an application, like the SPARQL queries and updates used by each microservice as well as the parameters needed to connect to a broker instance. The paper presents the ontology that has been designed to represent the SPARQL Application Profile from which the JSON-LD context has been eventually extracted.
... Many organizations and disciplines have a tradition of curating lists of terms to serve various roles, particularly in metadata, column headings, and for some values in datasets. These are often called code-lists or glossaries, and if there is a process to manage them, 'controlled-vocabularies'. Vocabularies may also be structured as hierarchies, thesauri, taxonomies, through to axiomatized ontologies [1]. Other sets of terms and codes that are used in data include units of measure, lists of materials, taxa, substances, and reference systems like geologic and dynastic time-scales (which are composed of ordered named intervals). ...
Article
Full-text available
We present ten simple rules that support converting a legacy vocabulary—a list of terms available in a print-based glossary or in a table not accessible using web standards—into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a globally unique resolvable identifier for each term or concept. A standard representation of the concept should be returned when the individual web identifier is resolved, using SKOS or OWL serialised in an RDF-based representation for machine-interchange and in a web-page for human consumption. Guidelines for vocabulary and term metadata are provided, as well as development and maintenance considerations. The rules are arranged as a stepwise recipe for creating a FAIR vocabulary based on the legacy vocabulary. By following these rules you can achieve the outcome of converting a legacy vocabulary into a standalone FAIR vocabulary, which can be used for unambiguous data annotation. In turn, this increases data interoperability and enables data integration.
Article
Full-text available
The complexity of cancer research stems from leaning on several biomedical disciplines for relevant sources of data, many of which are complex in their own right. A holistic view of cancer—which is critical for precision medicine approaches—hinges on integrating a variety of heterogeneous data sources under a cohesive knowledge model, a role which biomedical ontologies can fill. This study reviews the application of ontologies and knowledge graphs in cancer research. In total, our review encompasses 141 published works, which we categorized under 14 hierarchical categories according to their usage of ontologies and knowledge graphs. We also review the most commonly used ontologies and newly developed ones. Our review highlights the growing traction of ontologies in biomedical research in general, and cancer research in particular. Ontologies enable data accessibility, interoperability and integration, support data analysis, facilitate data interpretation and data mining, and more recently, with the emergence of the knowledge graph paradigm, support the application of Artificial Intelligence methods to unlock new knowledge from a holistic view of the available large volumes of heterogeneous data.
Article
Full-text available
Knowledge graph (KG) is a topic of great interests to geoscientists as it can be deployed throughout the data life cycle in data-intensive geoscience studies. Nevertheless, comparing with the large amounts of publications on machine learning applications in geosciences, summaries and reviews of geoscience KGs are still limited. The aim of this paper is to present a comprehensive review of KG construction and implementation in geosciences. It consists of four major parts: 1) concepts relevant to KG and approaches for KG construction, 2) KG application in data collection, curation, and service, 3) KG application in data analysis, and 4) challenges and trends of geoscience KG creation and application in the near future. For each of the first three parts, a list of concepts, exemplar studies, and best practices are summarized. Those summaries are synthesized together in the challenge and trend analyses. As artificial intelligence and data science are thriving in geosciences, we hope this review of geoscience KGs can be of value to practitioners in data-intensive geoscience studies.
Article
Full-text available
Taxonomies have developed over the years to reach the current status of corporate taxonomies. A number of issues that triggered the prominence of taxonomies include; information overload, information literacy, organizational terminology and “destructuring” of organizations.This research contributes to the ongoing advocacy for the use of corporate taxonomies for effective knowledge organization and information retrieval in the digital community due to the explosion of various kinds of information on the internet, changes in user information seeking behaviour and needs, and the inability of search engines to precisely recall relevant information.
Chapter
Ontologies are widely used nowadays for many different purposes and in many different contexts, like industry and research, and in domains ranging from geosciences, biology, chemistry or medicine. When used for research, ontologies should be treated as other research artefacts, such as data, software, methods, etc.; following the same principles used to make them findable, accessible, interoperable and reusable (FAIR) to others. However, in comparison to the number of guides, indicators and recommendations available for making research data FAIR, not much attention has been paid so far on how to publish ontologies following the FAIR principles. This position paper reviews the technical and social needs required to define a roadmap for generating and publishing FAIR ontologies on the Web. We analyze four initiatives for ontology publication, aligning them in a common framework for comparison. The paper concludes by opening a discussion about existing, ongoing and required initiatives and instruments to facilitate FAIR ontology sharing on the Web.
Chapter
In the classical formal logics, the negation can only be applied to formulas, not to terms and predicates. In (frame-based) knowledge representation, an ontology contains descriptions of individuals, concepts and slots, that is statements about individuals, concepts and slots. The negation can be applied to slots, concepts and statements, so that the logical implication should be considered for all possible combinations of individuals, concepts, slots and statements. In this regard, the logical implication at the ontological level is different from that at the logical level. This paper attempts to give such logical implications between individuals, concepts, slots, statements and their negations.
Chapter
Open Science not only means the openness of various resources involved in a scientific study but also the connections among those resources that demonstrate the origin, or provenance, of a scientific finding or derived dataset. In this chapter, the authors used the PROV Ontology, a community standard for representing and exchanging machine-readable provenance information in the Semantic Web, and extended it for capturing provenance in the IPython Notebook, a software platform that enables transparent workflows. The developed work was used in conjunction with scientists' workflows in the Ecosystem Assessment Program of the U.S. NOAA Northeast Fisheries Science Center. This work provides a pathway towards formal, well-annotated provenance in an electronic notebook. Not only will the use of such technologies and standards facilitate the verifiability and reproducibility of ecosystem assessments, their use will also provide solid support for Open Science at the interface of science and ecosystem management for sustainable marine ecosystems.
Chapter
Ian McHarg’s “The Theory of Creative Fitting” describes the collaborative integration of data into knowledge products for use in ecological architecture developments. In McHarg’s system, the provenance of data layers and trust in their creators are vital to the creation of knowledge, ideas recently described as being “urgent and relevant.” This chapter begins with these ideas and explores provenance propagation and data publication. Provenance propagation has been a topic under discussion by the World Wide Web Consortium, while data publication has been a focus of the global marine science community. From these projects, the issue of uniquely identifying individuals in a distributed, collaborative network has emerged. Possible implications of this problem are discussed, as are technical solutions to the problem. The principles described within this chapter are illustrated by examples of uncertainty propagation in chained Web Processing Service networks and data publication activities at the British Oceanographic Data Centre.
Conference Paper
Full-text available
Large-scale ontologies are becoming an essential component of many applications including standard search (such as Yahoo and Lycos), e- commerce (such as Amazon and eBay), configuration (such as Dell and PC-Order), and government intelligence (such as DARPA's High Performance Knowledge Base (HPKB) program). The ontologies are becoming so large that it is not uncommon for distributed teams of people with broad ranges of training to be in charge of the ontology development, design, and maintenance. Standard ontologies (such as UNSPSC) are emerging as well which need to be integrated into large application ontologies, sometimes by people who do not have much training in knowledge representation. This process has generated needs for tools that support broad ranges of users in (1) merging of ontological terms from varied sources, (2) diagnosis of coverage and correctness of ontologies, and (3) maintaining ontologies over time. In this paper, we present a new merging and diagnostic ontology environment called Chimaera, which was developed to address these issues in the context of HPKB. We also report on some initial tests of its effectiveness in merging tasks.
Article
The more common interpretations of is-a links are cataloged, and some differences between systems that on the surface appear very similar are pointed out. A rational reconstruction of the is-a relation is developed, and suggestions are made regarding how the next generation of knowledge-representation languages should be structured. In particular, the analysis indicates that meanings might be a lot clearer if is-a were broken down into its semantic subcomponents and those subcomponents then used as the primitives of a representation system. 9 references.
Article
Recently, a widespread interest has emerged in using ontologies on the Web. Resource Description Framework Schema (RDFS) is a basic tool that enables users to define vocabulary, structure and constraints for expressing meta data about Web resources. However, it includes no provisions for formal semantics, and its expressivity is not sufficient for full-fledged ontological modeling and reasoning. In this paper, we will show how RDFS can be extended to include a more expressive knowledge representation language. That, in turn, would enrich it with the required additional expressivity and the semantics of that language. We do this by describing the ontology language Ontology Inference Layer (OIL) as an extension of RDFS. An important advantage to our approach is that it ensures maximal sharing of meta data on the Web: even partial interpretation of an OIL ontology by less semantically aware processors will yield a correct partial interpretation of the meta data.
Article
A Uniform Resource Identifier (URI) is a compact string of characters for identifying an abstract or physical resource. This document defines the generic syntax of URI, including both absolute and relative forms, and guidelines for their use; it revises and replaces the generic definitions in RFC 1738 and RFC 1808.
Article
Reusable ontologies are becoming increasingly important for tasks such as information integration, knowledge-level interoperation, and knowledgebase development. We have developed a set of tools and services to support the process of achieving consensus on common shared ontologies by geographically distributed groups. These tools make use of the worldwide web to enable wide access and provide users with the ability to publish, browse, create, and edit ontologies stored on an ontology server. Users can quickly assemble a new ontology from a library of modules. We discuss how our system was constructed, how it exploits existing protocols and browsing tools, and our experience supporting hundreds of users. We describe applications using our tools to achieve consensus on ontologies and to integrate information. The Ontolingua Server may be accessed through the URL http://ontolingua.stanford.edu/ 1. INTRODUCTION 1.1 The Need for Ontologies In order for an agent to make state...