Conference PaperPDF Available

Towards an ontology based large repository for managing heterogeneous knowledge resources

Authors:

Abstract

Knowledge based applications require linguistic, terminological and ontological resources. These applications are used to fulfill a set of tasks such as semantic indexing, knowledge extraction from text, in-formation retrieval, etc. Using these resources and combining them for the same application is a tedious task with different levels of complex-ity. This requires their representation in a common language, extract-ing the required knowledge and designing effective large scale storage structures offering operators for resources management. For instance, ontology repositories were created to address these issues by collecting heterogeneous ontologies. They generally offer a more effective indexing of these resources than general search engines by generating alignments and annotations to ensure their interoperability. However, these reposi-tories treat a single category of resources and do not provide operations for reusing them. The aim of this research is building a large repository of knowledge resources. This repository is a collection of heterogenous resources represented in different languages and offers a set of operations to generate new resources based on the existing ones.
Towards an ontology based large repository for
managing heterogeneous knowledge resources
Nizar Ghoula and Gilles Falquet
ICLE, Centre Universitaire d’Informatique, University of Geneva, Switzerland
{Nizar.Ghoula,Gilles.Falquet}@unige.ch
Abstract. Knowledge based applications require linguistic, terminolog-
ical and ontological resources. These applications are used to fulfill a set
of tasks such as semantic indexing, knowledge extraction from text, in-
formation retrieval, etc. Using these resources and combining them for
the same application is a tedious task with different levels of complex-
ity. This requires their representation in a common language, extract-
ing the required knowledge and designing effective large scale storage
structures offering operators for resources management. For instance,
ontology repositories were created to address these issues by collecting
heterogeneous ontologies. They generally offer a more effective indexing
of these resources than general search engines by generating alignments
and annotations to ensure their interoperability. However, these reposi-
tories treat a single category of resources and do not provide operations
for reusing them. The aim of this research is building a large repository
of knowledge resources. This repository is a collection of heterogenous
resources represented in different languages and offers a set of operations
to generate new resources based on the existing ones.
Key words: Resources repository, Operations, Ontology of resources, Knowl-
edge representation
1 Introduction
Knowledge extraction and representation is a widely explored research problem.
Most of the proposed solutions to this problem are based on the usage of aux-
iliary knowledge resources [1]. This knowledge currently exists in resources of
different types such as terminologies, glossaries, ontologies, multilingual dictio-
naries or aligned text corpora. These resources are represented using various
formalisms and languages such as predicate logic, description logic, semantic
networks and conceptual graphs, etc. As part of an application that requires the
use of external resources, a designer is often required to perform painstaking
research and pre-treatment in order to collect and build adequate resources to
his application needs. Resolving this problem relies on finding at first the right
resources before extracting the required knowledge and then representing it in
a common formalism. It is then important to have repositories offering access
2 Nizar Ghoula et al.
to more diverse resources in different formalisms. Moreover, the right knowledge
resource for an application must be constructed and adapted to the application.
This adaptation may involve operations such as selecting a part of a resource,
composing it with another one, translating it to another language or representing
it in a different formalism [2] [3] [4].
In this paper, we present a model and a taxonomy of abstract operations for
managing and extracting knowledge from resources. We consider the possibility
of combining these operators to perform complex processes such as semantic
enrichment or generating a new resource by merging some other resources.
2 Methodology
A central point of our approach is to build a repository of knowledge resources.
This repository should offer the possibility to store and integrate heterogenous
knowledge resources and organize their usage in common context. It should also
offer operators for managing and combining these resources. For this we have
proposed a three steps methodology:
propose a method and a formalism allowing to represent heterogeneous ter-
minological, linguistic and ontological knowledge resources;
define the major representation languages by means of the repository’s con-
cepts (Resource, Entity, Relation, etc.);
define a set of operations performed on these resources to generate new
resources bases on some criteria;
propose multiple implementations per operator depending on the resource
type and the representation language;
implement a resources repository to study and resolve scalability problems
that arise by evaluating the usability of such a system.
Our approach is not focused on a particular domain, it aims to represent
different resources from diverse domains and manipulate them using different
operations. We distinguish two categories of resources. The first category is about
autonomous resources like ontologies, corpora or terminologies. These resources
are widely used in multiple applications of knowledge management. The second
one represents enrichment resources like annotations or alignments. They link
two or more autonomous resources and they result from the application of a
process on autonomous resources.
3 State of the art
For managing heterogeneous resources in large knowledge repositories we need
to resolve the problem of resources representation and storage at first and then
address the problem of defining and implementing resources management opera-
tors (collected from existing approaches and classified by type such as alignment
operators, annotation services, translation mechanisms, etc.).
Towards an ontology based repository for managing heterogeneous resources 3
3.1 Knowledge resources repositories
Some large repositories have been created to offer a more effective indexing for
knowledge resources than common search engines. For example, Swoogle1in-
dexes more than 10 000 ontologies; DAML repository2provides search based
on ontology components (classes, properties, . . . ) or metadata (URI, funding
source, . . . ); BioPortal3has similar searching and browsing tools [5] and offers
the possibility to annotate and align different ontologies. Many other portals [6]
[7] offer access to linguistic or ontological resources. However, these portals are
dedicated each for a specific category of resources (Swoogle is focused on ontolo-
gies, ACL4, CLARIN5or META-NET6are focused on corpora and linguistic
resources).
A repository containing heterogeneous types of knowledge resources is needed.
Hence, multiple languages for representing these resources are required. For this
purpose, it is necessary to develop a set of knowledge resources operators that
can import, export and process these resources while keeping a trace of their
origin (the provenance of the resources, for example externally imported or gen-
erated from the combination of multiple ones).
3.2 Resources representation models
There are many models for knowledge representation, but they usually focus
on one or two aspects only: ontological, terminological, lexical, textual, docu-
mentary, etc. It is more difficult to find models representing various aspects of
knowledge or resources of different kinds. For the integration of heterogeneous re-
sources, [8] have proposed a model of terminologies and ontologies. This remains
faithful to the representation of each resource model without using common ab-
stract entities. For example, instead of considering a term or a concept as an
abstract entity these classes have different representations depending on the re-
source, which creates redundancy in the instances. A model of the multilingual
aspect in ontology has been proposed by [9], its development is an association
between a meta-model of ontologies and a linguistic model. Another model to
unify the management of linguistic resources in multilingual environment has
been developed to centralize the management of linguistic resources within a
platform called Intuition [10]. This model is characterized by its exploration of
the structure of linguistic forms. The application of this model allows to rep-
resent ontological entities and identify lexical units by taking into account the
syntactic and semantic multilingual relations. This model cannot represent pure
linguistic resources. [11] proposed a Linguistic Meta-Model (LMM) allowing a
1http://swoogle.umbc.edu
2http://www.daml.org/ontologies
3http://bioportal.bioontology.org
4http://www.aclweb.org
5http://www.clarin.eu/external/
6http://www.meta-net.eu
4 Nizar Ghoula et al.
semiotic-cognitive representation of knowledge and linguistic resources. It rep-
resents individuals and facts in an open domain perspective.
In our case, we need to preserve the originality of all resources and treat
them within their original context and representation language. This is why we
propose a meta-model treating a resource as an entity in the repository. Each
resource can have different derivations which are also resources represented in
different languages.
3.3 Resources re-engineering
In the context of mapping linguistic and ontological resources, [12] have pro-
posed an approach to integrate and merge Wikipedia and WordNet to enrich
an ontology (YAGO7). The ontology is extracted from these two resources by
adding new facts8extracted from Wikipedia as individuals, classes from the con-
ceptual categories in Wikipedia and each ”synset” of WordNet. This approach
shows that the combination of multiple resources makes possible building or ex-
tending existing resources. Another methodology [13] focuses on a pattern based
approach for re-engineering non-ontological resources into ontologies. This type
of approach is a perfect component or a framework to add in the repository.
It offer a comparative study of re-engineering methods of non-ontological re-
sources. By means of this framework we can design a decision support algorithm
for choosing the best reuse method based on the type of the resource since all
reuse methods are supposed to be implemented by means of services or operators
in the repository.
4 A meta-model for integrating heterogeneous resources
Since there exist many different (and incompatible) ways to express knowledge
in resources (from formal logic to semi-formal or natural languages). Moreover,
the same resource may be involved in processes that can only handle specific
representation formalisms. For instance, an ontology alignment algorithm might
be implemented for OWL ontologies, while another algorithm might be about
resources in a WordNet-like model. It can be the same for other processes like
automated text annotation, multilingual text alignment, word sense disambigua-
tion, etc.
We have proposed a MOF-based model9to unify the representation of het-
erogeneous resources in a common formalism [14]. This model allows to describe
the metadata of any kind of knowledge resource and then associate different
representations (derivations) of the resource’s content in many languages (for-
malisms) which are by them selves represented in the repository by means of a
7Yet Another Great Ontology
8relative to all existing data in a knowledge base
9MOF is an acronym for Meta-Object Facility: http://www.omg.org/mof/
Towards an ontology based repository for managing heterogeneous resources 5
common terminology (namespace of the repository). The implementation of this
model includes an ontology, called TOK Onto10.
Depending on the user’s needs, a resource in the repository can be represented
differently using multiple languages, each language uses a subset of the resource’s
entities and link them in a different way compared to another language (for
example, a class hierarchy representation links the concepts of an ontology using
the subClassOf relation which leads to a different derivation of this resource,
otherwise a semantic network representation of that resource will lead to the use
of another set of relations). Table 1 shows some example of languages that have
been described in the current version of the repository.
Table 1. Examples of resource content models (languages) and their principal compo-
nents
Model Components
Concept hierarchy Concept, ISA Relation, . . .
WordNet Like Concept, Term, Lexical Form,
Hypernym Relation,
Meronym Relation,
Term Form Relation, . . .
Graph ontology Class, Taxonomic Relation, Relation,
Relation Label, etc.
Translation memory Text Segment, Language, Transla-
tion Relation, Language Relation
Ontology Alignment Concept, Correspondence Relation, . . .
For example, to represent an ontology we can focus on the hierarchy of classes
if we need it in a task of classification. We can also represent the same ontology
by focusing on axioms and complex expressions using logics if we need it for a
reasoning task.
5 Taxonomy of operations on knowledge resources
The aim of a resources repository is not only to collect heterogenous knowledge
resources but especially to offer instruments for reusing them. In order to for-
malize the definition of processes over these resources, we have defined a set of
generic primitive operations. We represented then an abstract class of opera-
tors in the repository’s ontology in order to manage multiple implementations
for each operator and to represent restrictions about each implementation. We
define a process as a sequence of operators applied on resources’ derivations.
By means of processes descriptions we managed to construct a process dictio-
nary that stores each instance of a process and apply it each time there is an
10 http://cui.unige.ch/isi/onto/tok/OWL Doc/
6 Nizar Ghoula et al.
evolution in the involved resources. Therefore, we must develop a subsequent
meta-operators. The definition of these operators depends on the treatment of
the resources.
5.1 Representation operators
These are the basic construction operators for representations. The abstraction
and reification operations create the resources in the repository and map them
to their original derivation in the repository (representation of the resource in
its original language). Language mapping operations creates new derivations in
other languages.
Importation or abstraction We denote by iRL the import operation that pro-
duces an instance of a resource Rin the resources repository and by creating the
content of the resource in its original languageL. This operation can be followed
by a derivation which produces a derivation of the resource in a representation
language.
Exportation or reification We denote by eRL the export operation that trans-
forms a derivation of a resource Rexpressed in a language Land its metadata
into an external file in a certain formalism related to the derivation’s language.
Reification is generally used at the end of a process (sequence of operations)
to produce the new resource. Consequently this operator can have as much in-
stances as the possible combinations from the representation languages imple-
mented in the repository (for example OWL, UML, DL, Graphs, etc.) to the
possible required formats (txt, xml, rdf, ttl, n3, etc.).
Derivation This abstract operator is used to create new representations of
a resource in different languages (represented already in the repository). For
instance, an UML class diagram could be derived into a Class diagram represen-
tation, then mapped to WordNet-like lexical ontology model (by dropping all the
associations except part-of and subclass). Since a derivation may “forget” infor-
mation, in general µL2L1is not the inverse of µL1L2. It is not always necessary
to preserve the entire contents of a resource when deriving a new representation
of its content (this can be compared to generating a view in the relational ap-
proach). In particular, if the representation language is less expressive than the
original language it is obvious that some knowledge will be lost.
5.2 Enrichment operators
The enrichment operations generate new alignments or annotations on existing
resources. They are generally based on sophisticated algorithms (more precisely
heuristics) and use auxiliary resources like lexical ontologies.
Towards an ontology based repository for managing heterogeneous resources 7
Alignement Alignment allows to express explicitly the correspondences be-
tween resources [15]. An alignment method consists of defining a distance be-
tween the entities of a resource and calculating the best match between them
by minimizing the distance measure or maximizing the similarity measure [16].
An alignment operator takes as input two resources Riand Rjrepresented in
a language L1and a set of auxiliary resources represented in other languages
L2, . . . to produce an alignment resource represented in a language Lal.
The signature of this operator is :
OpAlign :L1, L1,[L2, . . .](L1, Lal)
Lal is a language that includes the alignment relations used to represent the
correspondences (v,, etc.), OpALIGN is the operator used for the alignment.
A typical example of the need for simplified languages is the ontology align-
ment task. Most of the current alignment algorithms can align ontologies repre-
sented in OWL language, but they do not take advantage of all the semantics
expressed in such ontologies [17]. They are based on the textual labels attached
to each class and the structure of the ontology. The structure of a used resource
is generally a graph representing the class hierarchy and a set of properties relat-
ing two classes, e.g. there is an axiom of the form Class1vproperty only/some
Class2. In this case, it is much more appropriate to represent an OWL ontology
by its graph instead of the full description logic model. This will adapt the re-
sources for the alignment algorithms that are able to align any type of ontology
expressed as a labelled graph.
Annotation The annotation operator is used to describe elements of a resource
R1in terms of a resource R2, this description is through adding a set of rela-
tionships between entities of these resources according an annotation language.
The signature of this operator is:
OPAnn :L1, L2L1, L2, Lann
where L1is the language of the resource’s derivation to annotate and L2, . . .
are the languages of the resources’ derivations that serve as reference in the
annotation. Lann is the annotation language. For example, word sense disam-
biguation is a kind of annotation operation. Starting from a natural language
text and a reference lexical ontology (and possibly other resources), it produces a
set of correspondences between the text words and their meanings (the concepts
of the ontology).
5.3 Selection and combination operations
These operations are intended to produce new resources’ derivations by selecting
and combining entities of one or more resources.
8 Nizar Ghoula et al.
Selection This type of operation selects entities from a resource’s derivation
to generate a new resource’s derivation in the same language. This filtering is
specified by a boolean function applied on each entity. The computation of the
filtering function for a resource entity may depend on other entities from the
same resource or others entities associated to it by means of annotations or
alignments. In addition, the selection may generate a natural alignment between
entities of the original and new resource’s derivations. Each selected entity is
associated to its original entity.
The signature of a selection operation is of the form
OpSel :L1L1
where L1is the language of the input resource and the resulting selection.
For instance, in a description logic ontology, this operator can select indi-
viduals in the ABox (Assertional Box), leaving the TBox (Terminological Box)
untouched (as in a database selection) or it can select a subset of the TBox,
and hence drop the ABox entities that depend on unselected TBox concepts or
roles (as in a database projection).
Composition Composition operations may be applied on alignments and anno-
tations. It is an operator that generates new derivation of the composed resources
in the same language.
The composition of two alignment resources (from S1to S2and from S2to S3
results in a new alignment resource from S1to S3. The semantics (relation type)
of the resulting alignment depends on the relation types of the given alignments.
If A1and A2have the same relation type Rand Ris transitive, then A1A2
has type R.
Merge The idea of the merge operation is to build a new resource by taking
all the entities of two given resources [18] [3]. Depending on the representation
language, the operation can take different forms. For example, using the merge
operator on two ontologies in the language DL (description logic) is reduced to
perform the union operation of their vocabularies and axioms:
(merge) disjoint union of the vocabularies and axioms plus equivalence and
subsumption axioms corresponding to the given alignment;
(replace) if named concept Cof an ontology O1is aligned (equivalence) with
the named concept Dof an ontology O2then the operators drops every
axiom that defines C(C. . . and Cv. . .), keeps the axioms that define D
and add the axiom CD. This is a way to replace the definitions given in
O1by those in O2(used, for instance, when O2is considered as more reliable
than O1).
The signature of the merge operator has the form:
OpMerge :L1, L1,[Lal](L1)[Lal ]
Towards an ontology based repository for managing heterogeneous resources 9
This operator takes as parameters a list of resources represented in the same
language and uses auxiliary resources such alignments between them. Merging
two alignments or annotations can occur only if they are about a common re-
source. First, for each resource Rito merge, we must consolidate and merge
all correspondences whose source is Riand represented in the same alignment
language Lal. A multiple inputs and outputs alignment resource is constructed
and represented within the language Lal. Both the set of resources to merge and
the constructed alignment provide required ingredients for the merge.
6 Conclusion and Further work
Our main objective is to build a large repository for integrating heterogeneous re-
sources represented in different languages. We have identified three major steps
for implementing this repository. First we have defined an upper level model
for representing knowledge resources and dealing with different representation
languages. Then we have defined a set of abstract operators having multiple
implementations in order to combine the content of the repository and generate
new resources from existing ones. We will focus on defining examples and a set of
use cases in order to validate this approach and finally address the scalability is-
sues. To ensure the usage of the repository by means of knowledge representation
and resources management operators we are currently focusing on the following
issues: (1) define a model for each processing task using resources, these tasks
models should be the result of a reflection on a set of use cases; (2) define and
implement a set of heuristics for the automatic detection of entity mappings to
construct alignments between resources during the execution of any task.
For the third part of this research we will focus on the experimentation
and the implementation of the repository. An implementation of a prototype
is intended to prove the research results and define software requirements by
studying the available technologies and APIs that can be used. For instance, we
should address the following issues:
evaluation and study of RDF storage approaches must be driven to select
the best storage API to use for storing knowledge resources especially focus
on the scalability issues;
for the sake of generality we should investigate the possibilities for providing
resources management operators using web services;
define the interface that should be used for the repository’s portal and the
define the criteria of accessibility and user profiles.
References
1. Hendler, J., Golbeck, J.: Metcalfe’s law, web 2.0, and the semantic web. Web
Semant. 6(February 2008) 14–20
2. D’Aquin, M., Schlicht, A., Stuckenschmidt, H., Sabou, M.: Criteria and evaluation
for ontology modularization techniques, Berlin, Heidelberg, Springer-Verlag (2009)
67–89
10 Nizar Ghoula et al.
3. Pinto, H.S., Martins, J.P.: A methodology for ontology integration. In: K-
CAP’01:Proceedings of the 1st international conference on Knowledge capture,
New York, NY, USA, ACM (2001) 131–138
4. Sabou, M., Lopez, V., Motta, E.: Ontology selection: Ontology evaluation on the
real semantic web. In: In Workshop: Evaluation of Ontologies for the Web (EON
2006), 15th International World Wide Web Conference, Edinburgh (2006)
5. Noy, N.F., Shah, N., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Montegut, M.,
Rubin, D.L., Youn, C., Musen, M.A.: Bioportal: A web repository for biomedical
ontologies and data resources. In: International Semantic Web Conference (Posters
& Demos). (2008)
6. Sabou, M., Dzbor, M., Baldassarre, C., Angeletou, S., Motta, E.: Watson: A
gateway for the semantic web. In: Poster session of the European Semantic Web
Conference, ESWC. (2007)
7. Kiryakov, A., Ognyanov, D., Manov, D.: Owlim - a pragmatic semantic repository
for owl. In: WISE Workshops. (2005) 182–192
8. Vandenbussche, P.Y., Charlet, J.: M´eta-mod`ele g´en´eral de description de ressources
terminologiques et ontologiques. In Gandon, F.L., ed.: Actes d’IC, PUG (2009)
193–204
9. Montiel-Ponsoda, E., Aguado de Cea, G., G´omez-P´erez, A., Peters, W.: Mod-
elling multilinguality in ontologies. In: Coling 2008: Companion volume: Posters,
Manchester, UK, Coling 2008 Organizing Committee (August 2008) 67–70
10. Cailliau, F.: Un mod`ele pour unifier la gestion de ressources linguistiques en
contexte multilingue. In Mertens, P., ed.: Verbum ex machina: actes de la 13e
Conf´erence sur le Traitement Automatique des Langues Naturelles (TALN 2006) :
Leuven., Presses univ. de Louvain, 2006 (2006) 454–461
11. Picca, D., Gliozzo, A.M., Gangemi, A.: Lmm: an owl-dl metamodel to represent
heterogeneous lexical knowledge. In: LREC, European Language Resources Asso-
ciation (2008)
12. Suchanek, F., Kasneci, G., Weikum, G.: YAGO: A core of semantic knowledge
- unifying WordNet and Wikipedia. In Williamson, C.L., Zurko, M.E., Patel-
Schneider, Peter F. Shenoy, P.J., eds.: 16th International World Wide Web Con-
ference (WWW 2007), Banff, Canada, ACM (2007) 697–706
13. Garc´ıa-Silva, A., G´omez-P´erez, A., Su´arez-Figueroa, M.C., Villaz´on-Terrazas, B.:
A pattern based approach for re-engineering non-ontological resources into ontolo-
gies. In: Proceedings of the 3rd Asian Semantic Web Conference on The Semantic
Web. ASWC ’08, Berlin, Heidelberg, Springer-Verlag (2008) 167–181
14. Ghoula, N., Falquet, G., Guyot, J.: Tok: A meta-model and ontology for hetero-
geneous terminological, linguistic and ontological knowledge resources. In Huang,
J.X., King, I., Raghavan, V.V., Rueger, S., eds.: Web Intelligence, IEEE (2010)
297–301
15. Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: the state of the art. Knowl.
Eng. Rev. 18(1) (2003) 1–31
16. Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., Trojahn, C.: Journal
on data semantics xv. Springer-Verlag, Berlin, Heidelberg (2011) 158–192
17. Shvaiko, P., Euzenat, J.: Ontology matching: State of the art and future challenges.
IEEE Transactions on Knowledge and Data Engineering 99(PrePrints) (2011)
18. Noy, N.F., Musen, M.A.: Anchor-PROMPT: Using Non-Local Context for Se-
mantic Matching. In: Workshop on Ontologies and Information Sharing at the
Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-2001),
Seattle, WA (2001)
... We created different categories of knowledge engineering operators and we defined new operators. We also created a library of knowledge engineering operators based on the existing operators from the literature [Ghoula et al. 2011, Ajmi et al. 2012, Ghoula & Falquet 2012. ...
... Other operations such as the edition, updates and versioning are supposed to be built-in components within the repository and do not require multiple implementations. These operators are also part of knowledge resources combination operators but they are not represented as a category in our taxonomy [Ghoula & Falquet 2012]. ...
... We proposed an approach to represent knowledge engineering operators and proposed a taxonomy of resources combination and combination operators [Ghoula et al. 2011, Ajmi et al. 2012, Ghoula & Falquet 2012]. These operators have different signatures and can be represented using the operator's model. ...
Research
Full-text available
Multiple tasks related to documents, such as indexing, retrieving, annotation, or translation are based on linguistic, terminological and ontological knowledge existing in resources of different types represented using various formalisms. Building bridges between these resources and using them together is a complex task. Solving this problem relies on finding the right resources before extracting the required data. Ontology repositories have been created to help in this task by collecting ontologies and offering effective indexing of these resources. However, these repositories treat a single category of resources and do not provide operations for generating new resources. To meet these needs in terms of knowledge engineering, our contributions are (1) an ontology for representing heterogeneous resources and knowledge combination operators; (2) an approach based on the principles of semantic web to ensure the representation, storage and alignment of heterogeneous resources and (3) the development of an ontology-based repository for combining alignment resources.
Article
Full-text available
Researchers in the ontology-design field have developed the content for ontologies in many domain areas. Recently, ontologies have become increasingly common on the World- Wide Web where they provide semantics for annotations in Web pages. This distributed nature of ontology development has led to a large number of ontologies covering overlapping domains, which researchers now need to merge or align to one another. The processes of ontology alignment and merging are usually handled manually and often constitute a large and tedious portion of the sharing process. We have developed and implemented Anchor-PROMPT—an algorithm that finds semantically similar terms automatically. Anchor-PROMPT takes as input a set of anchors—pairs of related terms defined by the user or automatically identified by lexical matching. Anchor- PROMPT treats an ontology as a graph with classes as nodes and slots as links. The algorithm analyzes the paths in the subgraph limited by the anchors and determines which classes frequently appear in similar positions on similar paths. These classes are likely to represent semantically similar concepts. Our experiments show that when we use Anchor-PROMPT with ontologies developed independently by different groups of researchers, 75% of its results are correct.
Conference Paper
Full-text available
Biomedical ontologies provide essential domain knowledge to drive data integration, infor- mation retrieval, data annotation, natural-language processing, and decision support. The Na- tional Center for Biomedical Ontology is developing BioPortal, a Web-based system that serves as a repository for biomedical ontologies. BioPortal denes relationships among those ontolo- gies and between the ontologies and online data resources such as PubMed, ClinicalTrials.gov, and the Gene Expression Omnibus (GEO). BioPortal supports not only the technical require- ments for access to biomedical ontologies either via Web browsers or via Web services, but also community-based participation in the evaluation and evolution of ontology content. BioPortal enables ontology users to learn what biomedical ontologies exist, what a particular ontology might be good for, and how individual ontologies relate to one another. BioPortal is available online at http://bioportal.bioontology.org.
Conference Paper
Full-text available
OWLIM is a high-performance Storage and Inference Layer (SAIL) for Sesame, which performs OWL DLP reasoning, based on forward-chaining of entilement rules. The reasoning and query evaluation are performed in- memory, while in the same time OWLIM provides a reliable persistence, based on N-Triples files. This paper presents OWLIM, together with an evaluation of its scalability over synthetic, but realistic, dataset encoded with respect to PROTON ontology. The experiment demonstrates that OWLIM can scale to millions of statements even on commodity desktop hardware. On an almost- entry-level server, OWLIM can manage a knowledge base of 10 million ex- plicit statements, which are extended to about 19 millions after forward chain- ing. The upload and storage speed is about 3,000 statement/sec. at the maximal size of the repository, but it starts at more than 18,000 (for a small repository) and slows down smoothly. As it can be expected for such an inference strategy, delete operations are expensive, taking as much as few minutes. In the same time, a variety of queries can be evaluated within milliseconds. The experiment shows that such reasoners can be efficient for very big knowledge bases, in scenarios when delete operations should not be handled in real-time.
Conference Paper
Full-text available
Documents are rich resources containing knowledge describing a specific domain. That's why their processing is a common task, which is based on the use of terminological and ontological resources. Various types of ontologies, thesauri, and a large list of resources are commonly used in the process of knowledge extraction. The modeling and reuse of these resources is intended to support knowledge management. In this paper, we propose a methodology and a model for ontological and terminological resource management. Our aim is to build a resources repository that offers operations for loading, storing, indexing, translating, generating and matching different resources. In this contribution we propose an ontology as a model of these resources and we explain how can we represent, annotate and load new resources into our repository.
Conference Paper
Full-text available
an impending need for institutions worldwide with valuable linguistic resources in different natural languages. Since most ontologies are developed in one language, obtaining multilingual ontologies implies to localize or adapt them to a concrete language and culture community. As the adaptation of the ontology conceptualization demands considerable efforts, we propose to modify the ontology terminological layer, and provide a model called Linguistic Information Repository (LIR) that associated to the ontology meta-model allows terminological layer localization.
Article
After years of research on ontology matching, it is reasonable to consider several questions: is the field of ontology matching still making progress? Is this progress significant enough to pursue further research? If so, what are the particularly promising directions? To answer these questions, we review the state of the art of ontology matching and analyze the results of recent ontology matching evaluations. These results show a measurable improvement in the field, the speed of which is albeit slowing down. We conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching. We present such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field.
Article
The power of the Web is enhanced through the network effect produced as resources link to each other with the value determined by Metcalfe's law. In Web 2.0 applications, much of that effect is delivered through social linkages realized via social networks online. Unfortunately, the associated semantics for Web 2.0 applications, delivered through tagging, is generally minimally hierarchical and sparsely linked. The Semantic Web suffers from the opposite problem. Semantic information, delivered through ontologies of varying amounts of expressivity, is linked to other terms (within or between resources) creating a link space in the semantic realm. However, the use of the Semantic Web has yet to fully realize the social schemes that provide the network of users. In this article, we discuss putting these together, with linked semantics coupled to linked social networks, to deliver a much greater effect.
Conference Paper
Ontology mapping is seen as a solution provider in today's landscape of ontology research. As the number of ontologies that are made publicly available and accessible on the Web increases steadily, so does the need for applications to use them. A single ontology is no longer enough to support the tasks envisaged by a distributed environment like the Semantic Web. Multiple ontologies need to be accessed from several applications. Mapping could provide a common layer from which several ontologies could be accessed and hence could exchange information in semantically sound manners. Developing such mappings has been the focus of a variety of works originating from diverse communities over a number of years. In this article we comprehensively review and present these works. We also provide insights on the pragmatics of ontology mapping and elaborate on a theoretical approach for defining ontology mapping.
Conference Paper
Abstract We present YAGO, a light-weight and extensible ontology with high cov- erage and quality. YAGO builds on entities and relations and currently contains roughly 900,000 entities and 5,000,000 facts. This includes the Is- A hierarchy as well as non-taxonomic relations between entities (such as hasWonPrize). The facts have been automatically extracted from the uni- fication of Wikipedia and WordNet, using a carefully designed combination of rule-based and heuristic methods described in this paper. The resulting knowledge base is a major step beyond WordNet: in quality by adding knowl- edge about individuals like persons, organizations, products, etc. with their semantic relationships ‐ and in quantity by increasing the number of facts by more than an order of magnitude. Our empirical evaluation of fact correct- ness shows an accuracy of about 95%. YAGO is based on a logically clean model, which is decidable, extensible, and compatible with RDFS. Finally, we show how YAGO can be further extended by state-of-the-art information