Content uploaded by Monica Monachini
Author content
All content in this area was uploaded by Monica Monachini
Content may be subject to copyright.
SIMPLE-OWL: a Generative Lexicon Ontology
for NLP and the Semantic Web
Antonio Toral and Monica Monachini
{antonio.toral, monica.monachini}@{ilc.cnr.it}
Istituto di Linguistica Computazionale
Consiglio Nazionale delle Ricerche
Via G. Moruzzi 1 - 56124 Pisa, Italy
Abstract. This research deals with the modelling of a Generative Lex-
icon based ontology to be used in t he Semantic Web and Natural Lan-
guage Processing semantic tasks. This ontology is imported from a exist-
ing computational Lexical Resource and is converted to the W3C stan-
dard Web Ontology Language. This presents some challenges, as for ex -
ample the multidimensionality of the original ontology, which are covered
in the current paper. The result of this research is an OWL compliant
semantically rich and linguistically-based ontology, thus useful to the
automatic processing of text within the Semantic Web paradigm.
1 Introduction
The Semantic Web is an evolving extension of the World Wide Web
in which content can be expressed not only in natural language like is
done nowadays but also in a formalised way so that the appropriate
software will be able to do automatic processing of the content. In this
new paradigm ontologies will play a central role.
The Web Ontology Language (OWL) is a W3C recommendation and
a major technology for the Semantic Web. It is defi ned by [1] as “a
semantic markup language for publishing and sharing ontologies on the
World Wide Web”. OWL allows applications to p rocess the content of
information instead of just presenting it to the user [2].
On the other hand, ontologies are recognised as an important component
in Natural Language Processing (NLP) systems that seek to deal with the
semantic level of language. In fact, most, if not all of th e semantic lexical
resources within the area (e.g. WordNet [3], CYC [4], SIMPLE [5]), have
in common the presence of an ontology as a core module. Besides, there
is research in progress on applying ontologies to semantic NLP, e.g. [6].
The fact that OWL is the ontology language for t he Semantic Web and
that it provides a formal semantic representation as well as reasoning
capabilities has encouraged the NLP community to convert existing re-
sources to this language. Work in this area includes, for example, the
conversion of WordNet [7] and MeSH [8] and, moreover, the proposal of
a general meth od for converting thesauri [9].
The current paper deals with the modelling of a linguistic ontology im-
ported from a computational lexicon into OWL. According to [10], lin-
guistic ontologies offer an immense potential for gathering information
from the Web. Moreover, [11] motivates the choice of a linguistic-based
ontology for Natural Language Processing. However, these kind of on-
tologies present some issues and challenges [12] that need to be taken
into account.
The rest of the paper is organised as follows. Section 2 introduces the
computational lexicon used in this research. Nex t, section 3 covers the
modelling of the ontology in the OWL language. Subsequently, section 4
discuss the results of the modelling through an example. Last, in section
5 we present some conclusions and future lines.
2 The PAROLE-SIMPLE-CLIPS computational
lexicon
SIMPLE [5] is a large-scale project sponsored by the European Union
devoted to the development of wide-coverage multipurposed and har-
monised computational semantic lexica for twelve European languages
(Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Ital-
ian, Portuguese, Spanish and Swedish). A language-independent ontol-
ogy of semantic types and a set of templates were designed and developed
in order to guarantee uniformity and consistency among the monolingual
dictionaries. In the framework of this project, 10,000 word meanings were
annotated for each language.
SIMPLE should be considered as a follow up of a previous European
project, PAROLE [13], as it adds a semantic layer to a subset of the
morphologic and syntactic layers that were developed by the latter. SIM-
PLE provides t hus multi-layered lexica, as the information is encoded at
different descriptive levels (morphological, syntactic and semantic). Al-
though the information included for these levels is mutually independent,
the layers are connected by one-to-one, one-to-many or many-to-one links
(e.g. a syntactic unit is linked with one or more semantic units depending
on the number of meanings that the sy ntactic entry conveys).
CLIPS is an Italian national project which enlarged and refined the Ital-
ian PAROLE-SIMPLE lexicon [14]. The core data encoded within SIM-
PLE was extended in CLIPS with a new set of lexical units selected from
the PAROLE corpus according to frequency-based criteria. The resulting
lexical resource contains 387,267 phonetic units, 53,044 morphological
units, 37,406 sy ntactic units and 28,346 semantic u nits.
From a theoretical p oint of view, the linguistic background of PAROLE-
SIMPLE-CLIPS (PSC) is based on the Generative Lexicon (GL) th eory
[15]. In the GL, the sense is viewed as a complex bundle of orthogonal
dimensions that express the multidimensionality of word meaning. The
most important component for representing the lexical semantics of a
word sense is the qualia structure which consists of four qualia roles:
– Formal role. Makes it possible to identify an entity.
– Constitutive role. Expresses the constitution of an entity.
– Agentive role. Provides information about the origin of an entity.
– Telic role. Specifies the function of an entity.
Each qualia role can be considered as an independent element or dimen-
sion of t he vocabulary for semantic description. The qualia structure
enables to ex press different or orthogonal aspects of word sense whereas
a one-dimensional inheritance can only capture standard hyperonymic
relations. Within SIMPLE, the qualia structure was ext ended by assign-
ing subtypes to each of the qualia roles (e.g. “Usedfor” and “Usedby”
are subtypes of the telic role).
The elements involved in the semantic description of PSC that are con-
sidered for the current modelling (see section 3) are semantic types, re-
lations, features, templates and predicates. Each of these is briefly de-
scribed in the following subsections.
2.1 Semantic types
The semantic types are the nodes that make up the ontology. A pe-
culiar trait of the adopted ontology is the fact that it consists of both
simple types, which identify only a one-dimensional aspect of meaning
expressed by hyperonymic relations, and unified types, which express
multidimensional aspects of meaning by combining subtyping relations
and orthogonal semantic dimensions.
The ontology consists of 153 language-independent semantic types. The
top types are mappable to the ontology of EuroWordNet [16]. The design
of the ontology is highly influenced by the GL model. In fact, the top
nodes are the semantic type “Entity” and three other types named after
the agentive, constitutive and telic qualia roles (“Agentive”, “Constitu-
tive” and “Telic”). These three nodes are designed to include semantic
units definable only in terms of qualia dimensions. The direct subtypes
of the node “Entity” are the semantic types “Concrete
Entity”, “Prop-
erty”, “Abstract
Entity”, “Representation” and “Event”. Figure 1 shows
these nod es.
Fig. 1. Top nod es of the PSC ontology
2.2 Relations and features
Relations and features are the elements of PSC that allow to assign
attributes to lexicon units (also called semantic units or word senses).
While relations are used to link two semantic units (e.g. “Usedfor” links
“bisturi ” to “incidere (engrave)”), features allow to link a semantic unit
to a value within a closed range (e.g. “PLUS
EDIBLE” links “panino
(sandwich)” to the value “yes”). Relations and features can be defined
as prototypical within templates (see subsection 2.3), in this case they
act as type-defining for the semantic units included in these templates.
2.3 Templates
Temp lates act as blueprints for any given type in the ontology and pro-
vide the conditions of well-formedness and constraints for lexical items
belonging to that type. The template is a help and a guide for the en-
coding of information referring to th e ontology. The template structure
is built like a schema that works as an interface between the lexicon and
the ontology: it imposes constraints for the belonging of a given seman-
tic unit to a semantic type. A constraint can take one of the following
values:
– Yes. The information is mandatory. I.e. every semantic unit that
belongs to the semantic typ e should initialise this prop erty.
– RecYes. The information is mandatory and the cardinality can be
higher than one. I.e. a semantic unit can be linked to more than one
element via this property.
– No. The information is optional.
– RecNo. The information is optional and the cardinality can be higher
than one.
E.g. Table 1 shows the constraints present in the template corresponding
to the semantic type “Art ifact
Food” of the PSC ontology.
Table 1. Template for the semantic type Artifact
Food
Item Type constraint value
Createdby relation RecYes
Madeof relation RecNo
Objectoftheactivity relation RecYes
PLUS EDIBLE feature Yes
2.4 Predicates
Predicates are assigned to the predicative semantic units (verbs, deverbal
nouns, etc.) of the lexicon. A predicate is made up of a set of arguments,
each of which is linked to a semantic role and to a selectional restriction.
A selectional restriction can be a semantic type, a semantic u nit or a
notion, which is a cluster of restrictions combining features and semantic
types. The following examples show the pred icates for th e word senses
“guidare (drive)” and “ronzare (whirr)”:
– The predicate for “guidare” contains two arguments. The first ar-
gument has the semantic role “Agent” and has a restriction which
is a notion called “ArgHu manHumanGroup”. The second argument
has the semantic role “Patient” and has a restriction which is the
semantic type “Vehicle”.
– The predicate for “ronzare” is made up of only one argument. This
has the semantic role “Agent” and a restriction which is the semantic
unit “insetto (insect)”
3 Modelling
This section describes the modelling of the ontology of PSC into OWL.
The different relevant aspects of the OWL ontology are treated in the
following subsections.
3.1 Ontology classes
Semantic types, as aforementioned, are the nodes th at constitute the on-
tology. Therefore, they are modelled in OWL as classes. A one-dimension
taxonomy (i.e. GL formal dimension) is created following the structure
of the original PSC ontology. The incorporation of multidimensionality
to the OWL ontology is considered in 3.7. Finally, all siblings across the
class tax onomy are made disjoint.
3.2 Object properties
Relations are modelled as object properties. The only excep t ion is the
formal qualia relation “Isa” which is modelled with the “owl:subclassOf”
OWL relation. As in the case of the semantic types, a taxonomy has
been built for relations. The top nodes are the different relation types
present in t he PSC model, i.e. four types for the correspondant qualia
roles (agentive, constitutive, formal and telic) and others for non-qualia
relations (antonymy, derivational, metaphor, metonymy, polysemy and
synonymy). Domain and range are both set to the top node of the ontol-
ogy for non-qualia relations while for qualia relations both are set to the
ontology classes “Entity” and the class that corresponds to the specific
qualia type (“Agentive”, “Constitutive”, “Formal” or “Telic”).
3.3 Datatype properties
Features are modelled as datatype properties. Differently than for rela-
tions, features form a plain taxonomy, i.e. there are not hyperonymy/hyponymy
relationships. Templates information is used to establish the domain, as
this is defined as the union of the classes for which the feature is defined
(e.g. “Foo d”, “Vegetable”, etc. for t he feature “PLUS
EDIBLE”). The
range is set to boolean as so far only th ese kind of features have been
imported.
3.4 Cardinality restrictions
The application of template constraints to semantic types, as represented
in t he templates (see table 1), is modelled with cardinality restrictions.
To each value corresponds a different cardinality restriction, as shown in
table 2.
Table 2. Mapping template constraints to cardinality restrictions
Template constraint value OWL cardinality value
Yes min 1, max 1
RecYes min 1
No min 0, max 1
RecNo min 0
3.5 Quantifier restrictions
Within an ontology, quantifier restrictions allow to establish, for a re-
striction applied over a property to a source class, the target class/es
of this restriction. E.g. In University of Manchester’s pizza ontology
1
, a
restriction over the prop erty “hasTopping” is applied to t he source class
“Pizza” and the target class of this restriction is “MozzarellaTopping”.
There are two different types of quantifiers: existential (∃) and universal
(∀). An existential restriction describes the set of individuals that, for a
given property, have at least one relationship with ind ividuals that are
members of the t arget class. On the other hand, an universal restriction
describes the set of ind ividuals that, for a given property, only have
relationships with indiv iduals that are members of the target class.
Despite of the fact that the PSC ontology does not contain the semantic
types that are the target to a given restriction, this information can
inderectly been extracted from the lexicon and, after some generalisation,
be used to enrich the ontology.
For a given constraint over a relation that belongs to a template, we ex-
tract all the occurrences of the relation in the semantic units that belong
to the template’s semantic typ e. These are made up of a source semantic
unit that belongs to the current semantic type and a target semantic
unit. I.e. they link two semantic units. E.g. the semantic unit “bisturi
(scalpel)” that belongs to the semantic type “Instrument” is linked by
the relation “Usedby” to the semantic unit “chirurgo (surgeon)” that be-
longs to the semantic type “Profession”. For each of these occurrences,
we extract the semantic type to which the target semantic un it of the
relation belongs. Therefore, we obtain a list of target semantic types.
Afterwards, these are generalised in this way: if in the list it is present a
1
see http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf
semantic type and one ancestor of it, then the descendant semantic type
is deleted from the list. For example, there are 47 semantic units in the
semantic type “Food” that instantiate the telic relation “Objectoftheac-
tivity”, out of which we obtain the target class “Relational A ct”.
Regarding the quantifier type, we add an universal restriction to all the
constraints while existential restrictions are only applied to that con-
straints of type “Yes” or “RecYes” as an existential restriction implies a
minimum cardinality greater than zero. Following with the previous ex-
ample, both an existential and an universal quantifier restrictions would
be added for the relation “Objectoftheactivity” as its constraint value is
“RecYes”.
3.6 Predicates
Predicates are modelled in OWL with functional object properties. We
have created a property for each of the 15 semantic roles defined in
PSC (agent, patient, kinship, beneficiary, etc). Besides, when restrictions
are expressed by notions and semantic units (see subsection 2.4), these
need to be brought back to the correspondent semantic types. Semantic
units become its correspondent semantic type (e.g. t he semantic unit
“insetto” is assigned the semantic type “Animal”). Regarding notions, we
have established an equivalent class for each of them (e.g. the equ ivalent
class for the notion “ArgHumanHumanGroup” is “Human
Group OR
Human”).
Although semantic predicates are not included in PSC at the ontology
level, they are defined in the lexicon. The challenge consists then in estab-
lishing generic predicates for the nodes of th e ontology of a predicative
nature (the “Events” semantic type and its subclasses) by generalising
them from the predicates present for the semantic units that belong to
these semantic types. Concretely, given a semantic type and a set of
predicates (those of the corresponding semantic units), we generalise the
selectional restrictions that belong to each of t he different predicative
semantic roles to one or more semantic types.
A clear parallelism can be established between this issue and that intro-
duced above in 3.5 as also here we have to generalise the target of rela-
tionships to semantic types. The difference, however, is that the previous
case consisted in finding for a set of semantic units the corresponding se-
mantic types whereas in this case not only semantic units need to be
translated into semantic types but also notions (a selectional restriction
can be a semantic type, semantic unit or a notion). Afterwards, as in
3.5 again, a qu antifier restriction is introduced over each predicative se-
mantic role relation. The target of t he restriction is the semantic type/s
result of the generalisation of the gathered set of semantic types.
3.7 Muldimensionality
As previously stated in section 2, an important feature of the PSC model
is its ability to capture the different dimensions of word meaning, accord-
ing to the GL qualia roles (formal, constitutive, agentive and telic). In
fact, the PSC ontology, besides containing unidimensional nodes, con-
tains also n odes which are multidimensional (they are made up of a for-
mal dimension but also of one or more add itional qualia dimension/s).
Therefore, in the resulting OWL ontology it should b e possible t o repre-
sent this multidimensionality. This is modelled in two ways:
– Relations of a qualia type allow to identify multidimensional nodes
when applied as restrictions to them. I.e. if an ontology class has
a mandatory restriction over a qualia object property, then we can
identify it as having this qualia role as an additional defining di-
mension. E.g. the class “Artifact
Food” has the agentive additional
dimension as an agentive qualia relation (“Createdby”) is defined as
a mandatory restriction for this class.
– We have created an additional OWL class for each of the qualia types
with a necessary and sufficient minimum cardinality restriction with
value 1 over the correspondent qualia top property. For example,
the telic class (“TelicType”) would have the necessary and sufficient
cardinality restriction “hasTelic minimum 1” and thus, all classes
that have as a necessary constraint any telic relation as mandatory
would be subsu med and automatically add ed as subclasses through
inference. This exploits the reasoning capabilities of OWL allowing
us to obtain the taxonomy of nodes that have any of th e qualia
dimensions by calculating the inferred classified taxonomy of the
OWL ontology.
Figure 2 provides an example of the representation of unified types
through inference. “AgentiveType”, “ConstitutiveType” and “Telic-
Type” are t he additional classes. As it can be seen in the figure,
the classes “Building” and “Artifact” are inferred to have additional
agentive and telic dimensions whereas “Food” and “Drink” contain
an additional telic dimension.
Fig. 2. Unified types in the OWL ontology
4 Discussion
This section examines the resulting OWL ontology by introdu cing an
example with the information encoded for a class of this ontology. In
order to v isualise and check the consistency of the created OWL ontology
we have utilised the Prot´eg´e ontology editor with its OWL plugin
2
[17]
together with two OWL reasoners: FaCT++
3
and Pellet
4
[18].
The top nodes of the resulting ontology are depicted in figure 3. As it can
be seen, it is made up of the top nod es of the original ontology (see figure
1) plus additional nodes for the qualia types to infer the taxonomies for
the additional GL dimensions.
Fig. 3. Top nodes of the OWL ontology
Figure 4 shows the information encoded (asserted conditions) present
in the class “Artifact Food” of t he output ontology. The figure presents
two different areas, the upper one includes the necessary conditions, those
specific of the class, whereas in the lower part we find the inherited con-
ditions, those that the cu rrent class takes from its superclasses by inher-
itance. This picture allows us to compare the encoded constraints with
the information present in the correspondent temp late of the original
PSC ontology (see table 1). For each relation we can see in the resulting
ontology the corresponding cardinality and quantifier restrictions, the
latter including target classes extracted from the lexicon. Regarding the
only feature present in the original template, “PLUS
EDIBLE”, the cor-
respond ent minimum and maximum cardinality restrictions are shown
in the inherited part of the figure as the direct superclass (“Food”) in-
troduces as well these constraints and thus there is n o need to explicitly
rep eat th e same information for the class “Artifact
Food”.
5 Conclusions
This paper has studied the transformation of a Lexico-semantic R esource
based in the GL theory into the Semantic Web ontology language. The
paper has described the modelling choices to convert different elements
of the original resources into OWL. The approach followed has proved
to success to formalise the GL ontology in the stand ard OWL language.
The conversion allows the ontology to be processed and checked by stan-
dard reasoners. This can be useful for building semantic applications as
2
http://protege.stanford.edu/overview/protege-owl.html
3
http://owl.man.ac.uk/factplusplus/
4
http://pellet.owldl.com/
Fig. 4. Asserted conditions for the class “Artifact Food” in the resulting OWL ontology
well as to enhance the quality of the resource by validating it (through
reasoning we can look for inconsistencies or conflicts).
It should be noted that through the transformation studied regarding
quantifier restrictions and predicates we obtain a language independent
enriched ontology from language-dependent (Italian) lexico-semantic in-
formation.
The result of the current research is a semantically rich ontology with rea-
soning capabilities interfaced to a lexicon. Its possible uses are twofold.
First, as it is an OWL ontology it could be used in Semantic Web appli-
cations. Second, because of its semantic richness, it is a valuable resource
for semantic Natural Language Processing tasks.
As for future work, this ontology is a key element of a broader forthcom-
ing research which is aimed at guiding automatic lexico-semantic Text
Mining and Knowledge Acquisition procedures which, in their turn, have
the goal of gathering knowledge to enrich the PSC computational lexicon.
References
1. Dean, M., Schreiber, G.: OWL web ontology language reference.
W3C recommendation, W3C (2004)
2. Mcguinness, D.L., van Harmelen, F.: Owl web ontology language
overview (2004)
3. Fellbaum, C.: WordNet: An Electronic Lexical Database (Language,
Speech, and Communication). The MIT Press (1998)
4. Lenat, D. In: From 2001 to 2001: Common sense and the mind of
HAL. MIT Press, Cambridge, MA (1998) 193–208
5. Lenci, A., Bel, N ., Busa, F., Calzolari, N., Gola, E., Monachini, M.,
Ogonowski, A., Peters, I ., Peters, W., Ruimy, N., Villegas, M., Zam-
polli, A.: Simple: A general framework for the development of multi-
lingual lexicons. International Journal of Lexicography 13(4) (2000)
249–263
6. Pease, A., Niles, I., Li, J.: The Su ggested Upper Merged Ontology: A
large ontology for the semantic web and its applications. In: Working
Notes of the AAAI-2002 Workshop on Ontologies and the Semantic
Web, Edmonton, Canada (2002)
7. van Assem, M., Gangemi, A., Schreiber, G.: Conversion of Word-
Net to a standard RDF/OWL representation. In: Proceedings of the
Fifth International Conference on Language Resources and Evalua-
tion (LREC’06), Genoa, Italy (2006)
8. Soualmia, L.F., Golbreich, C., Darmoni, S.J.: Representing the mesh
in owl: Towards a semi-automatic migration. In: KR-MED. (2004)
81–87
9. van Assem, M., Menken, M.R., Schreiber, G., Wielemaker, J.,
Wielinga, B.: A method for converting thesauri t o rdf/owl. In:
Proceedings of the Third International Semantic Web Conference
(ISWC’04). Number 3298 in Lecture Notes in Computer Science,
Hiroshima, Japan (2004) 17–31
10. Guarino, N., Masolo, C., Vetere, G.: Ontoseek: Content-based access
to the web (1999)
11. Dahlgren, K.: A linguistic ontology. In: International Workshop on
Formal Ontology. (1994) 165–173
12. Hirst, G.: Ontology and the lexicon. In Staab, S., Studer, R., eds.:
Handbook on Ontologies. International H andbooks on Information
Systems. Springer (2004) 209–230
13. Ruimy, N., Corazzari, O., Gola, E., Spanu, A., Calzolari, N., Zam-
polli, A.: The european le-parole project: The italian syntactic lex-
icon. In: Proceedings of the First International Conference on Lan-
guage Resources and Evaluation (LREC’98), Granada, Spain (1998)
14. Ruimy, N., Monachini, M., Distante, R., Guazzini, E., Molino, S.,
Ulivieri, M., Calzolari, N., Zampolli, A.: Clips, a multi-level italian
computational lexicon: A glimpse to data. In: Proceedings of the
Third International Conference on Language Resources and Evalu-
ation (LREC’02), Las Palmas de Gran Canaria, Spain (2002)
15. Pustejovsky, J.: The generative lexicon. Computational Linguistics
17(4) (1991) 409–441
16. Vossen, P.: Introduction to EuroWordNet. Computers and the Hu-
manities 32 ( 1998) 73–89
17. Knublauch, H., Fergerson, R.W., Noy, N.F., Musen, M.A.: The pro-
tege owl plugin: An open development environment for semantic web
applications. In: Proceedings of the Third International Semantic
Web Conference. (2004)
18. Sirin, E., Parsia, B.: Pellet: An owl dl reasoner. In Haarslev, V.,
M¨oller, R., eds.: Description Logics. Volume 104 of CEUR Workshop
Proceedings., CEUR-WS.org (2004)