ArticlePDF Available

SIMPLE-OWL: a generative lexicon ontology for nlp and the semantic web

Authors:

Abstract and Figures

This research deals with the modelling of a Generative Lex-icon based ontology to be used in the Semantic Web and Natural Lan-guage Processing semantic tasks. This ontology is imported from a exist-ing computational Lexical Resource and is converted to the W3C stan-dard Web Ontology Language. This presents some challenges, as for ex-ample the multidimensionality of the original ontology, which are covered in the current paper. The result of this research is an OWL compliant semantically rich and linguistically-based ontology, thus useful to the automatic processing of text within the Semantic Web paradigm.
Content may be subject to copyright.
SIMPLE-OWL: a Generative Lexicon Ontology
for NLP and the Semantic Web
Antonio Toral and Monica Monachini
{antonio.toral, monica.monachini}@{ilc.cnr.it}
Istituto di Linguistica Computazionale
Consiglio Nazionale delle Ricerche
Via G. Moruzzi 1 - 56124 Pisa, Italy
Abstract. This research deals with the modelling of a Generative Lex-
icon based ontology to be used in t he Semantic Web and Natural Lan-
guage Processing semantic tasks. This ontology is imported from a exist-
ing computational Lexical Resource and is converted to the W3C stan-
dard Web Ontology Language. This presents some challenges, as for ex -
ample the multidimensionality of the original ontology, which are covered
in the current paper. The result of this research is an OWL compliant
semantically rich and linguistically-based ontology, thus useful to the
automatic processing of text within the Semantic Web paradigm.
1 Introduction
The Semantic Web is an evolving extension of the World Wide Web
in which content can be expressed not only in natural language like is
done nowadays but also in a formalised way so that the appropriate
software will be able to do automatic processing of the content. In this
new paradigm ontologies will play a central role.
The Web Ontology Language (OWL) is a W3C recommendation and
a major technology for the Semantic Web. It is defi ned by [1] as “a
semantic markup language for publishing and sharing ontologies on the
World Wide Web”. OWL allows applications to p rocess the content of
information instead of just presenting it to the user [2].
On the other hand, ontologies are recognised as an important component
in Natural Language Processing (NLP) systems that seek to deal with the
semantic level of language. In fact, most, if not all of th e semantic lexical
resources within the area (e.g. WordNet [3], CYC [4], SIMPLE [5]), have
in common the presence of an ontology as a core module. Besides, there
is research in progress on applying ontologies to semantic NLP, e.g. [6].
The fact that OWL is the ontology language for t he Semantic Web and
that it provides a formal semantic representation as well as reasoning
capabilities has encouraged the NLP community to convert existing re-
sources to this language. Work in this area includes, for example, the
conversion of WordNet [7] and MeSH [8] and, moreover, the proposal of
a general meth od for converting thesauri [9].
The current paper deals with the modelling of a linguistic ontology im-
ported from a computational lexicon into OWL. According to [10], lin-
guistic ontologies offer an immense potential for gathering information
from the Web. Moreover, [11] motivates the choice of a linguistic-based
ontology for Natural Language Processing. However, these kind of on-
tologies present some issues and challenges [12] that need to be taken
into account.
The rest of the paper is organised as follows. Section 2 introduces the
computational lexicon used in this research. Nex t, section 3 covers the
modelling of the ontology in the OWL language. Subsequently, section 4
discuss the results of the modelling through an example. Last, in section
5 we present some conclusions and future lines.
2 The PAROLE-SIMPLE-CLIPS computational
lexicon
SIMPLE [5] is a large-scale project sponsored by the European Union
devoted to the development of wide-coverage multipurposed and har-
monised computational semantic lexica for twelve European languages
(Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Ital-
ian, Portuguese, Spanish and Swedish). A language-independent ontol-
ogy of semantic types and a set of templates were designed and developed
in order to guarantee uniformity and consistency among the monolingual
dictionaries. In the framework of this project, 10,000 word meanings were
annotated for each language.
SIMPLE should be considered as a follow up of a previous European
project, PAROLE [13], as it adds a semantic layer to a subset of the
morphologic and syntactic layers that were developed by the latter. SIM-
PLE provides t hus multi-layered lexica, as the information is encoded at
different descriptive levels (morphological, syntactic and semantic). Al-
though the information included for these levels is mutually independent,
the layers are connected by one-to-one, one-to-many or many-to-one links
(e.g. a syntactic unit is linked with one or more semantic units depending
on the number of meanings that the sy ntactic entry conveys).
CLIPS is an Italian national project which enlarged and refined the Ital-
ian PAROLE-SIMPLE lexicon [14]. The core data encoded within SIM-
PLE was extended in CLIPS with a new set of lexical units selected from
the PAROLE corpus according to frequency-based criteria. The resulting
lexical resource contains 387,267 phonetic units, 53,044 morphological
units, 37,406 sy ntactic units and 28,346 semantic u nits.
From a theoretical p oint of view, the linguistic background of PAROLE-
SIMPLE-CLIPS (PSC) is based on the Generative Lexicon (GL) th eory
[15]. In the GL, the sense is viewed as a complex bundle of orthogonal
dimensions that express the multidimensionality of word meaning. The
most important component for representing the lexical semantics of a
word sense is the qualia structure which consists of four qualia roles:
Formal role. Makes it possible to identify an entity.
Constitutive role. Expresses the constitution of an entity.
Agentive role. Provides information about the origin of an entity.
Telic role. Specifies the function of an entity.
Each qualia role can be considered as an independent element or dimen-
sion of t he vocabulary for semantic description. The qualia structure
enables to ex press different or orthogonal aspects of word sense whereas
a one-dimensional inheritance can only capture standard hyperonymic
relations. Within SIMPLE, the qualia structure was ext ended by assign-
ing subtypes to each of the qualia roles (e.g. “Usedfor” and “Usedby”
are subtypes of the telic role).
The elements involved in the semantic description of PSC that are con-
sidered for the current modelling (see section 3) are semantic types, re-
lations, features, templates and predicates. Each of these is briefly de-
scribed in the following subsections.
2.1 Semantic types
The semantic types are the nodes that make up the ontology. A pe-
culiar trait of the adopted ontology is the fact that it consists of both
simple types, which identify only a one-dimensional aspect of meaning
expressed by hyperonymic relations, and unified types, which express
multidimensional aspects of meaning by combining subtyping relations
and orthogonal semantic dimensions.
The ontology consists of 153 language-independent semantic types. The
top types are mappable to the ontology of EuroWordNet [16]. The design
of the ontology is highly influenced by the GL model. In fact, the top
nodes are the semantic type “Entity” and three other types named after
the agentive, constitutive and telic qualia roles (“Agentive”, “Constitu-
tive” and “Telic”). These three nodes are designed to include semantic
units definable only in terms of qualia dimensions. The direct subtypes
of the node “Entity” are the semantic types “Concrete
Entity”, “Prop-
erty”, “Abstract
Entity”, Representation” and “Event”. Figure 1 shows
these nod es.
Fig. 1. Top nod es of the PSC ontology
2.2 Relations and features
Relations and features are the elements of PSC that allow to assign
attributes to lexicon units (also called semantic units or word senses).
While relations are used to link two semantic units (e.g. “Usedfor” links
“bisturi to “incidere (engrave)”), features allow to link a semantic unit
to a value within a closed range (e.g. “PLUS
EDIBLE” links “panino
(sandwich)” to the value “yes”). Relations and features can be defined
as prototypical within templates (see subsection 2.3), in this case they
act as type-defining for the semantic units included in these templates.
2.3 Templates
Temp lates act as blueprints for any given type in the ontology and pro-
vide the conditions of well-formedness and constraints for lexical items
belonging to that type. The template is a help and a guide for the en-
coding of information referring to th e ontology. The template structure
is built like a schema that works as an interface between the lexicon and
the ontology: it imposes constraints for the belonging of a given seman-
tic unit to a semantic type. A constraint can take one of the following
values:
Yes. The information is mandatory. I.e. every semantic unit that
belongs to the semantic typ e should initialise this prop erty.
RecYes. The information is mandatory and the cardinality can be
higher than one. I.e. a semantic unit can be linked to more than one
element via this property.
No. The information is optional.
RecNo. The information is optional and the cardinality can be higher
than one.
E.g. Table 1 shows the constraints present in the template corresponding
to the semantic type “Art ifact
Food” of the PSC ontology.
Table 1. Template for the semantic type Artifact
Food
Item Type constraint value
Createdby relation RecYes
Madeof relation RecNo
Objectoftheactivity relation RecYes
PLUS EDIBLE feature Yes
2.4 Predicates
Predicates are assigned to the predicative semantic units (verbs, deverbal
nouns, etc.) of the lexicon. A predicate is made up of a set of arguments,
each of which is linked to a semantic role and to a selectional restriction.
A selectional restriction can be a semantic type, a semantic u nit or a
notion, which is a cluster of restrictions combining features and semantic
types. The following examples show the pred icates for th e word senses
“guidare (drive)” and “ronzare (whirr)”:
The predicate for “guidare” contains two arguments. The first ar-
gument has the semantic role “Agent” and has a restriction which
is a notion called “ArgHu manHumanGroup”. The second argument
has the semantic role “Patient” and has a restriction which is the
semantic type “Vehicle”.
The predicate for “ronzare” is made up of only one argument. This
has the semantic role “Agent” and a restriction which is the semantic
unit “insetto (insect)”
3 Modelling
This section describes the modelling of the ontology of PSC into OWL.
The different relevant aspects of the OWL ontology are treated in the
following subsections.
3.1 Ontology classes
Semantic types, as aforementioned, are the nodes th at constitute the on-
tology. Therefore, they are modelled in OWL as classes. A one-dimension
taxonomy (i.e. GL formal dimension) is created following the structure
of the original PSC ontology. The incorporation of multidimensionality
to the OWL ontology is considered in 3.7. Finally, all siblings across the
class tax onomy are made disjoint.
3.2 Object properties
Relations are modelled as object properties. The only excep t ion is the
formal qualia relation “Isa” which is modelled with the “owl:subclassOf
OWL relation. As in the case of the semantic types, a taxonomy has
been built for relations. The top nodes are the different relation types
present in t he PSC model, i.e. four types for the correspondant qualia
roles (agentive, constitutive, formal and telic) and others for non-qualia
relations (antonymy, derivational, metaphor, metonymy, polysemy and
synonymy). Domain and range are both set to the top node of the ontol-
ogy for non-qualia relations while for qualia relations both are set to the
ontology classes “Entity” and the class that corresponds to the specific
qualia type (“Agentive”, “Constitutive”, “Formal” or “Telic”).
3.3 Datatype properties
Features are modelled as datatype properties. Differently than for rela-
tions, features form a plain taxonomy, i.e. there are not hyperonymy/hyponymy
relationships. Templates information is used to establish the domain, as
this is defined as the union of the classes for which the feature is defined
(e.g. “Foo d”, “Vegetable”, etc. for t he feature “PLUS
EDIBLE”). The
range is set to boolean as so far only th ese kind of features have been
imported.
3.4 Cardinality restrictions
The application of template constraints to semantic types, as represented
in t he templates (see table 1), is modelled with cardinality restrictions.
To each value corresponds a different cardinality restriction, as shown in
table 2.
Table 2. Mapping template constraints to cardinality restrictions
Template constraint value OWL cardinality value
Yes min 1, max 1
RecYes min 1
No min 0, max 1
RecNo min 0
3.5 Quantifier restrictions
Within an ontology, quantifier restrictions allow to establish, for a re-
striction applied over a property to a source class, the target class/es
of this restriction. E.g. In University of Manchester’s pizza ontology
1
, a
restriction over the prop erty “hasTopping” is applied to t he source class
“Pizza” and the target class of this restriction is “MozzarellaTopping”.
There are two different types of quantifiers: existential () and universal
(). An existential restriction describes the set of individuals that, for a
given property, have at least one relationship with ind ividuals that are
members of the t arget class. On the other hand, an universal restriction
describes the set of ind ividuals that, for a given property, only have
relationships with indiv iduals that are members of the target class.
Despite of the fact that the PSC ontology does not contain the semantic
types that are the target to a given restriction, this information can
inderectly been extracted from the lexicon and, after some generalisation,
be used to enrich the ontology.
For a given constraint over a relation that belongs to a template, we ex-
tract all the occurrences of the relation in the semantic units that belong
to the template’s semantic typ e. These are made up of a source semantic
unit that belongs to the current semantic type and a target semantic
unit. I.e. they link two semantic units. E.g. the semantic unit “bisturi
(scalpel)” that belongs to the semantic type “Instrument” is linked by
the relation “Usedby” to the semantic unit “chirurgo (surgeon)” that be-
longs to the semantic type “Profession”. For each of these occurrences,
we extract the semantic type to which the target semantic un it of the
relation belongs. Therefore, we obtain a list of target semantic types.
Afterwards, these are generalised in this way: if in the list it is present a
1
see http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf
semantic type and one ancestor of it, then the descendant semantic type
is deleted from the list. For example, there are 47 semantic units in the
semantic type “Food” that instantiate the telic relation “Objectoftheac-
tivity”, out of which we obtain the target class “Relational A ct”.
Regarding the quantifier type, we add an universal restriction to all the
constraints while existential restrictions are only applied to that con-
straints of type “Yes” or “RecYes” as an existential restriction implies a
minimum cardinality greater than zero. Following with the previous ex-
ample, both an existential and an universal quantifier restrictions would
be added for the relation “Objectoftheactivity” as its constraint value is
“RecYes”.
3.6 Predicates
Predicates are modelled in OWL with functional object properties. We
have created a property for each of the 15 semantic roles defined in
PSC (agent, patient, kinship, beneficiary, etc). Besides, when restrictions
are expressed by notions and semantic units (see subsection 2.4), these
need to be brought back to the correspondent semantic types. Semantic
units become its correspondent semantic type (e.g. t he semantic unit
“insetto” is assigned the semantic type “Animal”). Regarding notions, we
have established an equivalent class for each of them (e.g. the equ ivalent
class for the notion “ArgHumanHumanGroup” is “Human
Group OR
Human”).
Although semantic predicates are not included in PSC at the ontology
level, they are defined in the lexicon. The challenge consists then in estab-
lishing generic predicates for the nodes of th e ontology of a predicative
nature (the “Events” semantic type and its subclasses) by generalising
them from the predicates present for the semantic units that belong to
these semantic types. Concretely, given a semantic type and a set of
predicates (those of the corresponding semantic units), we generalise the
selectional restrictions that belong to each of t he different predicative
semantic roles to one or more semantic types.
A clear parallelism can be established between this issue and that intro-
duced above in 3.5 as also here we have to generalise the target of rela-
tionships to semantic types. The difference, however, is that the previous
case consisted in finding for a set of semantic units the corresponding se-
mantic types whereas in this case not only semantic units need to be
translated into semantic types but also notions (a selectional restriction
can be a semantic type, semantic unit or a notion). Afterwards, as in
3.5 again, a qu antifier restriction is introduced over each predicative se-
mantic role relation. The target of t he restriction is the semantic type/s
result of the generalisation of the gathered set of semantic types.
3.7 Muldimensionality
As previously stated in section 2, an important feature of the PSC model
is its ability to capture the different dimensions of word meaning, accord-
ing to the GL qualia roles (formal, constitutive, agentive and telic). In
fact, the PSC ontology, besides containing unidimensional nodes, con-
tains also n odes which are multidimensional (they are made up of a for-
mal dimension but also of one or more add itional qualia dimension/s).
Therefore, in the resulting OWL ontology it should b e possible t o repre-
sent this multidimensionality. This is modelled in two ways:
Relations of a qualia type allow to identify multidimensional nodes
when applied as restrictions to them. I.e. if an ontology class has
a mandatory restriction over a qualia object property, then we can
identify it as having this qualia role as an additional defining di-
mension. E.g. the class “Artifact
Food” has the agentive additional
dimension as an agentive qualia relation (“Createdby”) is defined as
a mandatory restriction for this class.
We have created an additional OWL class for each of the qualia types
with a necessary and sufficient minimum cardinality restriction with
value 1 over the correspondent qualia top property. For example,
the telic class (“TelicType”) would have the necessary and sufficient
cardinality restriction “hasTelic minimum 1” and thus, all classes
that have as a necessary constraint any telic relation as mandatory
would be subsu med and automatically add ed as subclasses through
inference. This exploits the reasoning capabilities of OWL allowing
us to obtain the taxonomy of nodes that have any of th e qualia
dimensions by calculating the inferred classified taxonomy of the
OWL ontology.
Figure 2 provides an example of the representation of unified types
through inference. “AgentiveType”, “ConstitutiveType” and “Telic-
Type” are t he additional classes. As it can be seen in the figure,
the classes “Building” and “Artifact” are inferred to have additional
agentive and telic dimensions whereas “Food” and “Drink” contain
an additional telic dimension.
Fig. 2. Unified types in the OWL ontology
4 Discussion
This section examines the resulting OWL ontology by introdu cing an
example with the information encoded for a class of this ontology. In
order to v isualise and check the consistency of the created OWL ontology
we have utilised the Prot´eg´e ontology editor with its OWL plugin
2
[17]
together with two OWL reasoners: FaCT++
3
and Pellet
4
[18].
The top nodes of the resulting ontology are depicted in figure 3. As it can
be seen, it is made up of the top nod es of the original ontology (see figure
1) plus additional nodes for the qualia types to infer the taxonomies for
the additional GL dimensions.
Fig. 3. Top nodes of the OWL ontology
Figure 4 shows the information encoded (asserted conditions) present
in the class “Artifact Food” of t he output ontology. The figure presents
two different areas, the upper one includes the necessary conditions, those
specific of the class, whereas in the lower part we find the inherited con-
ditions, those that the cu rrent class takes from its superclasses by inher-
itance. This picture allows us to compare the encoded constraints with
the information present in the correspondent temp late of the original
PSC ontology (see table 1). For each relation we can see in the resulting
ontology the corresponding cardinality and quantifier restrictions, the
latter including target classes extracted from the lexicon. Regarding the
only feature present in the original template, “PLUS
EDIBLE”, the cor-
respond ent minimum and maximum cardinality restrictions are shown
in the inherited part of the figure as the direct superclass (“Food”) in-
troduces as well these constraints and thus there is n o need to explicitly
rep eat th e same information for the class “Artifact
Food”.
5 Conclusions
This paper has studied the transformation of a Lexico-semantic R esource
based in the GL theory into the Semantic Web ontology language. The
paper has described the modelling choices to convert different elements
of the original resources into OWL. The approach followed has proved
to success to formalise the GL ontology in the stand ard OWL language.
The conversion allows the ontology to be processed and checked by stan-
dard reasoners. This can be useful for building semantic applications as
2
http://protege.stanford.edu/overview/protege-owl.html
3
http://owl.man.ac.uk/factplusplus/
4
http://pellet.owldl.com/
Fig. 4. Asserted conditions for the class “Artifact Food” in the resulting OWL ontology
well as to enhance the quality of the resource by validating it (through
reasoning we can look for inconsistencies or conflicts).
It should be noted that through the transformation studied regarding
quantifier restrictions and predicates we obtain a language independent
enriched ontology from language-dependent (Italian) lexico-semantic in-
formation.
The result of the current research is a semantically rich ontology with rea-
soning capabilities interfaced to a lexicon. Its possible uses are twofold.
First, as it is an OWL ontology it could be used in Semantic Web appli-
cations. Second, because of its semantic richness, it is a valuable resource
for semantic Natural Language Processing tasks.
As for future work, this ontology is a key element of a broader forthcom-
ing research which is aimed at guiding automatic lexico-semantic Text
Mining and Knowledge Acquisition procedures which, in their turn, have
the goal of gathering knowledge to enrich the PSC computational lexicon.
References
1. Dean, M., Schreiber, G.: OWL web ontology language reference.
W3C recommendation, W3C (2004)
2. Mcguinness, D.L., van Harmelen, F.: Owl web ontology language
overview (2004)
3. Fellbaum, C.: WordNet: An Electronic Lexical Database (Language,
Speech, and Communication). The MIT Press (1998)
4. Lenat, D. In: From 2001 to 2001: Common sense and the mind of
HAL. MIT Press, Cambridge, MA (1998) 193–208
5. Lenci, A., Bel, N ., Busa, F., Calzolari, N., Gola, E., Monachini, M.,
Ogonowski, A., Peters, I ., Peters, W., Ruimy, N., Villegas, M., Zam-
polli, A.: Simple: A general framework for the development of multi-
lingual lexicons. International Journal of Lexicography 13(4) (2000)
249–263
6. Pease, A., Niles, I., Li, J.: The Su ggested Upper Merged Ontology: A
large ontology for the semantic web and its applications. In: Working
Notes of the AAAI-2002 Workshop on Ontologies and the Semantic
Web, Edmonton, Canada (2002)
7. van Assem, M., Gangemi, A., Schreiber, G.: Conversion of Word-
Net to a standard RDF/OWL representation. In: Proceedings of the
Fifth International Conference on Language Resources and Evalua-
tion (LREC’06), Genoa, Italy (2006)
8. Soualmia, L.F., Golbreich, C., Darmoni, S.J.: Representing the mesh
in owl: Towards a semi-automatic migration. In: KR-MED. (2004)
81–87
9. van Assem, M., Menken, M.R., Schreiber, G., Wielemaker, J.,
Wielinga, B.: A method for converting thesauri t o rdf/owl. In:
Proceedings of the Third International Semantic Web Conference
(ISWC’04). Number 3298 in Lecture Notes in Computer Science,
Hiroshima, Japan (2004) 17–31
10. Guarino, N., Masolo, C., Vetere, G.: Ontoseek: Content-based access
to the web (1999)
11. Dahlgren, K.: A linguistic ontology. In: International Workshop on
Formal Ontology. (1994) 165–173
12. Hirst, G.: Ontology and the lexicon. In Staab, S., Studer, R., eds.:
Handbook on Ontologies. International H andbooks on Information
Systems. Springer (2004) 209–230
13. Ruimy, N., Corazzari, O., Gola, E., Spanu, A., Calzolari, N., Zam-
polli, A.: The european le-parole project: The italian syntactic lex-
icon. In: Proceedings of the First International Conference on Lan-
guage Resources and Evaluation (LREC’98), Granada, Spain (1998)
14. Ruimy, N., Monachini, M., Distante, R., Guazzini, E., Molino, S.,
Ulivieri, M., Calzolari, N., Zampolli, A.: Clips, a multi-level italian
computational lexicon: A glimpse to data. In: Proceedings of the
Third International Conference on Language Resources and Evalu-
ation (LREC’02), Las Palmas de Gran Canaria, Spain (2002)
15. Pustejovsky, J.: The generative lexicon. Computational Linguistics
17(4) (1991) 409–441
16. Vossen, P.: Introduction to EuroWordNet. Computers and the Hu-
manities 32 ( 1998) 73–89
17. Knublauch, H., Fergerson, R.W., Noy, N.F., Musen, M.A.: The pro-
tege owl plugin: An open development environment for semantic web
applications. In: Proceedings of the Third International Semantic
Web Conference. (2004)
18. Sirin, E., Parsia, B.: Pellet: An owl dl reasoner. In Haarslev, V.,
oller, R., eds.: Description Logics. Volume 104 of CEUR Workshop
Proceedings., CEUR-WS.org (2004)
... We decided to test this hypothesis by studying the conversion of the SIMPLE ontology 4 into the formal language OWL [22]. The SIMPLE ontology is not defined in a formal language, but contains cardinal semantic constraints regarding relations and features encoded in a systematic way. ...
Chapter
This contribution, aims at highlighting the strong interconnection between lexicons, terminologies and ontologies and especially the fundamental role that ontologies and lexica mutually play. Our view is that lexical resources are evolving in nature, from ontologically based lexicons we are going towards lexically based ontologies. We explore different instantiations of the current trend of using formal ontologies as a core module of computational lexicons, presenting the advantages especially in multilingual and terminological contexts. We present work showing that the lexical knowledge already present in non formal computational lexicons can be exploited to derive or enrich a formal ontology without much manual effort. In the terminology domain, we describe the construction of a resource for biology, directly linked to a parallel domain-ontology, that combines characteristics of both lexicons and terminologies, so that is can allow for intelligent access to content. Finally, we describe our experience in two projects in which formal ontologies play a central role in the context of multilingual computational lexicons, where the ontology is what acts as the glue among the different monolingual lexicons and what provides cross-lingual reasoning capabilities.
... The increasing importance of formal ontologies in LRs together with the availability of high quality and broad coverage, but non formal, computational lexicons developed during the last decades lead us to the question: can the lexical knowledge contained in these LRs be exploited to derive formal knowledge? We decided to test this hypothesis by studying the conversion of the SIM-PLE ontology 1 into the formal language OWL [14]. The SIMPLE ontology is not defined in a formal language, but contains cardinal semantic constraints regarding relations and features encoded in a systematic way. ...
Article
Full-text available
This paper deals with the relations between ontologies and lexicons. We study the role of these two components and their evolution during the last years in the fleld of Computational Linguistics. Subse- quently, we survey the current lines of research at ILC-CNR which tackle this topic. They involve (I) the reuse of already existing Lexical Resources to derive formal ontologies, (II) the conversion and combination of ter- minologies into rich and formal Lexical Resources and (III) the use of formal ontologies as the backbone of multilingual Lexical Resources.
... The following subsections briefly describe the elements of PSC that have been considered for the procedures treated in section 3. and how they are formalised in OWL. For a comprehensive explanation of this formalisation, refer to (Toral and Monachini, 2007), in which we analyse in detail this issue. ...
Article
Full-text available
This paper describes the automatic transformation of a Generative Lexicon (GL) based Ontology into OWL, the Semantic Web ontology language. Furthermore, the OWL ontology is automatically enriched by means of a bottom-up procedure that extracts additional semantic information (relationships, features, predicates and quantifier restrictions) from the lexicon. The contribution of this research is two-fold. On one hand, we introduce a methodology for the formalisation of GL ontologies. On the other, we have developed automatic procedures that bring out a formalised, reasoning-capable, and semantically rich ontology, thus suitable for Natural Language Processing semantic tasks.
Article
Full-text available
As the interest of the Semantic Web and computational linguistics communities in linguistic linked data (LLD) keeps increasing and the number of contributions that dwell on LLD rapidly grows, scholars (and linguists in particular) interested in the development of LLD resources sometimes find it difficult to determine which mechanism is suitable for their needs and which challenges have already been addressed. This review seeks to present the state of the art on the models, ontologies and their extensions to represent language resources as LLD by focusing on the nature of the linguistic content they aim to encode. Four basic groups of models are distinguished in this work: models to represent the main elements of lexical resources (group 1), vocabularies developed as extensions to models in group 1 and ontologies that provide more granularity on specific levels of linguistic analysis (group 2), catalogues of linguistic data categories (group 3) and other models such as corpora models or service-oriented ones (group 4). Contributions encompassed in these four groups are described, highlighting their reuse by the community and the modelling challenges that are still to be faced.
Article
This paper describes the publication and linking of (parts of) PAROLE SIMPLE CLIPS (PSC), a large scale Italian lexicon, to the Semantic Web and the Linked Data cloud using the lemon model. The main challenge of the conversion is discussed, namely the reconciliation between the PSC semantic structure which contains richly encoded semantic information, following the qualia structure of the Generative Lexicon theory and the lemon view of lexical sense as a reified pairing of a lexical item and a concept in an ontology. The result is two datasets: one consists of a list of lemon lexical entries with their lexical properties, relations and senses; the other consists of a list of OWL individuals representing the referents for the lexical senses. These OWL individuals are linked to each other by a set of semantic relations and mapped onto the SIMPLE OWL ontology of higher level semantic types.
Conference Paper
Full-text available
At the end of 2009 the review of a number of available Web services implementing linguistic processing chains (CLARIN deliverable D5R-3a, 2009) was prepared as part of Common Language Resources and Technology Infrastructure (CLARIN) Working Group 5.6 (LRT integration) activities. Basing on the showcases contributed by WG members, the summary of features of both chained and individual Web Services was compiled, preparing the ground for comparisons between selected linguistic properties of registered frame- works. The article aims at presenting preliminary generalizations regarding functionalities, communication standards and representation of linguistic resources being adopted as web services, which were initially put forward in the CLARIN paper. The major features of the tools are summarized to provide starting point for discussion over interchange formats and tagsets, standards of encoding of linguistic re- sources and linguistic data categories. Apart from concentrating on representation of linguistic annotation, very preliminary conclusions concern technical, formal and semantic interoperability of language resources.
Article
Full-text available
In this paper we discuss the development and application of a large formal ontology to the semantic web. The Suggested Upper Merged Ontology (SUMO) (Niles & Pease, 2001) (SUMO, 2002) is a "starter document" in the IEEE Standard Upper Ontology effort. This upper ontology is extremely broad in scope and can serve as a semantic foundation for search, interoperation, and communication on the semantic web.
Article
Full-text available
CLIPS is a multi-layered Italian computational lexicon based on the PAROLE-SIMPLE model. In this paper we briefly recall the main characteristics of the model and devote our attention to issues emerging from the encoding of large quantities of data, especially in relation to those types of syntactic and semantic information specific to our lexicon and that reflect innovative features of the underlying model. At syntactic level, we show how alternating structures may be encoded in a linguistically more elegant way by using framesets. We illustrate the connection between syntactic and semantic information, and show how the SIMPLE Italian lexicon approach to predicate selection has been refined in CLIPS. At semantic level, we illustrate the richness of information types encoded in a word sense description and the way such a wealth of data can be exploited. We stress in particular the expressive power of the Extended Qualia Structure yet mentioning some of its problematic aspects. We show that queries on qualia relations allow to retrieve lexical collocates, to extract domain specific information, semantic networks, and help interpreting modifying PPs in complex nominals. Finally, we show that features, which cut across the type hierarchy, have a stronger expressive power with respect to semantic types in identifying selectional preferences.
Article
Full-text available
This paper presents an overview of the work in progress at the W3C to produce a standard conversion of WordNet to the RDF/OWL representation language in use in the Semantic Web community. Such a standard representation is useful to provide application developers a high-quality resource and to promote interoperability. Important requirements in this conversion process are that it should be complete and should stay close to WordNet's conceptual model. The paper explains the steps taken to produce the conversion and details design decisions such as the composition of the class hierarchy and properties, the addition of suitable OWL semantics and the chosen format of the URIs. Additional topics include a strategy to incorporate OWL and RDFS semantics in one schema such that both RDF(S) infrastructure and OWL infrastructure can interpret the information correctly, problems encountered in understanding the Prolog source files and the description of the two versions that are provided (Basic and Full) to accommodate different usages of WordNet.
Conference Paper
Full-text available
This paper describes a method for converting existing the- sauri and related resources from their native format to RDF(S) and OWL. The method identifies four steps in the conversion process. In each step, decisions have to be taken with respect to the syntax or se- mantics of the resulting representation. Each step is supported through a number of guidelines. The method is illustrated through conversions of two large thesauri: MeSH and WordNet.
Conference Paper
Full-text available
We introduce the OWL Plugin, a Semantic Web extension of the Proteg´ e ontology development platform. The OWL Plugin can be used to edit ontologies in the Web Ontology Language (OWL), to access description logic reasoners, and to acquire instances for semantic markup. In many of these features, the OWL Plugin has created and facilitated new practices for building Semantic Web con- tents, often driven by the needs of and feedback from our users. Furthermore, Prot´ ege's flexible open-source platform means that it is easy to integrate custom- tailored components to build real-world applications. This document describes the architecture of the OWL Plugin, walks through its most important features, and discusses some of our design decisions.
Book
with a preface by George Miller WordNet, an electronic lexical database, is considered to be the most important resource available to researchers in computational linguistics, text analysis, and many related areas. Its design is inspired by current psycholinguistic and computational theories of human lexical memory. English nouns, verbs, adjectives, and adverbs are organized into synonym sets, each representing one underlying lexicalized concept. Different relations link the synonym sets. The purpose of this volume is twofold. First, it discusses the design of WordNet and the theoretical motivations behind it. Second, it provides a survey of representative applications, including word sense identification, information retrieval, selectional preferences of verbs, and lexical chains. Contributors: Reem Al-Halimi, Robert C. Berwick, J. F. M. Burg, Martin Chodorow, Christiane Fellbaum, Joachim Grabowski, Sanda Harabagiu, Marti A. Hearst, Graeme Hirst, Douglas A. Jones, Rick Kazman, Karen T. Kohl, Shari Landes, Claudia Leacock, George A. Miller, Katherine J. Miller, Dan Moldovan, Naoyuki Nomura, Uta Priss, Philip Resnik, David St-Onge, Randee Tengi, Reind P. van de Riet, Ellen Voorhees.
Article
This paper gives a global introduction to the aims and objectives of the EuroWordNet project, and it provides a general framework for the other papers in this volume. EuroWordNet is an EC project that develops a multilingual database with wordnets in several European languages, structured along the same lines as the Princeton WordNet. Each wordnet represents an autonomous structure of language-specific lexicalizations, which are interconnected via an Inter-Lingual-Index. The wordnets are built at different sites from existing resources, starting from a shared level of basic concepts and extended top-down. The results will be publicly available and will be tested in cross-language information retrieval applications.