Content uploaded by Roberto Saia
Author content
All content in this area was uploaded by Roberto Saia on Jun 06, 2016
Content may be subject to copyright.
Introducing a Weighted Ontology to Improve the
Graph-based Semantic Similarity Measures
Roberto Saia, Ludovico Boratto, Salvatore Carta
Dip.to di Matematica e Informatica, Universit`
a di Cagliari
Via Ospedale 72 - 09124 Cagliari, Italy
Email: {roberto.saia, ludovico.boratto, salvatore}@unica.it
Abstract—The semantic similarity measures are designed to
compare terms that belong to the same ontology. Many of these are
based on a graph structure, such as the well-known lexical database
for the English language, named WordNet, which groups the words
into sets of synonyms called synsets. Each synset represents a unique
vertex of the WordNet semantic graph, through which is possible
to get information about the relations between the different synsets.
The literature shows several ways to determine the similarity between
words or sentences through WordNet (e.g., by measuring the distance
among the words, or by counting the number of edges between
the correspondent synsets), but almost all of them do not take into
account the peculiar aspects of the used dataset. In some contexts
this strategy could lead toward bad results, because it considers only
the relationship between vertexes of the WordNet semantic graph,
without giving them a different weight based on the synsets frequency
within the considered datasets. In other words, common synsets and
rare synsets are valued equally. This could create problems in some
applications, such as those of recommender systems, where WordNet
is exploited to evaluate the semantic similarity between the textual
descriptions of the items positively evaluated by the users, and the
descriptions of the other ones not evaluated yet. In this context, we
need to identify the user preferences as best as possible, and not
taking into account the synsets frequency, we risk to not recommend
certain items to the users, since the semantic similarity generated
by the most common synsets present in the description of other
items could prevail. This work faces this problem, by introducing
a novel criterion of evaluation of the similarity between words (and
sentences) that exploits the WordNet semantic graph, adding to it the
weight information of the synsets. The effectiveness of the proposed
strategy is verified in the recommender systems context, where the
recommendations are generated on the basis of the semantic similarity
between the items stored in the user profiles, and the items not
evaluated yet.
Index Terms—Semantic Graph, Semantic Analysis, Ontology,
Graph Theory, Metrics
I. INTRODUCTION
The use of the semantic measures of similarity [1] has
spread over the past decades, and this is related with the
coming of the so-called Semantic Web [2], as well as, more
generally, with the needs to interpret the users preferences in
a non-schematic mode, in order to understand the concepts
connected with a text, instead of using the single terms,
disjointed from the concepts that they express.
When operating through a metric, in order to determine the
level of semantic similarity between concepts, it is assumed
that this takes place within a specific ontology [3], [4], related
with the terms used in the operating environment. The level
of similarity between two or more terms, is usually performed
by measuring their distance within an ontology. The main
objective of these semantic operations is to provide a standard
(and non supervised) approach of evaluation of the informa-
tion. This evaluation is crucial in many environments, such
as the commercial ones that provide forms of personalization
and have to interpret the preferences of the users [5], or the
medical applications that have to analyze the medical reports
automatically [6].
Many approaches map the terms of an ontology exploiting
a graph structure, such as WordNet1, the widespread approach
considered in this work, which is a semantic graph were each
vertex represents a distinct set of synonyms called synset (i.e.,
a set of words that denote the same concept). The WordNet
graph is a Directed Acyclic Graph (DAG), where each vertex
vis an integer that identifies a synset, and each directed edge
that connects vwith wdenotes that wis a hypernym of v.
The literature proposes several approaches able to evaluate
the semantic similarity among concepts, i.e., Jiang and Con-
rath [7], Leacock and Chodorow [8], Lin [9], Resnik [10], and
Wu and Palmer [11]. Some of them exploit graph structures
such as WordNet, and are based on the measure of the
shortest path length between vertexes (synsets). A limit of
these approaches is that they consider only the relationship
between vertexes of the WordNet semantic graph, without
giving them a different weight based on the synsets frequency
within the considered datasets (i.e., common synsets and rare
synsets are valued equally). This could create problems in
some contexts, where it is important to take into account
the synsets frequency, such as the recommender systems,
where the semantic similarity generated by the items with
most common synsets in their description could prevent the
recommendation of other relevant items with rare synsets.
In this work, we present a strategy aimed to evaluate the se-
mantic similarity between words or sentences, that introduces
a novel way to define and use the ontology of synsets used
to build the WordNet semantic graph. The proposed approach,
instead of a DAG graph, uses a Weighted Graph (W G) [12],
in order to introduce the weight of the synsets on the edges,
which is calculated through an inverse frequency criterion.
The new WordNet weighted graph gives the possibility to
characterize the operative context, attributing more importance
to some terms, and less to others, during the computation
of the semantic similarity. There are many contexts where
1http://wordnet.princeton.edu/