ArticlePDF Available

Abstract

The semantic similarity measures are designed to compare terms that belong to the same ontology. Many of these are based on a graph structure, such as the well-known lexical database for the English language, named WordNet, which groups the words into sets of synonyms called synsets. Each synset represents a unique vertex of the WordNet semantic graph, through which is possible to get information about the relations between the different synsets. The literature shows several ways to determine the similarity between words or sentences through WordNet (e.g., by measuring the distance among the words, by counting the number of edges between the correspondent synsets), but almost all of them do not take into account the peculiar aspects of the used dataset. In some contexts this strategy could lead toward bad results, because it considers only the relationship between vertexes of the WordNet semantic graph, without giving them a different weight based on the synsets frequency within the considered datasets. In other words, common synsets and rare synsets are valued equally. This could create problems in some applications, such as those of recommender systems, where WordNet is exploited to evaluate the semantic similarity between the textual descriptions of the items positively evaluated by the users, and the descriptions of the other ones not evaluated yet. In this context, we need to identify the user preferences as best as possible, and not taking into account the synsets frequency, we risk to not recommend certain items to the users, since the semantic similarity generated by the most common synsets present in the description of other items could prevail. This work faces this problem, by introducing a novel criterion of evaluation of the similarity between words (and sentences) that exploits the WordNet semantic graph, adding to it the weight information of the synsets. The effectiveness of the proposed strategy is verified in the recommender systems context, where the recommendations are generated on the basis of the semantic similarity between the items stored in the user profiles, and the items not evaluated yet.
Introducing a Weighted Ontology to Improve the
Graph-based Semantic Similarity Measures
Roberto Saia, Ludovico Boratto, Salvatore Carta
Dip.to di Matematica e Informatica, Universit`
a di Cagliari
Via Ospedale 72 - 09124 Cagliari, Italy
Email: {roberto.saia, ludovico.boratto, salvatore}@unica.it
AbstractThe semantic similarity measures are designed to
compare terms that belong to the same ontology. Many of these are
based on a graph structure, such as the well-known lexical database
for the English language, named WordNet, which groups the words
into sets of synonyms called synsets. Each synset represents a unique
vertex of the WordNet semantic graph, through which is possible
to get information about the relations between the different synsets.
The literature shows several ways to determine the similarity between
words or sentences through WordNet (e.g., by measuring the distance
among the words, or by counting the number of edges between
the correspondent synsets), but almost all of them do not take into
account the peculiar aspects of the used dataset. In some contexts
this strategy could lead toward bad results, because it considers only
the relationship between vertexes of the WordNet semantic graph,
without giving them a different weight based on the synsets frequency
within the considered datasets. In other words, common synsets and
rare synsets are valued equally. This could create problems in some
applications, such as those of recommender systems, where WordNet
is exploited to evaluate the semantic similarity between the textual
descriptions of the items positively evaluated by the users, and the
descriptions of the other ones not evaluated yet. In this context, we
need to identify the user preferences as best as possible, and not
taking into account the synsets frequency, we risk to not recommend
certain items to the users, since the semantic similarity generated
by the most common synsets present in the description of other
items could prevail. This work faces this problem, by introducing
a novel criterion of evaluation of the similarity between words (and
sentences) that exploits the WordNet semantic graph, adding to it the
weight information of the synsets. The effectiveness of the proposed
strategy is veried in the recommender systems context, where the
recommendations are generated on the basis of the semantic similarity
between the items stored in the user proles, and the items not
evaluated yet.
Index TermsSemantic Graph, Semantic Analysis, Ontology,
Graph Theory, Metrics
I. INTRODUCTION
The use of the semantic measures of similarity [1] has
spread over the past decades, and this is related with the
coming of the so-called Semantic Web [2], as well as, more
generally, with the needs to interpret the users preferences in
a non-schematic mode, in order to understand the concepts
connected with a text, instead of using the single terms,
disjointed from the concepts that they express.
When operating through a metric, in order to determine the
level of semantic similarity between concepts, it is assumed
that this takes place within a specic ontology [3], [4], related
with the terms used in the operating environment. The level
of similarity between two or more terms, is usually performed
by measuring their distance within an ontology. The main
objective of these semantic operations is to provide a standard
(and non supervised) approach of evaluation of the informa-
tion. This evaluation is crucial in many environments, such
as the commercial ones that provide forms of personalization
and have to interpret the preferences of the users [5], or the
medical applications that have to analyze the medical reports
automatically [6].
Many approaches map the terms of an ontology exploiting
a graph structure, such as WordNet1, the widespread approach
considered in this work, which is a semantic graph were each
vertex represents a distinct set of synonyms called synset (i.e.,
a set of words that denote the same concept). The WordNet
graph is a Directed Acyclic Graph (DAG), where each vertex
vis an integer that identies a synset, and each directed edge
that connects vwith wdenotes that wis a hypernym of v.
The literature proposes several approaches able to evaluate
the semantic similarity among concepts, i.e., Jiang and Con-
rath [7], Leacock and Chodorow [8], Lin [9], Resnik [10], and
Wu and Palmer [11]. Some of them exploit graph structures
such as WordNet, and are based on the measure of the
shortest path length between vertexes (synsets). A limit of
these approaches is that they consider only the relationship
between vertexes of the WordNet semantic graph, without
giving them a different weight based on the synsets frequency
within the considered datasets (i.e., common synsets and rare
synsets are valued equally). This could create problems in
some contexts, where it is important to take into account
the synsets frequency, such as the recommender systems,
where the semantic similarity generated by the items with
most common synsets in their description could prevent the
recommendation of other relevant items with rare synsets.
In this work, we present a strategy aimed to evaluate the se-
mantic similarity between words or sentences, that introduces
a novel way to dene and use the ontology of synsets used
to build the WordNet semantic graph. The proposed approach,
instead of a DAG graph, uses a Weighted Graph (W G) [12],
in order to introduce the weight of the synsets on the edges,
which is calculated through an inverse frequency criterion.
The new WordNet weighted graph gives the possibility to
characterize the operative context, attributing more importance
to some terms, and less to others, during the computation
of the semantic similarity. There are many contexts where
1http://wordnet.princeton.edu/
Presentation
Full-text available
Similarity and Diversity: Two Sides of the Same Coin in the Evaluation of Data Streams
Chapter
Full-text available
this paper we introduce Ontologies as a basis for modelling enterprises. An ontology "consists of a representational vocabulary with precise definitions of the meanings of the terms of this vocabulary plus a set of formal axioms that constrain the interpretation and well-formed use of these terms." [Campbell & Shapiro 95]. Axioms provide the basis of an ontology's deductive capability. We begin by introducing the concept of a Generic Enterprise Model (GEM). We then extend the concept of a GEM to a Deductive Enterprise Model (DEM) and then briefly review research to date
Article
Full-text available
In recent years, semantic similarity measure has a great interest in Semantic Web and Natural Language Processing (NLP). Several similarity measures have been developed, being given the existence of a structured knowledge representation offered by ontologies and corpus which enable semantic interpretation of terms. Semantic similarity measures compute the similarity between concepts/terms included in knowledge sources in order to perform estimations. This paper discusses the existing semantic similarity methods based on structure, information content and feature approaches. Additionally, we present a critical evaluation of several categories of semantic similarity approaches based on two standard benchmarks. The aim of this paper is to give an efficient evaluation of all these measures which help researcher and practitioners to select the measure that best fit for their requirements.
Article
Full-text available
Advances in telecommunications and information technology have allowed the proliferation of mobile and multifunctional devices and their incorporation more and more into physical objects, making new in- formation and services available. A consequent problem of this new sce- nario is information overload, i.e. users face vast and distributed informa- tion sources, and have difficulty in selecting those that satisfy their needs and interests. To help users, recommender systems can be applied. More- over, context-aware systems may collaborate with recommender system, improving recommendations, thus benefiting users with more personali- zed and contextual results. This work explores the synergy between re- commender systems and context-aware computing, describing the devel- opment of COReS (Context-aware, Ontology-based Recommender sys- tem for Service recommendation). This recommender system broadens the capabilities of the Infraware context-aware platform, by making ser- vice offer more efficient, personalized and proactive.
Chapter
Full-text available
Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user. In this introductory chapter we briefly discuss basic RS ideas and concepts. Our main goal is to delineate, in a coherent and structured way, the chapters included in this handbook and to help the reader navigate the extremely rich and detailed content that the handbook offers.
Article
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
Book
The explosive growth of e-commerce and online environments has made the issue of information search and selection increasingly serious; users are overloaded by options to consider and they may not have the time or knowledge to personally evaluate these options. Recommender systems have proven to be a valuable way for online users to cope with the information overload and have become one of the most powerful and popular tools in electronic commerce. Correspondingly, various techniques for recommendation generation have been proposed. During the last decade, many of them have also been successfully deployed in commercial environments. Recommender Systems Handbook, an edited volume, is a multi-disciplinary effort that involves world-wide experts from diverse fields, such as artificial intelligence, human computer interaction, information technology, data mining, statistics, adaptive user interfaces, decision support systems, marketing, and consumer behavior. Theoreticians and practitioners from these fields continually seek techniques for more efficient, cost-effective and accurate recommender systems. This handbook aims to impose a degree of order on this diversity, by presenting a coherent and unified repository of recommender systems major concepts, theories, methodologies, trends, challenges and applications. Extensive artificial applications, a variety of real-world applications, and detailed case studies are included. Recommender Systems Handbook illustrates how this technology can support the user in decision-making, planning and purchasing processes. It works for well known corporations such as Amazon, Google, Microsoft and AT&T. This handbook is suitable for researchers and advanced-level students in computer science as a reference.