ArticlePDF Available

Approaches for Word Sense Disambiguation - A Survey

Authors:

Abstract and Figures

Word sense disambiguation is a technique in the field of natural language processing where the main task is to find the correct sense in which a word occurs in a particular context. It is found to be of vital help to applications such as question answering, machine translation, text summarization, text classification, information retrieval etc. This has resulted in excessive interest in approaches based on machine learning which performs classification of word senses automatically. The main motivation behind word sense disambiguation is to allow the users to make ample use of the available technologies because ambiguities present in any language provide great difficulty in the use of information technology as words in human language that occur in a particular context can be interpreted in more than one way depending on the context. In this paper we put forward a survey of supervised, unsupervised and knowledge based approaches and algorithms available in word sense disambiguation (WSD). Index Terms— Machine readable dictionary, Machine translation, Natural language processing, Wordnet, Word sense disambiguation.
Content may be subject to copyright.
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-3, Issue-1, March 2014
1
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.
Approaches for Word Sense Disambiguation
A Survey
Pranjal Protim Borah, Gitimoni Talukdar, Arup Baruah
Abstract Word sense disambiguation is a technique in the
field of natural language processing where the main task is to
find the correct sense in which a word occurs in a particular
context. It is found to be of vital help to applications such as
question answering, machine translation, text summarization,
text classification, information retrieval etc. This has resulted in
excessive interest in approaches based on machine learning
which performs classification of word senses automatically. The
main motivation behind word sense disambiguation is to allow
the users to make ample use of the available technologies
because ambiguities present in any language provide great
difficulty in the use of information technology as words in
human language that occur in a particular context can be
interpreted in more than one way depending on the context. In
this paper we put forward a survey of supervised, unsupervised
and knowledge based approaches and algorithms available in
word sense disambiguation (WSD).
Index Terms Machine readable dictionary, Machine
translation, Natural language processing, Wordnet, Word sense
disambiguation.
I. INTRODUCTION
Word sense disambiguation is the task of detecting the
meaning of words in a context in a computational paradigm
and also differentiating among the senses of words. The
solution of a task in WSD is as hard as most of the difficult
problems in artificial intelligence and so WSD is often
regarded as an AI complete problem [1]. Word sense
disambiguation highly depends on knowledge sources like
corpora of texts which may be unlabeled or annotated with
word senses, machine readable dictionaries, semantic
networks etc because the framework of the procedure is
that whenever a sentence is given, WSD makes use of
more than one knowledge sources to attach the most exact
senses with words in the context. The task description of
WSD can be formulated as a method of assigning the
appropriate sense to all or some words in the text T where
T is a sequence of words (w
o
, w
1
, ...., w
n-1
) to find the
mapping M from words to senses such that M(k)
Senses
J
(w
k
) where M(k) is the subset of senses of w
k
which
are appropriate in the text T and Senses
J
(w
k
) is the set of
senses in dictionary J for word w
k
.
Manuscript received February 01, 2014.
Pranjal Protim Borah, Computer Science and Engineering, Assam
Don Bosco University, Guwahati, India, (e-mail: pranjalborah777
@gamil.com).
Gitimoni Talukdar, Computer Science and Engineering, Assam Don
Bosco University, Guwahati, India, (e-mail: talukdargitimoni
@gamil.com).
Arup Baruah, Assistant Professor in Computer Science and Engineering,
Assam Don Bosco University, Guwahati, India, (e-mail: arup.baruah
@gamil.com).
The mapping M can assign more than one sense to w
k
belonging to T but eventually the most appropriate sense is
selected. Thus WSD is a classification task where word
senses are the classes and the classification method
classifies each occurrence of the word to more than one
class based on external knowledge sources and context.
The conceptual model for the word sense disambiguation
system is given below in Fig. 1.
Fig. 1 Conceptual Model for Word Sense Disambiguation
The paper has been further divided into five sections. In
section II a brief discussion of knowledge based
approaches has been given. In section III supervised
disambiguation approach has been highlighted followed by
unsupervised disambiguation approach in section IV. In
section V elaborations of some evaluation measures for
assessing WSD systems is provided and section VI finally
concludes our paper.
II. KNOWLEDGE BASED APPROACHES
The idea behind the knowledge based approach is to
make extensive use of knowledge sources to decide upon
the senses of words in a particular context. It was found
that although alternate supervised approaches were more
efficient than knowledge based approaches but their
advantages also covered a wide range. Collocations,
thesauri, dictionaries etc are the most commonly used
resources in this approach. Initially knowledge based
approaches started in limited domains in 1979 and 1980
[2]. Some of the knowledge based approaches are
discussed as follows:
A. Overlap Based Approach
Overlap based approach calls for the requirement of
machine readable dictionary (MDR). It includes
determination of the different features of the senses of
Approaches for Word Sense Disambiguation - A Survey
2
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.
words which are ambiguous along with features of the
words in the context. The word sense having the maximum
overlap is selected as the appropriate sense in the context.
The commonly used algorithms used in overlap based
approach are:
1) WSD using conceptual density: The conceptual density
is the measure of how the concept that the word
represents is related to the concept of the words in its
context. Conceptual density is related to conceptual
distance inversely. The conceptual distance is
determined from the wordnet.
2) Lesks algorithm: The Lesk’s algorithm used by
overlap based approach can be stated as if W is a word
creating disambiguation, C be the set of words in the
context collection in the surrounding, S be the senses
for W, B be the bag of words derived from glosses,
synonyms, hyponyms, glosses of hyponyms, example
sentences, hypernyms, glosses of hypernyms,
meronyms, example sentence of meronyms, example
sentence of hypernyms, glosses of meronyms then use
the interaction similarity rule to measure the overlap
and output the sense which is the most probable
having the maximum overlap[3] .
3) Walker’s approach: Walker’s algorithm can be stated
as each word is assigned to one or more categories of
subjects in the theasurs. Different subjects are assigned
to different senses of the word.
B. Selectional Preferences
Selectional Preferences approach imposes restrictions on
the possibility of the occurrence of number of meanings of
the word in the context. The measure of semantic
association is provided by the count of number of instances
(W1, W2, Y) in the corpus given by pair of words W1 and
W2 occurring in the relation Y. The senses that violate the
constraint are omitted. The word sense imposes constraints
on the semantic type of the words with which it usually
combines grammatically. The semantic appropriateness of
word to word can be estimated as the conditional
probability of the word W1 given the word W2 [4] as
follows-
Y: P(W1/W2,Y) = count(W1,W2,Y)/count(W2,Y).
III. SUPERVISED DISAMBIGUATION
A number of supervised approaches applied to the
problem of WSD have reflected the dramatic shift from
manually crafted systems to automated machine learning
approach. A word expert called the classifier is used to
assign appropriate sense to each instance of the concerned
word. The classifier learns from the training set containing
of a set of examples in which the target word is manually
annotated with sense. Some of the common supervised
approaches are:
A. Decision Trees
A decision tree divides the training data in a recursive
manner and represents the rules for classification in a tree
structure. The internal nodes represent test on the features
and each branch shows how the decision is being made and
the leaf node refers to the outcome or prediction. It is often
regarded as a prediction tool. Some popular algorithms for
learning decision trees are ID3 and C4.5. On comparison
with other machine learning algorithms it was found that
several supervised approaches performed better than the
decision tree obtained C4.5 algorithm [5]. If the training
data is small in size decision tree suffers from prediction
unreliability and decision tree also suffers from sparseness
of data when there are features with large number of
values.
B. Neural Networks
Neural networks processes information based on
computational model of connectionist approach. The input
includes the input features and the target output. The
training dataset is divided into sets which are non-
overlapping based on desired responses. When the network
encounters new input pairs the weights are adjusted so that
the output unit giving the target output has the larger
activation. The network can have weights both positive and
negative corresponding to correct or wrong sense choice.
Neural networks are trained unless the error between the
computed and the target output is minimum. Learning in
neural networks is eventually updating of weights.
C. Decision Lists
Decision lists contains ordered set of if-then-else rules
for assigning category to the test data. Features are
obtained from training data which includes rules in the
form of (F, S, Score) where F represents feature value, S
represents word sense. Rules are arranged in the list as per
descending order of the score. For a word w represented in
the form of feature vector, the winning sense for the word
is the one whose feature has the maximum score in the
decision list in matching the input vector.
D. Naive Bayes
Naive Bayes classifier is the classifier based on Bayes
theorem and assumption that every feature is class
conditionally independent of every other feature. The
conditional probability of each sense of the word S
i
given
the features in the context is calculated to make the final
decision.
IV. UNSUPERVISED DISAMBIGUATION
Unsupervised approach unlike supervised approach does
not need the beforehand knowledge of sense information in
large scale resources for the disambiguation. It is based on
the fact that words having similar senses will have similar
surrounding words. Word senses are derived by forming
clusters of occurrences of words and the task is to classify
the new occurrence to the derived clusters. This approach
instead of assigning sense labels detects the clusters.
A. Context Clustering
In this approach a vector called context vector is used
which is maintained for each word occurrence in the
corpora. Clusters are formed for these vectors and each of
such clusters corresponds to sense of word to be tested.
The initial approach was to have a vector space having
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-3, Issue-1, March 2014
3
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.
words in dimensions. A vector consists of every possible
sense of the words. The similarity between two words m
and n is provided by the cosine between the respective
vectors m and n in a geometrical manner. The grouping of
all the vectors gives rise to a co-occurrence matrix. The
matrix may suffer from high dimensionality problem
which can be overcomed by the method of latent semantic
analysis through singular value decomposition. This is
done by merging of dimensions corresponding to words
with same senses. Various clustering algorithms are
available for sense differentiation as one such mentioned in
[6] which was an agglomerative clustering method. In this
method at the starting each cluster was having one member
corresponding to each instance. The algorithm then
continues to group similar pair of clusters as one cluster
until a stopping threshold. One more approach was
mentioned in [7] called context group discrimination.
According to this approach if a word is ambiguous then
each occurrence of it is grouped to some sense cluster on
the basis of similarity of context of these occurrences.
Contextual similarity is measured by the cosine between
two vectors corresponding between two words whose
similarity is to be determined. Expectation maximization
algorithm is used to perform clustering in this approach.
B. Co-occurrence Graphs
In the recent times a certain success is observed in the
graph based approaches of unsupervised disambiguation.
In a co-occurrence graph the set of vertices V consist of
words occurring in text and the set of edges E gives the
connection between words co-occurring in the same
context. An approach was mentioned in [8] called
HyperLex. In this approach nodes are the text words in the
paragraph and the edge of the graph signifies that the two
words occur in the same paragraph and a weight is given to
each edge corresponding to the frequency at which the
words connected by the edge co-occur together. The
weight can be represented as:
W
mn
= 1 - Max{ P( W
m
/W
n
), P( W
n
/W
m
) } where
P(W
m
/W
n
) represents frequencymn/frequencyn and
frequencymn is the frequency at which words W
m
and W
n
co-occur and frequencyn represents frequency at which W
n
occurs within the context. Words having higher co-
ocurrence frequency will have weights near about zero and
words which co-occur in rare form will have weights near
about one.
Another graph based algorithm for deriving word senses
is PageRank algorithm which is extensively used in Google
search engine. PageRank algorithm can be used to estimate
the importance of objects whose relations can be described
by a graph.
C. Word Clustering
In this technique words having similar meanings are
assigned to the same cluster. One of the approach
mentioned in [9] was to find the sequence of words same
as the target word. The similarity between the words is
given by syntactical dependency. If W consist of words
which are similar to w
m
then a tree is formed initially with
only one node w
m
and a node w
i
will have a child node w
m
when w
i
is found to be the word with most similar meaning
to w
m
. Another approach mentioned in [10] called
clustering by committee algorithm represents each word as
a feature vector. When target words are encountered a
matrix called similarity matrix S
mn
is constructed whose
each element is a similarity between two words w
m
and w
n
.
In the subsequent step of this algorithm committees are
formed for a set of words W in recursive manner. The
clustering algorithm then tries to find those words not
similar to the words of any committee. These words which
are not part of any committee are again used to form more
committees. In the final step each target word belonging to
W will be a member of committee depending on its
similarity to the centroid of the committee. The clustering
technique used is average-link clustering.
V. PERFORMANCE METRICS
The evaluation measures for assessing a WSD system
which is responsible for improving the performance of
applications such as machine translation, information
retrieval are mentioned below:
Coverage C- Coverage is the measure of percentage of
words in the test data for which the WSD system has given
sense assignment. It is represented as:
C= Answers provided/Total answers to provide
Precision P- Precision is the measure of ratio of correct
answers provided to the answers provided. It is represented
as:
P= Correct answers provided/Answers provided
Recall R- Recall is the measure of the ratio of correct
answers provided to the total number of answers to
provide. It is represented as:
R= Correct answers provided/ Total answers to provide.
F1 measure- F1 measure is the weighted harmonic mean of
precision and recall. It is represented as:
F1 measure= (2 * P * R)/(P+R)
Precision is found to be equal to Recall when coverage is
100%.
Table I
COMPARISON OF DIFFERENT SUPERVISED APPROACHES
BASED ON ACCURACY
Approach
Average
precision
Average baseline
accuracy
Naïve Bayes
64.13%
60.9%
Exemplar based
68.6%
63.7%
Decision lists
96%
63.9%
SVM
72.4%
55.2%
Perceptron
Trained HMM
67.6%
60.9%
In the above table a comparison of different supervised
approaches has been given based on their Average
precision and Average baseline accuracy [16].
Approaches for Word Sense Disambiguation - A Survey
4
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication Pvt. Ltd.
VI. CONCLUSION
WSD is a very complex task in Natural language
processing as it has to deal with complexities found in a
language. In this paper we have put forwarded a survey of
comparison of different approaches available in word sense
disambiguation with primarily focussing on the knowledge
based, supervised and unsupervised approaches. We
concluded that supervised approach is found to perform
better but one of its disadvantage is the requirement of a
large corpora without which training is impossible which
can be overcame in unsupervised approach as it does not
rely on any such large scale resource for the
disambiguation. Knowledge based approach on the other
hand makes use of knowledge sources to decide upon the
senses of words in a particular context provided machine
readable knowledge base is available to apply.
REFERENCES
[1] Samit Kumar, Neetu Sharma, Dr. S. Niranjan, Word Sense
Disambiguation Using Association Rules: A Survey”, International
Journal of Computer Technology and Electronics Engineering
(IJCTEE) Volume 2, Issue 2, 2012.
[2] J. Sreedhar, S. Viswanadha Raju, A. Vinaya Babu, Amjan Shaik, P.
Pavan Kumar, Word Sense Disambiguation: An Empirical
Survey”, International Journal of Soft Computing and Engineering
(IJSCE), Volume-2, Issue-2, May,2012.
[3] A. Blum and S. Chawla , Learning from labeled and unla-beled
data using graph mincuts”, ICML,2001.
[4] D. HINDLE and M. ROOTH, Structural ambiguity and lexical
relations”, Computat. Ling.19, 1, 103120, 1993.
[5] R. J. MOONEY, Comparative experiments on disambiguating
word senses: An illustration of the role of bias in machine learning”,
In Proceedings of the Conference on Empirical Methods in Natural
Language Processing (EMNLP), 8291, 1996.
[6] T. PEDERSEN and R. BRUCE, Distinguishing word senses in
untagged text”, In Proceedings of the Conference on Empirical
Methods in Natural Language Processing (EMNLP, Providence,
RI), 197207, 1997.
[7] H. SCHUTZE, Automatic word sense discrimination”,
Computat.Ling.24, 1, 97124, 1998.
[8] J. V´ERONIS, Hyperlex: Lexical cartography for information
retrieval”, Comput. Speech Lang. 18, 3, 223252, 2004.
[9] D. LIN., Automatic retrieval and clustering of similar words”, In
Proceedings of the 17th International Conference on Computational
linguistics (COLING, Montreal, P.Q., Canada). 768774,1998.
[10] D. LIN and P.PANTEL, “Discovering word senses from text”, In
Proceedings of the 8th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (Edmonton, Alta., Canada).
613619, 2002.
[11] A. PURANDAR and T. PEDERSEN, Improving word sense
discrimination with gloss augmented feature vectors”, In
Proceedings of the Workshop on Lexical Resources for theWeb
andWord Sense Disambiguation (Puebla, Mexico), 123130, 2004.
[12] D. YAROWSKY, Decision lists for lexical ambiguity resolution:
Application to accent restoration in Spanish and French”, In
Proceedings of the 32nd Annual Meeting of the Association for
Computational Linguistics (Las Cruces, NM). 8895, 1994.
[13] S. ABNEY and M. LIGHT, Hiding a semantic class hierarchy in a
Markov model”, In Proceedings of the ACL Workshop on
Unsupervised Learning in Natural Language Processing (College
Park, MD), 18, 1999.
[14] Arindam Chatterjee, Salil Joshii, Pushpak Bhattacharyya, Diptesh
Kanojia and Akhlesh Meena,, A Study of the Sense Annotation
Process: Man v/s Machine‖”, International Conference on Global
Wordnets, Matsue, Japan, January, 2012.
[15] M. Nameh, S.M. Fakhrahmad, M. Zolghadri Jahromi, A New
Approach to Word Sense Disambiguation Based on Context
Similarity”, Proceedings of the World Congress on Engineering
2011 Vol I, July 6 - 8, 2011.
[16] Pankaj Kumar, Atul Vishwakarma and Ashwani Kr. Verma,
Approaches for Disambiguation in Hindi Language”,
International Journal of Advanced Computer Researc, Volume-3
Number-1 Issue-8 March, 2013.
... A vast number of research approaches, techniques and models have attempted to solve the WSD challenge as a standalone task or as part of a larger NLP application [27,34,10,37,19,45,6]. Either way, these approaches are grouped into four conventional categories. ...
... Early development of supervised WSD approaches include rule-based, probabilistic, or statistical models. Many comprehensive surveys have covered the mathematical details of each model in [34,19,10,37,45,6]. ...
... SCSMM back-tracing steps OMSTI training datasets (see Section 4.1.1). The SemCor annotations are available as part of the SemCor installation package in the 'cntlist' file, and the OMSTI annotations were preloaded to the SQL database from the 'keys' file downloaded from[44] 10 . ...
Preprint
Full-text available
Various applications in computational linguistics and artificial intelligence rely on high-performing word sense disambiguation techniques to solve challenging tasks such as information retrieval, machine translation, question answering, and document clustering. While text comprehension is intuitive for humans, machines face tremendous challenges in processing and interpreting a human's natural language. This paper presents a novel knowledge-based word sense disambiguation algorithm, namely Sequential Contextual Similarity Matrix Multiplication (SCSMM). The SCSMM algorithm combines semantic similarity, heuristic knowledge, and document context to respectively exploit the merits of local context between consecutive terms, human knowledge about terms, and a document's main topic in disambiguating terms. Unlike other algorithms, the SCSMM algorithm guarantees the capture of the maximum sentence context while maintaining the terms' order within the sentence. The proposed algorithm outperformed all other algorithms when disambiguating nouns on the combined gold standard datasets, while demonstrating comparable results to current state-of-the-art word sense disambiguation systems when dealing with each dataset separately. Furthermore, the paper discusses the impact of granularity level, ambiguity rate, sentence size, and part of speech distribution on the performance of the proposed algorithm.
... Since there are already many electronic databases and resources that can efficiently provide the different meanings of words in a given language (i.e., digital treasuries, machine readable dictionaries, ontologies, corpora and others), the majority of the efforts to solve WSD are directed developing methods that allow the automatic selection of the most proper sense of the ambiguous words in a text. These methods can be classified into four groups: supervised, unsupervised, semi-supervised, and knowledge-based approaches (Borah, Talukdar & Baruah, 2014;Nandanwar & Mamulkar, 2015). ...
... Current approaches for automatic WSD can be classified into four groups according to the methodology employed for selecting the correct sense of the word to be disambiguated: supervised, unsupervised, semi-supervised, and knowledge-based approaches (Borah et al., 2014;Nandanwar & Mamulkar, 2015). ...
Article
Full-text available
One of the most important current problems in natural language processing is word sense disambiguation (WSD). WSD consists of identifying the correct sense of the words in a given text. In this work, we present a novel method for automatic WSD based on the simplified-Lesk algorithm. The proposed method employs Alpha-Beta associative memories for the relatedness computation between the senses of the ambiguous words and its context. The performance of this method was evaluated in terms of precision, recall, and F-score, using the semantically annotated corpora Senseval-2, Semcor, and Semeval-2007. The results show the advantages of the proposed method compared with other Lesk-based state-of-the-art methods.
... The supervised method has the advantage of using machine learning techniques to effectively disambiguate and capture complex relationships, but it requires labeled training data, which can be costly and time-consuming to produce [26]. Algorithms like XGBoost, AdaBoost, Neural Networks, Naive Bayes, Support Vector Machines, and Decision Trees are frequently employed in supervised learning [27]. ...
Chapter
Full-text available
Word sense disambiguation (WSD) is determining the correct sense of an ambiguous word from its surrounding context. It’s a fundamental problem in NLP that has wide-ranging effects on critical tasks like information retrieval, machine translation, and question-answering systems. The inherent complexity of language is like, word homonymy, polysemy, and other linguistic features, making it challenging to develop precise word sense disambiguation solutions. Our goal in this study is to find out more about the most recent developments in word sense disambiguation. Notable progress has been recently made in this field. However, dealing with languages with complex morphological structures, numerous dialects, or limited linguistic resources continues to be a challenging task for WSD. To the best of our knowledge, we present a comprehensive understanding of WSD across different languages including English, Chinese, Arabic, selected low-resource Indian languages, and Amharic. We look into various WSD techniques, including Knowledge-based, supervised, unsupervised, Deep Learning, and Hybrid approaches. The finding of this study shows that deep learning approaches and pre-trained models show promise in improving multilingual Word Sense Disambiguation, We also draw attention to the significant effort needed to address cross-lingual challenges in this field.
... WSD is the process of determining the exact sense of a word in a particular context, thereby assessing the different senses the same words might have (Navigli 2009;Borah, Talukdar, and Baruah 2014). The purpose of including this task in the presented framework is twofold; first, to generalize a particular sense of a word to a more general concept (Section 4.2) and second, to exploit this technique in order to transform a text into its disambiguated version. ...
Article
Full-text available
Nowadays, most research conducted in the field of abstractive text summarization focuses on neural-based models alone, without considering their combination with knowledge-based approaches that could further enhance their efficiency. In this direction, this work presents a novel framework that combines sequence-to-sequence neural-based text summarization along with structure and semantic-based methodologies. The proposed framework is capable of dealing with the problem of out-of-vocabulary or rare words, improving the performance of the deep learning models. The overall methodology is based on a well-defined theoretical model of knowledge-based content generalization and deep learning predictions for generating abstractive summaries. The framework is composed of three key elements: (i) a pre-processing task, (ii) a machine learning methodology, and (iii) a post-processing task. The pre-processing task is a knowledge-based approach, based on ontological knowledge resources, word sense disambiguation, and named entity recognition, along with content generalization, that transforms ordinary text into a generalized form. A deep learning model of attentive encoder-decoder architecture, which is expanded to enable a coping and coverage mechanism, as well as reinforcement learning and transformer-based architectures, is trained on a generalized version of text-summary pairs, learning to predict summaries in a generalized form. The post-processing task utilizes knowledge resources, word embeddings, word sense disambiguation, and heuristic algorithms based on text similarity methods in order to transform the generalized version of a predicted summary to a final, human-readable form. An extensive experimental procedure on three popular data sets evaluates key aspects of the proposed framework, while the obtained results exhibit promising performance, validating the robustness of the proposed approach.
Article
Full-text available
Machine Translation (MT) systems often face challenges in choosing appropriate translations for words in the source language. To resolve this problem, wordsense disambiguation systems depend on knowledge gained from different sources. The knowledge gained could be from external resources or corpora. Despite the importance of this knowledge sources, many world languages lack adequate training data required to undertake the task of resolving ambiguity that erupts when translating texts from one language to the other. Yorùbá language falls within this group. The objective of this study is to propose a method of resolving translation ambiguity in a Yorùbá to English MT system using limited resources. This research depends on a rule based method that utilizes the semantic feature of the predicate-object relation in Yorùbá sentences for the disambiguation. To evaluate the proposed method, twenty words with more than one possible English translation were tested on a Yoruba to English machine system. The result of the evaluation shows a precision of 85% and recall of 75%. This result shows that the use of selectional restriction is effective in resolving translation disambiguity in a Yorùbá-English machine translation system.
Article
Full-text available
Sequence labeling models for word sense disambiguation have proven highly effective when the sense vocabulary is compressed based on the thesaurus hierarchy. In this paper, we propose a method for compressing the sense vocabulary without using a thesaurus. For this, sense definitions in a dictionary are converted into sentence vectors and clustered into the compressed senses. First, the very large set of sense vectors is partitioned for less computational complexity, and then it is clustered hierarchically with awareness of homographs. The experiment was done on the English Senseval and Semeval datasets and the Korean Sejong sense annotated corpus. This process demonstrated that the performance greatly increased compared to that of the uncompressed sense model and is comparable to that of the thesaurus-based model.
Article
Full-text available
In communication, textual data are a vital attribute. In all languages, ambiguous or polysemous words' meaning changes depending on the context in which they are used. The ability to determine the ambiguous word's correct meaning is a Know‐distill challenging task in natural language processing (NLP). Word sense disambiguation (WSD) is an NLP process to analyze and determine the correct meaning of polysemous words in a text. WSD is a computational linguistics task that automatically identifies the polysemous word's set of senses. Based on the context some word comes into view, WSD recognizes and tags the word to its correct priori known meaning. Semitic languages like Arabic have even more significant challenges than other languages since Arabic lacks diacritics, standardization, and a massive shortage of available resources. Recently, many approaches and techniques have been suggested to solve word ambiguity dilemmas in many different ways and several languages. In this review paper, an extensive survey of research works is presented, seeking to solve Arabic word sense disambiguation with the existing AWSD datasets. This article is categorized under: Algorithmic Development > Text Mining Technologies > Machine Learning
Article
Various applications in computational linguistics and artificial intelligence rely on high-performing word sense disambiguation techniques to solve challenging tasks such as information retrieval, machine translation, question answering, and document clustering. While text comprehension is intuitive for humans, machines face tremendous challenges in processing and interpreting a human’s natural language. This paper presents a novel knowledge-based word sense disambiguation algorithm, namely Sequential Contextual Similarity Matrix Multiplication (SCSMM). The SCSMM algorithm combines semantic similarity, heuristic knowledge, and document context to respectively exploit the merits of local sense-based context between consecutive terms, human knowledge about terms, and a document’s main topic in disambiguating terms. Unlike other algorithms, the SCSMM algorithm guarantees the capture of the maximum sentence context while maintaining the terms’ order within the sentence. The proposed algorithm outperformed all other algorithms when disambiguating nouns on the combined gold standard datasets, while demonstrating comparable results to current state-of-the-art word sense disambiguation systems when dealing with each dataset separately. Furthermore, the paper discusses the impact of granularity level, ambiguity rate, sentence size, and part of speech distribution on the performance of the proposed algorithm.
Conference Paper
Full-text available
Does context help determine sense? This question might seem frivolous, even preposterous to anybody sensible. However, our long time research on Word Sense Disambiguation (WSD) shows that in almost all disambigua-tion algorithms, the sense distribution parameter P(S/W), where P is the probability of the sense of a word W being S, plays the deciding role. The widely reported accuracy figure of around 60% for all-words-domain-independent WSD is contributed to mainly by P(S/W), as one ablation test after another reveals. The story with human annotation is different though. Our experience of working with human annotators who mark with WordNet sense ids, general and domain specific corpora brings to light the interesting fact that producing sense ids without looking at the context is a heavy cognitive load. Sense annotators do form hypothesis in their minds about the possible sense of a word ('most frequent sense' bias), but then look at the context for clues to accept or reject the hypothesis. Such clues are minimal, just one or two words, but are critical nonetheless. Without these clues the annotator is left in an indecisive state as to whether or not to put down the first sense coming to his mind. The task becomes all the more cogni-tively challenging, if the senses are fine grained and seem equally probable. These facts increase the annotation time by a factor of almost 1.5. In the current paper we explore the di-chotomy that might exist between machines and humans in the way they determine senses. We study the various parameters for WSD and also the sense marking behavior of human sense annotators. The observations, though not completely conclusive, establish the need for context for humans and that for accurate sense distribution parameters for machines.
Article
Full-text available
One of the major issues in the process of machine translation is word sense disambiguation (WSD), which is defined as choosing the correct meaning of a multi-meaning word. Supervised learning methods are usually used to solve this problem. The disambiguation task is carried out using the statistics of the translated documents (as training data) or dual corpora of source and target languages. In this paper we present a supervised learning method for WSD, which is based on Cosine Similarity. As the first step, we extract two sets of features; the set of words that have occurred frequently in the text and the set of words surrounding the ambiguous word. We will present the results of evaluating the proposed schemes and illustrate the effect of weighting strategies proposed. The results are promising compared to the methods existing in the literature.
Article
Full-text available
This paper presents a method of unsupervised word sense discrimination that augments co-occurrence feature vectors derived from raw untagged corpora with information from the glosses found in a ma- chine readable dictionary. Each content word that occurs in the context of a target word to be discriminated is represented by a co-occurrence feature vector. Each of these vectors is augmented with the content words that occur in the glosses of the different possible meanings of the word it represents. Then these vectors are averaged to create a vector that rep- resents that context of the target word. Discrimination is carried out by clustering all of the vectors associated with the contexts in which the tar- get word occurs. We show via an evaluation with the Senseval-2, line, hard and serve corpora that feature vectors augmented with gloss infor- mation from WordNet significantly improve discrimination performance when limited data is available.
Article
Full-text available
This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum{variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense de nition based solely on the values of automatically identi able features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set. 1
Article
This paper presents context-group discrimination, a disambiguation algorithm based on clustering, Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous worn are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from re corpus without labeled training instances or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words.
Article
This article describes an algorithm called HyperLex that is capable of automatically determining word uses in a textbase without recourse to a dictionary. The algorithm makes use of the specific properties of word cooccurrence graphs, which are shown as having “small world” properties. Unlike earlier dictionary-free methods based on word vectors, it can isolate highly infrequent uses (as rare as 1% of all occurrences) by detecting “hubs” and high-density components in the cooccurrence graphs. The algorithm is applied here to information retrieval on the Web, using a set of highly ambiguous test words. An evaluation of the algorithm showed that it only omitted a very small number of relevant uses. In addition, HyperLex offers automatic tagging of word uses in context with excellent precision (97%, compared to 73% for baseline tagging, with an 82% recall rate). Remarkably good precision (96%) was also achieved on a selection of the 25 most relevant pages for each use (including highly infrequent ones). Finally, HyperLex is combined with a graphic display technique that allows the user to navigate visually through the lexicon and explore the various domains detected for each word use.
Conference Paper
Many application domains suffer from not having enough labeled training data for learning. However, large amounts of unlabeled examples can often be gathered cheaply. As a result, there has been a great deal of work in recent years on how unlabeled data can be used to aid classification. We consider an algorithm based on finding minimum cuts in graphs, that uses pairwise relationships among the examples in order to learn from both labeled and unlabeled data. Our algorithm
Conference Paper
This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an efficient, effective, and highly perspicuous recipe for resolving a given ambiguity. By identifying and utilizing only the single best disambiguating evidence in a target context, the algorithm avoids the problematic complex modeling of statistical dependencies. Although directly applicable to a wide class of ambiguities, the algorithm is described and evaluated in a realistic case study, the problem of restoring missing accents in Spanish and French text.
Article
This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training instances or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words.
Article
Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses.