Content uploaded by Mohammed Korayem
Author content
All content in this area was uploaded by Mohammed Korayem on Oct 31, 2015
Content may be subject to copyright.
Query Sense Disambiguation Leveraging Large
Scale User Behavioral Data
Mohammed Korayem∗, Camilo Ortiz†, Khalifeh AlJadda∗, and Trey Grainger∗
∗CareerBuilder, Norcross, GA, USA
mohammed.korayem, khalifeh.aljadda, trey.grainger@careerbuilder.com
†Bloomberg, New York, NY, USA
camiort@gmail.com
Abstract—Term ambiguity - the challenge of having multiple
potential meanings for a keyword or phrase - can be a major
problem for search engines. Contextual information is essential
for word sense disambiguation, but search queries are often
limited to very few keywords, making the available textual context
needed for disambiguation minimal or non-existent. In this paper
we propose a novel system to identify and resolve term ambiguity
in search queries using large-scale user behavioral data. The
proposed system demonstrates that, despite the lack of context
in most keyword queries, multiple potential senses of a keyword
or phrase within a search query can be accurately identified,
disambiguated, and expressed in order to maximize the likelihood
of fulfilling a user’s information need. The proposed system
overcomes the immediate lack of context by leveraging large-
scale user behavioral data from historical query logs. Unlike
traditional word sense disambiguation methods that rely on
knowledge sources or available textual corpora, our system is
language-agnostic, is able to easily handle domain-specific terms
and meanings, and is automatically generated so that it does
not grow out of date or require manual updating as ambiguous
terms emerge or undergo a shift in meaning. The system has
been implemented using the Hadoop eco-system and integrated
within CareerBuilder’s semantic search engine.
I. INTRODUCTION
Terms can be defined as ambiguous when multiple mean-
ings could exist for the same word or phrase depending
upon the context [1], [2]. Techniques to resolve such term
ambiguities mentioned in the literature often focus on uti-
lization of ontologies and dictionaries like Wordnet [3]–[5].
Those solutions are generally inadequate when applied to
ambiguous terms found within search engine queries, how-
ever, as sufficient textual context surrounding an ambiguous
term is generally not present in short-text queries to provide
context for disambiguation. Furthermore, in domain-specific
search applications, queries will often contain words or phrases
not commonly found in available knowledge bases, as well
as jargon that carries different meanings within the specific
domain. For example, with a job search engine, keywords
entered are often job titles, skills, or company names, many of
which are not typical English keywords that would be found
in a dictionary. Consider that the word Java refers to an island
in Indonesia, as well as a kind of coffee, but that within
the job search domain it almost always refers to a computer
programming language.
The difficulty in this problem thus arises from two phe-
nomena. First, dictionary-based word sense definitions are still
ambiguous without additional context being provided (which
is not present in most search queries). Second, much of the
real-world knowledge or common sense needed for word sense
disambiguation is difficult to verbalize in dictionaries [3]–[5].
Our research shows that large scale search logs analysis
can instead be used to derive a context for keywords used
in search queries. Given that users with same interest usually
use same terms in and across their searches, we can collect
co-occurring search terms across queries and use them as a
context for a given term. This can be viewed as a form of
collaborative filtering that uses the wisdom of the crowd to
derive the possible meaning(s) of a given term instead of
analyzing the text content in which that term is presented.
The proposed system analyzes massive volumes of search logs
to determine overlapping and non-overlapping clusters of co-
occurring terms in order to identify word sense ambiguity,
expressively represent the possible senses of each term, and
disambiguate terms within future user queries into their ap-
propriate sense.
The main contribution in this paper is presenting a system
that incorporates user behavioral data from search logs to iden-
tify, define, and disambiguate different senses of ambiguous
terms with limited textual context, using a methodology that
is language-independent and domain-agnostic.
II. RE LATE D WOR K
In this section we will cover the two main focuses of related
work: the use of knowledge sources and the kinds of algo-
rithms traditionally leveraged for word sense disambiguation.
A. Knowledge Sources
The fundamental component of word sense disambiguation
(WSD) is knowledge about the various meanings of words [6].
In previous literature, knowledge sources used in WSD can be
divided into two major categories:
1) Structured knowledge sources.
2) Unstructured knowledge sources.
The most common structured knowledge sources include the-
sauri, dictionaries, and ontologies. Thesauri provide informa-
tion about relationships between different words that share
common meanings. Machine-readable dictionaries, such as
Collins English Dictionary [7], provide definitions for common
terms, including different senses of words with multiple mean-
ings. Ontologies represents known concepts and relationships
within specific domains of interest [4], [8]. Some commonly-
referenced ontologies in the literature include WordNet and
its extensions [4], such as the Omega Ontology [9] and
SUMO ontology [10]. Unstructured knowledge sources, most
commonly represented as corpora of textual information, are
commonly used to build language models and to serve as a
source of information about real world textual and semantic
relationships between terms. By containing the co-occurences
of terms with other terms and their usages within particular
contexts and linguistic constructs, such corpora serve as a
valuable source of minable information for clustering related
concepts, extracting features for classification models, and
for learning surrounding context to better model nuances of
language usage. One class of corpora particularly well suited
for word sense disambiguation is that of sense annotated cor-
pora, which tag textual content with metadata surrounding the
particular usage and meaning of terms [11]. One commonly-
referenced example of a sense annotated corpus is SemCor,
which is the largest and most used sense-tagged corpus [12].
B. Algorithms
WSD algorithms can be broadly divided into two classes:
supervised algorithms and unsupervised algorithms. Both kinds
of algorithms frequently invoke common natural language
processing techniques such as tokenization of words, part-
of-speech detection, stemming/lemmatization, chunking, and
parsing. WSD algorithms typically apply these techniques in a
pre-processing phase to convert the unstructured text from the
corpus into a more structured and normalized representation.
Supervised WSD algorithms require a manually annotated
corpus. Standard machine learning algorithms used in WSD for
supervised learning include naive Bayes, maximum entropy,
and support vector machines (SVMs) [13]–[18]. Lee et al.
[13] trained SVMs using different sets of features including
part-of-speech (POS) of neighboring words, single words in
the surrounding context, local collocations, and syntactic re-
lations. Escudero et al. [14] tested naive Bayes classifiers
and exemplar-based classifiers (i.e KNN) on the Defence
Science Organisation of Singapore (DSO) corpus, which is a
semantically-annotated corpus including 192,800 occurrences
of 121 nouns and 70 verbs [19]. Instead of using only a
single classifier, Florian et al. [20] showed different methods
to combine multiple classifiers and how the combination of
classifiers outperforms any single classifier.
Unsupervised methods are usually used when there is
no appropriate annotated corpus available [15], [21]–[23].
Unsupervised approaches mainly try to infer the sense of the
word using the context of neighboring words through clus-
tering or graph methods [24]–[27]. Navigli and Lapata [28]–
[30] presented unsupervised graph-based algorithms for WSD.
They evaluated their algorithm on the SemCor corpus [31],
the Senseval-3 [32], and the Semeval-2007 [33] data sets.
Mann and Yarowsky presented an unsupervised clustering
technique over a rich feature space of biographical facts to ac-
complish personal name disambiguation [34]. Semi-supervised
approaches can also be used when a small corpus is available
that can be used to automatically bootstrap tagging of a larger
untagged corpus [15], [35], [36].
Recently, Deep Learning approaches have gained a lot
of attention across various domains including NLP and text
mining [37], [38]. In [39], a deep neural network is applied
to learn entity representations, leveraging a combination of
supervised and unsupervised approaches. They learn entity
representation through an unsupervised pre-training stage and
then use a supervised approach to optimize the rank function
based on a similarity measure between documents and those
entity representations.
One of the more recent efforts in this field is the system
introduced by IBM research in [40]. In this paper, the authors
introduce a novel term ambiguity detection technique for
large scale data sets. They focused on detecting the general
ambiguity of each term, rather than the specific ambiguity
at the instance level. The idea here is that assessing general
ambiguity can lead to a more robust model across an entire
dataset that can provide later context to improve instance-level
disambiguation. They utilized data from language models,
ontologies, and topic modeling for this detection. Our proposed
system similarly works at the term level instead of the instance
level, but we leverage behavioral data from user search logs
instead of requiring language and domain-specific language
models, dictionaries, and ontologies to be built, fine-tuned, and
maintained.
III. SYS TE M DESIGN
To detect and resolve word sense ambiguity, the proposed
system uses search logs as input instead of a textual corpus or
other knowledge source. We believe that search logs provide
a valuable source of semantic relationships that have not yet
been well utilized in the field of word sense disambiguation.
Our system extracts semantic relationships from search logs by
1) mining all the keyword phrases (terms) used within search
queries, 2) discovering related terms that commonly co-occur
in subsequent searches across multiple users, and finally 3)
deriving the ambiguous terms along with a list of possible
senses based upon how those terms and co-terms cluster
together. In this section we will describe the implementation
of each of these phases.
A. Search Log Analyzer
The search log analyzer starts by collecting all the terms
searched by each user when conducting queries against a
search engine. The semantic relationship between those terms
is then extracted using the model described in [41]. Building
this model involves mining user search logs for a list of com-
mon phrases, performing collaborative filtering on each phrase
to find semantically-related terms ("users who searched for this
term also searched for these other terms"), and then taking
steps to aggressively remove noise from the query logs, such
as segmenting search phrases based upon the classification of
users, requiring the search phrases to co-occur often enough
and in enough contexts that we know they are meaningful,
and verifying that each term co-occurs with its semantically-
related terms in real content present in the search engine. The
end result of this process is a list of domain-specific terms
along with a high-precision list of semantically-related terms
carrying a similar meaning. Once the semantic relationship is
extracted, each term is returned with a vector of semantically-
related terms that are ordered by their relatedness score.
B. Word Sense Ambiguity Detection
The vectors of semantically-related terms discovered by
the Search Log Analyzer can be used to enhance future search
queries (for example, a search for "Hadoop" could be expanded
to include semantically-related terms found such as "Hive",
"Pig", and "Map/Reduce"). While this technique works well
for most terms, the Search Log Analyzer unfortunately fails
to differentiate between ambiguous senses of terms. As an
example, the term "driver" will be associated with "truck
driver", "linux", "windows", "embedded", "courier", "cdl", and
"delivery". There are clearly two different word senses here:
a person who operates a vehicle and delivers goods ("truck
driver", "cdl") vs. a computer/software component ("linux",
"windows", "embedded"). Since a user executing a search
query is most likely to be searching only for a specific sense
of a term, it is important that we can identify and disambiguate
between the possible senses.
1) Detecting ambiguity: Detecting word sense ambiguity
for each term requires context that we do not have within the
text of a search query, since most queries are for a single
keyword phrase or at best a few phrases. As such, our system
utilizes the user who searched for each term as context for that
term. The intuition here is sound - because a user is likely to be
searching for a specific sense of a term, if we can isolate that
user and similar users into a specific context then we can use
the related terms within that context to identify the sense of
the term being used (versus other potential senses intended by
users classified into a different context). Table I demonstrates
how a single user is classified and represented as a list of
co-occurring terms for input into our disambiguation model.
To do this, we represent the relationship between the
terms and the users who searched for them using a Prob-
abilistic Graphical Model for Massive Hierarchical Data
(PGMHD) [42]. In the PGMHD we leverage the users’ classi-
fications as nodes in the top layer while representing the search
terms in the second layer. With our domain being a job search
engine, all users are divided into classes representing their job
titles (software engineer, registered nurse, etc.).
The edges connecting the nodes in the first and second layer
store the number of searches for the term (represented by the
second layer node) conducted by users in the class (represented
by the first layer node). Figure 1 shows this representation of
the search logs using PGMHD.
The PGMHD then is used to calculate the Normalized
Pointwise Mutual Information (PMI) [43] score for each term
with all of its parents. We determine the ambiguous keywords
by applying the following technique:
TABLE I. SA MPL E OF O UR DATASE T RE PRE SEN TE D WIT HI N THE
PGMHD
UserID Classification Search Terms
U1 Java Developer Java, Java Developer, C, Software Engineer
U2 Nurse RN, Registered Nurse, Health Care
U3 .NET Developer C#, ASP, VB, Software Engineer, SE
U4 Java Developer Java, JEE, Struts, Software Engineer, SE
U5 Health Care Health Care Rep, HealthCare
Fig. 1. Representing search log data using PGMHD. The first layer represents
the classes of users who conducted searches, while the second layer represents
the search terms entered by those users. An edge represents the usage of
a term in the second layer by users belonging to a class in the first layer.
The numerical value on each edge represents how many users from the class
searched for the given term [42] .
Let:
• C := {C1, . . . , Cn}be the set of different classes of
jobs (Java Developer, Nurse, Accountant, ... etc);
• S ={t1, . . . , tN}be the set of different search terms
entered by the users when they conducted searches (N
is the number of different terms); and
•f(Cj, s)be the number times (frequency) a user from
class Cj∈ C searched for the keyword s∈ S.
◦To reduce noise, we will only consider the
frequencies with at least 100 distinct searches,
i.e.,
f(c, s)≥100.
Then, define
•O(c): the number of times a user from class csearched
for a keyword i.e.:
O(c) := X
s∈S
f(c, s)c∈ C;
•T(s): the number of times the keyword tjis searched,
i.e.:
T(s) := X
c∈C
f(c, s)s∈ S;
•T:the total number of keyword searches, i.e.:
T:= X
c,s
f(c, s) = X
c∈C
O(c) = X
s∈S
T(s).
For every c∈ C and s∈ S, and letting Cand Sbe the
random variables representing the class of job and the search
term of a single user query, respectively, we can estimate their
PMI given by
P(C=c, S =s)
P(C=c)P(S=s)=P(C=c|S=s)
P(C=c)
as follows
pmi(c, s) := log f(c, s)
T(s)
T
O(c)c∈ C, s ∈ S.
The normalized version [43] of the original PMI estimate
is given by
Discover)Related)Terms)for)
term)X)
Using)PGMHD)
Get)the)classes)to)which)term)X)
belongs){C1,C2,..})
Classify)the)related)terms)to)the)
given)classes){C1,C2,..})
Each)set)of)related)terms)
classified)under)the)same)class)
provides)a)possible)sense)of)
term)X)
Fig. 2. Overview of the proposed approach to resolve term sense ambiguity.
˜p(c, s) := pmi(c, s)
−log f(c,s)
T
=log T+ log f(c, s)−log [O(c)T(s)]
log T−log f(c, s)
=−1 + 2 log T−log O(c)−log T(s)
log T−log f(c, s)∈[−1,1]
c∈ C, s ∈ S.
This normalized version of the original PMI can then be
leveraged to generate an ambiguity score to determine whether
or not a term should be considered ambiguous.
2) Ambiguity score: For every search keyword s∈ S, we
define the following ambiguity score Aα(s)as
Aα(s) := |{i: ˜p(c, s)>0}| ,
we say that a search keyword tjis a candidate to be
ambiguous if Aj(α)>1. Then, we can define a set of
candidate ambiguous terms CA as
CA ={tj:Aj(α)>1, j = 1, . . . , N }.
C. Resolving Word Sense Ambiguity
Once a term is recognized as ambiguous, the challenge is
to resolve this ambiguity by defining the possible meanings
of that term. Our system tackles this challenge by leveraging
the related term vectors of the ambiguous term within each
independent context to represent the different meanings of
the term. Our approach (see figure 2) starts by discovering
the semantically-related search terms of the ambiguous
term. This is accomplished using a PGMHD as described in
[42]. Since we already created a PGMHD for detecting the
ambiguous terms, we can utilize the same model to find the
semantically-related terms for any given term that falls within
the same class. To do so, we calculate the probabilistic-based
similarity score between the given term Xand a term Y
given they both share the same parent class(es) as follows:
Fix a level i∈ {2, . . . , m}, and let X, Y ∈L2× · · · × Lm
be identically distributed random variables. We define
the probabilistic-based similarity score CO (Co-Occurrence)
between two independent siblings Xij , Yig ∈Liby computing
the conditional joint probability
CO(Xij , Yig) := P(Xij , Yig |pa(Xij )∩pa(Yig ))
as
CO(Xij , Yig) = Y
C0
k∈pa(Xij )∩pa(Yig)
P(Xij |C0
k)P(Yig|C0
k),
where P(Xij |C0
k) = P(C0
k,Xij )
P(C0
k)for every (Xij, C 0
k)∈Li−1×
Li.
Given out(C0
k)as the total number of occurrences of C0
k
and f(C0
k, Xij )as the frequency of co-occurrence of C0
k
with Xij , we can naturally estimate the joint probabilities
P(Xij , C0
k)with ˆp(Xij , C0
k)defined as
ˆp(Xij , C0
k) := f(C0
k, Xij )
out(C0
k).
Hence, we can estimate the correlation between Xij and Yig
by estimating the probabilistic similarity score CO(Xij , Yig).
Once the list of related terms is generated using PGMHD,
we classify them into the classes (since the term is ambiguous,
they must belong to more than one class) to which the am-
biguous term belongs. This classification phase of the related
terms is also done using PGMHD as follows:
For a random variable at level i∈ {2,...m}, namely
Xij ∈Li, where Xij is the jth random variable at level i,
we calculate a classification score Cl(C0
k|Xij )for Xij given
its primary parent C0
k∈Li−1. It is used to estimate the
conditional probability P(C0
k|Xij ). The notation C0
kis used
to denote a parent, and when it is at level 1, it will represent
class Cjas denoted previously. Let
f(C0
k, Xij ) = F requency of co −occurrence of C 0
kand Xij
Cl(C0
k|Xij ) := f(C0
k, Xij )
in(Xij )
The classification score is the ratio of the co-occurrence
frequency of C0
kand Xij divided by the total occurrence of
Xij . The total occurrence of Xij is calculated by summing up
the frequencies of the co-occurrence of Xij and all its parents.
in(Xij ) := X
C∈pa(Xij )
f(C, Xij ),∀Xij ∈V,
The group of semantically-related terms that get classified
under the same parent class will form a possible meaning of the
ambiguous term. Using this technique we are not restricted to a
limited number of possible meanings, some terms are assigned
two possible meanings, some receive three possible meanings,
and so on.
IV. EXP ER IM EN TAL RESULTS
CareerBuilder1operates job boards in many countries and
receives tens of millions of search queries every day. Given
the tremendous volume of search data in our logs, our goal
is to detect and resolve the ambiguous senses of the search
terms and phrases for our many region-specific websites using
a novel technique that avoids the need to use traditional natural
language processing (NLP) techniques. Avoiding traditional
NLP makes it possible to apply the same technique to many
different domain-specific websites supporting many languages
without having to change the algorithms, language models,
corpora, or other libraries per-language.
We ran our experiment using 1.6 billion search logs, with
each search log containing one or more keywords supplied by
a user to search for jobs on CareerBuilder.com. We designed
a distributed probabilistic graphical model (PGMHD) using
Hadoop HDFS [44], Hadoop Map/Reduce [45] and Hive [46].
The experiment was run on a Hadoop cluster with 69 data
nodes, each having a 2.6 GHz AMD Opteron Processor with
12 to 32 cores and 32 to 128 GB RAM.
Our experiment shows that the system was initially so
sensitive at detecting nuances in meaning that some search
terms were detected as ambiguous due to being commonly
used across multiple similar classes with only slight nuances
in usage. For example, the classes Marketing M anager and
Account Executive are semantically very similar to each
other, causing the term "marketing" to be detected as ambigu-
ous because it was used extensively by users classified under
both classes, with only slight nuances in usage across classes.
Table II highlights a few examples where these nuances in
meaning led to terms being identified has having ambiguous
meanings across similar classes.
To overcome this problem, we applied hierarchical cluster-
ing to measure the similarity between classes. We represent
each class as a vector containing the search terms used by the
users within that class. Then we applied the Euclidean distance
to measure the distance between the classes. The result of
this hierarchical clustering was the merging of classes with
distances less than a specific threshold. Before we applied the
hierarchical clustering we had 2200 highly-nuanced ambiguous
terms, whereas after merging the semantically-similar classes
we ended up with 129 ambiguous terms with 85% precision.
Figure 3 shows the final system design for detecting the
ambiguous terms and term senses.
After we applied this technique for merging related senses,
the system obtained an impressive ability to identify and
separate meaningfully divergent senses, as demonstrated in
Table III. The word "architect", for example, was detected
with two different senses. The first sense is represented by the
terms "enterprise architect", "java architect", "data architect",
"oracle", "java", ".net", while the second is represented by the
terms "architectural designer", "architectural drafter", "cad",
"autocad drafter", etc. This clearly captures the two fundamen-
tal notions of an architect within the job search domain: that of
a software/systems architect, and that of a building architect.
Going down the list, two senses for the word "account"
also arise: one usage related to finance and accounting, and the
1http://www.careerbuilder.com/
Java$Developer$ .NET$Developer$ Health$Care$and$Nurse$
Java$ J2EE$ C#$ Care$
giver$ RN$ Senior$
Home$
5
10
3
50 50
110 15
Java$Developer$ .NET$Developer$ Health$Care$
Java$ J2EE$ C#$ Care$
giver$ RN$ Senior$
Home$
5
10
3
50 50 100
15
Nurse$
10 15
Ambiguity)
Score))
>)1)
Ambiguous)
Not)Ambiguous)
Yes
No
Fig. 3. An abstract view of the proposed system to detect the ambiguous
terms. In phase number 1 the search logs of millions of users are collected
and analyzed to extract the search terms. In phase 2 the extracted data from
search logs is modeled using PGMHD. In phase 3 hierarchical clustering is
performed to discover the semantically similar classes in the top layer of
PGMHD. In phase 4 the similar classes get merged in PGMHD. In phase 5
the ambiguity score is calculated for each search term, so that the term is
identified as ambiguous if the score is greater than 1.
other representing B2B sales ("sales executive", "account man-
ager", etc.). For the term "designer", we can see term vectors
arise representing the three senses of 1) a graphic artist ("illus-
trator", "animation", "graphic artist", "photoshop"), 2) a web
designer ("web design", "web design", "graphic designer"), and
3) an industrial engineer ("drafter", "cad engineer", "auto cad",
"mechanical designer", "structural designer").
The term "warehouse" represents a particularly interesting
example. While the word itself would not necessarily be
considered ambiguous by most people, we find two somewhat
different senses of the information need represented when
different users search for it. The first sense contains re-
lated terms like "warehouse manager", "logistics", "warehouse
supervisor", "distribution" and "inventory" (the sense being
someone who manages warehouse and logistical operations),
whereas the second sense is represented by related terms like
"forklift", "warehouse worker", "order puller", and "general
labor" (indicating a front-line worker doing manual labor
in a warehouse). While no dictionary or ontology is going
to have this nuance modeled, it is nonetheless quite useful
to differentiate the ambiguous intentions of users running
a job search with the single keyword of "warehouse" by
understanding those users’ classifications and disambiguating
the meaning accordingly based upon their information need.
V. CONCLUSION
In this paper we present a novel method for detecting
and resolving query sense ambiguity by leveraging user be-
havioral data from search logs. Search queries represent one
use case where textual context is very limited, rendering most
traditional solutions for word sense disambiguation ineffective.
Our proposed technique utilizes search logs to detect and
resolve word sense ambiguity with high accuracy and without
a dependence upon textual corpora or other content-based
knowledge sources.
Our system holds several benefits over traditional
knowledge-source-based systems that leverage dictionaries,
ontologies, and textual corpora. First, our methodology is
TABLE II. NO N-AM BI GUO US TE RM S DET EC TED A S AM BIG UO US DU E TO BE IN G CLA SS IFIE D UN DER T WO D IFFE REN T PARE NT N ODE S IN PGMHD,
WHERE THE CLASSES ARE SEMANTICALLY SIMILAR
Term Class1 Class2
Marketing Marketing manager Account executive
Help Desk Manager Computer and Information Systems Manager Network and Computer Systems Administrators
Truck Drivers Couriers and Messengers Truck Drivers, Light or Delivery Services
Director Finance Financial Managers Accountants and Auditors
Director of Facilities First-Line Supervisors Administrative Services Managers
TABLE III. RE SULT S OF SE MA NTI C AMB IG UIT Y DI SCO VERY P ROC ESS . TH E FIRS T CO LUM N SH OWS TH E KE YWO RD ,WHILE THE SECOND COLUMN
SH OWS T HE RE LATE D KEY WOR DS O F EAC H POS SI BLE M EA NIN G SE PARATE D BY A DA SHE D HOR IZ ONTA L LIN E
term semantically related terms representing a disambiguated sense
architect enterprise architect, java architect, data architect, oracle, java, .net
architectural designer, architectural drafter, autocad, autocad drafter, designer, drafter, cad, engineer
account bookkeeper, accountant, analyst, finance
sales executive, account executive, insurance, account manager, outside sales, medical sales, manager, sales
designer
design, print, animation, artist, illustrator, creative, graphic artist, graphic, photoshop, video
graphic, web designer, design, web design, graphic design, graphic designer
design, drafter, cad designer, draftsman, autocad, mechanical designer, proe, drafter drafting designer autocad, structural designer, revit
driver
linux, windows, embedded
truck driver, cdl driver, delivery driver, class b driver, cdl, courier
writer copywriter, communications manager, communications, public relations, marketing communications, social media, consultant
editor, writer editor, writing, copywriter, technical writer, editorial, reporter, communications, proposal, proofreader
warehouse warehouse manager, inventory, warehouse supervisor, distribution, inventory manager, shipping, warehouse worker, warehouse management, logistics
warehouse worker, forklift, warehouse associate, warehouse clerk, shipping, order picker, forklift operator, general labor, order puller
suitable when there is limited textual context provided for a
term under consideration (such as in a short search engine
query), as the most likely meaning given a user’s context can
still be identified. Second, our system is optimal for domain-
specific use cases where existing knowledge sources (dictio-
naries, ontologies, etc.) are unavailable or contain inadequate
coverage of the domain. Third, our methodology is language-
agnostic, as it crowd sources the meanings of each term
from native-language search queries, thus avoiding the need
for a new textual model to be procured for each language.
Fourth, the system is able to dynamically learn the various
meanings of ambiguous terms and to represent them as human-
readable term vectors. These vectors can subsequently be used
to enhance search queries by searching on the entire vector
as opposed to only the single keyword entry that the user
specified (as most search engines currently work). Finally,
because the term senses are dynamically generated, they will
be automatically updated over time as new terms enter the
vocabulary of users or change meaning, removing the need for
the system to be manually kept up to date like many traditional
knowledge sources.
Our proposed system utilizes a Probabilistic Graphical
Model for Massive Hierarchical Data (PGMHD) to represent
the data in search logs which enables us to detect ambiguous
terms via the classifications of users who conducted those
searches. The proposed technique is able to resolve term
ambiguity by identifying the synonyms that represent each
possible meaning of the ambiguous term. To test the proposed
model, we used a data set provided by CareerBuilder, the
largest job board in the US. From this data set we analyzed
more than 1.6 billion search log entries, enabling us to detect
ambiguous terms with their possible meanings. To the best of
our knowledge we are the first group to conduct such experi-
ments aimed at detecting and resolving word sense ambiguity
for search queries in the recruitment domain. The presented
system has been integrated within CareerBuilder’s semantic
search engine, where it will help improve the accuracy of
millions of job searches every hour.
ACKNOWLEDGMENT
The authors would like to thank the Big Data team at
CareerBuilder for their support with implementing the pro-
posed system within CareerBuilder’s Hadoop ecosystem. The
authors would also like to show deep gratitude to the Search
Development group at CareerBuilder for their help integrating
this system within CareerBuilder’s search engine to enable an
improved semantic search experience.
REFERENCES
[1] H. Jayadianti, L. E. Nugroho, C. S. Pinto, P. I. Santosa, and W. Widayat,
“Solving problem of ambiguity terms using ontology,” 2013.
[2] A. A. Ferreira, M. A. Gonçalves, and A. H. Laender, “A brief survey
of automatic methods for author name disambiguation,” Acm Sigmod
Record, vol. 41, no. 2, pp. 15–26, 2012.
[3] X. Zhou and H. Han, “Survey of word sense disambiguation ap-
proaches.,” in FLAIRS Conference, pp. 307–313, 2005.
[4] R. Navigli, “Word sense disambiguation: A survey,” ACM Computing
Surveys (CSUR), vol. 41, no. 2, p. 10, 2009.
[5] J. Sreedhar, S. V. Raju, A. V. Babu, A. Shaik, and P. P. Kumar, “Word
sense disambiguation: An empirical survey,” International Journal of
Soft Computing and Engineering (IJSCE) ISSN, pp. 2231–2307, 2012.
[6] E. Agirre and D. Martinez, “Knowledge sources for word sense disam-
biguation,” in Text, Speech and Dialogue, pp. 1–10, Springer, 2001.
[7] R. Turner, “Collins english dictionary,” New Library World, vol. 107,
no. 1/2, pp. 81–83, 2006.
[8] R. Navigli and P. Velardi, “Structural semantic interconnections: a
knowledge-based approach to word sense disambiguation,” Pattern
Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 7,
pp. 1075–1086, 2005.
[9] A. Philpot, E. Hovy, and P. Pantel, “The omega ontology,” in Proceed-
ings, IJCNLP workshop on Ontologies and Lexical Resources (OntoLex-
05), 2005.
[10] I. Niles and A. Pease, “Mapping wordnet to the sumo ontology,” in
Proceedings of the ieee international knowledge engineering confer-
ence, pp. 23–26, 2003.
[11] T. . F. B. Petrolito, “A survey of wordnet annotated corpora,” in
Proceedings of The Seventh Global WordNet Conference (GWC-7),
pp. 236–245, 2014.
[12] J. J. Jiang and D. W. Conrath, “Semantic similarity based on corpus
statistics and lexical taxonomy,” arXiv preprint cmp-lg/9709008, 1997.
[13] Y. K. Lee, H. T. Ng, and T. K. Chia, “Supervised word sense
disambiguation with support vector machines and multiple knowledge
sources,” in Senseval-3: third international workshop on the evaluation
of systems for the semantic analysis of text, pp. 137–140, 2004.
[14] G. Escudero, L. Màrquez, and G. Rigau, “Naive bayes and exemplar-
based approaches to word sense disambiguation revisited,” arXiv
preprint cs/0007011, 2000.
[15] E. Agirre and P. G. Edmonds, Word sense disambiguation: Algorithms
and applications, vol. 33. Springer Science & Business Media, 2007.
[16] S. Tratz, A. Sanfilippo, M. Gregory, A. Chappell, C. Posse, and P. Whit-
ney, “Pnnl: a supervised maximum entropy approach to word sense
disambiguation,” in Proceedings of the 4th International Workshop
on Semantic Evaluations, pp. 264–267, Association for Computational
Linguistics, 2007.
[17] T. Wang, J. Rao, and Q. Hu, “Supervised word sense disambiguation
using semantic diffusion kernel,” Engineering Applications of Artificial
Intelligence, vol. 27, pp. 167–174, 2014.
[18] G. Escudero, L. Marquez, and G. Rigau, “A comparison between
supervised learning algorithms for word sense disambiguation,” in
Proceedings of the 2nd workshop on Learning language in logic and
the 4th conference on Computational natural language learning-Volume
7, pp. 31–36, Association for Computational Linguistics, 2000.
[19] H. T. Ng and H. B. Lee, “Integrating multiple knowledge sources to dis-
ambiguate word sense: An exemplar-based approach,” in Proceedings of
the 34th annual meeting on Association for Computational Linguistics,
pp. 40–47, Association for Computational Linguistics, 1996.
[20] R. Florian, S. Cucerzan, C. Schafer, and D. Yarowsky, “Combining clas-
sifiers for word sense disambiguation,” Natural Language Engineering,
vol. 8, no. 04, pp. 327–341, 2002.
[21] P. Pantel and D. Lin, “An unsupervised approach to prepositional phrase
attachment using contextually similar words,” in Proceedings of the 38th
Annual Meeting on Association for Computational Linguistics, pp. 101–
108, Association for Computational Linguistics, 2000.
[22] M. Diab and P. Resnik, “An unsupervised method for word sense
tagging using parallel corpora,” in Proceedings of the 40th Annual
Meeting on Association for Computational Linguistics, pp. 255–262,
Association for Computational Linguistics, 2002.
[23] R. Mihalcea, “Unsupervised large-vocabulary word sense disambigua-
tion with graph-based algorithms for sequence data labeling,” in
Proceedings of the conference on Human Language Technology and
Empirical Methods in Natural Language Processing, pp. 411–418,
Association for Computational Linguistics, 2005.
[24] D. Yarowsky, “Unsupervised word sense disambiguation rivaling su-
pervised methods,” in Proceedings of the 33rd annual meeting on
Association for Computational Linguistics, pp. 189–196, Association
for Computational Linguistics, 1995.
[25] E. Agirre, O. L. de Lacalle, and A. Soroa, “Random walks for
knowledge-based word sense disambiguation,” Computational Linguis-
tics, vol. 40, no. 1, pp. 57–84, 2014.
[26] M. T. Pilehvar, D. Jurgens, and R. Navigli, “Align, disambiguate and
walk: A unified approach for measuring semantic similarity.,” in ACL
(1), pp. 1341–1351, 2013.
[27] R. Navigli, “A quick tour of word sense disambiguation, induction and
related approaches,” in SOFSEM 2012: Theory and practice of computer
science, pp. 115–129, Springer, 2012.
[28] R. Navigli and M. Lapata, “Graph connectivity measures for unsuper-
vised word sense disambiguation.,” in IJCAI, pp. 1683–1688, 2007.
[29] R. Navigli and M. Lapata, “An experimental study of graph connectivity
for unsupervised word sense disambiguation,” Pattern Analysis and
Machine Intelligence, IEEE Transactions on, vol. 32, no. 4, pp. 678–
692, 2010.
[30] W. G. El-Rab, O. R. Zaiane, and M. El-Hajj, “Unsupervised graph-
based word sense disambiguation of biomedical documents,” in e-
Health Networking, Applications & Services (Healthcom), 2013 IEEE
15th International Conference on, pp. 649–652, IEEE, 2013.
[31] G. A. Miller, C. Leacock, R. Tengi, and R. T. Bunker, “A semantic
concordance,” in Proceedings of the workshop on Human Language
Technology, pp. 303–308, Association for Computational Linguistics,
1993.
[32] B. Snyder and M. Palmer, “The english all-words task,” in Senseval-
3: Third International Workshop on the Evaluation of Systems for the
Semantic Analysis of Text, pp. 41–43, 2004.
[33] S. S. Pradhan, E. Loper, D. Dligach, and M. Palmer, “Semeval-2007
task 17: English lexical sample, srl and all words,” in Proceedings of
the 4th International Workshop on Semantic Evaluations, pp. 87–92,
Association for Computational Linguistics, 2007.
[34] G. S. Mann and D. Yarowsky, “Unsupervised personal name disam-
biguation,” in Proceedings of the seventh conference on Natural lan-
guage learning at HLT-NAACL 2003-Volume 4, pp. 33–40, Association
for Computational Linguistics, 2003.
[35] T. P. Pham, H. T. Ng, and W. S. Lee, “Word sense disambiguation with
semi-supervised learning,” in Proceedings of the National Conference
on Artificial Intelligence, vol. 20, pp. 1093–1098, Menlo Park, CA;
Cambridge, MA; London; AAAI Press; MIT Press; 1999, 2005.
[36] S. Faralli and R. Navigli, “A new minimally-supervised framework for
domain word sense disambiguation,” in Proceedings of the 2012 Joint
Conference on Empirical Methods in Natural Language Processing and
Computational Natural Language Learning, pp. 1411–1422, Associa-
tion for Computational Linguistics, 2012.
[37] R. Mihalcea and J. Wiebe, “Simcompass: Using deep learning word
embeddings to assess cross-level similarity,” SemEval 2014, p. 560,
2014.
[38] R. Collobert and J. Weston, “A unified architecture for natural lan-
guage processing: Deep neural networks with multitask learning,” in
Proceedings of the 25th international conference on Machine learning,
pp. 160–167, ACM, 2008.
[39] Z. He, S. Liu, M. Li, M. Zhou, L. Zhang, and H. Wang, “Learning entity
representation for entity disambiguation.,” in ACL (2), pp. 30–34, 2013.
[40] T. Baldwin, Y. Li, B. Alexe, and I. R. Stanoi, “Automatic term ambiguity
detection.,” in ACL (2), pp. 804–809, Citeseer, 2013.
[41] K. AlJadda, M. Korayem, T. Grainger, and C. Russell, “Crowdsourced
query augmentation through semantic discovery of domain-specific
jargon,” in Big Data (Big Data), 2014 IEEE International Conference
on, pp. 808–815, IEEE, 2014.
[42] K. AlJadda, M. Korayem, C. Ortiz, T. Grainger, J. Miller, W. S.
York, et al., “Pgmhd: A scalable probabilistic graphical model for
massive hierarchical data problems,” in Big Data (Big Data), 2014 IEEE
International Conference on, pp. 55–60, IEEE, 2014.
[43] G. Bouma, “Normalized (pointwise) mutual information in collocation
extraction,” in Proceedings of the Biennial GSCL Conference, pp. 31–
40, 2009.
[44] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop
distributed file system,” in Mass Storage Systems and Technologies
(MSST), 2010 IEEE 26th Symposium on, pp. 1–10, IEEE, 2010.
[45] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on
large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–
113, 2008.
[46] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu,
P. Wyckoff, and R. Murthy, “Hive: a warehousing solution over a map-
reduce framework,” Proceedings of the VLDB Endowment, vol. 2, no. 2,
pp. 1626–1629, 2009.