ArticlePDF Available

Abstract and Figures

The Internet facilitates large-scale collaborative projects. The emergence of Web~2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the "wisdom of the crowd" has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often un-monitored environment of such projects may make them susceptible to systematic malfunction and misbehavior. In this work, we focus on Urban Dictionary, a crowd-sourced online dictionary. We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinion-focused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. There is also a high presence of offensive content, but highly offensive content tends to receive lower scores through the voting system. Our study highlights that Urban Dictionary has a higher content heterogeneity than found in traditional dictionaries, which poses challenges in terms in processing but also offers opportunities to analyze and track language innovation.
This content is subject to copyright.
rsos.royalsocietypublishing.org
Research
Cite this article: Nguyen D, McGillivray B,
Yasseri T. 2018 Emo, love and god: making
sense of Urban Dictionary, a crowd-sourced
online dictionary. R. Soc. open sci. 5:172320.
http://dx.doi.org/10.1098/rsos.172320
Received: 21 December 2017
Accepted: 27 March 2018
Subject Category:
Computer science
Subject Areas:
human-computer interaction
Keywords:
natural language processing, linguistic
innovation, computational sociolinguistics,
human–computer interaction
Author for correspondence:
Dong Nguyen
e-mail: dnguyen@turing.ac.uk
Emo, love and god: making
sense of Urban Dictionary,
a crowd-sourced online
dictionary
Dong Nguyen1,2, Barbara McGillivray1,3 and
Taha Yasseri1,4
1The Alan Turing Institute, London, UK
2Institute for Language, Cognition and Computation, School of Informatics,
University of Edinburgh, Edinburgh, UK
3Theoretical and Applied Linguistics, Faculty of Modern and MedievalLanguages,
University of Cambridge, Cambridge, UK
4Oxford Internet Institute, University of Oxford, Oxford, UK
DN, 0000-0002-6062-3117;TY,0000-0002-1800-6094
The Internet facilitates large-scale collaborative projects and
the emergence of Web 2.0 platforms, where producers and
consumers of content unify, has drastically changed the
information market. On the one hand, the promise of the
‘wisdom of the crowd’ has inspired successful projects such
as Wikipedia, which has become the primary source of
crowd-based information in many languages. On the other
hand, the decentralized and often unmonitored environment
of such projects may make them susceptible to low-quality
content. In this work, we focus on Urban Dictionary, a crowd-
sourced online dictionary. We combine computational methods
with qualitative annotation and shed light on the overall
features of Urban Dictionary in terms of growth, coverage
and types of content. We measure a high presence of opinion-
focused entries, as opposed to the meaning-focused entries
that we expect from traditional dictionaries. Furthermore,
Urban Dictionary covers many informal, unfamiliar words
as well as proper nouns. Urban Dictionary also contains
offensive content, but highly offensive content tends to receive
lower scores through the dictionary’s voting system. The low
threshold to include new material in Urban Dictionary enables
quick recording of new words and new meanings, but the
resulting heterogeneous content can pose challenges in using
Urban Dictionary as a source to study language innovation.
2018 The Authors. Published by the Royal Society under the terms of the Creative Commons
Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted
use, provided the original author and source are credited.
2
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
1. Introduction
Contemporary information communication technologies open up new ways of cooperation leading to
the emergence of large-scale crowd-sourced collaborative projects [1]. Examples of such projects are open
software development [2], citizen science campaigns [3] and most notably Wikipedia [4]. All these projects
are based on contributions from volunteers, often anonymous and non-experts. Although the success of
most of these examples is beyond expectation, there are challenges and shortcomings to be considered
as well. In the case of Wikipedia for instance, inaccuracies [5], edit wars and destructive interactions
between contributors [6,7] and biases in coverage and content [8,9] are only a few to name among many
undesirable aspects of the project that have been studied in detail.
The affordances of Internet-mediated crowd-sourced platforms have also led to the emergence of
crowd-sourced online dictionaries. Language is constantly changing. Over time, new words enter the
lexicon, others become obsolete, and existing words acquire new meanings (i.e. senses) [10]. Dictionaries
record new words and new meanings, are regularly updated, and sometimes used as a source to study
language change [11]. However, a new word or a new meaning needs to have enough evidence backing it
up before it can enter a traditional dictionary. For example, selfie was the Oxford Dictionaries word of the
year in 2013 and its frequency in the English language increased by 17 000% in that year. Its first recorded
use dates back to 2002,1but was only added to OxfordDictionaries.com in August 2013. Even though
some of the traditional online dictionaries, such as Oxford Dictionaries2or Macmillan Dictionary,3have
considered implementing crowdsourcing in their workflow [12](see[13, pp. 3–6] for a typology of
crowdsourcing activities in lexicography), for most, they rely on professional lexicographers to select,
design and compile their entries.
Unlike traditional online dictionaries [13, p. 11], the content in crowd-sourced online dictionaries
comes from non-professional contributors and popular examples are Urban Dictionary4and Wiktionary
[14].5Collaborative online dictionaries are constantly updated and have a lower threshold for including
new material compared to traditional dictionaries [13, p. 2]. Moreover, it has also been suggested
that such dictionaries might be driving linguistic change, not only reflecting it [15,16]. Crowd-sourced
dictionaries could potentially complement online sources such as Twitter, blogs and websites (e.g.
[1719]) to study language innovation. However, such dictionaries are subject to spam and vandalism,
as well as ‘unspecific, incorrect, outdated, oversimplified or overcomplicated descriptions’ [12]. Another
concern affecting such collaborative dictionaries is the question of whether their content reflects real
language innovation, as opposed to the concerns of a specific community of users, their opinions, and
generally neologisms and new word meanings that will not last in the language.
This paper presents an explorative study of Urban Dictionary (UD), an online crowd-sourced
dictionary founded in December 1999. Users contribute by submitting an entry describing a word and a
word might, therefore, have multiple entries. According to Aaron Peckham, its founder, ‘People write
really opinionated definitions and incorrect definitions. There are also ones that have poor spelling and poor
grammar [...] I think reading those makes definitions more entertaining and sometimes more accurate and honest
than a heavily researched dictionary definition’[20]. An UD entry for selfie is shown in figure 1,inwhich
selfie is defined as ‘The beginning of the end of intelligent civilization’ and accompanied with an example
usage ‘Future sociologists use the selfie as an artifact for the end of times’. Furthermore, entries can contain
tags (e.g. #picture, #photograph). In total, UD contains 76 entries for selfie (July 2016), the earliest submitted
in 2009, and a range of variations (e.g. selfie-conscious, selfied, selfieing and selfie-esteem). Overall, there are
353 entries that describe a word (or phrase) containing the string selfie (see figure 2 for a plot over time).
Figure 3 shows a similar plot for fleek and on fleek, a phrase that went viral in 2014. UD thus not only
captures new words rapidly, but it also captures the many variations that arise over time. Furthermore,
the personal, informal and often offensive nature of the content in this popular site is different from
the content typically found in both traditional dictionaries (see [13, pp. 3–4] and [13,p.7])andmore
regulated collaborative dictionaries like Wiktionary. The status of UD as source of evidence for popular
and current usage is widely recognized [2123] and it has even been consulted in some legal cases [24].
UD has also been used as a source to cross-check emerging word forms identified through Twitter [18].
1http://blog.oxforddictionaries.com/press-releases/oxforddictionaries-word-of-the-year-2013/
2https://www.oxforddictionaries.com
3https://www.macmillandictionary.com
4https://www.urbandictionary.com/
5https://en.wiktionary.org/
3
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
Figure 1. An Urban Dictionar y entry for sele.
0
25
50
75
100
125
2010 2012 2014 2016
time (year)
no. new definitions per year
variations
selfie
Figure 2. The number of new denitions for sele and its variations per year (December 1999–July 2016).
0
10
20
30
40
2004 2008 2012 2016
time (year)
no. new definitions per year
fleek
on fleek
variations
Figure 3. The number of new denitions for eek and on eek and other variations per year (December 1999–July 2016).
UD has also been used for the development of natural language processing systems that have to
deal with informal language, non-standard language and slang. For example, UD has been consulted
when building a text normalization system for Twitter [25] and it has been used to create more training
data for a Twitter-specific sentiment lexicon [26]. In a recent study, UD is used to automatically generate
explanations of non-standard words and phrases [24].
While UD seems a promising resource to record and analyse language innovation, so far little is
known about the characteristics of its content. In this study, we take the first step towards characterizing
4
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
UD. So far, UD has been featured in a few studies, but these qualitative analyses were based on a small
number of entries [23,27]. We study a complete snapshot (December 1999–July 2016) of all the entries in
the dictionary as well as selected samples using content analysis methods. To the best of our knowledge,
this is the first systematic study of UD at this scale.
2. Results
We start with presenting an overall picture of UD (§2.1), such as its growth and how content is
distributed. Next, we compare its size to Wiktionary based on the number of headwords (§2.2). We then
present results based on two crowd-sourcing experiments in which we analyse the types of content and
the offensiveness in the entries (§2.3). Finally, we discuss how characteristics of the entries relate to their
popularity on UD (§2.4).
2.1. Overall picture
Since its inception in 1999, UD has had a rather steady growth. Figure 4 shows the number of new entries
added each week. So far, UD has collected 1620438 headwords (after lower casing)6and 2 661 625 entries
with an average of 1.643 entries per headword. However, as depicted in figure 5a, the distribution of the
number of entries for each headword varies tremendously from one headword to another. While the
majority of headwords have only one definition, there are headwords with more than 1000 definitions.
Table 1 reports the headwords with the largest number of definitions.
This fat-tailed, almost power-law distribution is not limited to the number of definitions per
headword; the number of definitions contributed by each user follows a similar distribution, shown
in figure 5b. The majority of users have contributed only once, while there are few power-users with
more than 1000 contributed definitions. These types of distributions are common in self-organized
human systems, particularly similar crowd-based systems such as Wikipedia [28,29] or the citizen science
projects Zooniverse [3], social media activity levels such as on Twitter [30] or content sharing systems
such as Reddit or Digg [31].
A noteworthy feature of UD is that users can express their evaluation of different definitions for each
headword by up or down voting the definition. There is little to no guideline on ‘what a good definition
is’ in UD and users are supposed to judge the quality of the definitions based on their own subjective
perception of how an urban dictionary should be. Figure 6ashows the distribution of the number of
up/down votes that each definition has received among all the definitions of all the headwords. A similar
pattern is evident, in which many definitions have received very few votes (both up and down) and few
definitions have many votes. Figure 6bshows a scatter plot of the number of down votes versus the
number of up votes for each definition. There is a striking correlation between the number of up and
down votes for each definition which emphasizes the role of visibility rather than quality in the number
of votes. However, there seems to be a systematic deviation from a perfect correlation in which the
number of up votes generally outperforms the number of down votes. This is more evident in figure 6c,
where the distribution of the ratio of up votes to down votes is shown. Evidently, there is a wide variation
among the definitions with some having more than 10 times more up votes than down votes and some
the other way around.
2.2. Number of headwords
We now compare the number of unique headwords in UD to the number of unique headwords
in Wiktionary, another crowd-sourced dictionary. Wiktionary manifests a different policy from that
of UD. The content in Wiktionary is created and maintained by administrators (selected by the
community), registered users and anonymous contributors [14]. In contrast to UD, there are many
different mechanisms in Wiktionary to ensure that the content adheres to the community guidelines.
Each page is accompanied by a talk page, where users can discuss the content of the page and resolve
any possible conflicts. Furthermore, in Wiktionary guidelines can be found for the structure and content
of the entries. Capitalization is consistent and content or headwords that do not meet the Wiktionary
guidelines are removed. For example, while both UD and Wiktionary have misspelled headwords (e.g.
6We use ‘headword’ to refer to the title under which a set of definitions appear. For example, in Wiktionary, the page about bank
covers different parts of speech (e.g. noun and verb) as well as the different senses. In the context of UD, we use ‘entry’ to refer to an
individual content contribution (e.g. the combination of headword, definition, example text and tags submitted by a user). Due to the
heterogeneity in UD, we lower cased the headwords to calculate this statistic. This follows the interface of UD, which also does not
matchoncasewhengroupingentries.
5
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
0
5000
10 000
15 000
20 000
2000 2005 2010 2015
time (week)
no. definitions per week
Figure 4. Number of contributed denitions to Urban Dictionary per week since its inception in 1999.
10−10
10−8
10−6
10−4
10−2
1
110
2104
no. definitions per user
probability density function
10−8
10−6
10−4
10−2
1
11010
2103
no. definitions per word
(a)(b)
Figure 5. The probability density function of (a) the number of denitions contributed to each headword and (b) the number of
denitions contributed by each user of Urban Dictionary (logarithmic binning). Both axes are logarithmically scaled.
Table 1. Headwords with the most denitions.
headword no. denitions
emo 1204
.........................................................................................................................................................................................................................
love 1140
.........................................................................................................................................................................................................................
god 706
.........................................................................................................................................................................................................................
urban dictionary 701
.........................................................................................................................................................................................................................
chode 614
.........................................................................................................................................................................................................................
canada’s history 583
.........................................................................................................................................................................................................................
sex 558
.........................................................................................................................................................................................................................
school 555
.........................................................................................................................................................................................................................
cunt 541
.........................................................................................................................................................................................................................
scene 537
.........................................................................................................................................................................................................................
beleive for believe), Wiktionary guidelines state that only common misspellings should be included while
rare misspellings should be excluded.7In contrast, such guidelines are not present in UD. Wiktionary
entries thus undergo a deeper level of curation.
7https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion (17 February 2018).
6
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
25 000
50 000
75 000
012345
log10 (no. votes per definition + 1)
count
up
down
0
25 000
50 000
75 000
100 000
−4 −2 0 2 4
log10 ((U + 1)/(D + 1))
count
0246
log10 (no. upvotes per definition + 1)
log
10
(no. downvotes
per definition + 1)
5
4
3
2
1
0
(a)
(c)
(b)
Figure 6. (a) Histogram of the number of votes of each denition, (b) scatter plot of the number of up votes and down votes that each
denition has received, with error bars for bins and a tted line, and (c) the histogram of the ratio of up votes (U) to down votes (D) of
each denition.
Table 2. Headword comparison between UD and Wiktionary. The table reports the unique number of headwords in each category. No
threshold was applied.
no processing all lowercase mixed
overlap 93 167 (4%) 112 762 (5%) 108 361 (5%)
.........................................................................................................................................................................................................................
only UD 1 698 812 (72%) 1 507 675 (70%) 1 565 794 (70%)
.........................................................................................................................................................................................................................
only Wiktionary 569 787 (24%) 540 641 (25%) 546 263 (25%)
.........................................................................................................................................................................................................................
total 2 361 766 2 161 078 2 220 418
.........................................................................................................................................................................................................................
Because of the inconsistent capitalization in UD, we experiment with three approaches to match the
headwords between both dictionaries: no preprocessing, lower casing of all characters, and mixed.8
Table 2 reports the result of this matching. The number of unique headwords in UD is much higher and
the lexical overlap is relatively low. Sometimes there is a match on the lexical level (i.e. the headwords
match), but UD or Wiktionary cover different or additional meanings. For example, phased is described
in UD as ‘something being done bit by bit—in phases’, a meaning also covered in Wiktionary. However,
UD also describes several other meanings, including ‘A word that is used when your asking if someone
wants to fight’ and ‘to be “buzzed” when you arent drunk, but arent sober’.
Because there is little curation of UD content, there are many headwords that would not typically
be included in a dictionary. Examples include nick names and proper names (e.g. shaskank defined as
‘Akshay Kaushik’s nick name for his boyfriend Shashank’; dan taylor, defined as ‘A very wonderful
man that cooks the best beef stew in the whole wide world. [...]’), as well as informal spelling (e.g.
AYYYYYYYYYYYYYYYYYYY!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!) and made-up words that actually no one uses (e.g.
Emptybottleaphobia9). Based on manual inspection, it seems that these are often headwords with only one
entry.
8The headword will be lower cased when the headword is all upper case or when the first character is upper case and the second
character is lower case.
9A Google search only returns 14 results, all of them containing the UD definition (17 February 2018).
7
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
Table 3. Headword comparison between UD and Wiktionary. The table reports the unique number of headwords in each category. Only
UD headwords with at least two entries are included.
no processing all lowercase mixed
overlap 50 522 (6%) 56 730 (7%) 55 003 (7%)
.........................................................................................................................................................................................................................
only UD 220 661 (25%) 165 054 (20%) 178 164 (21%)
.........................................................................................................................................................................................................................
only Wiktionary 612 432 (69%) 596 673 (73%) 599 621 (72%)
.........................................................................................................................................................................................................................
total 883 615 818 457 832 788
.........................................................................................................................................................................................................................
We, therefore, also perform a matching considering only headwords from UD with at least two entries
(table 3). In this way, we use the number of entries as a crude proxy for whether the headword is of
interest to a wider group of people. Note that this filtering is not applied to Wiktionary, because each
headword has only one page and headwords that do not match Wiktionary guidelines are already
removed by the community. For example, an important criterion for inclusion in Wiktionary is that
the term is reasonably widely attested, e.g. has widespread use or is used in permanently recorded
media.10 Compared to the first analysis, the difference is striking. In this comparison, the number of
unique headwords in Wiktionary is higher than that of UD. From a manual inspection we see that
many Wiktionary-specific headwords include domain specific and encyclopaedic words (e.g. acacetins,
dramaturge and shakespearean sonnets), archaic words (e.g. unaffrighted), as well as some commonly
used words (e.g. deceptive,e-voucher). We also find that many of the popular UD headwords (i.e.
headwords that have many entries) that are not covered in Wiktionary are proper nouns: the top five
entries are canada’s history,justin bieber,george w. bush,runescape and green day.Insomecases,entries
uniquely appearing in UD refer to words with genuine general coverage, such as loml (in total 11
entries) defined as, for example, ‘Acronym of “Love of My Life”’ or broham ‘a close buddy, compadre,
smoking and/drinking buddy. a term of endearment between men to reaffirm heterosexuality’ (in total
18 entries).
2.3. Content analysis
In this section, we present our analyses on the different types of content as well as the offensiveness of
the content in UD.
2.3.1. Content type
We now analyse several aspects of the content in UD that we expect to be different from content typically
found in traditional dictionaries as well as Wiktionary. For example, manual inspection suggested that
UD has a higher coverage of informal and infrequent words and of proper nouns (e.g. names of places
or specific people). Many of the headwords are not covered in knowledge bases or encyclopaedias.
To characterize the data, we therefore annotated a sample of the data using crowdsourcing (see Data
and methods). In order to limit the dominance of headwords with only one entry (which represent
the majority of headwords in UD), the sample was created by taking headwords from each of the 11
frequency bins (see table 10 for details on the way the bins were created and sampled from). Note that
the last two bins are very small. For each headword, we include up to three entries (top ranked, second
ranked and random based on up and down votes). Annotations were collected on the entry level and
crowd workers were shown the headword, definition and example.
Proper nouns
Dictionaries are usually selective with including proper nouns (e.g. names of places or individuals)
[32, p. 77]. In contrast, in UD many entries describe proper nouns. We therefore asked crowdworkers
whether the entry described a proper noun (yes or no). In our stratified sample, 16.4% of the entries
were annotated as being about a proper noun. Figure 7 shows the fraction of proper nouns by
frequency bin.
10https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclusion (17 February, 2018).
8
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
0
0.25
0.50
0.75
1.00
012345678910
frequency bin
proportion
proper noun?
no
yes
Figure 7. Proper nouns.
0
0.25
0.50
0.75
1.00
012345678910
frequency bin
proportion
meaning or opinion?
both
meaning
opinion
Figure 8. Meaning versus opinions (proper nouns were excluded).
Opinions
Most dictionaries strive towards objective content. For example, Wiktionary states ‘Avoid bias. Entries
should be written from a neutral point of view, representing all usages fairly and sympathetically’.11 In
contrast, the entries provided in UD do not always describe the meaning of a word, but they sometimes
contain an opinion (e.g. beer ‘Possibly the best thing ever to be invented ever. I MEAN IT’ or Bush
‘A disgrace to America’). We therefore asked the crowdworkers whether the definition describes the
meaning of the word, expresses a personal opinion, or both. Figures 8and 9show the fraction of entries
labeled as opinion,meaning or both, separated according to whether they were annotated as describing
proper nouns. In higher frequency bins, the fraction of entries marked as opinion is higher. We also find
that the number of entries marked as opinion is higher for proper nouns. While most entries are marked
as describing a meaning, the considerable presence of opinions suggests that the type of content in UD is
different from that in traditional dictionaries [13, pp. 3–4].
Familiarity
UD enables quick recording of new words and new meanings, many of them which may not have seen a
widespread usage yet. Furthermore, as discussed in the previous section, some entries are about made-
up words or words that only concern a small community. In contrast, many dictionaries require that
included headwords should be attested (i.e. have widespread use). These observations suggest that many
definitions in UD may not be familiar to people. To quantify this, we asked crowdworkers whether
they were familiar with the meaning of the word. The majority of the entries in UD were not familiar
to the crowdworkers. Examples are common headwords with an uncommon meaning such as coffee
defined as ‘a person who is coughed upon’ or shipwreck ‘The opposite of shipmate. A crew member
11https://en.wiktionary.org/wiki/Wiktionary:Policies_and_guidelines (16 February 2018).
9
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
0
0.25
0.50
0.75
1.00
0123456789
frequency bin
proportion
meaning or opinion?
both
meaning
opinion
Figure 9. Meaning versus opinions (proper nouns entries only).
0
0.4
0.8
1.2
012345678910
frequenc
y
bin
proportion
familiar
no
yes
Figure 10. Familiarity (proper nouns and opinion entries were excluded).
who is an all round liability and as competent as a one legged man in an arse kicking competition’, as
well as uncommon headwords and uncommon meanings (e.g. Once-A-Meeting defined as ‘An annoying
gathering of people for an hour or more once every pre-defined interval of time (e.g. once a day). Once-A-
Meetings could easily be circumvented by a simple phone call or e-mail but are instead used to validate
a project managers position within the company.’). Figure 10 shows that in higher frequency bins, more
definitions are marked as being familiar, suggesting that the number of definitions per headword is
indeed related to the general usage of a headword.
Formality
The focus of UD on slang words [33] means that many of the words are usually not appropriate in formal
conversations, like a formal job interview. To quantify this, we asked crowdworkers whether the word
in the described meaning can be used in a formal conversation. As figure 11 shows, most of the words in
their described meanings were indeed not appropriate for use in formal settings.
2.3.2. Oensiveness
Online platforms with user generated content are often susceptible to offensive content, which may be
insulting, profane and/or harmful towards individuals as well as social groups [34,35]. Furthermore, the
existence of such content in platforms could signal to other users that such content is acceptable and
impact the social norms of the platform [36]. As a response, various online platforms have integrated
different mechanisms to detect, report and remove inappropriate content. In contrast, regulation is
minimal in UD and one of its characteristics is its often offensive content.
UD not only contains offensive entries describing the meaning of offensive words, but there are also
offensive entries for non-offensive words (e.g. a definition describing women as ‘The root of all evil’). We
10
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
0
0.25
0.50
0.75
1.00
012345678910
frequency bin
proportion
formal
no
unclear
yes
Figure 11. Formality (proper nouns and opinion entries were excluded).
Table 4. Average oensiveness rankings (3 =most oensive, 1 =least oensive) by type of denition in UD entries.
type avg. oensiveness
both 2.025
.........................................................................................................................................................................................................................
meaning 1.989
.........................................................................................................................................................................................................................
opinion 2.050
.........................................................................................................................................................................................................................
Table 5. Average oensiveness rankings (3 =most oensive, 1 =least oensive) by formality in UD denitions.
formal? avg. oensiveness
no 2.031
.........................................................................................................................................................................................................................
unclear 1.884
.........................................................................................................................................................................................................................
yes 1.873
.........................................................................................................................................................................................................................
note, however, that UD also contains non-offensive definitions for offensive words (e.g. asshole defined
as ’A person with no concept of boundaries, respect or common decency’). To investigate how offensive
content is distributed in UD, we ran a crowdsourcing task on CrowdFlower (see Data and methods for
more details). Workers were shown three definitions for the same headword, which they had to rank
from the most to the least offensive.
We only included headwords with at least three definitions. In total, we obtained annotations for 1322
headwords and thus 3966 definitions. Out of these 1322 headwords there are 326 headwords for which
the majority of the workers agreed that none of the definitions were offensive.
Table 4 reports the offensiveness scores separated by whether the definitions describe a meaning,
opinion or both. An one-way ANOVA test indicates a slight significant difference (F2, 3963 =2.766,
p<0.1). A post hoc comparison using the Tukey test indeed indicates a slight significant difference
between the scores of definitions describing a meaning and opinion (p<0.1). Thus, definitions stating
an opinion tend to be ranked as more offensive compared to definitions describing a meaning.
Table 5 reports the offensiveness scores by formality. Definitions for words that were annotated as
not being appropriate for formal settings (based on their described meaning) tend to be ranked as
being more offensive. An one-way ANOVA confirms that the differences between the groups are highly
significant (F2, 3963 =22.72, p<0.001). Post hoc comparisons using the Tukey test indicate significant
differences between the formal and not formal categories (p<0.001), and between the unclear and not
formal categories (p<0.05). We also find that definitions for which crowdworkers had indicated that they
were familiar with the described meaning of the word tended to be perceived as less offensive (table 6,
p<0.001 based on a t-test). We observe the same trends when we only consider definitions that describe
a meaning.
11
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
Table 6. Average oensiveness rankings (3 =most oensive, 1 =least oensive) by familiarity in UD entries.
familiar? avg. oensiveness
yes 1.915
.........................................................................................................................................................................................................................
no 2.022
.........................................................................................................................................................................................................................
Table 7. Characterization of UD entries based on votes. The table reports the proportions of opinion-based versus meaning-based
denitions in each of the ranking groups.
opinion or meaning?
both meaning opinion
no proper nouns (n=3268)
.........................................................................................................................................................................................................................
top ranked 0.055 0.852 0.094
.........................................................................................................................................................................................................................
second ranked 0.074 0.850 0.076
.........................................................................................................................................................................................................................
random 0.051 0.864 0.084
.........................................................................................................................................................................................................................
proper nouns (n=698)
.........................................................................................................................................................................................................................
top ranked 0.172 0.481 0.347
.........................................................................................................................................................................................................................
second ranked 0.169 0.477 0.354
.........................................................................................................................................................................................................................
random 0.190 0.444 0.366
.........................................................................................................................................................................................................................
2.4. Content and popularity
An important feature of UD is the voting mechanism that allows the users to express their evaluation of
entries by up or down voting them. For a given headword, entries are ranked according to these votes
and the top ranked one is labeled as top definition. The votes thus drive the online visibility of entries,
leading to the following implications. First, the top ranked entries are immediately visible when UD is
consulted to look up the meaning of a headword. Many users might not browse the additional pages
with lower ranked entries. Second, by users expressing their evaluation through votes, social norms are
formed regarding what content is valued in UD.
UD does not provide clear guidelines on ‘what a good definition is’. Various factors could influence
the up and down votes an entry receives, including whether the voter thinks the entry is offensive,
informative, funny and whether the voter (dis)agrees with the expressed view. In this section, we analyse
how characteristics of the content as discussed in the previous section relate to the votes the entries
receive. Because the number of up and down votes varies highly depending on the popularity of the
headword, we perform the analysis based on the rankings of entries (top ranked, second ranked and
random) instead of the absolute number of up and down votes. Only headwords with at least three
entries are included.
Table 7 shows the distribution of opinion-based versus meaning-based definitions separated by
whether the headwords are annotated as proper nouns by the crowdworkers. The proportion of
definitions that are annotated as opinions is much higher for proper nouns, which is consistent with
our previous analysis. However, among the top ranked definitions for proper nouns, the proportion of
opinions is lower (but not significant).
Table 8 characterizes the entries by formality and familiarity. We discard proper nouns and entries
marked as opinion, since it is less clear what formality and familiarity mean in these contexts. We find
that the top ranked definitions tend to be more familiar (χ2(2, N=2991) =15.385, p<0.001) and more
appropriate for formal settings (but not significant).
Table 8 also reports the average offensiveness ranking of the definitions separated by their popularity
(again, discarding proper nouns and entries marked as opinions). The difference in rankings between top
ranked and second ranked definitions is minimal, but random definitions are more often ranked as being
more offensive. A one-way ANOVA test confirms that the differences between the groups are highly
significant (F2, 2988 =22.07, p<0.001). Post hoc comparisons using the Tukey test indicate significant
differences between the random and top ranked, and random and second ranked definitions (p<0.001).
12
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
Table 8. Familiarity, formality and oensiveness of UD denitions across rankings based on votes.
familiar? formal?
no yes no unclear yes oensiveness avg. ranking
top ranked 0.799 0.201 0.855 0.026 0.119 1.950
.........................................................................................................................................................................................................................
second ranked 0.807 0.193 0.876 0.023 0.101 1.966
.........................................................................................................................................................................................................................
random 0.861 0.139 0.894 0.020 0.086 2.107
.........................................................................................................................................................................................................................
Denitions for proper nouns and denitions annotated as opinions are not included. The table reports the proportions in each of the rankings for
familiarity and formality and the average ranking for oensiveness (3 =most oensive, 1 =least oensive); n=2991.
Table 9. Ordinal regression results. The dependent variable is the ranking: top ranked (0), second ranked (1) or a random rank (2).
dependent variable: ranking
familiar (yes) 0.255∗∗∗ (0.096)
.........................................................................................................................................................................................................................
formal (unclear) 0.133 (0.226)
.........................................................................................................................................................................................................................
formal (yes) 0.073 (0.123)
.........................................................................................................................................................................................................................
oensiveness 0.335∗∗∗ (0.059)
.........................................................................................................................................................................................................................
observations 2991
.........................................................................................................................................................................................................................
log likelihood 3262.19
.........................................................................................................................................................................................................................
AIC 6536.38
.........................................................................................................................................................................................................................
∗∗∗ p<0.01.
A similar trend is observed when we consider all definitions (F2, 3963 =34.87, p<0.001). Thus, although
UD contains offensive content, very offensive definitions do tend to be ranked lower through the voting
system. However, the small difference in scores between the groups indicates that offensiveness only
plays a small role in the up and down votes a definition receives.
To analyze the different factors jointly, we fit an ordinal regression model (table 9) using the ordinal
R library based on definitions that were annotated as not being an opinion and not describing proper
nouns. We find that familiarity and offensiveness indeed have a significant effect. More familiar and less
offensive definitions tend to have a higher ranking. Similar trends in coefficients were observed with
fitting logistic regression models when dichotomizing the ranking variable.
3. Discussion and conclusion
In this article, we have studied a complete snapshot (1999–2016) of UD to shed light on the characteristics
of its content. We found that most contributors of UD only added one entry and very few added a
high number of entries. Moreover, we found a number of skewed distributions, which need to be taken
into account whenever performing analyses on the UD data. Very few headwords have a high number
of entries, while the majority have only one entry. Similarly, few entries are highly popular (i.e. they
collected a high number of votes). We also found a strong correlation between the number of up and
down votes for each entry, illustrating the importance of visibility on the votes an entry receives.
The lexical content of UD is radically different from that of Wiktionary, another crowdsourced, but
more highly moderated dictionary. In general, we can say that the overlap between the two dictionaries
is small. Considering all unique UD headwords that are not found in Wiktionary, we found that this
number is almost three times the number of headwords that uniquely occur in Wiktionary. However, if
we exclude words with only one definition in UD (which tend to be infrequent or idiosyncratic words),
we found the opposite pattern, with Wiktionary-only headwords amounting to almost three times the
UD-only headwords.
Our analyses based on crowd-sourced annotations showed more details on the specific characteristics
of UD content. In particular, we measured a high presence of opinion-focused entries, as opposed to
the meaning-focused entries that we expect from traditional dictionaries. In addition, many entries in
UD describe proper nouns. The crowdworkers were not familiar with most of the definitions presented
13
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
to them and many words (and their described meaning) were found not to be appropriate for formal
settings.
UD captures many infrequent, informal words and it also contains offensive content, but highly
offensive definitions tend to get ranked lower through the voting system. The high content heterogeneity
in UD could mean that, depending on the goal, considerable effort is needed to filter and process the data
(e.g. the removal of opinions) compared to when traditional dictionaries are used. We also found that
words with more definitions tended to be more familiar to crowdworkers, suggesting that UD content
does reflect broader trends in language use to some extent.
There are several directions of future work that we aim to explore. We have compared the lexical
overlap with Wiktionary in terms of headwords. As future work, we plan to extend the current study by
performing a deeper semantic analysis and by comparing UD with other non-crowdsourced dictionaries.
Furthermore, we plan to extend the current study by comparing the content in UD with language use in
social media to advance our understanding of the extent to which UD reflects broader trends in language
use.
4. Data and methods
4.1. Data collection
4.1.1. Urban Dictionary
We crawled UD in July 2016. First, the definitions were collected by crawling the ‘browse’ pages of
UD and by following the ‘next’ links. After collecting the list of words, the definitions themselves were
crawled directly after (between 23 July and 29 July 2016). We did not make use of the API, since the API
restricted the maximum number of definitions returned to 10 for each word.
4.1.2. Wiktionary
We downloaded the Wiktionary dump of the English language edition of 20 July 2016, so that the date
matched our crawling process. To parse Wiktionary, we made use of code available through ConceptNet
5.2.2 [37]. Pages in the English Wiktionary edition can also include sections describing other languages
(e.g. the page about boot contains an entry describing the meaning of boot in the Dutch language (‘boat’)).
We only considered the English sections in this study.
4.2. Crowdsourcing
Most headwords in UD have only one entry, and, therefore, these headwords would dominate a random
sample. Because such headwords tend to be uncommon, a random sample would not be able to give
us much insight into the overall content of UD. We therefore sampled the headwords according to the
number of their entries. For each headword (after lower casing), we counted the number of entries and
placed the headword in a frequency bin (after taking a log base 2 transformation). For each bin, we
randomly sampled up to 200 headwords. For each sampled headword, we included the top two highest
scoring entries (scored according to the number of thumbs up minus the number of thumbs down) and
another random entry. In total, we sampled 4465 entries (table 10).
We collected the annotations using CrowdFlower. The quality was ensured using test questions and
by restricting the contributors to quality levels two and three and the countries Australia, Canada,
Ireland, New Zealand, UK, and the USA. We marked the crowdsourcing tasks as containing explicit
content, so that the tasks were only sent to contributors that accepted to work with such content.
4.2.1. Content type
For each task, we collected three judgements. The workers were paid $0.03 per judgement. We collected
13 395 judgements from a total of 201 workers. The median number of judgements per worker is
76. Workers were shown the headword, definition and example. The crowdworkers were asked the
following questions (options for answers are displayed in italic font):
Q1: Is this word a proper noun, for example, a name used for an individual person (like Mark),
place (like Paris) or organization (like Starbucks, Apple)? yes,no
Q2: The definition: describes the meaning of the word, expresses a personal opinion, both
14
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
Table 10. Statistics of the sampled denitions.
.........................................................................................................................................................................................................................
frequency bin (log 2) 0 1 2 3 4 5 6 7 8 9 10
.........................................................................................................................................................................................................................
no. denitions 200 449 600 600 600 600 600 600 180 30 6
.........................................................................................................................................................................................................................
Table 11. Agreement statistics.
Fleiss’ kappa pairwise agreement
Q1: proper noun (yes, no) 0.379 0.806
.........................................................................................................................................................................................................................
Q2: meaning or opinion? (meaning, opinion, both) 0.207 0.691
.........................................................................................................................................................................................................................
Q3: familiar (yes, no) 0.206 0.713
.........................................................................................................................................................................................................................
Q4: formal (yes, no, unclear) 0.207 0.7 12
.........................................................................................................................................................................................................................
Q3: Were you familiar with this meaning of the word before reading this definition? If you are
familiar with this word but NOT with this meaning, then please select no. Example: If you are
familiar with the meaning of the word ‘cat’ as the animal, but the definition describes cat as ‘A
person, usually male and generally considered or thought to be cool’ and you are not familiar
with this meaning, select no: yes, no
Q4: Can this word in the described meaning be used in a formal conversation? Examples
of formal settings are a formal job interview, meeting an important person, or court of law.
Examples of informal settings are chatting with close friends or family: yes, no, unclear
Agreement
For each definition we have three judgements. We calculate Fleiss’ kappa (using the irr package in R)
and the pairwise agreement (table 11). The agreement for the first question, asking whether the word is a
proper noun, is the highest. In general the agreement is low, due to the difficulty of the task. For example,
in these cases all three workers answered differently to the question whether the definition described a
meaning or an opinion: AR-15 defined as ‘AR does NOT stand for Assault Rifle’ and Law School defined
as ‘Where you go for to school for four years after college to learn to become a lawyer. In these four years,
you will work your butt off every day, slog through endless amounts of reading, suffer through so much
writing, and after you graduate, you do not get to call yourself “doctor”’. We merge the answers for each
question by taking the majority vote. We use ‘both’forQ2and‘unclear’ for Q4 if there was no majority.
4.2.2. Oensiveness
We experimented with different pilot setups in which we asked workers to annotate the level and type of
offensiveness for individual definitions. However, we found that this led to confusion and disagreement
among the crowdworkers. For example, an offensive word can be described in a non-offensive way
and a non-offensive word can be described in an offensive way. Furthermore, people have different
thresholds of what they consider to be offensive, making it challenging to ask for a binary judgement.
In the final setup, we therefore showed the sampled definitions for the same word and asked workers
to rank the definitions according to their offensiveness, with 1 being the most offensive and 3 being the
least offensive. Even if workers have different thresholds of what they consider offensive, they could still
agree when being asked to rank the definitions. Indeed, we found that this led to a higher agreement.
Note that in this article, we have reversed the ratings (3 =most offensive, 1 =least offensive) for a more
intuitive presentation of the results. Workers were also asked to indicate whether they considered all
definitions equally offensive, equally non-offensive, or none. For each task, we collected five judgements.
We paid $0.04 per judgement. We collected 6610 judgements from a total of 158 workers (median number
of judgements per worker: 44). Table 12 provides examples for two words (goosed and dad) and their
ratings.
15
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
Table 12. Examples of annotated denitions for oensiveness (3 =most oensive, 1 =least oensive).
word denition ratings
goosed Def. 1 old school denition: to pinch someone’s buttocks, hopefully the opposite sex, but hey,
you take what you get. Always associated in my mind with a British accent...
2, 2, 2, 2, 2
..........................................................................................................................................................................................
Def. 2 adj. 1. a feeling of over whelmedness 2. a feeling of frustration 3. a feeling of joy 4. all
emotions easily substituted by the word 5. the new ‘owned’
1, 1, 1, 1, 1
..........................................................................................................................................................................................
Def. 3 to apply pressure onone’s taint (or space between genitalia and anus), preferably of the
opposite sex!
3, 3, 3, 3, 3
.........................................................................................................................................................................................................................
dad Def. 1 the one who knocked-up your mom 2, 2, 2, 2, 3
..........................................................................................................................................................................................
Def. 2 the parent that takes the most shit. Sure, if you had a shitty father, then go ahead and
bitch, but not all of us did. Some of us had great fathers, who really loved us, and weren’t
assholes. Honestly, if you could see how much damage a mother could do to one’s self
esteem, you wouldn’t even place so much blame on ‘dear old dad’
3, 3, 3, 3, 2
..........................................................................................................................................................................................
Def. 3 the replacement name for ‘bro’ to call your best friend of whom you have a fatherly bond 1, 1, 1, 1, 1
.........................................................................................................................................................................................................................
Agreement
We calculate agreement using Kendall’s W(also called Kendall’s coefficient of concordance), which
ranges from 0 (no agreement) to 1 (complete agreement). We calculate Kendall’s Wfor each word
separately. The average value of Kendall’s Wis 0.511 (standard deviation =0.303). If we exclude words
for which a worker indicated that the definitions were equal in terms of offensiveness, the value increases
to 0.714 (standard deviation =0.238).
Ethics. In this study we employ crowdsourcing to collect annotations. The tasks were marked as containing explicit
content, so that the tasks were only visible to contributors that accepted to work with such content. The tasks also
explicitly mentioned that the results will be used for scientific research (‘By participating you agree that these results
will be used for scientific research’). We closely monitored the crowdsourcing tasks and contributor satisfaction was
consistently high.
Data accessibility. Despite several attempts to contact Urban Dictionary to confirm their data sharing policies, the authors
have not been able to confirm that deposition of our data in a public repository would breach their terms and
conditions. Furthermore, owing to these concerns it has not been possible to host the current dataset in a public
repository. With this in mind, the authors note that the R analysis code and annotations are available through
https://github.com/alan-turing-institute/urban-dictionary-rsos2018. The authors are happy to provide researchers
with the original data in case they contact us personally. This statement has been agreed with the journal.
Authors’ contributions. D.N. collected and analysed the data, participated in the design of the study, and drafted the
manuscript; B.M. participated in the design of the study, analysed the data and drafted the manuscript; T.Y. conceived
the study, analyzed the data and helped draft the manuscript. All authors gave final approval for publication.
Competing interests. We declare we have no competing interests.
Funding. This work was supported by the Alan Turing Institute under the EPSRC grant no. EP/N510129/1. D.N. was
supported by Turing award TU/A/000006 and B.M. by Turing award TU/A/000010 (RG88751). The crowdsourcing
data collection was supported with an Alan Turing Institute seed funding grant (SF024).
References
1. Estellés-Arolas E, González-Ladrón-de Guevara F.
2012 Towards an integ rated crowdsourcing
denition. J. Inf. Sci. 38, 189–200. (doi:10.1177/
0165551512437638)
2. Dabbish L, Stuart C, TsayJ, Herbsleb J. 2012
Social coding in GitHub: transparency and
collaboration in an open software repository.In
Proc.ACM2012Conf.onComputerSupported
CooperativeWork, pp. 1277–1286. New York,
NY:ACM.(doi:10.1145/2145204.
2145396).
3. Sauermann H, Franzoni C. 2015 Crowdscience user
contribution patterns and their implications. Proc.
Natl Acad. Sci. 112,679–684.(doi:10.1073/pnas.
1408907112)
4. DoanA, R amakrishnan R, Halevy AY.2011
Crowdsourcing systems on the world-wideweb.
Commun. A CM 54, 86–96. (doi:10.1145/
1924421.1924442)
5. Giles J. 2005 Internetenc yclopaedias go head to
head. Nature 438, 900–901. (doi:10.1038/438900a)
6. KitturA, Suh B, Pendleton BA, Chi EH. 2007 He says,
she says: conict and coordination in Wikipedia. In
Proc. SIGCHI Conf.on Human Factors in Computing
Systems, pp. 453–462. New York, NY:ACM.
(doi:10.1145/1240624.1240698).
7. YasseriT, Sumi R, Rung A, Kornai A, Kertész J. 2012
Dynamics of conicts in Wikipedia. PLoS ONE 7,
e38869. (doi:10.1371/journal.pone.0038869)
8. HalavaisA, Lacka D. 2008 An analysis of topical
coverage of Wikipedia. J.Comput. Mediat. Commun.
13,429–440.(doi:10.1111/j.1083-6101.2008.
00403.x)
9. SamoilenkoA, Yasseri T. 2014 The distorted mirror of
Wikipedia: a quantitative analysis of Wikipedia
coverage of academics. EPJData Sci. 3,1.
(doi:10.1140/epjds20)
10. Labov W. 2001 Principles of linguistic change,volume
II, social factors. Oxford, UK: Wiley-Blackwell.
16
rsos.royalsocietypublishing.org R. Soc. open sci. 5:172320
.................................................
11. Siemund P. 2014 The emergence of English reexive
verbs: an analysis based on the Oxford English
Dictionary.Engl. Lang. Linguist. 18,49–73.
(doi:10.1017/S1360674313000270)
12. Abel A, Meyer CM. 2013 The dynamics outside the
paper: user contributions to online dictionaries. In
Proc.eLex2013,Tallinn,Estonia,1719October2013,
pp. 179–194.
13. Rundell M. 2016 Dictionaries and crowdsourcing,
wikis and user-generated content. In International
handbook of modern lexis and lexicography (eds
PHanks,GMdeSchryver),pp.116.Berlin,
Germany: Springer.
14. Meyer CM, Gurevych I. 2012 Wiktionary: a new rival
for expert-built lexicons? Exploring the possibilities
of collaborative lexicography. In Electronic
lexicography (edsS G ranger, M Paquot),
pp. 259–291. Oxford,UK: Oxford University Press.
15. Creese S. 2013 Exploring the relationship between
language change and dictionary compilation in the
age of the collaborative dictionary. In Proc.eLex
2013,Tallinn,Estonia,1719October2013,
pp. 392–406.
16. Creese S. 2017 Lexicographicalexplorations of
neologisms in the digital age. Trackingnew words
online and comparing Wiktionary entries with
‘traditional’ dictionary representations. PhD thesis,
Coventry University.
17. EisensteinJ,O’ConnorB,SmithNA,XingEP.2014
Diusion of lexical change in social media. PLoS ONE
9, e113114. (doi:10.1371/journal.pone.0113114)
18. Grieve J, Nini A, Guo D. 2017 Analyzing lexical
emergence in Modern American English online.
Engl. Lang.Linguist. 21, 99–127. (doi:10.1017/
S1360674316000113)
19. Kerremans D, StegmayrS, Schmid H. 2011 The
NeoCrawler: identifying and retrieving neologisms
from the internet and monitoring ongoing change.
In Current methods in historicalsemantics (ed s K
Allan, JA Robinson), pp. 59–96. Hawthorne,NY: De
Gruyter Mouton.
20. Tenore MJ. 2012 Urban Dictionary, Wordnik track
evolution of language as words change, emerge.
Poynter: ThePoynter Institute. See https://www.
poynter.org/news/urban-dictionary-wordnik-
track-evolution-language-words-change-emerge
(24 October 2017).
21. Davis J. 2011 In praise of urban dictionaries. See
https://www.theguardian.com/books/2011/apr/
21/in-praise-urban-dictionaries (24 October
2017).
22. HeatonT.201010Questionsurbandictionarys
Aaron Peckham. See http://thepomoblog.com/
index.php/10-questions-with-urban-dictionarys-
aaron-peckham/ (24 October 2017).
23. Smith RE. 2011 Urban dictionary: youth slanguage
andtheredeningofdenition.EnglishToday
27,43–48.(doi:10.1017/S02660784110
00526)
24. NiK, Wang WY. 2017 Learning to explain
non-standard English words phrases.In Proc. 8th
Int. Joint Conf.on Natural Language Processing,
Taipei,Taiwan, 27 November–1December 2017,
pp. 413–417.
25. BeckleyR. 2015 B ekli: a simple approachto Twitter
text normalization. In Proc. ACL2015 Workshop on
Noisy User-generated Text,Beijing, China, 31 July
2015, pp. 82–86.
26. TangD,WeiF,QinB,ZhouM,LiuT.2014Building
large-scale Twitter-specicsentiment lexicon: a
representation learning approach.In Proc. COLING
2014, 25th Int. Conf.on Computational Linguistics:
TechnicalPapers, Dublin, Ireland, 23–29 August 2014,
pp. 172–182.
27. Damaso J, Cotter C. 2007 UrbanDictionary.com.Engl.
Toda y 23,19–26.
28. Or tega F, Gonzalez-Barahona JM, Robles G. 2008 On
the inequality of contributions to Wikipedia. In Proc.
41st Annual Hawaii Int. Conf.on System Sciences,
Waikoloa,HI, USA, 7–10 January 2008,p.304.
(doi:10.1109/HICSS.2008.333)
29. Yasseri T, Kertész J. 2013 Value production in a
collaborative environment.J. Stat. Phys. 151,
414–439. (doi:10.1007/s10955-013-0728-6)
30. Huberman B, Romero DM, Wu F. 2008 Social
networks that matter: Twitterunder the
microscope. First Monday 14.(http://rstmonday.
org/ojs/index.php/fm/article/view/2317/2063)
31. Wu F, Huberman BA. 2007 Novelty and collective
attention. Proc. Natl Acad. Sci. 104, 17 599–17 601.
(doi:10.1073/pnas.0704916104)
32. MarconiD. 1990 Dic tionaries and propernames.
History Philos. Q. 7,7792.
33. PeckhamA. 2009 Urban dictionar y: fularious street
slang dened. Kansas City, MO: Andrews McMeel
Publishing.
34. SoodS, Antin J, Churchill E. 2012 Profanity use in
online communities. In Proc.SIGCHI Conf. on Human
Factors in ComputingSystems, pp. 1481–1490. New
York, NY:ACM. (doi:10.1145/2207676.
2208610).
35. WaseemZ, Davidson T, Warmsley D, Weber I. 2017
Understanding abuse: a typology of abusive
language detection subtasks. In Proc. 1st Workshop
on Abusive Language Online, Vancouver, Canada, 4
August 2017,pp.78–84.
36. Sukumaran A, Vezich S, McHugh M, Nass C. 2011
Normative inuences on thoughtful online
participation. In Proc. SIGCHI Conf.on Human Factors
in Computing Systems, pp.3401–3410. New York, NY:
ACM. (doi:10.1145/1978942.1979450).
37. Speer R, Havasi C. 2012 Representinggeneral
relational knowledge in ConceptNet 5. In LREC,
pp. 3679–3686.
... Prior work has already focused on the study of Urban Dictionary as a corpus (Nguyen et al., 2018), finding that the platform has shown steady usage since its inception in 1999, and that the definitions capture a mixture of opinions, humor, and true meanings of the defined headwords. The authors also found skewed distributions in terms of the number of definitions per word and votes per entry, which we also verify in Section 3.1.. ...
... Urban Dictionary (UD) is a crowd-sourced dictionary for (mostly) English-language terms or definitions that are not typically captured by traditional dictionaries. In the best cases, users provide definitions for new and emerging language, while in reality, many entries are a mix of honest definitions ("Stan: a crazy or obsessed fan"), jokes ("Shoes: houses for your feet"), personal messages ("Sam: a really kind and caring person"), and inappropriate or offensive language (Nguyen et al., 2018). Each entry, uploaded by a single user, contains a term, its definition, examples, and tags ( Figure 1). ...
... In order to get a high-level understanding of the data, we also visualize the length of each definition ( Figure 3) and plot the upvotes and downvotes assigned to the full set of definitions ( Figure 4). We note similar skewness in these figures as was reported in earlier analysis of Urban Dictionary data (Nguyen et al., 2018). ...
Conference Paper
Full-text available
The choice of the corpus on which word embeddings are trained can have a sizable effect on the learned representations, the types of analyses that can be performed with them, and their utility as features for machine learning models. To contribute to the existing sets of pre-trained word embeddings, we introduce and release the first set of word embeddings trained on the content of Urban Dictionary, a crowd-sourced dictionary for slang words and phrases. We show that although these embeddings are trained on fewer total tokens (by at least an order of magnitude compared to most popular pre-trained embeddings), they have high performance across a range of common word embedding evaluations, ranging from semantic similarity to word clustering tasks. Further, for some extrinsic tasks such as sentiment analysis and sarcasm detection where we expect to require some knowledge of colloquial language on social media data, initializing classifiers with the Urban Dictionary Embeddings resulted in improved performance compared to initializing with a range of other well-known, pre-trained embeddings that are order of magnitude larger in size.
... Urban Dictionary is an online, crowd-sourced dictionary for (mostly) 3 English-language terms containing definitions that are not typically captured by traditional dictionaries. In the best cases, users provide meaningful definitions for new and emerging language, while in reality, many entries are a mix of honest definitions ("Stan: a crazy or obsessed fan"), jokes ("Shoes: houses for your feet"), personal messages ("Sam: a really kind and caring person"), and inappropriate or offensive language [18]. Each entry, uploaded by a single user, contains a term, its definition, examples, and tags ( Figure 2). ...
... This data collection includes an up-to-date version of Urban Dictionary as of October 16, 2019. In order to get a high-level understanding of the data, we also plot the upvotes and downvotes assigned to the full set of definitions in Figure 4. We note similar skewness in these figures as was reported in an earlier analysis of Urban Dictionary data [18]. ...
... Urban Dictionary is an online, crowd-sourced dictionary for (mostly) 3 English-language terms containing definitions that are not typically captured by traditional dictionaries. In the best cases, users provide meaningful definitions for new and emerging language, while in reality, many entries are a mix of honest definitions ("Stan: a crazy or obsessed fan"), jokes ("Shoes: houses for your feet"), personal messages ("Sam: a really kind and caring person"), and inappropriate or offensive language [18]. Each entry, uploaded by a single user, contains a term, its definition, examples, and tags ( Figure 2). ...
... This data collection includes an up-to-date version of Urban Dictionary as of October 16, 2019. In order to get a high-level understanding of the data, we also plot the upvotes and downvotes assigned to the full set of definitions in Figure 4. We note similar skewness in these figures as was reported in an earlier analysis of Urban Dictionary data [18]. ...
Preprint
Full-text available
As an online, crowd-sourced, open English-language slang dictionary, the Urban Dictionary platform contains a wealth of opinions, jokes, and definitions of terms, phrases, acronyms, and more. However, it is unclear exactly how activity on this platform relates to larger conversations happening elsewhere on the web, such as discussions on larger, more popular social media platforms. In this research, we study the temporal activity trends on Urban Dictionary and provide the first analysis of how this activity relates to content being discussed on a major social network: Twitter. By collecting the whole of Urban Dictionary, as well as a large sample of tweets over seven years, we explore the connections between the words and phrases that are defined and searched for on Urban Dictionary and the content that is talked about on Twitter. Through a series of cross-correlation calculations, we identify cases in which Urban Dictionary activity closely reflects the larger conversation happening on Twitter. Then, we analyze the types of terms that have a stronger connection to discussions on Twitter, finding that Urban Dictionary activity that is positively correlated with Twitter is centered around terms related to memes, popular public figures, and offline events. Finally, We explore the relationship between periods of time when terms are trending on Twitter and the corresponding activity on Urban Dictionary, revealing that new definitions are more likely to be added to Urban Dictionary for terms that are currently trending on Twitter.
... We use a combination of corpus-based and linguistic features to rank the segmentations. For a candidate segmentation s, its feature vector s includes the number of words in the candidate, the length of each word, the proportion of words in an English dictionary 3 or Urban Dictionary 4 (Nguyen et al., 2018), ngram counts from Google Web 1TB corpus (Brants and Franz, 2006), and ngram probabilities from trigram language models trained on the Gigaword corpus (Graff and Cieri, 2003) and ...
... We use a combination of corpus-based and linguistic features to rank the segmentations. For a candidate segmentation s, its feature vector s includes the number of words in the candidate, the length of each word, the proportion of words in an English dictionary 3 or Urban Dictionary 4 (Nguyen et al., 2018), ngram counts from Google Web 1TB corpus (Brants and Franz, 2006), and ngram probabilities from trigram language models trained on the Gigaword corpus (Graff and Cieri, 2003) and 1.1 billion English tweets from 2010, respectively. We train two language models on each corpus: one with Good-Turing smoothing using SRILM (Stolcke, 2002) and the other with modified Kneser-Ney smoothing using KenLM (Heafield, 2011). ...
Preprint
Full-text available
Hashtags are often employed on social media and beyond to add metadata to a textual utterance with the goal of increasing discoverability, aiding search, or providing additional semantics. However, the semantic content of hashtags is not straightforward to infer as these represent ad-hoc conventions which frequently include multiple words joined together and can include abbreviations and unorthodox spellings. We build a dataset of 12,594 hashtags split into individual segments and propose a set of approaches for hashtag segmentation by framing it as a pairwise ranking problem between candidate segmentations. Our novel neural approaches demonstrate 24.6% error reduction in hashtag segmentation accuracy compared to the current state-of-the-art method. Finally, we demonstrate that a deeper understanding of hashtag semantics obtained through segmentation is useful for downstream applications such as sentiment analysis, for which we achieved a 2.6% increase in average recall on the SemEval 2017 sentiment analysis dataset.
Article
In both the Spanish-language Argentine comic strip Mafalda, created by Quino, and the Portuguese-language Brazilian comic A Turma da Mônica by Maurício de Sousa, the creators’ use of political and cultural satire unveils critical global and national issues through the eyes of young female protagonists. Character naming and effective translation of these comic strips requires an expanded view of satire as a principally literary genre by examining its linguistic and cultural purposes. Thus, this study explores the cultural and linguistic significance of satire in translation along with its associated challenges, drawing on specific examples from these two comic strips to illustrate these issues.
Article
This study performs content analysis of the tweets on the hashtag Covidiots to comprehend the impact of religious issues and events on netizens during the COVID-19 pandemic. The current project intended to study and analyse ways people encountered religion on Twitter during the COVID-19 pandemic. The said tweets are collected using the consecutive day sampling method. The focus of the study is to highlight the articulation of ideas and responses offered to the coronavirus pandemic by religious leaders, groups, and individuals and the consequent reaction of the netizens to the conduct of these religious entities to the scourge of the pandemic. This research indicates that most Twitter users were critical of opinions and actions expressed by religious entities on issues related to the COVID-19 pandemic and discarded them. It also proves that organised religions appeared out of sync with the threats posed by the pandemic and failed to resonate with society’s evolving norms.
Article
Sarcasm target detection (identifying the target of mockery in a sarcastic sentence) is an emerging field in computational linguistics. Although there has been some research in this field, accurately identifying the target still remains problematic especially when the target of mockery is not presented in the text. In this paper, we propose a combination of a machine learning classifier and a deep learning model to extract the target of sarcasm from the text. First, we classify sarcastic sentences using machine learning, to determine whether a sarcastic sentence contains a target. Then we use a deep learning model from Aspect-Based Sentiment Analysis to extract the target. Our proposed system is evaluated on three publicly available data sets: sarcastic book snippets, sarcastic tweets, and sarcastic Reddit comments. Our evaluation results show that our approach achieves equal or better performance compared to the current state-of-the-art system, with an 18% improvement on the Reddit data set and similar scores on the Books and Tweets data sets. This is because our method is able to accurately identify when the target of sarcasm is not present. The primary challenge we identify, that is hindering the creation of a high accuracy classifier, is the lack of consistency among human annotators in identifying the target of sarcasm within standard ground-truth data sets.
Chapter
This chapter advocates for going beyond traditional sources and definitions of language teacher professionalism in order to consider the perspectives of the language learner. We attempt to re-direct the evidence on relationship building (e.g., between teacher-teacher, teacher-administrator, and teacher-parent) as a key characteristic of teacher professionalism and apply it to the relationships that language teachers build with their learners. The chapter begins with general definitions of professionalism and their evolving meaning and then considers what it means to be a teaching professional and the stakeholders who are given the authority to make such judgements. Building on this background, we advocate for viewing language teacher professionalism through a positive psychology lens, with particular emphasis on emotional intelligence and nurturing the teacher-learner relationship.
Chapter
The final chapter looks to the future of social machines under the conditions of digital modernity, when they will have to coexist and even compete with technologies such as artificial intelligence. They will no doubt require or acquire new forms, new functions, new methods of study and new infrastructure. Examples are given of each of these four. The chapter considers cyber-physical social machines, where social machines exploit the new affordances of the Internet of Things, as a new form of social machine. As a new function of social machines, social knowledge machines are described, in which social groups emerge around technologies to create and curate data and knowledge resources. Topological data analysis is considered as a potential new research method into social machines. Finally, new initiatives to decentralise the Web, such as the Solid project, are reviewed as means for allowing social machine participants to retain a measure of control. Finally, the chapter reviews the requirements to enable participants to develop and maintain social machines of their own design.
Thesis
Full-text available
This thesis explores neologisms in two distinct but related contexts: dictionaries and newspapers. Both present neologisms to the world, the former through information and elucidation of meaning, the latter through exemplification of real-world use and behaviour. The thesis first explores the representation of new words in a range of different dictionary types and formats, comparing entries from collaborative dictionary Wiktionary with those in expert-produced dictionaries, both those categorised here as ‘corpus-based’ and those termed ‘corpus-informed’. The former represent the most current of the expert-produced dictionary models, drawing on corpora for almost all of the data they include in an entry, while the latter draw on a mixture of old-style citations and Reading Programmes for much of their data, although this is supplemented with corpus information in some areas. The purpose of this part of the study was to compare degrees of comprehensiveness between the expert and collaborative dictionaries as demonstrated by the level and quality of detail included in new-word entries and in the dictionaries’ responsiveness to new words. This is done by comparing the number and quality of components that appear in a dictionary entry, both the standardised elements found in all of the dictionary types, such as the ‘headword’ at the top of the entry, to the non-standardised elements such as Discussion Forums found almost exclusively in Wiktionary. Wiktionary is found to provide more detailed entries on new words than the expert dictionaries, and to be generally more flexible, responding more quickly and effectively to neologisms. This is due in no small part to the way in which every time an entry or discussion is saved, the entire site updates, something which occurs for expert-produced online dictionaries once a quarter at best. The thesis further explores the way in which the same neologisms are used in four UK national newspapers across the course of their neologic life-cycle. In order to do this, a new methodology is devised for the collection of web-based data for context-rich, genre-specific corpus studies. This produced highly detailed, contextualised data that not only showed how certain newspapers are more likely to use less-well established neologisms (the Independent), while others have an overall stronger record of neologism usage across the 14 years of the study (The Guardian). As well as generating findings on the use and behaviour of neologisms in these newspapers, the manual methodology devised here is compared with a similar automated system, to assess which approach is more appropriate for use in this kind of context-rich database/corpus. The ability to accurately date each article in the study, using information which only the manual methods could accurately access, coupled with the more targeted approach it can offer by excluding unwanted texts from the outset made it the more appropriate approach.
Conference Paper
Full-text available
The rise in collaborative 'wiki' dictionaries means that dictionary creation is no longer the purview solely of academics and publishing companies. Ordinary people can now create and share their own dictionary entries, whilst traditional publishing houses must compete against resources able to achieve levels of interactivity and immediacy that they simply cannot. These differences in the dictionary landscape may not be the only consequence of the rise of 'wiki' dictionaries, however; the very relationship between dictionary compilation and language change may be shifting, with the speed and ease of updating of 'wiki' dictionaries meaning that they not only reflect current use, but actually drive change. This paper examines the possibility of this, through the findings of a pilot study featuring a new web-based corpus of youth neologisms, and media tracking of these new words. In it, I set out to determine the relationship between the Wiktionary definition and the grassroots use of particular words, as well as considering if and how this is changing as 'wiki' dictionaries become more and more firmly established.
Chapter
Full-text available
Why do some new words manage to enter the lexicon and stay there while others drop out of use and are neither used nor heard anymore? Of interest to both lay people and linguists, this question has not been answered in an empirically convincing manner to date, mainly because systematic methods have not yet been found for spotting new words as soon as possible after their first occurrence and monitoring their early development and spread as exhaustively as possible. In this paper we present a new and improved tool which is designed to accomplish precisely these tasks when applied to material from the Internet. Following a brief review of existing tools for retrieving linguistic data from the Web (Section 2), we will introduce in some detail a tailor-made webcrawler, the so-called NeoCrawler, which identifies and retrieves neologisms from the Internet and stores data necessary for the systematic monitoring of their early development with regard to form and meaning as well as spread (Section 3). Following this description, we will present a case study discussing the results of an analysis of the neologism detweet with regard to its di¤usion, institutionalization, lexicalization and lexical network-formation (Section 4). The study indicates that the NeoCrawler can indeed be applied fruitfully in the study of ongoing processes relating to how the meanings and forms of new words are negotiated in the speech community, how words spread in the early stages of their life cycles and how they begin to establish themselves in lexical and semantic networks.
Article
We describe a data-driven approach for automatically explaining new, non-standard English expressions in a given sentence, building on a large dataset that includes 15 years of crowdsourced examples from UrbanDictionary.com. Unlike prior studies that focus on matching keywords from a slang dictionary, we investigate the possibility of learning a neural sequence-to-sequence model that generates explanations of unseen non-standard English expressions given context. We propose a dual encoder approach---a word-level encoder learns the representation of context, and a second character-level encoder to learn the hidden representation of the target non-standard expression. Our model can produce reasonable definitions of new non-standard English expressions given their context with certain confidence.
Chapter
It is tempting to dismiss crowdsourcing as a largely trivial recent development which has nothing useful to contribute to serious lexicography. This temptation should be resisted. When applied to dictionary-making, the broad term “crowdsourcing” in fact describes a range of distinct methods for creating or gathering linguistic data. A provisional typology is proposed, distinguishing three approaches which are often lumped under the heading “crowdsourcing.” These are: user-generated content (UGC), the wiki model, and what is referred to here as “crowd-sourcing proper.” Each approach is explained, and examples are given of their applications in linguistic and lexicographic projects. The main argument of this chapter is that each of these methods – if properly understood and carefully managed – has significant potential for lexicography. The strengths and weaknesses of each model are identified, and suggestions are made for exploiting them in order to facilitate or enhance different operations within the process of developing descriptions of language. Crowdsourcing – in its various forms – should be seen as an opportunity rather than as a threat or diversion.
Article
This article introduces a quantitative method for identifying newly emerging word forms in large time-stamped corpora of natural language and then describes an analysis of lexical emergence in American social media using this method, based on a multi-billion-word corpus of Tweets collected between October 2013 and November 2014. In total 29 emerging word forms, which represent various semantic classes, grammatical parts-of-speech and word formation processes, were identified through this analysis. These 29 forms are then examined from various perspectives in order to begin to better understand the process of lexical emergence.
Conference Paper
Every day, Twitter users generate vast quantities of potentially useful information in the form of written language. Due to Twitter’s frequently informal tone, text normalization can be a crucial element for exploiting that information. This paper outlines our approach to text normalization used in the WNUT shared task. We show that a very simple solution, powered by a modestly sized, partiallycurated wordlist—combined with a modest reranking scheme—can deliver respectable results.