ThesisPDF Available

Ideograms as Semantic Primes: Emoji in Computational Linguistic Creativity

Authors:

Abstract and Figures

Our everyday virtual communication underwent a shift in recent years, when the Unicode Standard introduced Emoji (Unicode-Standard, since 2000), the set of more than one thousand pictograms, which became a standard in most of our online messaging services. Now Emoji are a substantial part of our virtual communication with more and more words becoming substituted by Emoji. Companies emerge around the Emoji trend such as Emogi.com, which states in its Emoji Report 2015 that about 92% of the online population uses Emoji (Emogi.com, since 2011). Emogi.com uses Machine Learning, NLP and biometric studies to analyse how people communicate with Emoji. All big social networks such as Facebook, Twitter and Instagram are using sentiment analysis on their Emoji usage statistics. Emoji are semiotic objects (Danesi, 2016) as they demonstrate a multitude of linguistic semiotic layers and can be interpreted as signs, metaphors, analogies or symbols. Nevertheless, there is no effective human language solely based on pictograms. Even the Egyptian hieroglyphs are not pictographic in nature, they are a phonetic, sound-based system (Jespersen and Reintges, 2008). Iconic signs and natural language both can construct complex sentences but pictograms - such as Emoji - can hardly express narration, conversation or argumentation in longer sequences (Tijus et al., 2007). Emoji show polysemy (Olson, 1970) and only few are assumed to be universally understood. Theoretically Emoji, as much as hieroglyphs, denote and connote ideas visually and those ideas might be primitive semantic building blocks of complex meaning. The idea of semantic primes has been introduced in AI by Y. Wilks (Wilks, 1975) and R. Schank (Schank, 1972). In this theory, any complex idea can be expressed as the product of two or more simpler ideas, and the irreducibly atomic ideas are the semantic primes. As such, they can be used to derive any complex sentence in any language (Wierzbicka, 1996). Given this theory, Emoji should prove useful in the expression of complex ideas as sequences of atomic ideas. Hence, a story telling system should be able to articulate its stories entirely in Emoji, if the Emoji as semantic primes hypothesis has any validity. We choose the twitterbot @MetaphorMagnet (Veale, 2016), which provides a large set of plot verbs (800+) and we translate them into Emoji sequences using a set of distinctive methods. Analysing the understanding of the translations (with crowdsoucing) allows us to investigate the individual methods. This thesis provides an empirical investigation of the techniques used to compose the translation sequences and a manual on how to apply them. The results suggest with strong significance that the Emoji sequences prove to translate comprehensibly. Moreover, the results suggest that Emoji can tell stories and thus allow considering Emoji to be able to combine a set of fundamental ideas. The scheme is a novel approach in parsing and translating Emoji, that could prove highly beneficial given the aforementioned interest in Emoji.
Content may be subject to copyright.
Λ
Λ
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Correct
Incorrect
64.9%
35.1%
Overall Distribution of Correct / Incorrect Answers
Percentage of answers
µ= 0.649 σ= 0.26189
H0:
H1:
p
q= 1 p
H0
p= 0.2
H0:
H1:
n= 1000
p(k|p) = n
kpk(1 p)nk
n=p=k=
n= 1000 p= 0.2
n= 1000
p= 0.2H0k= 649
p(649|0.2) = 1000
649 0.2649(1 0.2)1000649 = 1.6806 10208
α
1.23021097 H1
H0
α= 0.01
E(x)
E(x) = µ µ
H0
n= 1000 p= 0.2H0
64.88%
p(532|0.2) = 820
5320.2532(1 0.2)820532 = 3.469 10171 k= 820 0.6488
#correct
#questions
α= 0.01
pub = 1 BetaI ncompI nverse(k+ 1, n k, α
2)
plb = 1BetaIncompInverse(k, n k+1,1α
2)
k
n
α
BetaI ncompI nverse(k, n, α)
BetaI ncomp(z;a, b) = Rz
0ua1(1 u)b1du
α
ρµ,σ =0.8611
p= 0.00033 α < 0.01
ρrgX,rgY=cov(rgX,rgY)
σrgXσrgY
ρ
cov σrgXσrgY
0
0.2
0.4
0.6
0.8
1
Rebus Literal Metaphor DA ShortStory DM Example IA BE Idiom Double
Methods
Percentage of correct answers
µ =0.81
*σ =0.213
[]md =0.85
_
µ =0.875
*σ =0.189
[]md =0.95
_
µ =0.7
*σ =0.278
[]md =0.8
_
µ =0.65
*σ =0.24
[]md =0.7
_
µ =0.722
*σ =0.199
[]md =0.7
_
µ =0.59
*σ =0.324
[]md =0.6
_
µ =0.667
*σ =0.28
[]md =0.65
_
µ =0.625
*σ =0.296
[]md =0.7
_
µ =0.667
*σ =0.321
[]md =0.8
_
µ =0.55
*σ =0.3
[]md =0.6
_
µ =0.5
*σ =0.361
[]md =0.6
_
α
p= 0.0361
ρµ,σ =0.2277
α < 0.01
0
0.2
0.4
0.6
0.8
1
Plus Exchange Versus 1Support 2Support Return ShuffleT. 3Support Four Mask Soon End
Methods
Percentage of correct answers
µ =0.933
*σ =0.0577
[]md =0.9
_
µ =0.825
*σ =0.171
[]md =0.85
_
µ =0.75
*σ =0.379
[]md =0.9
_
µ =0.606
*σ =0.27
[]md =0.7
_
µ =0.593
*σ =0.183
[]md =0.6
_
µ =0.7
*σ =0.173
[]md =0.8
_
µ =0.7
*σ =0.1
[]md =0.7
_
µ =0.667
*σ =0.0577
[]md =0.7
_
µ =0.633
*σ =0.306
[]md =0.7
_
µ =0.56
*σ =0.182
[]md =0.5
_
µ =0.567
*σ =0.252
[]md =0.6
_
µ =0.367
*σ =0.153
[]md =0.4
_
H0
p= 0.2α= 0.01
α= 0.05
α= 0.01
H0
p= 0.2α= 0.01
µTOTAL =3.729 σTOTAL =1.146
µ
σ
µ σ
µ σ
α= 0.05 ρ1storder
ratio,ratingµ=0.7654 (pvalue = 0.0030)
α= 0.05 α= 0.05
ρ1storder
ratio,ratingσ=0.6000 (pvalue = 0.0281)
µP lus = 4.10
µEnd = 3.30
α= 0.05 ρ2ndorder
ratio,ratingµ=0.6690 (p
value = 0.0087)
ρ2ndorder
ratio,ratingσ=0.3357 (pvalue = 0.1433)
µ σ
µ σ
1st
1st
2nd
2nd
Osnabrück, 24.02.2017
n=
1000 p= 0.2H0
α=
0.01
µ
σ
µ
σ
... Concerning the role of emoji in written communication, several topics have been addressed: redundancy and part-of-speech category (Donato and Paggio, 2017), complementary vs text-replacing functions of emoji (Dürscheid and Siever, 2017), emoji as text-replacement and its effect on reading time (Gustafsson, 2017), emoji as semantic primes (Wicke, 2017), among others (Cramer, Juan, and Tetreault, 2016;Herring and Dainas, 2017;Kelly and Watts, 2015). ...
... for luck) or literal translations (e.g. for the action to explode). According to Wicke (2017), these strategies enable one to use the semiotic advantages of emoji. ...
... Wicke and Bolognesi (2020) conduct a user study in which they asked participants to provide semantic representations for a sample of 300 English nouns using emoji, with the goal of identifying which representational strategies are most used to represent abstract and concrete concepts. They use a refined version of the classification of repre-sentational strategies proposed by Wicke (2017): literal, rebus, phonetic similarity and figurative construction. According to their results, figurative construction is the most used strategy (59%), followed by literal (33.91%). ...
Thesis
The visual representation of concepts has been the focus of multiple studies throughout history and is considered to be behind the origin of existing writing systems. Its exploration has led to the development of several visual language systems and is a core part of graphic design assignments, such as icon design. As is the case with problems from other fields, the visual representation of concepts has also been addressed using computational approaches. In this thesis, we focus on the computational generation of visual symbols to represent concepts, specifically through the use of blending. We started by studying aspects related to the transformation mechanisms used in the visual blending process, which led to the proposal of a visual blending taxonomy that can be used in the study and production of visual blends. In addition to the study of visual blending, we conceived and implemented several systems: a system for the automatic generation of visual blends using a descriptive approach, with which we conducted an experiment with three concepts (pig, angel and cactus); a visual blending system based on the combination of emoji, which we called Emojinating; and a system for the generation of flags, which we called Moody Flags. The experimental results obtained through multiple user studies indicate that the systems that we developed are able to represent abstract concepts, which can be useful in ideation activities and for visualisation purposes. Overall, the purpose of our study is to explore how the representation of concepts can be done through visual blending. We established that visual blending should be grounded on the conceptual level, lead- ing to what we refer to as Visual Conceptual Blending. We delineated a roadmap for the implementation of visual conceptual blending and described resources that can help in such a venture, as is the case of a categorisation of emoji oriented towards visual blending.
... The relation of the word "window" and the thing is arbitrary. While in Emoji is vice versa; whereas the iconic and the meaning are linked together, and it is not arbitrary (Wicke, 2017). Robertson et al. (2021) offered the first longitudinal study of how emoji semantics changes over time, using techniques from computational linguistics to six years of Twitter data. ...
Article
Full-text available
Due to the polysemous and ambiguous nature of the emojis, translators encounter difficulties in rendering them into Kurdish. This paper is an attempt to find out the nature and the frequency of the problems related to emojis and suggest more appropriate ways for dealing with them when they are translated into Kurdish. The paper takes up a descriptive-analytic approach. The data of the study is collected primarily from the 'Emoji movie' produced in 2017. The data are then categorized and analyzed thoroughly to explore the underlying factors of these problems and suggest effective strategies for translating them with minimum ambiguity. The results of this study show that polysemous emojis could be disambiguated through the context and other extralinguistic factors such as the setting and the technological background of the translators.
... There are also several systems that focus on generating visual symbols from natural language. Most relevantly, Wicke developed a system that is capable of translating verbal narratives into emoji symbols (Wicke 2017). This is similar to our work in that narratives are told pictorially, not phonetically, but it differs somewhat in that our system strives for a highly abstract artistic representation, more similar to the primitive style of cave paintings than modern emojis. ...
Article
Telling stories is a central part of human culture. The development of computational systems that can understand and respond to human language is an integral part of AI research in general and narrative technologies in particular. In this paper, we describe a system that is able to understand human spoken English sentences and portray that understanding via a semiotic visual language in real-time. We then employ this system in a communal storytelling environment as part of an interactive art installation and create a medium for collaborative creative expression.
... how different emoji renders affect interpretation [28]), role in communication (e.g. studying emoji as semantic primes [41]), similarity (e.g. semantically measuring emoji similarity [3]) and text-to-emoji translation (e.g. ...
Article
The emoji connection between visual representation and semantic knowledge, together with its large conceptual coverage have the potential to be exploited in computational approaches to the visual representation of concepts. An example of a system that explores this potential is Emojinating-a system that uses a process of visual blending of existing emoji to represent concepts. In this paper, we use the Emojinating system as a case study to analyse the appropriateness of visual blending for the visual representation of concepts. We conduct three experiments in which we analyse output quality, type of blend used, usefulness to the user and ease of interpretation. Our main contributions are the following: (i) the production of a double-word concept list for testing the system; (ii) an extensive user study using two different concept lists (single-word and double-word); and (iii) a study that compares produced blends with user drawings.
... Several related works have inspired the methods applied in the proposed translation system. Closely related to our textto-emoji system is the one presented by Wicke (2017). The author creates and evaluates a system that can translate action words into sequences of emoji through the use of vari-Proceedings of the 11th International Conference on Computational Creativity (ICCC'20) ISBN: 978-989-54160-2-8 ous linguistic strategies (metaphor, idioms, rebus etc). ...
Conference Paper
Full-text available
The task of translating text to images holds some valid creative potential and has been the subject of study in Computational Creativity. In this paper, we present preliminary work focused on emoji translation. The work-in-progress system is based on techniques of information retrieval. We compare the performance of our system with three deep learning approaches using a text-to-emoji task. The preliminary results suggest some advantages of using a knowledge-base approach as opposed to a purely data-driven approach. This paper aims to situate the research, underline its relevance and attract valuable feedback for its future development.
... One of the approaches consists in gathering a set of individual graphic elements (either pictures or icons), which work as a translation when put side by side -e.g. translating plot verbs into sequences of emoji [7] or the Emojisaurus platform 1 . ...
Conference Paper
Full-text available
Graphic designers visually represent concepts in several of their daily tasks, such as in icon design. Computational systems can be of help in such tasks by stimulating creativity. However, current computational approaches to concept visual representation lack in effectiveness in promoting the exploration of the space of possible solutions. In this paper, we present an evolutionary approach that combines a standard Evolutionary Algorithm with a method inspired by Estimation of Distribution Algorithms to evolve emoji blends to represent user-introduced concepts. The quality of the developed approach is assessed using two separate user-studies. In comparison to previous approaches, our evolutionary system is able to better explore the search space, obtaining solutions of higher quality in terms of concept representativeness.
... Research on the role of emoji in written communication addresses several topics: e.g. redundancy and part-of-speech category [14], emoji function [15], effect on reading time [19], emoji as semantic primes [33] Fig. 1. Examples of retrieved emoji: existing (E), related (R) and blended (B) others [22,7,20]. ...
Conference Paper
Full-text available
Emoji system does not currently cover all possible concepts. In this paper, we present the platform Emojinating, which has the purpose of fostering creativity and aiding in ideation processes. It lets the user introduce a concept and automatically represents it, by searching for existing emoji and generating novel ones. The system combines the exploration of semantic networks with visual blending, and integrates data from EmojiNet, ConceptNet and Twemoji. To evaluate the system in terms of production efficiency and output quality, we produced emoji for a set of 1509 nouns from the New General Service List. The results show a coverage of 75% of the list.
Article
Full-text available
Literature in the Twitterverse emerges through two cyber-species: the tweeter and the Twitterbot. Both are fighting for relevance in digital creative writing to win the reader’s attention. They tweet, read, and write to each other remotely, and they correct each other as an algorithmic sharpening symbiosis. This essay will analyse the symmetrical linguistics of Twitterbot poetry that incorporates critical code studies of its source code as a subset of technodiscursive analysis to decipher the meanings of the tweet-poems produced by Leonardo Flores’ @Protestitas. This text contextualises itself in a genre of generative electronic literature with multiple interdisciplinary approaches. Our study focuses on this Twitterbot poetry and its relation to socio-technological communication. We also seek to capture the intention of the Twitterary robopoet @Protestitas through a technodiscursive analysis that consists of what we call “Four-Dimensional Analysis (henceforth 4DAs)” of the Twitterbot poetry. How does Twitterbot produce its output as executed codes? How do we read the language of the tweet-poem? Our study seeks to demystify these phenomena in this article. The analysis deciphers Twitterbot poetry in the manner of the rebus, deconstructing the four semantic elements of the ten excerpts of @Protestitas with its source code. We have compared the source codes with the output on the screen. Therein, we discovered that both of them project the same literary spark.
Article
Full-text available
Este artigo reflete sobre a evolução dos emojis, signos visuais que surgem, inicialmente, no âmbito das mensagens de texto mediadas por dispositivos eletrônicos. Para isso, partimos de uma revisão sobre os estudos acadêmicos desses elementos, que possuem origem em campos distintos como comunicação, marketing, linguística, psicologia e semiótica etc., além de frequentes abordagens interdisciplinares. Identificamos que os emojis se inserem num conjunto de elementos que, ao longo do tempo, foram utilizados como recursos expressivos e que evoluem à medida em que transbordam do ambiente digital no qual surgiram e se apresentam como integrantes da cultura contemporânea. Com isso, surgem discussões sobre controle e vigilância na produção dos emojis, que influenciam as práticas de comunicação e consumo marcário.
Article
An increasingly large body of converging evidence supports the idea that the semantic system is distributed across brain areas and that the information encoded therein is multimodal. Within this framework, feature norms are typically used to operationalize the various parts of meaning that contribute to define the distributed nature of conceptual representations. However, such features are typically collected as verbal strings, elicited from participants in experimental settings. If the semantic system is not only distributed (across features) but also multimodal, a cognitively sound theory of semantic representations should take into account different modalities in which feature-based representations are generated, because not all the relevant semantic information may be easily verbalized into classic feature norms, and different types of concepts (e.g., abstract vs. concrete concepts) may consist of different configurations of non-verbal features. In this paper we acknowledge the multimodal nature of conceptual representations and we propose a novel way of collecting non-verbal semantic features. In a crowdsourcing task we asked participants to use emoji to provide semantic representations for a sample of 300 English nouns referring to abstract and concrete concepts, which account for (machine readable) visual features. In a formal content analysis with multiple annotators we then classified the cognitive strategies used by the participants to represent conceptual content through emoji. The main results of our analyses show that abstract (vs. concrete) concepts are characterized by representations that: 1. consist of a larger number of emoji; 2. include more face emoji (expressing emotions); 3. are less stable and less shared among users; 4. use representation strategies based on figurative operations (e.g., metaphors) and strategies that exploit linguistic information (e.g. rebus); 5. correlate less well with the semantic representations emerging from classic features listed through verbal strings.
Conference Paper
Full-text available
Emoji are a contemporary and extremely popular way to enhance electronic communication. Without rigid semantics attached to them, emoji symbols take on different meanings based on the context of a message. Thus, like the word sense disambiguation task in natural language processing, machines also need to disambiguate the meaning or 'sense' of an emoji. In a first step toward achieving this goal, this paper presents EmojiNet, the first machine readable sense inventory for emoji. EmojiNet is a resource enabling systems to link emoji with their context-specific meaning. It is automatically constructed by integrating multiple emoji resources with BabelNet, which is the most comprehensive multilingual sense inventory available to date. The paper discusses its construction, evaluates the automatic resource creation process, and presents a use case where EmojiNet disambiguates emoji usage in tweets. EmojiNet is available online for use at http://emojinet.knoesis.org.
Article
Full-text available
There is a new generation of emoticons, called emojis, increasingly used in mobile communications and social media. In the last two years, over ten billion of emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to a small number of well-known emoticons which carry clear emotional contents, there are hundreds of emojis. But what is their emotional contents? We provide the first emoji sentiment lexicon, called Emoji Sentiment Ranking, and draw a sentiment map of the 751 most frequently used emojis. The sentiment of emojis is computed from the sentiment of tweets in which they occur. We have engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity (negative, neutral, or positive). About 4\% of the annotated tweets contain emojis. The sentiment analysis of emojis yields several interesting conclusions. It turns out that most of the emojis are positive, especially the most popular ones. The sentiment distribution of the tweets with and without emojis is significantly different. The inter-annotator agreement on the tweets with emojis is higher. Emojis tend to occur at the end of the tweets, and their sentiment polarity increases with the distance. We observe no significant differences in emoji rankings between the 13 languages, and propose our Emoji Sentiment Ranking as a European language-independent resource for automated sentiment analysis. Finally, the paper provides a formalization of sentiment and novel visualization in the form of a sentiment bar.
Article
Günter Dreyer’s Umm El-Quaab I—Das prädynastische Königsgrab U-j und seine frühen Schriftzeugnisse presents comprehensively the results of archaeological diggings in the tomb U-j. It also outlines Dreyer’s claim to have discovered the origin of writing. The primary aspect of this review essay is to draw the attention of accounting historians to Dreyer’s book and to the claim therein to have discovered the earliest known writing. Since this discovery is closely connected to an accounting function (though in a somewhat different way from that of the Sumerian proto-cuneiform writing), a review of Dreyer’s book is well justified. Dreyer’s claim is based on a series of small inventory tags (identifying in proto-hieroglyphics the provenance of various commodities) found in the tomb of King Scorpion I (c.3400 B.C. to 3200 B.C.).1 Another aspect of this review is a discussion of the controversy surrounding Dreyer’s claim and the counter-hypothesis of accounting archaeology, which sees in the token- envelop accounting of Mesopotamia the origin of writing.
Article
The literary imagination may take flight on the wings of metaphor, but hard-headed scientists are just as likely as doe-eyed poets to reach for a metaphor when the descriptive need arises. Metaphor is a pervasive aspect of every genre of text and every register of speech, and is as useful for describing the inner workings of a "black hole" (itself a metaphor) as it is the affairs of the human heart. The ubiquity of metaphor in natural language thus poses a significant challenge for Natural Language Processing (NLP) systems and their builders, who cannot afford to wait until the problems of literal language have been solved before turning their attention to figurative phenomena. This book offers a comprehensive approach to the computational treatment of metaphor and its figurative brethren—including simile, analogy, and conceptual blending—that does not shy away from their important cognitive and philosophical dimensions. Veale, Shutova, and Beigman Klebanov approach metaphor from multiple computa...
Article
There is a sociolinguistic interest in studying the social power dynamics that arise on online social networks and how these are reflected in their users' use of language. Online social power prediction can also be used to build tools for marketing and political campaigns that help them build an audience. Existing work has focused on finding correlations between status and linguistic features in email,Wikipedia discussions, and court hearings. While a few studies have tried predicting status on the basis of language on Twitter, they have proved less fruitful. We derive a rich set of features from literature in a variety of disciplines and build classifiers that assign Twitter users to different levels of status based on their language use. Using various metrics such as number of followers and Klout score, we achieve a classification accuracy of individual users as high as 82.4%. In a second step, we reached up to 71.6% accuracy on the task of predicting the more powerful user in a dyadic conversation. We find that the manner in which powerful users write differs from low status users in a number of different ways: not only in the extent to which they deviate from their usual writing habits when conversing with others but also in pronoun use, language complexity, sentiment expression, and emoticon use. By extending our analysis to Facebook, we also assess the generalisability of our results and discuss differences and similarities between these two sites. Copyright © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Article
The Natural Semantic Metalanguage (NSM) approach has a long track record in crosslinguistic lexical semantics (Wierzbicka 1992, 1996, 1999; Goddard 1998, 2006, 2008, 2011; Harkins and Wierzbicka 2001; Goddard and Wierzbicka 2002; Peeters 2006; Gladkova 2010; Ye 2007a, 2007b, 2010; Bromhead 2009, 2011; Wong 2005, 2010; and other works). It is therefore not surprising that it has a clear theoretical position on key issues in lexical semantic typology and a well-developed set of analytical techniques. From a theoretical point of view, the overriding issue concerns the tertium comparationis. What are the optimal concepts and categories to support the systematic investigation of lexicons and lexicological phenomena across the world's languages? To this question, the NSM approach offers the following answer: the necessary concepts can — and must — be based on the shared lexical-conceptual core of all languages, which NSM researchers claim to have discovered over the course of a thirty-five year program of empirical crosslinguistic semantics. This shared lexical-conceptual core is the minilanguage of semantic primes and their associated grammar. In addition, over the past 10 or so years, NSM researchers have developed certain original analytical constructs which promise to enhance the power and systematicity of the approach: in particular, the notions of semantic molecules and semantic templates. This paper sets out to explain and illustrate these notions, to report some key analytical findings (updated, in many cases, from previously published accounts), and to extrapolate their implications for the further development of lexical typology.