Conference PaperPDF Available

Sarcasm detection on Facebook: a supervised learning approach

Authors:

Abstract and Figures

Sarcasm is a common feature of user interaction on social networking sites. Sarcasm differs with typical communication in alignment of literal meaning with intended meaning. Humans can recognize sarcasm from sufficient context information including from the various contents available on SNS. Existing literature mainly uses text data to detect sarcasm; though, a few recent studies propose to use image data. To date, no study has focused on user interaction pattern as a source of context information for detecting sarcasm. In this paper, we present a supervised machine learning based approach focusing on both contents of posts (e.g., text, image) and users' interaction on those posts on Facebook.
Content may be subject to copyright.
Sarcasm Detection on Facebook: A
Supervised Learning Approach
Dipto Das
Anthony J. Clark
Missouri State University
Springfield, Missouri, USA
dipto175@live.missouristate.edu
anthonyclark@missouristate.edu
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from permissions@acm.org.
ACM.
ICMI’18 Adjunct ,, October 16–20, 2018, Boulder, CO, USA
ACM 978-1-4503-6002-9/18/10.
https://doi.org/10.1145/3281151.3281154
Abstract
Sarcasm is a common feature of user interaction on social
networking sites. Sarcasm differs with typical communica-
tion in alignment of literal meaning with intended meaning.
Humans can recognize sarcasm from sufficient context in-
formation including from the various contents available on
SNS. Existing literature mainly uses text data to detect sar-
casm; though, a few recent studies propose to use image
data. To date, no study has focused on user interaction
pattern as a source of context information for detecting
sarcasm. In this paper, we present a supervised machine
learning based approach focusing on both contents of posts
(e.g., text, image) and users’ interaction on those posts on
Facebook.
Author Keywords
Sarcasm; Sentiment; Text; Image; Facebook; Supervised
Learning
ACM Classification Keywords
H.5.m [Information interfaces and presentation (e.g., HCI)]:
Miscellaneous
Introduction
Social networking sites (SNS) are a major medium of com-
munication. People assess the sentiment of contents shared
on SNS by considering all of the various aspects of those
contents together. A good amount of studies have focused
on sentiment analysis on SNS. However, sarcasm is often
hard to detect because people convey negative sentiments
using seemingly positive words and vice-versa. Thus, it has
Terminologies
Description: Posts shared on
SNS may include a short ex-
planation about the contents’
gist, source, and audience.
Message: The user who
posts content on SNS may
choose to associate the con-
tent with a message written
by him/her. He/She expresses
what he/she thinks about the
post, describes the content in
detail if necessary, mentions
various topics regarding the
content like place, persons,
feelings, etc.
Reactions: Almost all SNSs
provide users with some ways
to react to contents on those
platforms. Some SNSs pro-
vide a like/star based system
(e.g. Instagram, Twitter),
and some provides upvote-
downvote based system (e.g.
Reddit, Quora). Since Febru-
ary 24, 2016, Facebook has
supported a six-types-of-
emotions (like, love, haha,
wow, sad, angry) based react
system.
not gained as much attention as straightforward positive-
negative sentiment analysis. Most existing works depend
only on text data for detecting sarcasm. A few recent works
propose that multimedia contents (e.g., image) shared with
the posts can be useful to detect sarcasm [3, 9]. Previous
works have identified the importance of context informa-
tion for detecting sarcasm [2]. However, to the best of our
knowledge, no study has taken user interaction on a post as
an indicator of detecting sarcasm. We propose that users’
interaction on a post can be helpful to understand the con-
text and thus can help to detect sarcasm. Here, by user
interaction we mean the way they react to a post on SNS
with reaction buttons or with comments. For testing our hy-
pothesis, we needed multimodal SNS data. We take text,
image, and user interaction into consideration. We devel-
oped a supervised learning model that can detect sarcasm
on Facebook (FB) data with 93% accuracy. The major con-
tributions of this work are: considering user interaction as
an indicator of sarcasm, and a supervised learning model
for detecting sarcasm.
Related Works
Tepperman et. al. [11] first worked to address the problem
of detecting sarcasm on social media. They proposed ex-
periments to recognize sarcasm using contextual, prosodic
cues, but given the limited capability of NLP at that time,
they took a naïve approach–detecting sarcasm based on
use of the phrase “yeah, right”. Later works on sarcasm
detection by Filatova et. al. [4], and Bamman et. al. [2] em-
phasized on context for detecting sarcasm. In their works,
Filatova et. al. [4] and Riloff et. al. [8] indicated that the
contrast between positive and negative sentiment can be a
sign of sarcasm in twitter. According to them, presence of
positive and negative sentiment yielding words or phrases
in a tweet can denote the that tweet to be sarcastic. The
works by Das et.al. [3], and Schifanella et. al. [9] empha-
sized the importance of considering images besides text
to detect sarcasm. Schifanella et. al. [9] suggests that the
difference between sentiments yielded by the caption and
the image is an indicator of sarcasm. According to Das et.
al. [3], even without captions, images alone can convey sar-
castic cues, and they proposed a CNN-based model for
detecting sarcasm on Flickr. However, no works so far have
focused on users’ interaction around a particular content
to detect sarcasm. In this study, we considered several as-
pects of SNS posts to detect sarcasm on a popular social
networking site Facebook: the sentiment of posts, and the
nature of user interaction around those posts. Users might
chose to interact with text, images or use of reaction but-
tons.
Data
Previous works to detect sarcasm employed various meth-
ods for data collection. Some researchers used hashtags
(e.g., #sarcasm) as indicator of sarcasm on twitter, some
used context-based approach for identifying sarcastic con-
tents on SNS [5]. We used the Facebook Graph API to col-
lect data. Our data was collected after Facebook adopted
the GDPR guidelines. For collecting SNS posts with sar-
castic intents (i.e. positive instances) and posts with non-
sarcastic intents, we selected ten public sarcasm related
pages (e.g., Sarcasm Society) and verified FB pages of
ten popular mainstream news media (e.g. The New York
Times) on FB that have at least one million followers. We
collected the description, message, image (if any), reac-
tions, and comments (without users’ identifying informa-
tion). We only collected contents posted on FB with ‘public’
privacy setting. Since, reactions are a relatively new fea-
tures, introduced on February 24, 2016, we chose to collect
contents posted after February 2016. In total, we collected
20,120 posts of sarcasm category (48.65%) and 21,230
posts of non-sarcasm category (51.35%). Among the posts
we collected, 98.26% posts include an image.
Experiment
We have three types of data: numeric, text, and image. De-
scriptions, messages, and comments are text data (1, 3, 5
in Figure 1). For most posts, there is an associated image
(2 in Figure 1). Each post also has a count for six types of
reactions (4 in Figure 1). The message, image or any other
part of the post might not be sarcastic on its own but they
altogether might convey sarcasm. In this study, we are try-
ing to detect whether a post as a whole conveys sarcasm.
Figure 1: Sample of a Facebook
post. (1) Message of the post;
(2) Image of the post;
(3) Description of the post;
(4) Count of users’ reactions to the
post; (5) Users’ comments on the
post
Reaction Data Pre-process
The six reaction counts on posts are the only numeric input
data. They were first introduced on February 24, 2016. We
considered the rest of that month as a burn-in period for the
users to get familiar with these because many users might
kept using the reaction buttons and that could harm the pat-
tern of their usage in updated platform. Another concern is
that, the reactions received on a post varies with how much
reach (i.e. to how many users FB showed that post) a post
receives on FB. Since the algorithm FB uses to arrange
users’ newsfeed is not known, we chose to use normaliza-
tion. We divided the number of each reaction by the total
number of received reactions on each post to remove the
bias created by posts’ reach.
Sentiment Analysis
For textual sentiment analysis, we considered two proper-
ties: subjectivity and polarity. Many existing works report
context and sentiment as useful sources of sarcasm detec-
tion [4, 2, 8]. Subjectivity means the amount of expression
of a user’s sentiment, feelings, or opinion in a piece of text.
Polarity denotes whether the text yields positive or nega-
tive sentiment. We used TextBlob [6] for determining sub-
jectivity and polarity. Subjectivity is measured in scale of
[0, 1] and polarity is measured in scale of [-1, 1]. Text with
subjectivity near zero does not convey much information
about a user’s feelings (e.g. names that are tagged in text).
A polarity value less less than zero indicates negative sen-
timent, while polarity greater than zero means that a user
expresses positive sentiment with that piece of text.
Though a post can have more than one comment, it can
have at most one description, message, and caption of
image. For the latter three data, we determined the sub-
jectivity and polarity of the text. Thus, we get six sentiment
based features from textual data. However, since a post
can have multiple comments, we modified the technique
for sentiment analysis. For each comment, we calculated
the subjectivity, positive sentiment (if polarity>0), and neg-
ative sentiment (if polarity<0). We used sum of subjectivity
scores, sum of all positive sentiment scores, and sum of all
negative sentiment scores of all comments as three individ-
ual features.
Image Caption Generation model
Schifanella et. al. [9] explored the importance of consider-
ing visual and textual aspects of SNS contents. They used
semantic representations of the images. However, we ar-
gue that captions of images can provide semantic represen-
tations and hint about the sentiment expressed by an image
at the same time and thus provides more useful information
for detecting sarcasm. For automatically captioning images,
we used the image captioning model proposed by Vinyals
et. al. [12]. Each image now has a model-generated cap-
tion. Besides, it also might have a user-given caption.
Image Sarcasm Detection Model
We used a CNN-based model proposed by Das et. al. [3]
that can detect sarcasm with 84% accuracy from images
based on the visual cues. If an SNS post does not have a
description or a message associated with it, the image is
the only medium for knowing if the post has sarcastic intent.
We pass the image of each post to this component and it
outputs the probability (we call CNN score) of this image to
have sarcastic cues in it.
Model Training
From the collected dataset, we constructed the 16 features
listed in Table 1. We used scikit-learn [7] for machine learn-
ing algorithms. For missing values of any feature (e.g., cap-
tion subjectivity, caption polarity, CNN score if there is no
image with a post), we used the average value of that fea-
ture as the representing value. We used 10-fold cross val-
idation approach for validating our models. We used five
supervised machine learning algorithms as follows: support
vector machine (SVM) with linear kernel, two ensemble al-
gorithms: Adaboost with Decision Tree classifier of depth
1, and Random Forest with scikit-learn’s default parameter
values, Multi Layer Perceptron (MLP), and Gaussian Naïve
Bayes.
Result
Feature Information Gain
Reaction a
angry 0.3217
haha 0.4904
like 0.5534
love 0.4275
sad 0.3328
wow 0.4493
Image Data
auto caption
polarity 0.0174
subjectivity 0.0173
CNN score 0.0263
Text Data
comments
negativity 0.2503
positivity 0.4185
subjectivity 0.4626
description
polarity 0.0237
subjectivity 0.0253
message
polarity 0.1825
subjectivity 0.2044
Table 1: Information Gain of
Features
aSpecific to Facebook platform
Among all features, only reaction counts (like, love, haha,
wow, sad, angry) are specific to FB. The other ten features
are general to any SNS plaform. Table 1 shows the Entropy
(a measurement of impurity) based information gain (reduc-
tion of entropy by using a particular feature) of our features.
Information gain can be used to rank the features, higher in-
formation gain indicates that a feature will be more useful to
machine learning algorithms [10, 1]. In Table 2, we present
accuracy results for several different classifiers; stochastic
algorithms were repeated 25 times.
Algorithm SVM Ada
Boost
Random
Forest
MLP Gaussian
NB
Acc. ±
S.D.
88.39±
0.0
90.61±
0.0
93.11±
0.196
92.06±
0.190
73.66±
0.0
Table 2: Applied ML Algorithms, Accuracies with Std. Deviation
In our study, we used a bag-of-features approach. Each
feature we used can be used to build a weak classifier for
sarcasm detection. Therefore, it was expected that en-
semble approach combining these features will be a good
classifier. We can see that both ensemble algorithms we
used—Random Forest and AdaBoost performed very well
for sarcasm detection. MLP-based model even with a small
number of layers and nodes also performed well (>90% ac-
curacy). SVM-based model’s performance was not as good
as ensemble models. Again, Naïve Bayes (NB) algorithms
are widely used for text, sentiment data analysis. Since the
features we are considering have continuous values, we
chose to use Gaussian NB. It is surprising that Gaussian
NB’s performance was worse than that of other models.
Conclusion
In this extended abstract, we presented our findings for de-
tecting sarcasm on social media using supervised learning
algorithms on a noisy dataset (the data could include du-
plicate images and spam messages). Our results show
that supervised learning algorithms, especially ensem-
ble algorithms, are good fit for such applications. As part
of our continuing work, we will study the impact of spam
messages and duplicated data on the accuracy of sarcasm
detection. Additionally, we are working to generalize our
methods so that they work on other SNSs.
REFERENCES
1. Taqwa Ahmed Alhaj, Maheyzah Md Siraj, Anazida
Zainal, Huwaida Tagelsir Elshoush, and Fatin Elhaj.
2016. Feature selection using information gain for
improved structural-based alert correlation. PloS one
11, 11 (2016), e0166017.
2. David Bamman and Noah A Smith. 2015.
Contextualized Sarcasm Detection on Twitter. In
International AAAI Conference on Web and Social
Media (ICWSM). 574–577.
3. Dipto Das and Anthony J Clark. 2018. Sarcasm
Detection on Flickr Using a CNN. In International
Conference on Computing and Big Data (ICCBD).
4. Elena Filatova. 2012. Irony and Sarcasm: Corpus
Generation and Analysis Using Crowdsourcing. In
International Conference on Language Resources and
Evaluation (LREC). Citeseer, 392–398.
5. Roberto González-Ibánez, Smaranda Muresan, and
Nina Wacholder. 2011. Identifying sarcasm in Twitter: a
closer look. In Proceedings of the 49th Annual Meeting
of the Association for Computational Linguistics:
Human Language Technologies: Short Papers-Volume
2. Association for Computational Linguistics, 581–586.
6. Steven Loria, P Keen, M Honnibal, R Yankovsky, D
Karesh, E Dempsey, and others. 2014. Textblob:
simplified text processing. Secondary TextBlob:
Simplified Text Processing (2014).
7. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.
Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R.
Weiss, V. Dubourg, J. Vanderplas, A. Passos, D.
Cournapeau, M. Brucher, M. Perrot, and E.
Duchesnay. 2011. Scikit-learn: Machine Learning in
Python. Journal of Machine Learning Research 12
(2011), 2825–2830.
8. Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra
De Silva, Nathan Gilbert, and Ruihong Huang. 2013.
Sarcasm as contrast between a positive sentiment and
negative situation. In Proceedings of the 2013
Conference on Empirical Methods in Natural Language
Processing. 704–714.
9. Rossano Schifanella, Paloma de Juan, Joel Tetreault,
and Liangliang Cao. 2016. Detecting sarcasm in
multimodal social platforms. In Proceedings of the 2016
ACM on Multimedia Conference. ACM, 1136–1145.
10. Bangsheng Sui. 2013. Information gain feature
selection based on feature interactions. Ph.D.
Dissertation.
11. Joseph Tepperman, David Traum, and Shrikanth
Narayanan. 2006. " Yeah Right": Sarcasm Recognition
for Spoken Dialogue Systems. In Ninth International
Conference on Spoken Language Processing.
12. Oriol Vinyals, Alexander Toshev, Samy Bengio, and
Dumitru Erhan. 2015. Show and tell: A neural image
caption generator. In Proceedings of the IEEE
conference on computer vision and pattern recognition.
3156–3164.
... They performed sarcasm recognition by concatenating image and text features that were either manually handcrafted or deep-learning-based. Using a combination of text, numeric, and visual information taken from Facebook posts, Das and Clark [34] trained multiple ML classifiers to identify sarcasm. Sangwan et al. [5] created two multimodal sarcasm recognition datasets: the Silver-Standard dataset, consisting of 10K sarcastic and non-sarcastic Instagram posts each, categorized on the basis of hashtags, and the Gold-Standard dataset, which includes 1600 randomly selected sarcastic posts from the first dataset and annotated first using only the text modality, and then reannotated using both modalities. ...
Preprint
Full-text available
Sarcasm is a type of irony, characterized by an inherent mismatch between the literal interpretation and the intended connotation. Though sarcasm detection in text has been extensively studied, there are situations in which textual input alone might be insufficient to perceive sarcasm. The inclusion of additional contextual cues, such as images, is essential to recognize sarcasm in social media data effectively. This study presents a novel framework for multimodal sarcasm detection that can process input triplets. Two components of these triplets comprise the input text and its associated image, as provided in the datasets. Additionally, a supplementary modality is introduced in the form of descriptive image captions. The motivation behind incorporating this visual semantic representation is to more accurately capture the discrepancies between the textual and visual content, which are fundamental to the sarcasm detection task. The primary contributions of this study are: (1) a robust textual feature extraction branch that utilizes a cross-lingual language model; (2) a visual feature extraction branch that incorporates a self-regulated residual ConvNet integrated with a lightweight spatially aware attention module; (3) an additional modality in the form of image captions generated using an encoder-decoder architecture capable of reading text embedded in images; (4) distinct attention modules to effectively identify the incongruities between the text and two levels of image representations; (5) multi-level cross-domain semantic incongruity representation achieved through feature fusion. Compared with cutting-edge baselines, the proposed model achieves the best accuracy of 92.89% and 64.48%, respectively, on the Twitter multimodal sarcasm and MultiBully datasets.
... The cues for a humorous disposition include status updates on relational talk, humor in profile pictures, humor in quotes, number of times online friends "liked" status updates, and number of friends who commented on status updates (Pennington & Hall, 2014). Conversely, sarcasm in language is also detected in Facebook, which can be studied through a supervised machine learning-based approach that focuses on the contents of posts and users' interaction on those posts on Facebook (Das & Clark, 2018). Aside from humor, Facebook posts can also be a venue in expressing mourning and grief. ...
Chapter
This article in the Social Media and Individual Differences contains a discussion of textual analysis of social media posts of teenagers in the Philippines, focusing on the sociolinguistic aspect of language. This study reflects on how teens assert themselves in the online world through sharing their thoughts and feelings in forms they believe may best represent their ideas. Social media creates a new platform where users become creative in the use of language that may also give way to the birth of online language culture. The emerging language forms used in social media will be discussed in this chapter. This includes the practice of using hashtags to make ideas more discoverable online. There are symbolisms and acronyms incorporated in the posts that represent new ideas. Social media also paved way for the development of online jargons that may eventually become mainstream language. Moreover, this chapter discusses how users alter the spelling of words and use of code-switching as an initiation of style of writing verbalized feelings that suggest tone and feelings attached to it. Other ways of using language in communicating one’s identity online will also be explained.
... They constructed their own Word2Vec model and used the CNN model to obtain an accuracy of 96.4%, although the dataset included insufficient data. [24] identified sarcasm in 41350 Facebook postings by taking into account public replies, interactive comments, and image attachments. To identify sarcasm from photos, they used machine learning techniques and a CNN-based model. ...
Article
Full-text available
Sarcasm detection research in Bengali is still limited due to a lack of relevant resources. In this context, getting high-quality annotated data is costly and time-consuming. Therefore, in this paper, we present a transformer-based generative adversarial learning for sarcasm detection from Bengali text based on available limited labeled data. Here, we use the Bengali sarcasm dataset 'Ben-Sarc'. Besides, we construct another dataset containing Bengali sarcastic and non-sarcastic comments from YouTube and newspapers to observe the model's performance on the new dataset. On top of that, we utilize another Bengali sarcasm dataset 'BanglaSarc' to further prove our models' robustness. Among all models, the Bangla BERT-based Generative Adversarial Model has achieved the highest accuracy with 77.1% for the 'Ben-Sarc' dataset. Besides, this model has achieved the highest accuracy of 68.2% for the dataset constructed from YouTube and newspaper, and 97.2% for the 'BanglaSarc' dataset.
... Sarcasm Detection on Facebook: A Supervised Learning Approach ( Das & Clark, 2018a ) The use of user interaction pattern as a source of context information for detecting sarcasm. A supervised machine learning based approach focusing on both contents of posts (e.g., text, image) and users' interaction on those posts Facebook Sarcasm Detection in News Headlines using Voted Classification (see Fig. 3 ) ( Bharti et al., 2022 ) This paper deals particularly with sarcasm detection in News Headlines. ...
... A work from 2018 [16] proposes a supervised learning approach to detect sarcasm from Facebook posts and images. Also, they cultured the comments and reply-based threads among the users for a better understanding of it. ...
Preprint
Full-text available
Sarcasm is a delicate form of expressing any opinions in a facetious way. The advent of communication using social networks has mass-produced the new avenues of socialization. It can be further said that humor, irony, sarcasm, and wit are the four chariots of being socially funny in the modern day. Among these, sarcasm is a sophisticated way of wrapping any immanent truth, message, or even mockery within a hilarious manner. In the present work, we manually extract the features of a benchmark pop culture sarcasm corpus containing sarcastic dialogues and monologues, to generate padding sequences from the matrices formed of the vector representations. We further propose an amalgamation of four Parallel deep Long Short Term Networks (pLSTM), each with a distinctive activation classifier. These modules are primarily aimed for successfully detecting the sarcasm from the text corpus through the training phase. Not limited to only that, besides generic validation testing, we also mimic the human-alike statement successions initiating from random user input seed words to produce auto-generated sarcastic dialogues, which have semantic significance in an understandable sense. Our proposed model for detecting sarcasm peaks with a training accuracy of 98.95% when trained with the discussed dataset. Consecutively it achieves 98.31% accuracy among the test cases on open source humor English literature. Our approach transcends several previous state-of-the-art works and results in sophisticated sarcastic statement generation. We also culture the probable prospects for producing even better refined automated sarcasm generation.
Article
As social media platforms evolve from text-based forums into multi-modal environments, the nature of misinformation in social media is also transforming accordingly. Taking advantage of the fact that visual modalities such as images and videos are more favorable and attractive to users, and textual content is sometimes skimmed carelessly, misinformation spreaders have recently targeted contextual connections between the modalities, e.g., text and image. Hence, many researchers have developed automatic techniques for detecting possible cross-modal discordance in web-based content. We analyze, categorize, and identify existing approaches in addition to the challenges and shortcomings they face in order to unearth new research opportunities in the field of multi-modal misinformation detection.
Article
Full-text available
Sarcasm detection research in the Bengali language so far can be considered to be narrow due to the unavailability of resources. In this paper, we introduce a large-scale self-annotated Bengali corpus for sarcasm detection research problem in the Bengali language named 'Ben-Sarc' containing 25,636 comments, manually collected from different public Facebook pages and evaluated by external evaluators. Then we present a complete strategy to utilize different models of traditional machine learning, deep learning, and transfer learning to detect sarcasm from text using the Ben-Sarc corpus. Finally, we demonstrate a comparison between the performance of traditional machine learning, deep learning, and transfer learning models on our Ben-Sarc corpus. Transfer learning using Indic-Transformers Bengali Bidirectional Encoder Representations from Transformers as a pre-trained source model has achieved the highest accuracy of 75.05%. The second-highest accuracy is obtained by the long short-term memory model with 72.48% and Multinomial Naive Bayes is acquired the third highest with 72.36% accuracy for deep learning and machine learning, respectively. The Ben-Sarc corpus is made publicly available in the hope of advancing the Bengali Natural Language Processing Community. The Ben-Sarc is available at https://github.com/sanzanalora/Ben-Sarc.
Article
Full-text available
In Turkey, the widespread sharing of false information about refugees on social media is one of the main reasons behind the ever-increasing antirefugee sentiment. In this study, with a reference to critical discourse studies and by adopting a theoretical approach on indirect violence, 1,000 user comments made under the most viral false claims on Facebook about Syrian refugees during the election years of 2018-19 were analyzed. Together with comments that are classified under categories of hate speech and discriminative rhetoric, a significant number of sarcastic comments were also detected by critical discourse analysis. It is observed that sarcasm is instrumentalized to reproduce superiority over refugees as well as criticizing official policies. Such comments cannot simply be considered as "innocent jokes" and are indeed part of a vicious cycle of violence and function like hate speech and discriminatory rhetoric. On June 29, 2019, an angry mob demolished and vandalized shops and apartments belonging to Syrian refugees in İkitelli, a working-class neighborhood of Istanbul. The false claim that mobilized them was an alleged harassment of a 12-year-old Turkish child by another child; a 12-year-old Syrian refugee. Numerous shops and houses were destroyed, and riot police dispersed the mob, breaking the glass and signboards with sticks and stones, by using water cannons and tear-gas. Some refugees who were terrorized by this night attack left the neighborhood, and 16 people were detained in the coming days. The police discovered that the unfounded harassment claim was first raised in a WhatsApp group called "Ikitelli Youth Movement," which had 58 only members. However, it was also found that the claims were later spread by other groups in different social media platforms, such as Twitter and Facebook, with the hashtags "#Idon'tWantSyriansInMyCountry" and "#SyriansGoAway!" This was one of the first social media powered mob attacks in Turkey against the refugee population ("İkitelli'de yaşanan gerilim," 2019; "Küçükçekmece'de mültecilere saldırı," 2019; "Uluslararası mülteci hakları derneği," 2019).
Conference Paper
Full-text available
Sarcasm is an important aspect of human communication. However, it is often difficult to detect or understand this sentiment because the literal meaning conveyed in communication is opposite of the intended meaning. Though the field of sentiment analysis is well studied, sarcasm has often been ignored by the research community. So far, to detect sarcasm on social media, studies have largely focused upon textual features. However, visual cues are an important part of sarcasm. In this paper, we present a convolutional neural network based model for detecting sarcasm based on images shared on a popular social photo sharing site, Flickr.
Article
Full-text available
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU score improvements on Flickr30k, from 55 to 66, and on SBU, from 19 to 27.
Article
Full-text available
Grouping and clustering alerts for intrusion detection based on the similarity of features is referred to as structurally base alert correlation and can discover a list of attack steps. Previous researchers selected different features and data sources manually based on their knowledge and experience, which lead to the less accurate identification of attack steps and inconsistent performance of clustering accuracy. Furthermore, the existing alert correlation systems deal with a huge amount of data that contains null values, incomplete information, and irrelevant features causing the analysis of the alerts to be tedious, time-consuming and error-prone. Therefore, this paper focuses on selecting accurate and significant features of alerts that are appropriate to represent the attack steps, thus, enhancing the structural-based alert correlation model. A two-tier feature selection method is proposed to obtain the significant features. The first tier aims at ranking the subset of features based on high information gain entropy in decreasing order. The‏ second tier extends additional features with a better discriminative ability than the initially ranked features. Performance analysis results show the significance of the selected features in terms of the clustering accuracy using 2000 DARPA intrusion detection scenario-specific dataset.
Article
Full-text available
Sarcasm is a peculiar form of sentiment expression, where the surface sentiment differs from the implied sentiment. The detection of sarcasm in social media platforms has been applied in the past mainly to textual utterances where lexical indicators (such as interjections and intensifiers), linguistic markers, and contextual information (such as user profiles, or past conversations) were used to detect the sarcastic tone. However, modern social media platforms allow to create multimodal messages where audiovisual content is integrated with the text, making the analysis of a mode in isolation partial. In our work, we first study the relationship between the textual and visual aspects in multimodal posts from three major social media platforms, i.e., Instagram, Tumblr and Twitter, and we run a crowdsourcing task to quantify the extent to which images are perceived as necessary by human annotators. Moreover, we propose two different computational frameworks to detect sarcasm that integrate the textual and visual modalities. The first approach exploits visual semantics trained on an external dataset, and concatenates the semantics features with state-of-the-art textual features. The second method adapts a visual neural network initialized with parameters trained on ImageNet to multimodal sarcastic posts. Results show the positive effect of combining modalities for the detection of sarcasm across platforms and methods.
Conference Paper
Full-text available
The robust understanding of sarcasm in a spoken dialogue system requires a reformulation of the dialogue manager's basic assumptions behind, for example, user behavior and grounding strategies. But automatically detecting a sarcastic tone of voice is not a simple matter. This paper presents some experiments toward sarcasm recognition using prosodic, spectral, and contextual cues. Our results demonstrate that spectral and contextual features can be used to detect sarcasm as well as a human annotator would, and confirm a long-held claim in the field of psychology - that prosody alone is not sufficient to discern whether a speaker is being sarcastic. Index Terms: dialogue, user modeling, sarcasm, speech acts
Conference Paper
Full-text available
Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine learning effectiveness for identifying sarcastic utterances and we compare the performance of machine learning techniques and human judges on this task. Perhaps unsurprisingly, neither the human judges nor the machine learning techniques perform very well.
Article
Sarcasm requires some shared knowledge between speaker and audience; it is a profoundly contextual phenomenon. Most computational approaches to sarcasm detection, however, treat it as a purely linguistic matter, using information such as lexical cues and their corresponding sentiment as predictive features. We show that by including extra-linguistic information from the context of an utterance on Twitter — such as properties of the author, the audience and the immediate communicative environment — we are able to achieve gains in accuracy compared to purely linguistic features in the detection of this complex phenomenon, while also shedding light on features of interpersonal interaction that enable sarcasm in conversation.
Article
The ability to reliably identify sarcasm and irony in text can improve the performance of many Natural Language Processing (NLP) systems including summarization, sentiment analysis, etc. The existing sarcasm detection systems have focused on identifying sarcasm on a sentence level or for a specific phrase. However, often it is impossible to identify a sentence containing sarcasm without knowing the context. In this paper we describe a corpus generation experiment where we collect regular and sarcastic Amazon product reviews. We perform qualitative and quantitative analysis of the corpus. The resulting corpus can be used for identifying sarcasm on two levels: a document and a text utterance (where a text utterance can be as short as a sentence and as long as a whole document).