Conference Paper

Measurement of Online Discussion Authenticity within Online Social Media

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper, we propose an approach for estimating the authenticity of online discussions based on the similarity of online social media (OSM) accounts participating in the online discussion to known abusers and legitimate accounts. Our method uses similarity functions for the analysis and classification of OSM accounts. The proposed methods are demonstrated using Twitter data collected for this study and a previously published Arabic Honeypot dataset. The data collected during this study includes manually labeled accounts and a ground truth collection of abusers from crowdturfing platforms. Demonstration of the discussion topic's authenticity, derived from account similarity functions, shows that the suggested approach is effective for discriminating between topics that were strongly promoted by abusers and topics that attracted authentic public interest.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Recently, Tacchini et al. [35] identified hoaxes within Facebook based on the users who interacted with these hoaxes rather than the hoaxes' content. Elyashar et al. [20] proposed a method for estimating the authenticity of online discussions based on several similarity functions of OSM accounts participating in an online discussion. They found that the similarity function with the best performance across all of the datasets was the bag-of-words. ...
... Clickbait is responsible of the rapid spread of rumors and misinformation online [16]. This malicious activity is common also among abusers in OSM [20]. Therefore, we can use features that are helpful for abuser detection also for the clickbait detection [29]. ...
Article
Full-text available
In this paper, we propose an approach for the detection of clickbait posts in online social media (OSM). Clickbait posts are short catchy phrases that attract a user's attention to click to an article. The approach is based on a machine learning (ML) classifier capable of distinguishing between clickbait and legitimate posts published in OSM. The suggested classifier is based on a variety of features, including image related features, linguistic analysis, and methods for abuser detection. In order to evaluate our method, we used two datasets provided by Clickbait Challenge 2017. The best performance obtained by the ML classifier was an AUC of 0.8, an accuracy of 0.812, precision of 0.819, and recall of 0.966. In addition, as opposed to previous studies, we found that clickbait post titles are statistically significant shorter than legitimate post titles. Finally, we found that counting the number of formal English words in the given content is useful for clickbait detection.
... For example, in their paper, Zannettou et al. discovered that Russian trolls tend to 1) post longer tweets, 2) be more negative and subjective, and 3) refer to more general topics than random Twitter users. Similarly unaspiring from a linguistic perspective are the results of some other recent studies of state-sponsored and inauthentic internet messages [13][14][15]. ...
Article
Full-text available
Troll internet messages, especially those posted on Twitter, have recently been recognised as a very powerful weapon in hybrid warfare. Hence, an important task for the academic community is to provide a tool for identifying internet troll accounts as quickly as possible. At the same time, this tool must be highly accurate so that its employment will not violate people’s rights and affect the freedom of speech. Though such a task can be effectively fulfilled on purely linguistic grounds, as of yet, very little work has been done that could help to explain the discourse-specific features of this type of writing. In this paper, we suggest a quantitative measure for identifying troll messages which is based on taking into account certain sociolinguistic limitations of troll speech, and discuss two algorithms that both require as few as 50 tweets to establish the true nature of the tweets, whether ‘genuine’ or ‘troll-like’.
... [26] found evidence that socialbots play a key role in the spread of fake news. [27] proposed a method for estimating the authenticity of online discussions based on several similarity functions of OSM accounts participating in the online discussion. They found that the similarity function with the best performance across all the datasets was bag-of-words. ...
Conference Paper
Full-text available
In this paper, we propose an approach for the detection of fake news in online social media (OSM). The approach is based on the authenticity of online discussions published by fake news promoters and legitimate accounts. Authenticity is quantified using a machine learning (ML) classifier that distinguishes between fake news promoters and legitimate accounts. In addition, we introduce novel link prediction features that were shown to be useful for classification. A description of the processes used to divide the dataset into categories representing topics or online discussions and measuring the authenticity of online discussions is provided. We also discuss new data collection methods for OSM, describe the process used to retrieve accounts and their posts in order to train traditional ML classifiers, and present guidelines for manually labeling accounts. The proposed approach is demonstrated using a Twitter pro-ISIS fanboy dataset provided by Kaggle. Our results show that the method can determine a topic's authenticity from fake news promoters, and legitimate accounts. Thus, the suggested approach is effective for discriminating between topics that were strongly promoted by fake news promoters and those that attracted authentic public interest.
Chapter
This paper proposes a machine learning approach to detect clickbait posts published in social media. Clickbait posts are short, catchy phrases pointing into a longer online article. Users are encouraged to click on these posts to read the full article in many cases. The suggested approach differentiates between clickbait and legitimate posts based on training mainstream machine learning (ML) classifiers. The suggested classifiers are trained in various features extracted from images, linguistic, and behavioral analysis. For evaluation, we used two datasets provided by Clickbait Challenge 2017. The XGBoost classifier obtained the best performance with an AUC of 0.8, an accuracy of 0.812, a precision of 0.819, and a recall of 0.966. Finally, we found that counting the number of formal English words in the given content is helpful for clickbait detection.KeywordsClickbait detectionSocial mediaMachine learning
Article
Full-text available
State-sponsored internet trolls repeat themselves in a unique way. They have a small number of messages to convey but they have to do it multiple times. Understandably, they are afraid of being repetitive because that will inevitably lead to their identification as trolls. Hence, their only possible strategy is to keep diluting their target message with ever-changing filler words. That is exactly what makes them so susceptible to automatic detection. One serious challenge to this promising approach is posed by the fact that the same troll-like effect may arise as a result of collaborative repatterning that is not indicative of any malevolent practices in online communication. The current study addresses this issue by analysing more than 180,000 app reviews written in English and Russian and verifying the obtained results in the experimental setting where participants were asked to describe the same picture in two experimental conditions. The main finding of the study is that both observational and experimental samples became less troll-like as the time distance between their elements increased. Their ‘troll coefficient’ calculated as the ratio of the proportion of repeated content words among all content words to the proportion of repeated content word pairs among all content word pairs was found to be a function of time distance between separate individual contributions. These findings definitely render the task of developing efficient linguistic algorithms for internet troll detection more complicated. However, the problem can be alleviated by our ability to predict what the value of the troll coefficient of a certain group of texts would be if it depended solely on these texts’ creation time.
Article
This study seeks to uncover the effects of source and repetition on the illusory truth effect and the dissemination of fake news on social media with an online experiment. This study found that in a personalized source system where trustworthy traditional news sources and personal contacts converged on social media, repetition has a big influence on the trustworthiness of news source and balance of news story. Although most people intend to share real news stories with balance, the illusory truth effect causes mis-judgement, which makes fake news more likely to go viral than real news. The multi-group SEM analysis of the two groups – without source and with source – showed that readers in the no source group rated the effect of repetition on news evaluation as more significant than the with source group. The findings suggest that the effect of source has diminished in the evaluation of news quality. However, sharers on social media are becoming more influential.
Conference Paper
Full-text available
The current study yielded a number of important findings. We built a neural network that achieved an accuracy score of 91% in classifying troll and genuine tweets. By means of regression analysis, we identified a number of features that make a tweet more susceptible to correct labelling and found that they are inherently present in troll tweets as a special type of discourse. We hypothesised that those features are grounded in the sociolinguistic limitations of troll writing, which can be best described as a combination of two factors: speaking with a purpose and trying to mask the purpose of speaking. Next, we contended that the orthogonal nature of these factors must necessarily result in the skewed distribution of language parameters of troll messages. Having chosen as an example distribution of the topics and vocabulary associated with them, we showed some very pronounced distributional anomalies, thus confirming our prediction.
Chapter
Full-text available
The technological advances made in the last twenty years radically changed our society, improving our lifestyle in almost every aspect of our daily life. This change directly affects human habits, transforming the way people share information and knowledge. The exponential technological advancement, together with the related information deluge, are also radically changing Information Warfare and its scenarios. Indeed, the consequently increase of the digital attack surface poses new challenges and threats for both personal and national security.
Conference Paper
Full-text available
Over the past couple of years, anecdotal evidence has emerged linking coordinated campaigns by state-sponsored actors with efforts to manipulate public opinion on the Web, often around major political events, through dedicated accounts, or “trolls.” Although they are often involved in spreading disinformation on social media, there is little understanding of how these trolls operate, what type of content they disseminate, and most importantly their influence on the information ecosystem. In this paper, we shed light on these questions by analyzing 27K tweets posted by 1K Twitter users identified as having ties with Russia’s Internet Research Agency and thus likely state-sponsored trolls. We compare their behavior to a random set of Twitter users, finding interesting differences in terms of the content they disseminate, the evolution of their account, as well as their general behavior and use of Twitter. Then, using Hawkes Processes, we quantify the influence that trolls had on the dissemination of news on social platforms like Twitter, Reddit, and 4chan. Overall, our findings indicate that Russian trolls managed to stay active for long periods of time and to reach a substantial number of Twitter users with their tweets. When looking at their ability of spreading news content and making it viral, however, we find that their effect on social platforms was minor, with the significant exception of news published by the Russian state-sponsored news outlet RT (Russia Today).
Article
Full-text available
Online social media (OSM) has a great influence in todays' world. Some individuals view OSM as fertile ground for abuse and use it to disseminate misinformation, political propaganda, slander competitors, and spread spam. The crowdturfing industry employs large numbers of bots and human workers to manipulate OSM and misrepresent public opinion. The detection of online discussion topics manipulated by OSM abusers is an emerging problem attracting significant attention. In this paper we propose an approach for quantifying the authenticity of online discussions based on the similarity of OSM accounts participating in the discussion to known abusers and legitimate accounts. Our method uses multiple similarity functions for the analysis and classification of OSM accounts. The proposed methods are demonstrated using Twitter data collected for this study and previously published Arabic Honeypots data. The former includes manually labeled accounts and abusers who participated in crowdturfing platforms. Demonstration of the topic's authenticity, derived from account similarity functions, shows that the suggested approach is effective for discriminating between topics that were strongly promoted by abusers and topics that attracted authentic public interest.
Article
Full-text available
Popular Internet services in recent years have shown that remarkable things can be achieved by harnessing the power of the masses using crowd-sourcing systems. However, crowd-sourcing systems can also pose a real challenge to existing security mechanisms deployed to protect Internet services. Many of these techniques make the assumption that malicious activity is generated automatically by machines, and perform poorly or fail if users can be organized to perform malicious tasks using crowd-sourcing systems. Through measurements, we have found surprising evidence showing that not only do malicious crowd-sourcing systems exist, but they are rapidly growing in both user base and total revenue. In this paper, we describe a significant effort to study and understand these "crowdturfing" systems in today's Internet. We use detailed crawls to extract data about the size and operational structure of these crowdturfing systems. We analyze details of campaigns offered and performed in these sites, and evaluate their end-to-end effectiveness by running active, non-malicious campaigns of our own. Finally, we study and compare the source of workers on crowdturfing sites in different countries. Our results suggest that campaigns on these systems are highly effective at reaching users, and their continuing growth poses a concrete threat to online communities such as social networks, both in the US and elsewhere.
Conference Paper
Full-text available
In handwritten character recognition, the rejection of extraneous patterns, like image noise, strokes or corrections, can improve significantly the practical usefulness of a system. In this paper a combination of two confidence measures defined for a k-nearest neighbors (NN) classifier is proposed. Experiments are presented comparing the performance of the same system with and without the new rejection rules.
Article
Full-text available
In Handwritten Character Recognition, the rejection of extraneous patterns, like image noise, strokes or corrections, can improve significantly the practical usefulness of a system. In this paper, a combination of two confidence measures defined for a k-nearest neighbors classifier is proposed. Experiments are presented comparing the performance of the same system with and without the new rejection rules.
Article
Crowdturfing has recently been identified as a sinister counterpart to the enormous positive opportunities of crowdsourcing. Crowdturfers leverage human-powered crowdsourcing platforms to spread malicious URLs in social media, form "astroturf" campaigns, and manipulate search engines, ultimately degrading the quality of online information and threatening the usefulness of these systems. In this paper we present a framework for "pulling back the curtain" on crowdturfers to reveal their underlying ecosystem. Concretely, we analyze the types of malicious tasks and the properties of requesters and workers in crowdsourcing sites such as Microworkers.com, ShortTask.com and Rapidworkers.com, and link these tasks (and their associated workers) on crowdsourcing sites to social media, by monitoring the activities of social media participants. Based on this linkage, we identify the relationship structure connecting these workers in social media, which can reveal the implicit power structure of crowdturfers identified on crowdsourcing sites. We identify three classes of crowdturfers -- professional workers, casual workers, and middlemen -- and we develop statistical user models to automatically differentiate these workers and regular social media users.