Conference PaperPDF Available

Using Behavior and Text Analysis to Detect Propagandists and Misinformers on Twitter

Authors:

Abstract and Figures

There are organized groups that disseminate similar messages in online forums and social media; they respond to real-time events or as persistent policy, and operate with state-level or organizational funding. Identifying these groups is of vital importance for preventing distribution of sponsored propaganda and misinformation. This paper presents an unsupervised approach using behavioral and text analysis of users and messages to identify groups of users who abuse the Twitter micro-blogging service to disseminate propaganda and misinformation. Groups of users who frequently post strikingly similar content at different times are identified through repeated clustering and frequent itemset mining, with the lack of credibility of their content validated through human assessment. This paper introduces a case study into automatic identification of propagandists and misinformers in social media.
Content may be subject to copyright.
Using Behavior and Text Analysis
to Detect Propagandists
and Misinformers on Twitter
Michael Orlov1(B
)and Marina Litvak2(B
)
1NoExec, Beer Sheva, Israel
orlovm@noexec.org
2Shamoon College of Engineering, Beer Sheva, Israel
marinal@ac.sce.ac.il
Abstract. There are organized groups that disseminate similar mes-
sages in online forums and social media; they respond to real-time events
or as persistent policy, and operate with state-level or organizational
funding. Identifying these groups is of vital importance for preventing
distribution of sponsored propaganda and misinformation. This paper
presents an unsupervised approach using behavioral and text analysis of
users and messages to identify groups of users who abuse the Twitter
micro-blogging service to disseminate propaganda and misinformation.
Groups of users who frequently post strikingly similar content at differ-
ent times are identified through repeated clustering and frequent itemset
mining, with the lack of credibility of their content validated through
human assessment. This paper introduces a case study into automatic
identification of propagandists and misinformers in social media.
Keywords: Propaganda ·Misinformation ·Social networks
1 Introduction
The ever-growing popularity of social networks influences everyday life, causing
us to rely on other people’s opinions when making large and small decisions,
from the purchase of new products online to voting for a new government. It
is not surprising that by spreading disinformation and misinformation social
media became a weapon of choice for manipulating public opinion. Fake content
and propaganda are rampant on social media and must be detected and filtered
out. The problem of information validity in social media has gained significant
traction in recent years, culminating in large-scale efforts by the research com-
munity to deal with “fake news” [7], clickbait [6], “fake reviews” [2], rumors [8],
and other kinds of misinformation.
We are confident that detecting and blocking users who disseminate misin-
formation and propaganda is a much more effective way of dealing with fake
content, as it enables prevention of its massive and consistent distribution in
social media. Therefore, in this paper we deal with detection of propagandists.
c
Springer Nature Switzerland AG 2019
J. A. Lossio-Ventura et al. (Eds.): SIMBig 2018, CCIS 898, pp. 67–74, 2019.
https://doi.org/10.1007/978-3-030-11680-4_8
68 M. Orlov and M. Litvak
We define propagandists as groups of people who intentionally spread misinfor-
mation or biased statements, typically receiving payment for this task, similarly
to the definition of “fake reviews” disseminators in [2]. An article in the Russian-
language Meduza media outlet [12] describes one example of paid propagandists
performing their task on a social network1while neglecting to delete the task
description and requirements, as illustrated in Fig. 1.
Fig. 1. A comment on VK social network that includes paid propaganda task descrip-
tion. The marked section translates as: “ATTENTION!!!—attach a screenshot of the
task performed! The task is paid ONLY when this condition is fulfilled. TEXT OF
COMMENT.” The rest of the post promotes a municipal project.
Twitter is one of the most popular platforms for dissemination of informa-
tion. We would expect Twitter to attract focused attention of propagandists—
organized groups who disseminate similar messages in online forums and social
media, in response to real-time events or as a persistent policy, operating with
state-level or organizational funding. We explore, below, an unsupervised app-
roach to identifying groups of users who abuse the Twitter micro-blogging service
to disseminate propaganda and misinformation. This task is accomplished via
behavioral analysis of users and text analysis of their content. Users who fre-
quently post strikingly similar content at different times are identified through
repeated clustering, and their groups are subsequently identified via frequent
itemset mining. The lack of credibility of their content is validated manually.
The most influential disseminators are detected by calculating their PageRank
centrality in the social network and the results are visualized. Our purpose is
to present a case study into automatic identification of propagandists in social
media.
2 Related Work
The subject of credibility of information propagated on Twitter has been pre-
viously analyzed. Castillo et al. [5] observed that while most messages posted
on Twitter are truthful, the service also facilitates spreading misinformation and
false rumors. Dissemination of false rumors under critical circumstances was ana-
lyzed in [14], and the aggregation analysis on tweets was performed in order to
1VK is a social network popular in Russia, see https://vk.com.
Detection of Propagandists and Misinformers on Twitter 69
differentiate between false rumors and confirmed news. Discussion about detect-
ing rumors and misinformation in social networks remains very popular nowa-
days. Authors of [3] demonstrate the importance of social media for fake news
suppliers by measuring the source of their web traffic. Hamidian and Diab [8]
performed supervised rumors classification using the tweet latent vector feature.
Large-scale datasets for rumor detection were built in [17] and [21].
However, not much attention has been paid to detection of propagandists
in social media. Some works used the term propaganda in relation to spam-
mers [13]. Metaxas [15]associated the theory of propaganda with the behavior
of web spammers and applied social anti-propagandistic techniques to recognize
trust graphs on the web. Lumezanu et al. [11] studied the tweeting behavior of
assumed Twitter propagandists and identified tweeting patterns that character-
ize them as users who consistently express the same opinion or ideology. The
first attempt to automatically detect propaganda on Twitter was made in [20],
where linguistically-infused predictive models were built to classify news posts
as suspicious or verified, and then to predict four subtypes of suspicious news,
including propaganda.
In this paper, we address the problem of automatically identifying paid pro-
pagand ists, who have an agenda, but do not necessarily spread false rumors, or
even false information. This problem is principally different from what had been
stated in other papers, classifying propaganda as rumor or equating it with spam,
which is a much wider concept. Our approach is very intuitive and unsupervised.
3 Methodology
When using Twitter as an information source, we would like to detect tweets that
contain propaganda2, and users who disseminate it. We assume that propaganda
is disseminated by professionals who are centrally managed and who have the
following characteristics (partly supported by [11]): (1) They work in groups; (2)
Disseminators from the same group write very similar (or even identical) posts
within a short timeframe; (3) Each disseminator writes very frequently (within
short intervals between posts and/or replies); (4) One disseminator may have
multiple accounts; as such, a group of accounts with strikingly similar content
may represent the same person; (5) We assume that propaganda posts are pri-
marily political; (6) The content of tweets from one particular disseminator may
vary according to the subject of an “assignment,” and, as such, each subject
is discussed in disseminator’s accounts during some temporal frame of its rele-
vance; (7) Propaganda carries content similar to an official governance “vision”
depicted in mass media.
2Propaganda is defined as: “posts that contain information, especially of a biased or
misleading nature, that is used to promote or publicize a particular political cause or
point of view” (Oxford English Dictionary, 3rd Online Edition).
70 M. Orlov and M. Litvak
Based on the foregoing assumptions, we propose to perform the following
analysis for detection of propagandists:
Based on (1) and (2), given a time dimension, repeatedly cluster tweets posted
during the same time interval (timeframe), based on their content. For each
run, a group of users who posted similar posts (clustered together) can be
obtained. Given Nruns for Ntimeframes, we can obtain a group of users
who consistently write similar content—these are users whose tweets were
clustered together in most of the runs. The retrieved users can be considered
good suspects for propaganda dissemination.
Based on (3), the timeframes must be small, and clustering must be performed
quite frequently.
Based on (4), we do not distinguish between different individuals. Our purpose
is to detect a set of accounts, where each individual (propagandist) can be
represented by a single account or by a set of accounts.
Based on (5), we can verify the final results of our analysis and see whether the
posts published from the detected accounts indeed contain political content.
Based on (6) and (7), we collect data that belongs to content that is discussed
in mass media.
We outline, below, the main algorithm steps for the proposed methodology.
1. Filtering and pre-processing tweets. We consider only tweets in English and
perform standard preprocessing using tokenization, stopword removal, and
stemming. We also filter out numbers, non-textual content (like emoji sym-
bols), and links.
2. Split data set into timeframes. We split the data set into Ntimeframes, so
that each split contains tweets posted at the same period of time (between
two consecutive timeframes niand ni+1). The timeframes must be relatively
short, according to assumption (3).
3. Cluster tweets at each timeframe. We cluster tweets at each timeframe niin
order to find a group of users who posted similar content (clustered together).
K-means has been chosen as the unsupervised clustering method, using the
elbow method to determine the optimal number of clusters. The simple vector
space model [18] with adapted tf-idf weights3was used for tweets represen-
tation. We denote the clustering results (set of clusters) for timeframe niby
Ci={ci1,c
i2,...,c
ik}. The final clusters are composed of user IDs (after
replacing tweet IDs by IDs of users who posted them), therefore the clusters
are not disjointed.
4. Calculate groups of users4frequently clustered together. We scan the obtained
clusters and, using adapted version of the AprioriTID algorithm [1,10], com-
pute groups of users whose posts were frequently clustered together. We start
from generating a list L1from all single users uiappearing in at least T(the
minimum threshold specified by the user) timeframes. Then, we generate a
3A tweet was considered as a document, and collection of all tweets as a corpus.
4By “user” we mean account and not individual, based on assumption (4).
Detection of Propagandists and Misinformers on Twitter 71
list of pairs L2={ul,u
m,i,u
lL1,u
mL1}of users that are clustered
together in at least Ttimeframes. According to the Apriori algorithm, we then
join pairs from L2in order to obtain L3and so forth. This step is necessary
if we want to detect organized groups of propaganda disseminators.
5. Identifying the most influential disseminators with PageRank centrality. We
construct an undirected graph, with nodes standing for users. We add an
edge between two users if they have been clustered together at least once
(in one timeframe). The weights on edges are proportional5to the number
of times they were clustered together. As an option, edges having weights
below the specified threshold tcan be removed from the graph. We calculate
PageRank centrality on the resultant graph and keep the obtained scores for
detected accounts as a disseminator’s “influency” measure, as illustrated in
Fig. 2. Using an eigenvalue centrality metric for measuring influence in graph
structure of a social network considers its “recursive” nature. For example,
in [2] HITS algorithm [9] is adapted for computing the honesty of users and
goodness of products.
6. Visualize the “dissemination network structure and analyze results. We v is u-
alize the graph obtained in the prior step, where the PageRank centrality for
each node affects its size. We also apply topic modeling in order to visualize
main topics in the content that was detected as propaganda.
Fig. 2. Partial example of a list of PageRank centrality values that were computed for
the disseminators graph in step 5 above.
The algorithm’s flow is shown in Fig.3.
5Edge weights are normalized to be in range of [0,1].
72 M. Orlov and M. Litvak
4 Case Study
Dataset. Military airstrikes in Syria in September 2017 attracted worldwide
criticism. Reflection of these events in Twitter can be tracked using the keyword
#syria, determined via Hamilton 68 [19] as the most popular hashtag for 600
monitored Twitter accounts that were linked to Russian influence operations.
Our case study was carried out on a dataset obtained from Twitter, collected
using the Twitter Stream API with the #syria hashtag. The dataset covers
10,848 tweets posted by 3,847 users throughout September 9–12, 2017.
Fig. 3. Pipeline for detecting users who consistently post similar content.
Parameters/Settings. We performed clustering with 10 (K= 10) clusters, as
an optimal clusters number according to the elbow method, 8 (N= 8) times
(every 12 h, according to our assumption that organized propagandists work reg-
ular hours), and looked for a group of accounts that consistently (all timeframes
without exceptions, with T= 100%) post similar content.
To o ls . We have implemented the above-described process in KNIME, a data
analytics, reporting, and integration platform [4].
Results. Our algorithm detected seven suspicious accounts. The content of mes-
sages posted by these accounts confirmed our suspicions of organized propaganda
dissemination. Speaking formally, we manually approved 100% of precision. How-
ever, the recall was not measured due to the absence of manual annotation for
all accounts in our data.
Topic modeling6results confirmed that most topics in the detected posts
aligned well with political propaganda vocabulary. For example, the top topic
words attack, russia, report, isis, force, bomb, military represent Russia’s mili-
tary operations in Syria, and trump, attack, chemical, false, flag, weapons rep-
resent an insinuated American undercover involvement in the area.
Activity analysis of the detected accounts confirmed assumption (3) about pro-
pagandists posting significantly more frequently than regular Twitter users. While
regular users had 12.8 h mean time between posts, propagandists featured 1.8 h
mean time. This assumption has been also confirmed by empirical analysis in [20].
6Topic modeling was performed using KNIME’s LDA implementation.
Detection of Propagandists and Misinformers on Twitter 73
5 Conclusions
This paper introduces initial stages in our research related to automatic detection
of propagandists, based on analysis of users’ behavior and messages. We propose
an intuitive unsupervised approach for detecting Twitter accounts that dissem-
inate propaganda. We intend to continue this research in several directions: (a)
Extend our experiments with respect to other (baseline) methods, commercial
domains, and various (standard) IR evaluation metrics; (b) Evaluate in depth
the contribution of each separate stage of our pipeline; (c) Incorporate additional
(or alternative) techniques, like topic modeling, graph clustering, or analyzing
web traffic of news sources, into our pipeline; (d) Adapt and apply our approach
to tweets written in different languages, with focus on Russian, due to high pop-
ularity of Twitter among organized dissemination groups [16]; (e) Combine the
proposed approach with authorship analysis to detect actual users that might
use several accounts, according to assumption (4); (f) Perform geolocation pre-
diction and analysis on the detected accounts to provide additional important
information related to geographical distribution of organized propaganda dis-
semination activity; (g) Perform supervised classification of detected tweets for
more accurate analysis; (h) Incorporate retweeting statistics into our network
centrality analysis (step 6) to detect the most influential disseminators.
Our approach can be of great assistance in collecting a high quality dataset of
propaganda and its disseminators, which then can be used for training supervised
predictive models and for automatic evaluations. An automatic evaluation of our
approach can be performed via verification of automatically detected accounts
with accounts identified by public annotation tools, such as PropOrNot7.
References
1. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I., et al.: Fast dis-
covery of association rules. Adv. Knowl. Discov. Data Min. 12(1), 307–328 (1996)
2. Akoglu, L., Chandy, R., Faloutsos, C.: Opinion fraud detection in online reviews
by network effects. In: ICWSM 2013, pp. 2–11 (2013)
3. Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J.
Econ. Perspect. 31(2), 211–36 (2017)
4. Berthold, M.R., et al.: KNIME: the Konstanz information miner. In: Preisach,
C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds.) Data Analysis, Machine
Learning and Applications. Studies in Classification, Data Analysis, and Knowl-
edge Organization, pp. 319–326. Springer, Heidelberg (2008). https://doi.org/10.
1007/978-3-540-78246-9 38
5. Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: Pro-
ceedings of the 20th International Conference on World Wide Web, WWW 2011,
pp. 675–684. ACM, New York (2011). https://doi.org/10.1145/1963405.1963500
6. Chen, Y., Conroy, N.J., Rubin, V.L.: Misleading online content: recognizing click-
bait as false news. In: Proceedings of the 2015 ACM on Workshop on Multimodal
Deception Detection, pp. 15–19. ACM (2015)
7See http://www.propornot.com.
74 M. Orlov and M. Litvak
7. Conroy, N.J., Rubin, V.L., Chen, Y.: Automatic deception detection: methods for
finding fake news. Proc. Assoc. Inf. Sci. Technol. 52(1), 1–4 (2015)
8. Hamidian, S., Diab, M.T.: Rumor identification and belief investigation on Twitter.
In: WASSA@ NAACL-HLT, pp. 3–8 (2016)
9. Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: The
web as a graph: measurements, models, and methods. In: Asano, T., Imai, H., Lee,
D.T., Nakano, S., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, pp. 1–17.
Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48686-0 1
10. Li, Z.C., He, P.L., Lei, M.: A high efficient AprioriTid algorithm for mining associa-
tion rule. In: 2005 International Conference on Machine Learning and Cybernetics,
vol. 3, pp. 1812–1815. IEEE, August 2005. https://doi.org/10.1109/ICMLC.2005.
1527239
11. Lumezanu, C., Feamster, N., Klein, H.: #bias: measuring the tweeting behav-
ior of propagandists. In: Sixth International AAAI Conference on Weblogs and
Social Media (2012). http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/
paper/view/4588
12. Meduza: Authors of paid comments in support of Moscow authorities forgot to
edit assignment (2017). https://meduza.io/shapito/2017/02/03/avtory-platnyh-
kommentariev-v-podderzhku-moskovskih-vlastey-zabyli-otredaktirovat-zadanie
13. Mehta, B., Hofmann, T., Fankhauser, P.: Lies and propaganda: detecting spam
users in collaborative filtering. In: Proceedings of the 12th International Conference
on Intelligent User Interfaces, pp. 14–21. ACM (2007). https://doi.org/10.1145/
1216295.1216307
14. Mendoza, M., Poblete, B., Castillo, C.: Twitter under crisis: can we trust what we
RT? In: Proceedings of the First Workshop on Social Media Analytics, SOMA 2010,
pp. 71–79. ACM, New York (2010). https://doi.org/10.1145/1964858.1964869
15. Metaxas, P.: Using propagation of distrust to find untrustworthy web neighbor-
hoods. In: 2009 Fourth International Conference on Internet and Web Applica-
tions and Services, ICIW 2009, pp. 516–521. IEEE (2009). https://doi.org/10.1109/
ICIW.2009.83
16. Paul, C., Matthews, M.: The Russian “Firehose of Falsehood” Propaganda Model.
RAND Corporation, Santa Monica (2016)
17. Qazvinian, V., Rosengren, E., Radev, D.R., Mei, Q.: Rumor has it: identifying
misinformation in microblogs. In: Proceedings of the Conference on Empirical
Methods in Natural Language Processing, EMNLP 2011, pp. 1589–1599. Asso-
ciation for Computational Linguistics, Stroudsburg (2011). https://www.aclweb.
org/anthology/D11-1147
18. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing.
Commun. ACM 18(11), 613–620 (1975)
19. The Alliance for Securing Democracy: Hamilton 68 (2017). https://dashboard.
securingdemocracy.org
20. Volkova, S., Shaffer, K., Jang, J.Y., Hodas, N.: Separating facts from fiction: lin-
guistic models to classify suspicious and trusted news posts on Twitter. In: Proceed-
ings of the 55th Annual Meeting of the Association for Computational Linguistics,
Short Papers, vol. 2, pp. 647–653 (2017)
21. Zubiaga, A., Liakata, M., Procter, R., Bontcheva, K., Tolmie, P.: Towards detecting
rumours in social media. In: AAAI Workshop: AI for Cities (2015)
... To detect state-sponsored propaganda on social media and minimize its harmful effects, much research has been conducted over the past several years such as analyzing the spread of propaganda (Zannettou et al. 2019b;, analyzing the user features of state-sponsored trolls (Zannettou et al. 2019a;Volkova et al. 2017) and identifying state-sponsored propaganda trolls (Luceri, Giordano, and Ferrara 2020;Miao, Last, and Litvak 2020;Orlov and Litvak 2018) or content (Guo and Vosoughi 2020) on social media. ...
... However, all these works ignore the topic preference of users, which might cause analysis or predictive modeling to be biased as there will be information leakage from the topical information. Other researchers (Broniatowski et al. 2020;Guo and Vosoughi 2020;Volkova et al. 2017;Chang et al. 2021;Orlov and Litvak 2018) overcame this problem by building the baseline data at the postlevel to ensure that negative and positive tweets both contain similar hashtags and keywords. While these works partly solve the problem of topic distribution, they are only focused on limited topics or countries which limits the usage of these dataset. ...
Article
This paper proposes a large-scale and comprehensive dataset of 28 sub-datasets of state-backed tweets and accounts affiliated with 14 different countries, spanning more than 3 years, and a corresponding "negative" dataset of background tweets from the same time period and on similar topics. To our knowledge, this is the first dataset that contains both state-sponsored propaganda tweets and carefully collected corresponding negative tweet datasets for so many countries spanning such a long period of time.
... There is no doubt that the internet provides easier and faster access to news. Thus, social networks are a primary resource for news [2] and can influence the opinions of others when making large and small decisions [3]. It has become common knowledge that social media platforms provide an ideal environment for non-journalists.Thus, it deliberately spreads disinformation, hoaxes, propaganda, and satire intentionally to mislead people and deceive readers [4] [5] [6]. ...
... Until 2019, scholars focused on studying and detecting propaganda at the article level [14] [21] [4] [3]. The previously presented approaches focused on extracting handcrafted features and performing machine learning classifiers, such as Maximum Entropy and Logistic Regression (LR), and deep learning methods, such as LSTM and CNN. ...
... It is recognized by repetitiveness, a blend of true and false messages, and the use of fear and anger to help spread the messages. Computational propaganda methods include, for example, organized groups that disseminate similar messages into online social media platforms [32] and the synchronization of these groups to increase their web presence. Also, techniques such as an early "ping-pong" like the exchange of messages within Spreading Groups [14] enhance the future spread of a message and propagate it to larger audiences. ...
Article
Full-text available
Social networks are a battlefield for political propaganda. Protected by the anonymity of the internet, political actors use computational propaganda to influence the masses. Their methods include the use of synchronized or individual bots, multiple accounts operated by one social media management tool, or different manipulations of search engines and social network algorithms, all aiming to promote their ideology. While computational propaganda influences modern society, it is hard to measure or detect it. Furthermore, with the recent exponential growth in large language models (L.L.M), and the growing concerns about information overload, which makes the alternative truth spheres more noisy than ever before, the complexity and magnitude of computational propaganda is also expected to increase, making their detection even harder. Propaganda in social networks is disguised as legitimate news sent from authentic users. It smartly blended real users with fake accounts. We seek here to detect efforts to manipulate the spread of information in social networks, by one of the fundamental macro-scale properties of rhetoric—repetitiveness. We use 16 data sets of a total size of 13 GB, 10 related to political topics and 6 related to non-political ones (large-scale disasters), each ranging from tens of thousands to a few million of tweets. We compare them and identify statistical and network properties that distinguish between these two types of information cascades. These features are based on both the repetition distribution of hashtags and the mentions of users, as well as the network structure. Together, they enable us to distinguish (p − value = 0.0001) between the two different classes of information cascades. In addition to constructing a bipartite graph connecting words and tweets to each cascade, we develop a quantitative measure and show how it can be used to distinguish between political and non-political discussions. Our method is indifferent to the cascade’s country of origin, language, or cultural background since it is only based on the statistical properties of repetitiveness and the word appearance in tweets bipartite network structures.
... There was a change in the meanings of propaganda after World War II. Jacques Ellul described propaganda as, "a variety of tactics employed by a dispersed group that needs to realize the complex or uninvolved interest in its activities of a large number of people, mentally brought together by mental controls and consolidated in an association" [4]. Garth S. Jowett and Victoria O'Donnell described it as, "the deliberate, systematic attempt to form attitudes, manipulate cognitions, and guide behavior to obtain a response that furthers the propagandist's desired" purpose" [5]. ...
Article
Full-text available
Social media platforms serve as communication tools where users freely share information regardless of its accuracy. Propaganda on these platforms refers to the dissemination of biased or deceptive information aimed at influencing public opinion, encompassing various forms such as political campaigns, fake news, and conspiracy theories. This study introduces a Hybrid Feature Engineering Approach for Propaganda Identification (HAPI), designed to detect propaganda in text-based content like news articles and social media posts. HAPI combines conventional feature engineering methods with machine learning techniques to achieve high accuracy in propaganda detection. This study is conducted on data collected from Twitter via its API, and an annotation scheme is proposed to categorize tweets into binary classes (propaganda and non-propaganda). Hybrid feature engineering entails the amalgamation of various features, including Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words (BoW), Sentimental features, and tweet length, among others. Multiple Machine Learning classifiers undergo training and evaluation utilizing the proposed methodology, leveraging a selection of 40 pertinent features identified through the hybrid feature selection technique. All the selected algorithms including Multinomial Naive Bayes (MNB), Support Vector Machine (SVM), Decision Tree (DT), and Logistic Regression (LR) achieved promising results. The SVM-based HaPi (SVM-HaPi) exhibits superior performance among traditional algorithms, achieving precision, recall, F-Measure, and overall accuracy of 0.69, 0.69, 0.69, and 69.2%, respectively. Furthermore, the proposed approach is compared to well-known existing approaches where it overperformed most of the studies on several evaluation metrics. This research contributes to the development of a comprehensive system tailored for propaganda identification in textual content. Nonetheless, the purview of propaganda detection transcends textual data alone. Deep learning algorithms like Artificial Neural Networks (ANN) offer the capability to manage multimodal data, incorporating text, images, audio, and video, thereby considering not only the content itself but also its presentation and contextual nuances during dissemination.
... Gangireddy et al. [69] focus on the issue of unsupervised false news detection without labeled data on online media portals, where the approach draws on graph-based methodologies like graph-based features vector learning and label propagation and identifying bicliques. [70] introduced a simple unsupervised method for identifying Twitter profiles that spread propaganda, which cluster twitter posts at each timeframe to obtain a set of publishers who published similar information using K-means. ...
Article
The development of Online media sites in recent years has led to the spread of content sharing like commercial advertisements, political news, celebrity news, and so on. Various social media applications, such as Facebook, Instagram, and Twitter, have been impacted by fake news. Due to the easier access and rapid expansion of data through online media platforms, distinguishing between fake and real data has become difficult. The massive volume of news transmitted over online media portals makes manual verification impractical, which has prompted the development and deployment of automated methods for detecting fake news. Given the clear dangers of misleading and deception, fake news study has seen an increase in efforts that employ machine learning approaches, and sentiment analysis. In this study, we review the many implementations of sentiment analysis and machine learning methodologies in the fake news detection, as well as the most pressing difficulties and future research prospects.
Article
The structure and mechanics of organized outreach around certain issues, such as in propaganda networks, is constantly evolving on social media. We collect tweets on two propaganda events and one non-propaganda event with varying degrees of organized messaging. We then perform a comparative analysis of the user and network characteristics of social media networks around these events and find clearly distinguishable traits across events. We find that influential entities like prominent politicians, digital influencers, and mainstream media prefer to engage more with social media events with lesser degree of propaganda while avoiding events with high degree of propaganda, which are mostly sustained by lesser known but dedicated micro-influencers. We also find that network communities of events with high degree of propaganda are significantly centralized with respect to the influence exercised by their leaders. The methods and findings of this study can pave the way for modeling and early detection of other propaganda events, using their user and community characteristics.
Chapter
Modern networks, like social networks, can typically be characterised as a graph structure, and graph theory approaches can be used to address issues like link prediction, community recognition in social network analysis, and social network mining. The community structure or cluster, which is the organisation of vertices with numerous links joining vertices of the same cluster and comparatively few edges joining vertices of different clusters, is one of the most important characteristics of graphs describing real systems. People post content on social media platforms and others comment, share, and like their messages. There are various approaches in finding the communities on online social networks. In this chapter an overview of community structure is provided. A critical analysis is being done on various community detection algorithms.
Article
The content-criteria table of propagandist’s psycholinguistic portrait for researching the group propa-ganda was constructed in the article on the basis of classifying the psycholinguistic portrait’s semantic category components. The study results can be used in constructing a propagandist’s psycholinguistic portrait based on the identified group of psychological influence semantic components that require research. The semantic category com-ponents of the propagandist’s psycholinguistic portrait were classified by applying the content analysis method of the semantic particle in texts with psycholinguistic influence signs. A table of 17 content-criteria for the single propagandist’s psycholinguistic portrait was constructed based on the classified components of the propagandist’s psycholinguistic portrait semantic category. A table of 16 content-criteria for a psycholinguistic portrait of propa-gandists' group was constructed by means of studying the territorially remote multiple sources, which allows re-searching the group propaganda. The content-criteria number expansion, taking into account the perception worlds' multiplicity of both the propagandists' group and persons being influenced require further research. The effectiveness of the developed content analysis method of the semantic particle was proved. Checking the corre-spondence of defined categorical features to the real psychological profile showed a result of 82%, which gives a reason to consider the content-criteria table as maximal accurate for studying the semantic component of the prop-agandist's psycholinguistic portrait. The comparison results of the type I subcategory semantic cores number by content-criterion classes for single and group propaganda indicate the less detailed psycholinguistic profile in case of group propaganda.
Chapter
Online social media has been evolved as a universal platform for sharing information. Termination being shared on these platforms can be dubious or filthy. Propaganda is one of the systematic methods by which behavior of user can be manipulated. In this work, various machine learning methods are used for detecting such types of information on online social media. Data is collected d from Twitter using its API with the help of various ambiguous hashtags. The results showed that proposed Long Short Term Memory (LSTM) based propaganda identification showed better results than other machine learning techniques. An accuracy of 77.15% is achieved using the proposed approach. In the future BERT model can be used for achieving better Accuracy.KeywordsPropagandaMachine learningOnline social mediaTwitter
Conference Paper
Full-text available
Tabloid journalism is often criticized for its propensity for exaggeration, sensationalization, scare-mongering, and otherwise producing misleading and low quality news. As the news has moved online, a new form of tabloidization has emerged: ‘clickbaiting.’ ‘Clickbait’ refers to “content whose main purpose is to attract attention and encourage visitors to click on a link to a particular web page” [‘clickbait,’ n.d.] and has been implicated in the rapid spread of rumor and misinformation online. This paper examines potential methods for the automatic detection of clickbait as a form of deception. Methods for recognizing both textual and non-textual clickbaiting cues are surveyed, leading to the suggestion that a hybrid approach may yield best results.
Conference Paper
Full-text available
The spread of false rumours during emergencies can jeopardise the well-being of citizens as they are monitoring the stream of news from social media to stay abreast of the latest updates. In this paper, we describe the methodology we have developed within the PHEME project for the collection and sampling of conversational threads, as well as the tool we have developed to facilitate the annotation of these threads so as to identify rumourous ones. We describe the annotation task conducted on threads collected during the 2014 Ferguson unrest and we present and analyse our findings. Our results show that we can collect effectively social media rumours and identify multiple rumours associated with a range of stories that would have been hard to identify by relying on existing techniques that need manual input of rumour-specific keywords.
Article
Following the 2016 US presidential election, many have expressed concern about the effects of false stories ("fake news"), circulated largely through social media. We discuss the economics of fake news and present new data on its consumption prior to the election. Drawing on web browsing data, archives of fact-checking websites, and results from a new online survey, we find: 1) social media was an important but not dominant source of election news, with 14 percent of Americans calling social media their "most important" source; 2) of the known false news stories that appeared in the three months before the election, those favoring Trump were shared a total of 30 million times on Facebook, while those favoring Clinton were shared 8 million times; 3) the average American adult saw on the order of one or perhaps several fake news stories in the months around the election, with just over half of those who recalled seeing them believing them; and 4) people are much more likely to believe stories that favor their preferred candidate, especially if they have ideologically segregated social media networks.
Article
This research surveys the current state‐of‐the‐art technologies that are instrumental in the adoption and development of fake news detection. “Fake news detection” is defined as the task of categorizing news along a continuum of veracity, with an associated measure of certainty. Veracity is compromised by the occurrence of intentional deceptions. The nature of online news publication has changed, such that traditional fact checking and vetting from potential deception is impossible against the flood arising from content generators, as well as various formats and genres. The paper provides a typology of several varieties of veracity assessment methods emerging from two major categories – linguistic cue approaches (with machine learning), and network analysis approaches. We see promise in an innovative hybrid approach that combines linguistic cue and machine learning, with network‐based behavioral data. Although designing a fake news detector is not a straightforward problem, we propose operational guidelines for a feasible fake news detecting system.
Article
User-generated online reviews can play a significant role in the success of retail products, hotels, restaurants, etc. However, review systems are often targeted by opinion spammers who seek to distort the perceived quality of a product by creating fraudulent reviews. We propose a fast and effective framework, FRAUDEAGLE, for spotting fraudsters and fake reviews in online review datasets. Our method has several advantages: (1) it exploits the network effect among reviewers and products, unlike the vast majority of existing methods that focus on review text or behavioral analysis, (2) it consists of two complementary steps; scoring users and reviews for fraud detection, and grouping for visualization and sensemaking, (3) it operates in a completely unsupervised fashion requiring no labeled data, while still incorporating side information if available, and (4) it is scalable to large datasets as its run time grows linearly with network size. We demonstrate the effectiveness of our framework on synthetic and real datasets; where FRAUDEAGLE successfully reveals fraud-bots in a large online app review database. Copyright © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Conference Paper
Twitter is an efficient conduit of information for millions of users around the world. Its ability to quickly spread information to a large number of people makes it an efficient way to shape information and, hence, shape public opinion. We study the tweeting behavior of Twitter propagandists, users who consistently express the same opinion or ideology, focusing on two online communities: the 2010 Nevada senate race and the 2011 debt-ceiling debate. We identify several extreme tweeting patterns that could characterize users who spread propaganda: (1) sending high volumes of tweets over short periods of time, (2) retweeting while publishing little original content, (3) quickly retweeting, and (4) colluding with other, seemingly unrelated, users to send duplicate or near-duplicate messages on the same topic simultaneously. These four features appear to distinguish tweeters who spread propaganda from other more neutral users and could serve as starting point for developing behavioral-based propaganda detection techniques for Twitter.
Article
The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons -- mathematical, sociological, and commercial -- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web.