Figure 1 - uploaded by Raad Bin Tareaf
Content may be subject to copyright.
Source publication
Spam Bots have become a threat to online social networks with their malicious behavior, posting misinformation messages and influencing online platforms to fulfill their motives. As spam bots have become more advanced over time, creating algorithms to identify bots remains an open challenge. Learning low-dimensional embeddings for nodes in graph st...
Contexts in source publication
Context 1
... are several well-known datasets collected by different research groups specifically for bot detection on Twitter. Lee Figure 1 shows the degree distribution of the accounts in the dataset. Most accounts have a small number of followers and followings and there are a few accounts which have more than 1000 accounts in their neighborhood. ...
Context 2
... a specific extension for future work is to deploy this method in real time on Twitter's streaming API for spambot detection. 1, 2, 3, 4) 0.85 0.77 0.80 GCNN (with features 5, 6) 0.80 0.69 0.72 GCNN (All features) 0.89 0.80 0.84 Table 5: Comparison of different algorithms on the dataset ...
Similar publications
Graph neural networks (GNNs) have shown significant success in graph representation learning. However, the performance of existing GNNs degrades seriously when their layers deepen due to the over-smoothing issue. The node embedding incline converges to a certain value when GNNs repeat, aggregating the representations of the receptive field. The mai...
Representation learning models for graphs are a successful family of techniques that project nodes into feature spaces that can be exploited by other machine learning algorithms. Since many real-world networks are inherently dynamic, with interactions among nodes changing over time, these techniques can be defined both for static and for time-varyi...
Entity alignment refers to discovering two entities in different knowledge bases that represent the same thing in reality. Existing methods generally only adopt TransE or TransE-like knowledge graph representation learning models, which usually assume that there are enough training triples for each entity, and entities appearing in few triples are...
Abusive behaviors are common on online social networks. The increasing frequency of antisocial behaviors forces the hosts of online platforms to find new solutions to address this problem. Automating the moderation process has thus received a lot of interest in the past few years. Various methods have been proposed, most based on the exchanged cont...
Citations
... Alhosseini et al. [46] introduced the use of graph convolutional neural networks (GCNN) in bot identification. They noted that besides the users' features, the construction of a social network would enhance a model's ability to distinguish the bots from the genuine users. ...
... • Botometer [23] is a web-based program that leverages more than 1,000 user features. • Alhosseini et al. [46] introduced graph convolutional neural networks in bot detection. • SATAR [27] leverages the user's semantics, property, and neighborhood information • BotRGCN et al. [12] used the user's description, tweets, numerical and categorical properties, and neighborhood information. ...
... We see that our model benefits from the search for the fittest architecture that we performed beforehand, as it achieves a higher accuracy, F1-score, and MCC than other state-of-theart methods. Model Accuracy F1-score MCC [9] 0.7456 0.7823 0.4879 [37] 0.8191 0.8546 0.6643 [20] 0.8174 0.7517 0.6710 [21] 0.7126 0.7533 0.4193 [39] 0.4801 0.6266 -0.1372 [10] 0.4793 0.1072 0.0839 [23] 0.5584 0.4892 0.1558 [46] 0.6813 0.7318 0.3543 [27] 0.8412 0.8642 0.6863 [12] 0.8462 0.8707 0.7021 [29] 0.7466 0.7630 ours 0.8568 ± 0.004 0.8712 ± 0.003 0.7116 ± 0.007 ...
Social media platforms, including X, Facebook, and Instagram, host millions of daily users, giving rise to bots-automated programs disseminating misinformation and ideologies with tangible real-world consequences. While bot detection in platform X has been the area of many deep learning models with adequate results, most approaches neglect the graph structure of social media relationships and often rely on hand-engineered architectures. Our work introduces the implementation of a Neural Architecture Search (NAS) technique, namely Deep and Flexible Graph Neural Architecture Search (DFG-NAS), tailored to Relational Graph Convolutional Neural Networks (RGCNs) in the task of bot detection in platform X. Our model constructs a graph that incorporates both the user relationships and their metadata. Then, DFG-NAS is adapted to automatically search for the optimal configuration of Propagation and Transformation functions in the RGCNs. Our experiments are conducted on the TwiBot-20 dataset, constructing a graph with 229,580 nodes and 227,979 edges. We study the five architectures with the highest performance during the search and achieve an accuracy of 85.7%, surpassing state-of-the-art models. Our approach not only addresses the bot detection challenge but also advocates for the broader implementation of NAS models in neural network design automation.
... This shift toward deep learning enables the analysis of unstructured information, including the network structures connected users create. Models utilizing graph convolutional networks (GCNs) have been introduced to exploit these user relationships [31]. ...
Organized misinformation campaigns on Twitter continue to proliferate, even as the platform acknowledges such activities through its transparency center. These deceptive initiatives significantly impact vital societal issues, including climate change, thus spurring research aimed at pinpointing and intercepting these malicious actors. Present-day algorithms for detecting bots harness an array of data drawn from user profiles, tweets, and network configurations, delivering commendable outcomes. Yet, these strategies mainly concentrate on postincident identification of malevolent users, hinging on static training datasets that categorize individuals based on historical activities. Diverging from this approach, we advocate for a forward-thinking methodology, which utilizes user data to foresee and mitigate potential threats before their realization, thereby cultivating more secure, equitable, and unbiased online communities. To this end, our proposed technique forecasts malevolent activities by tracing the projected trajectories of user embeddings before any malevolent action materializes. For validation, we employed a dynamic directed multigraph paradigm to chronicle the evolving engagements between Twitter users. When juxtaposed against the identical dataset, our technique eclipses contemporary methodologies by an impressive 40.66% in F score (F1 score) in the anticipatory identification of harmful users. Furthermore, we undertook a model evaluation exercise to gauge the efficiency of distinct system elements.
... We find that many works collect user profile data from various online social networks for this analysis, like Twitter [212][213][214][215][216][217][218], Instagram [219][220][221], Facebook [222][223][224], YouTube [225], and Sina Weibo [226]. A different approach was used by a study [227] that collected real names from various webpages, schools, and other sources to automatically detect fake names online. ...
Fraud is a prevalent offence that extends beyond financial loss, causing psychological and physical harm to victims. The advancements in online communication technologies alowed for online fraud to thrive in this vast network, with fraudsters increasingly using these channels for deception. With the progression of technologies like AI, there is a growing concern that fraud will scale up, using sophisticated methods, like deep-fakes in phishing campaigns, all generated by language generation models like ChatGPT. However, the application of AI in detecting and analyzing online fraud remains understudied. We conduct a Systematic Literature Review on AI and NLP techniques for online fraud detection. The review adhered the PRISMA-ScR protocol, with eligibility criteria including relevance to online fraud, use of text data, and AI methodologies. We screened 2,457 academic records, 350 met our eligibility criteria, and included 223. We report the state-of-the-art NLP techniques for analysing various online fraud categories; the training data sources; the NLP algorithms and models built; and the performance metrics employed for model evaluation. We find that current research on online fraud is divided into various scam activitiesand identify 16 different frauds that researchers focus on. This SLR enhances the academic understanding of AI-based detection methods for online fraud and offers insights for policymakers, law enforcement, and businesses on safeguarding against such activities. We conclude that focusing on specific scams lacks generalization, as multiple models are required for different fraud types. The evolving nature of scams limits the effectiveness of models trained on outdated data. We also identify issues in data limitations, training bias reporting, and selective presentation of metrics in model performance reporting, which can lead to potential biases in model evaluation.
... This procedure baits spambots into attacking a specific system aimed at studying their behaviors and profiles [4,22]. Furthermore, some recent methods have been developed by Ali Alhosseini et al. [3] to detect traditional spambots via models based on graph convolutional neural networks. ...
Emerging technologies, particularly artificial intelligence (AI), and more specifically Large Language Models (LLMs) have provided malicious actors with powerful tools for manipulating digital discourse. LLMs have the potential to affect traditional forms of democratic engagements, such as voter choice, government surveys, or even online communication with regulators; since bots are capable of producing large quantities of credible text. To investigate the human perception of LLM-generated content, we recruited over 1,000 participants who then tried to differentiate bot from human posts in social media discussion threads. We found that humans perform poorly at identifying the true nature of user posts on social media. We also found patterns in how humans identify LLM-generated text content in social media discourse. Finally, we observed the Uncanny Valley effect in text dialogue in both user perception and identification. This indicates that despite humans being poor at the identification process, they can still sense discomfort when reading LLM-generated content.
... These methods leverage user relationships to improve accuracy and robustness and demonstrate the efficacy of capturing and utilizing the structural information. They construct various types of graphs, including isomorphic graphs [17,18], heterogeneous graphs [19,20], and multirelational graphs [21,22] based on user relationships, and employ GNNs to obtain user representations for effective bot detection. ...
... The MIU phenomenon in the user representation feature space reveals a notable overlap between features of personalized genuine accounts and bot accounts, whereas the majority of human accounts are readily distinguishable. Therefore, we propose the HR-MRG that expands [17,20,21], and multiple types of user relationships have different impacts on social bot detection, we achieve the representation models R b and R r based on multi-relational graphs and realize classifiers F b and F r with fully connected layers. Specifically, given the dataset D = {V, X, A} , we first generate the adjacency matrix A r for each relation r from the global adjacency matrix A, where r ∈ {1, 2, ..., R} indicates any interaction between users. ...
... In detail, we first select subsets of unlabeled samples for LP c and LP f and obtain feature representations using frozen representation models R b and R r to construct adjacency matrices (lines [12][13][14]. Subsequently, we perform coarse and fine label propagation separately and congregate all pseudo labels (lines [15][16][17]. Finally, we fine-tune the models for a few epochs on the expanded dataset, consisting of labeled and unlabeled samples (lines [18][19][20][21][22][23][24]. ...
Social bot detection is crucial for ensuring the active participation of digital twins and edge intelligence in future social media platforms. Nevertheless, the performance of existing detection methods is impeded by the limited availability of labeled accounts. Despite the notable progress made in some fields by deep semi-supervised learning with label propagation, which utilizes unlabeled data to enhance method performance, its effectiveness is significantly hindered in social bot detection due to the misdistribution of individuation users (MIU). To address these challenges, we propose a novel deep semi-supervised bot detection method, which adopts a coarse-to-fine label propagation (LP-CF) with the hybridized representation models over multi-relational graphs (HR-MRG) to enhance the accuracy of label propagation, thereby improving the effectiveness of unlabeled data in supporting the detection task. Specifically, considering the potential confusion among accounts in the MIU phenomenon, we utilize HR-MRG to obtain high-quality user representations. Subsequently, we introduce a sample selection strategy to partition unlabeled samples into two subsets and apply LP-CF to generate pseudo labels for each subset. Finally, the predicted pseudo labels of unlabeled samples, combined with labeled samples, are used to fine-tune the detection models. Comprehensive experiments on two widely used real datasets demonstrate that our method outperforms other semi-supervised approaches and achieves comparable performance to the fully supervised social bot detection method.
... Researchers tackled this by leveraging the graphical structure of the Twittersphere, which is composed of social relationships among Twitter users. They used Graph Neural Networks (GNNs) like Graph Convolutional Networks (GCNs) [11], Relational Graph Convolutional Networks (RGCNs) [12], and Relational Graph Transformers (RGTs) [13] for graph node classification to detect bots. Graph-based methods outperform text-based methods in detection performance and exhibit better generalization capabilities [14]. ...
Detecting Twitter Bots is crucial for maintaining the integrity of online discourse, safeguarding democratic processes, and preventing the spread of malicious propaganda. However, advanced Twitter Bots today often employ sophisticated feature manipulation and account farming techniques to blend seamlessly with genuine user interactions, posing significant challenges to existing detection models. In response to these challenges, this paper proposes a novel Twitter Bot Detection framework called BotSAI. This framework enhances the consistency of multimodal user features, accurately characterizing various modalities to distinguish between real users and bots. Specifically, the architecture integrates information from users, textual content, and heterogeneous network topologies, leveraging customized encoders to obtain comprehensive user feature representations. The heterogeneous network encoder efficiently aggregates information from neighboring nodes through oversampling techniques and local relationship transformers. Subsequently, a multi-channel representation mechanism maps user representations into invariant and specific subspaces, enhancing the feature vectors. Finally, a self-attention mechanism is introduced to integrate and refine the enhanced user representations, enabling efficient information interaction. Extensive experiments demonstrate that BotSAI outperforms existing state-of-the-art methods on two major Twitter Bot Detection benchmarks, exhibiting superior performance. Additionally, systematic experiments reveal the impact of different social relationships on detection accuracy, providing novel insights for the identification of social bots.
... Для виявлення програмних ботів структурні графи аналізуються методами: показників центральності [8], навчання за поданими вузлами (node representation learning) [3], графових нейронних мереж (Graph Neural Network), або GNN [7]. Комбінування різних методів аналізу графів і текстів [32], а також створення покращених архітектур GNN для аналізу неоднорідних мереж [21], мають значні перспективи для виявлення програмних ботів. ...
Мета даної роботи полягає у детальному дослідженні ефективності використання великих мовних моделей (Large Language Models), або LLM, для виявлення програмних ботів в соціальних мережах. Робота зосереджується на аналізуванні ефективності різних методів виявлення та визначення потенціалу LLM як засобу для підвищення точності та ефективності процесу ідентифікації ботів. Дослідження охоплює аналіз трьох основних підходів до виявлення програмних ботів: аналіз метаданих, текстовий аналіз та аналіз графів. Аналізуються як традиційні методи машинного навчання, так і новітні LLM, які використовуються для аналізу великих даних з соціальних мереж. Основною методикою є порівняльний аналіз, який включає використання розширених наборів даних, таких як TwiBot20 і TwiBot-22, для оцінки продуктивності кожного методу з використанням метрик, таких як точність та F1-міра, що дозволяє отримати об'єктивне уявлення про ефективність різних підходів до виявлення ботів. Наукова новизна даної роботи полягає у використанні LLM для аналізу різноманітних видів даних з соціальних мереж для виявлення програмних ботів. Автори розглядають інтеграцію LLM у традиційні методи виявлення, що дозволяє адаптувати процеси виявлення до складної поведінки програмних ботів, забезпечуючи високу точність і ефективність. Висновки. LLM демонструють високу ефективність у виявленні програмних ботів, проте мають високу обчислювальну вимогливість. Тому актуальним є застосування гібридних підходів, які поєднують LLM з традиційними методами. Така гібридизація дозволить зменшити використання ресурсів і забезпечити більш стійку та адаптовану систему виявлення ботів. Такий підхід може сприяти поліпшенню загальної продуктивності систем виявлення ботів, зменшенню витрат на обчислювальні ресурси та забезпеченню більш точного і ефективного виявлення шкідливих програм у соціальних мережах. Рекомендується подальше дослідження для вдосконалення інтеграції LLM у системи виявлення ботів, особливо в контексті динамічної поведінки соціальних мереж та еволюції програмних ботів.
... This framework treats network elements as multi-attribute graphs and uses them for semi-supervised learning to classify nodes. Ali Alhosseini and his team [19] create a model using Graph Convolutional Neural Networks that can effectively spot social bots by looking at the characteristics of a node and those around it. Additionally, Thomas Kipf and his colleagues [20] suggest a scalable approach for learning with limited supervision on graphs. ...
... Previous research has found that there is no significant difference in the feature of username length between social bot users and human users [19]. Therefore, evaluating and improving robustness based on the username length attribute may not be significant. ...
Online social networks are easily exploited by social bots. Although the current models for detecting social bots show promising results, they mainly rely on Graph Neural Networks (GNNs), which have been proven to have vulnerabilities in robustness and these detection models likely have similar robustness vulnerabilities. Therefore, it is crucial to evaluate and improve their robustness. This paper proposes a robustness evaluation method: Attribute Random Iteration-Fast Gradient Sign Method (ARI-FGSM) and uses a simplified adversarial training to improve the robustness of social bot detection. Specifically, this study performs robustness evaluations of five bot detection models on two datasets under both black-box and white-box scenarios. The white-box experiments achieve a minimum attack success rate of 86.23%, while the black-box experiments achieve a minimum attack success rate of 45.86%. This shows that the social bot detection model is vulnerable to adversarial attacks. Moreover, after executing our robustness improvement method, the robustness of the detection model increased by up to 86.98%.
... With the application of graph neural networks for node representation and node classification tasks, many works have emerged that utilize graph models for social bot detection. Alhosseini et al. [22] firstly implemented social bot detection based on graph models by applying the GCN method to obtain the user representation and achieve the classification of the social accounts. BotRGCN [23] applied the relational graph convolutional neural network model to achieve social bot classification, obtaining better detection results. ...
Malicious social bots pose a serious threat to social network security by spreading false information and guiding bad opinions in social networks. The singularity and scarcity of single organization data and the high cost of labeling social bots have given rise to the construction of federated models that combine federated learning with social bot detection. In this paper, we first combine the federated learning framework with the Relational Graph Convolutional Neural Network (RGCN) model to achieve federated social bot detection. A class-level cross entropy loss function is applied in the local model training to mitigate the effects of the class imbalance problem in local data. To address the data heterogeneity issue from multiple participants, we optimize the classical federated learning algorithm by applying knowledge distillation methods. Specifically, we adjust the client-side and server-side models separately: training a global generator to generate pseudo-samples based on the local data distribution knowledge to correct the optimization direction of client-side classification models, and integrating client-side classification models’ knowledge on the server side to guide the training of the global classification model. We conduct extensive experiments on widely used datasets, and the results demonstrate the effectiveness of our approach in social bot detection in heterogeneous data scenarios. Compared to baseline methods, our approach achieves a nearly 3–10% improvement in detection accuracy when the data heterogeneity is larger. Additionally, our method achieves the specified accuracy with minimal communication rounds.
... Due to the continuous evolution of social bots [9], including their ability to steal information from legitimate accounts and mimic normal account behaviors [26], these traditional methods turn out to be ineffective in identifying the latest generation bots. A recent advancement in social bot detection is introducing graph neural networks [1] that treat accounts and the interactions in-between as nodes and edges, respectively. Multi-relational heterogeneous graphs can be established [13] and a Relation Graph Transformer (RGT) is responsible for aggregating information from neighbors. ...
... Graph-based social bot detection has been of ultimate importance in modeling various interactions intrinsically existing in social networks. Previous methods [1,13,15,46] have focused primarily on designing information aggregation strategies for better detection performance. [1] takes the first attempt to use graph convolutional neural networks (GCNs) [25] for detecting social bots. ...
... Previous methods [1,13,15,46] have focused primarily on designing information aggregation strategies for better detection performance. [1] takes the first attempt to use graph convolutional neural networks (GCNs) [25] for detecting social bots. Typically, BotRGCN [15] utilizes relational graph convolutional networks (RGCNs) [35] to aggregate neighbor information from edges of different relations. ...
Recent advancements in social bot detection have been driven by the adoption of Graph Neural Networks. The social graph, constructed from social network interactions, contains benign and bot accounts that influence each other. However, previous graph-based detection methods that follow the transductive message-passing paradigm may not fully utilize hidden graph information and are vulnerable to adversarial bot behavior. The indiscriminate message passing between nodes from different categories and communities results in excessively homogeneous node representations, ultimately reducing the effectiveness of social bot detectors. In this paper, we propose SEBot, a novel multi-view graph-based contrastive learning-enabled social bot detector. In particular, we use structural entropy as an uncertainty metric to optimize the entire graph's structure and subgraph-level granularity, revealing the implicitly existing hierarchical community structure. And we design an encoder to enable message passing beyond the homophily assumption, enhancing robustness to adversarial behaviors of social bots. Finally, we employ multi-view contrastive learning to maximize mutual information between different views and enhance the detection performance through multi-task learning. Experimental results demonstrate that our approach significantly improves the performance of social bot detection compared with SOTA methods.