Table 3 - uploaded by Srishti Gupta
Content may be subject to copyright.
Comparison of our feedback-based learning ap- proach with standard oversampling approach (SMOTE). The term 'Ratio' indicates the fraction of training set taken as the number of synthetic samples generated by the oversam- pling technique. 

Comparison of our feedback-based learning ap- proach with standard oversampling approach (SMOTE). The term 'Ratio' indicates the fraction of training set taken as the number of synthetic samples generated by the oversam- pling technique. 

Source publication
Conference Paper
Full-text available
Cybercriminals have leveraged the popularity of a large user base available on Online Social Networks~(OSNs) to spread spam campaigns by propagating phishing URLs, attaching malicious contents, etc. However, another kind of spam attacks using phone numbers has recently become prevalent on OSNs, where spammers advertise phone numbers to attract user...

Contexts in source publication

Context 1
... on the amount of oversampling required, k-nearest neighbors are randomly chosen. Table 3 shows the results for different values of oversampling ratio, i.e., the fraction of training set taken as the num- ber of synthetic samples. In addition, we perform the oversampling technique before dividing the data into training and validation to ensure that the information from the training set is used in build- ing the classifier. ...
Context 2
... addition, we perform the oversampling technique before dividing the data into training and validation to ensure that the information from the training set is used in build- ing the classifier. Table 3 shows that even after varying the ratio for oversampling, none of the cases can achieve the accuracy ob- tained from our feedback-based learning approach. This indicates that our feedback-based learning strategy is superior to the other oversampling strategy. ...

Similar publications

Preprint
In the context of a project to build a national injury surveillance system based on emergency room (ER) visit reports, it was necessary to develop a coding system capable of classifying the causes of these visits based on the automatic reading of clinical notes written by clinicians. Supervised learning techniques have shown good results but requir...

Citations

... Various studies were proposed in this direction. For example, Gupta et al. [65] developed a model to detect Twitter spammers who spread social spam by accessing phone numbers of Twitter users to deliver annoying advertisements of products and services. The authors proposed a Hierarchical Meta-Path Score (HMPS) metric to measure the similarity between two nodes in the network. ...
Article
Full-text available
The communication revolution has perpetually reshaped the means through which people send and receive information. Social media is an important pillar of this revolution and has brought profound changes to various aspects of our lives. However, the open environment and popularity of these platforms inaugurate windows of opportunities for various cyber threats, thus social networks have become a fertile venue for spammers and other illegitimate users to execute their malicious activities. These activities include phishing hot and trendy topics and posting a wide range of contents in many topics. Hence, it is crucial to continuously introduce new techniques and approaches to detect and stop this category of users. This article proposes a novel and effective approach to detect social spammers. An investigation into several attributes to measure topic-dependent and topic-independent users’ behaviours on Twitter is carried out. The experiments of this study are undertaken on various machine learning classifiers. The performance of these classifiers is compared and their effectiveness is measured via a number of robust evaluation measures. Furthermore, the proposed approach is benchmarked against state-of-the-art social spam and anomalous detection techniques. These experiments report the effectiveness and utility of the proposed approach and embedded modules.
... Here, we calculated the degree of similarity between the input UGC and the extracted UGC using the Jaccard coefficient, and if the calculated similarity was greater than or equal to a predetermined threshold, it was considered similar. In this study, the threshold for the Jaccard coefficient was set to 0.7, which is similar to that used in a previous study [13]. The selected UGC is the one posted at similar times that has text that is similar to the input UGC. ...
Article
The growth of user-generated content service platforms has led to people relying on user-generated content (UGC) rather than search engines when searching for and accessing information on the web. Attackers can also use UGC on a UGC service platform to disseminate web-based social engineering (SE) attacks to a large number of people. In this paper, we focus on an event-synced navigation attack, a type of web-based SE attack that generates UGC with links to malicious websites and distributes it synced with a real-life event at a specific time. To understand the attacks in the wild, we propose a three-step system to detect event-synced navigation attacks in real time by capturing the inevitable footprints left by attackers. We evaluate each step of the proposed system and determine that the proposed system can classify malicious and non-malicious UGC with 97% accuracy. In addition, we performed a comprehensive measurement study on event-synced navigation attacks spread from popular UGC platforms. We found that 34.1% of the fully qualified domain names of malicious websites associated with the event-synced navigation attack were spread from two or more UGC platforms. Finally, we also found that 87.8% of FQDN associated with well-known type of malicious websites (i.e., information theft, survey scams, suspicious browser plugin installations, etc.) survive for more than 100 days and that countermeasures taken by the UGC platform only covered 31.0% of the malicious UGC we detected in this study even though the malicious websites were accessed frequently.
... Social media website like YouTube is the appropriate way for spammers to spread malicious videos with some pornographic and dating websites. The user comments related to those videos are spread over the wired and wireless networks and attract many users to visit [4][5][6]. Sometimes, these comments are auto-generated through a bot and invite people to surf. Even if the prominent way of communication among different users like social network using Cloud and other ways are also affected by spammers to gain users' credentials [7][8][9][10][11][12][13][14][15][16][17][18][19]. ...
Chapter
In recent years, spammer has imbued every social network platform. The growth in users in social platforms like Instagram, Facebook, Sina Weibo, Twitter, YouTube, etc. has unfolded new approaches for spammers. To get the financial benefit, spammers exploit social network platforms using flooded content shared by the users. In order to protect user information and accounts from spammers, many researchers have proposed defensive mechanisms and detection methods. Over the last few years, researchers develop many spam detection methods which improve the performance of user’s account. Therefore, we are motivated to point out few surveys about various methods by which the spammer can spread and some detection mechanisms to protect users and their account information. The solution for the detection of a spammer is different based on the collected dataset and predictive user analysis. This survey includes (1) Various detection mechanisms including the way of spreading spam. (2) Analysis of user contents based on their account information and posts including a comparative analysis of various methodologies for the detection of spammers. (3) Open issues and challenges in social network spam detection techniques.
... Various studies were proposed in this direction. For example, Gupta et al. [65] developed a model to detect Twitter spammers who spread social spam by accessing phone numbers of Twitter users to deliver annoying advertisements of products and services. The authors proposed a Hierarchical Meta-Path Score (HMPS) metric to measure the similarity between two nodes in the network. ...
Preprint
Full-text available
The communication revolution has perpetually reshaped the means through which people send and receive information. Social media is an important pillar of this revolution and has brought profound changes to various aspects of our lives. However, the open environment and popularity of these platforms inaugurate windows of opportunities for various cyber threats, thus social networks have become a fertile venue for spammers and other illegitimate users to execute their malicious activities. These activities include phishing hot and trendy topics and posting a wide range of contents in many topics. Hence, it is crucial to continuously introduce new techniques and approaches to detect and stop this category of users. This paper proposes a novel and effective approach to detect social spammers. An investigation into several attributes to measure topic-dependent and topic-independent users' behaviours on Twitter is carried out. The experiments of this study are undertaken on various machine learning classifiers. The performance of these classifiers are compared and their effectiveness is measured via a number of robust evaluation measures. Further, the proposed approach is benchmarked against state-of-the-art social spam and anomalous detection techniques. These experiments report the effectiveness and utility of the proposed approach and embedded modules.
... Journal, conference and peer-reviewed papers Editorial papers, white papers, non-English papers and papers less than six pages Studies that focus on spam detection on Twitter Books, book chapters, theses and review papers Research papers that present techniques or innovative solutions to enhance spam detection in the Twitter dataset Research papers that do not mention solutions and methods to improve spam detection rate on Twitter dataset explicitly [48], [49], [50] tweet analysis approaches [51], [52], [53], [54], [55], network analysis approaches [56], [57], [58], [59], [60], [61] and hybrid analysis approaches. Hybrid analysis approaches combines various features to accomplish spam detection, such as user and tweet analysis approaches [11], [62], [63], [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], [75], content, user, and tweet analysis approaches [76], [7], [77], [78], content and network analysis approaches [79], [80], [81], [82], [83], user, tweet, and network analysis approaches [84], [85], [86], [87], network, tweet, and content analysis approaches [88], [89], or network, content, and user analysis approaches [90]. ...
... Gupta, et al. [58] developed a collective classification and feedback-based approach along with a Hierarchical Meta-Path Scores (HMPS) to measure the similarity between nodes in a heterogeneous network and to identify suspended accounts as spammers per campaign that was superior to previous spam detection and oversampling methods. Similarly, [59] drops in this category. ...
Article
Full-text available
Nowadays, with the rise of Internet access and mobile devices around the globe, more people are using social networks for collaboration and receiving real-time information. Twitter, the microblogging that is becoming a critical source of communication and news propagation, has grabbed the attention of spammers to distract users. So far, researchers have introduced various defense techniques to detect spams and combat spammer activities on Twitter. To overcome this problem, in recent years, many novel techniques have been offered by researchers, which have greatly enhanced the spam detection performance. Therefore, it raises a motivation to conduct a systematic review about different approaches of spam detection on Twitter. This review focuses on comparing the existing research techniques on Twitter spam detection systematically. Literature review analysis reveals that most of the existing methods rely on Machine Learning-based algorithms. Among these Machine Learning algorithms, the major differences are related to various feature selection methods. Hence, we propose a taxonomy based on different feature selection methods and analyses, namely content analysis, user analysis, tweet analysis, network analysis, and hybrid analysis. Then, we present numerical analyses and comparative studies on current approaches, coming up with open challenges that help researchers develop solutions in this topic.
... Journal, conference and peer-reviewed papers Editorial papers, white papers, non-English papers and papers less than six pages Studies that focus on spam detection on Twitter Books, book chapters, theses and review papers Research papers that present techniques or innovative solutions to enhance spam detection in the Twitter dataset Research papers that do not mention solutions and methods to improve spam detection rate on Twitter dataset explicitly [48], [49], [50] tweet analysis approaches [51], [52], [53], [54], [55], network analysis approaches [56], [57], [58], [59], [60], [61] and hybrid analysis approaches. Hybrid analysis approaches combines various features to accomplish spam detection, such as user and tweet analysis approaches [11], [62], [63], [64], [65], [66], [67], [68], [69], [70], [71], [72], [73], [74], [75], content, user, and tweet analysis approaches [76], [7], [77], [78], content and network analysis approaches [79], [80], [81], [82], [83], user, tweet, and network analysis approaches [84], [85], [86], [87], network, tweet, and content analysis approaches [88], [89], or network, content, and user analysis approaches [90]. ...
... Gupta, et al. [58] developed a collective classification and feedback-based approach along with a Hierarchical Meta-Path Scores (HMPS) to measure the similarity between nodes in a heterogeneous network and to identify suspended accounts as spammers per campaign that was superior to previous spam detection and oversampling methods. Similarly, [59] drops in this category. ...
Preprint
Full-text available
Nowadays, with the rise of Internet access and mobile devices around the globe, more people are using social networks for collaboration and receiving real-time information. Twitter, the microblogging that is becoming a critical source of communication and news propagation, has grabbed the attention of spammers to distract users. So far, researchers have introduced various defense techniques to detect spams and combat spammer activities on Twitter. To overcome this problem, in recent years, many novel techniques have been offered by researchers, which have greatly enhanced the spam detection performance. Therefore, it raises a motivation to conduct a systematic review about different approaches of spam detection on Twitter. This review focuses on comparing the existing research techniques on Twitter spam detection systematically. Literature review analysis reveals that most of the existing methods rely on Machine Learning-based algorithms. Among these Machine Learning algorithms, the major differences are related to various feature selection methods. Hence, we propose a taxonomy based on different feature selection methods and analyses, namely content analysis, user analysis, tweet analysis, network analysis, and hybrid analysis. Then, we present numerical analyses and comparative studies on current approaches, coming up with open challenges that help researchers develop solutions in this topic.
... Over the recent years, there has been a growing interest in analyzing social media among researchers. A great deal of work has done on different aspects of Twitter network Chatzakou et al. (2017); Keib et al. (2018); Gupta et al. (2018); Kim & Shim (2014); Anagnostopoulos et al. (2018). For instance, Cha et al. ...
Article
Full-text available
Telegram is a new Instant Messaging application providing key features for both public and private messaging. Telegram is similar to group broadcast or micro-blogging platforms, while on the other hand, it has features of ordinary Instant Messaging applications such as WhatsApp. In this paper, investigating a real dataset crawled from Telegram, we provide several observations which can explain the information flow, business model of content providers, and social sensing aspects of Telegram. The crawled dataset which is manually labeled by six persons contains two months of public messages of selected Telegram channels. Moreover, we introduce the viral messages in instant messaging services and propose formal definition of these messages as well as deeply analyzing their characteristics and features. Detection of virality characteristics of messages in Telegram can be beneficial for both end-users and digital marketers. Consequently, we propose statistical and word embedding approaches to detect viral messages and their sentiment and message category.Our experiments indicate that the word embedding approach can significantly outperform other baseline models.
... 31,32 The collective classification achieves higher classification accuracy compared with the individual classification methods shown in the previous techniques. 33,34 A collective classification method to decrease the learning and inference changes within the domains whereas the same set of nodes are connected by multiple networks. 35 Transfer learning is efficaciously useful in many application area of machine learning like, image classification, 36 text classification, 32 and human activity classification. ...
... 35 Transfer learning is efficaciously useful in many application area of machine learning like, image classification, 36 text classification, 32 and human activity classification. [33][34][35][37][38][39] ...
Article
The vast amount of data is key challenge to mine a new scholar that is plausible to be star in the upcoming period. The enormous amount of unstructured data raise every year is infeasible for traditional learning; consequently, we need a high quality of preprocessing technique to expand the performance of traditional learning. We have persuaded a novel approach, Authors classification algorithm using Transfer Learning (ACTL) to learn new task on target area to mine the external knowledge from the source domain. Comprehensive experimental outcomes on real‐world networks showed that ACTL, Node‐based Influence Predicting Stars, Corresponding Authors Mutual Influence based on Predicting Stars, and Specific Topic Domain‐based Predicting Stars enhanced the node classification accuracy as well as predicting rising stars to compared with contemporary baseline methods.
... Collective classification can be also used in active learning processes such as ALFNET (for active learning on network data) algorithm [32] which use local and collective aspects of any classifier to select more useful examples. It was shown that ALFNET works as efficient as ICA in collective classification task [33]. ...
Article
Full-text available
The increasing use of social networking platforms has raised the need to develop automated multi-classifications on document network. In this paper, we propose a supervised relation topic model that leverages the links between documents to learn the latent content of documents and enhance performance of prediction. Our model takes advantage of Bayesian generative model to exploit the relation between word feature and link feature in a document network. We evaluate our model on large-scale data collections that include scientific citation community and medical article network. We demonstrate its effectiveness and efficiency on document classification with SLDA model and collective classification approaches.
... Only recently, the attention has shifted on studying the extent to which these attacks entered and adapted to social platforms. Existing work has studied different aspects such as the pervasiveness of spam campaigns in social networks (e.g., [32], [15]), the infrastructure used by attackers to distribute malicious pages [29], and the accounts spreading malicious content (e.g., [35], [9]). Other lines of works looked at the demographics of the victims (e.g., [27]), showing that individual and communities behavior influence the likelihood to click. ...