Conference Paper

Identifying Suspected Cybermob on Tieba

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper describes an approach to identify suspected cybermob on social media. Many researches involve making predictions of group emotion on Internet (such as quantifying sentiment polarity), but this paper instead focuses on the origin of information diffusion, namely back to its makers and contributors. According our previous findings that have shown, at the level of Tieba’s contents, the negative information or emotions spread faster than positive ones, we centre on the maker of negative message in this paper, so-called cybermobs who post aggressive, provocative or insulting remarks on social websites. We explore the different characteristics between suspected cybermobs and general netizens and then extract relative unique features of suspected cybermobs. We construct real system to identify suspected cybermob automatically using machine learning method with above features, including other common features like user/content-based ones. Empirical results show that our approach can detect suspected cybermob correctly and efficiently as we evaluate it with benchmark models, and apply it to actual cases.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
With the rise and spread of the Web 2.0 culture the nature of “old”/“traditional” social interaction, including shame and shaming, is changing as more and more attention is given to online vs. offline social interactions. Amongst those on-going changes lies the construction of Shaming 2.0, i.e., a public attempt to impose shame on “the Other” by using Web 2.0 technological capabilities. Thus, Shaming 2.0 can be defined as a pragmatic social negotiation regarding the boundaries of what is allowed and forbidden, what is acceptable and unacceptable while performing on-line and off-line social interactions. The illustration of Shaming 2.0 was conducted by utilizing Israeli rabbinical court decisions in the era of Web 2.0 cultural features. Via the implementation of critical discourse analysis, the rise of the ‘Virtual Mirror’ is portrayed side by side with “new” social interactions behind the scenes of Shame 2.0.
Article
Full-text available
Social network has become a very popular way for internet users to communicate and interact online. Users spend plenty of time on famous social networks (e.g., Facebook, Twitter, Sina Weibo, etc.), reading news, discussing events and posting messages. Unfortunately, this popularity also attracts a significant amount of spammers who continuously expose malicious behavior (e.g., post messages containing commercial URLs, following a larger amount of users, etc.), leading to great misunderstanding and inconvenience on users’ social activities. In this paper, a supervised machine learning based solution is proposed for an effective spammer detection. The main procedure of the work is: first, collect a dataset from Sina Weibo including 30,116 users and more than 16 million messages. Then, construct a labeled dataset of users and manually classify users into spammers and non-spammers. Afterwards, extract a set of feature from message content and users’ social behavior, and apply into SVM (Support Vector Machines) based spammer detection algorithm. The experiment shows that the proposed solution is capable to provide excellent performance with true positive rate of spammers and non-spammers reaching 99.1% and 99.9% respectively.
Conference Paper
Full-text available
In recent years, opinion mining attracted a great deal of research attention. However, limited work has been done on detecting opinion spam (or fake reviews). The problem is analogous to spam in Web search [1, 9 11]. However, review spam is harder to detect because it is very hard, if not impossible, to recognize fake reviews by manually reading them [2]. This paper deals with a restricted problem, i.e., identifying unusual review patterns which can represent suspicious behaviors of reviewers. We formulate the problem as finding unexpected rules. The technique is domain independent. Using the technique, we analyzed an Amazon.com review dataset and found many unexpected rules and rule groups which indicate spam activities.
Conference Paper
Full-text available
Users and network administrators need ways to filter email messages based primarily on the reputation of the sender. Unfortunately, conventional mechanisms for sender reputation—notably, IP blacklists—are cumber- some to maintain and evadable. This paper investigates ways to infer the reputation of an email sender based solely on network-level features, without looking at the contents of a message. First, we study first-order prop- erties of network-levelfeatures that may help distinguish spammers from legitimate senders. We examine features that can be ascertained without ever looking at a packet's contents, such as the distance in IP space to other email senders or the geographic distance between sender and receiver. We derive features that are lightweight, since they do not require seeing a large amount of email from a single IP address and can be gleaned without looking at an email's contents—many such features are appar- ent from even a single packet. Second, we incorporate these features into a classification algorithm and evalu- ate the classifier's ability to automatically classify email senders as spammers or legitimate senders. We build an automated reputation engine, SNARE, based on these features using labeled data from a deployed commercial spam-filtering system. We demonstrate that SNARE can achieve comparable accuracy to existing static IP black- lists: abouta 70% detectionrate forless thana 0.3%false positive rate. Third, we show how SNARE can be inte- grated into existing blacklists, essentially as a first-pass filter.
Article
With its rising popularity, as evidenced in social networks, online shopping platforms and email systems, detection of Web spammer has already become one of the hottest topics in the data mining field. The main challenge of Web spammer detection is how to recognize spammer behavior patterns by examining spammer features and attributes from big dataset in order to limit the proliferation of Internet spam and insure quality of Internet service. This paper presents an overview of Web spammer detection, along with a comparison over the difference between traditional and burgeoning spammer detection approaches. The key techniques and evaluation methods are classified and discussed from several aspects. At last, the prospects for future development and suggestions for possible extensions are emphasized.
Article
As the popularity of the social media increases, as evidenced in Twitter, Facebook and China's Renren, spamming activities also picked up in numbers and variety. On social network sites, spammers often dis-guise themselves by creating fake accounts and hijack-ing normal users' accounts for personal gains. Different from the spammers in traditional systems such as SMS and email, spammers in social media behave like nor-mal users and they continue to change their spamming strategies to fool anti-spamming systems. However, due to the privacy and resource concerns, many social me-dia websites cannot fully monitor all the contents of users, making many of the previous approaches, such as topology-based and content-classification-based meth-ods, infeasible to use. In this paper, we propose a Su-pervised Matrix Factorization method with Social Reg-ularization (SMFSR) for spammer detection in social networks that exploits both social activities as well as users' social relations in an innovative and highly scal-able manner. The proposed method detects spammers collectively based on users' social actions and social re-lations. We have empirically tested our method on data from Renren.com, which is one of the largest social net-works in China, and demonstrated that our new method can improve the detection performance significantly.
Conference Paper
Microblogs have become an increasingly important source of information, both in the U.S. (Twitter) and in China (Weibo). However, the brevity of microblog updates, combined with increasing access of microblog content through search rather than through direct network connections, makes it challenging to assess the credibility of news relayed in this manner [34]. This paper reports on experimental and survey data that compare the impact of several features of microblog updates (author's gender, name style, profile image, location, and degree of network overlap with the reader) on credibility perceptions among U.S. and Chinese audiences. We reveal the complex mechanism of credibility perceptions, identify several key differences in how users from each country critically consume microblog content, and discuss how to incorporate these findings into the design of improved user interfaces for accessing microblogs in different cultural settings.
Conference Paper
Spamming has been a widespread problem for social networks. In recent years there is an increasing interest in the analysis of anti-spamming for microblogs, such as Twitter. In this paper we present a systematic research on the analysis of spamming in Sina Weibo platform, which is currently a dominant microblogging service provider in China. Our research objectives are to understand the specific spamming behaviors in Sina Weibo and find approaches to identify and block spammers in Sina Weibo based on spamming behavior classifiers. To start with the analysis of spamming behaviors we devise several effective methods to collect a large set of spammer samples, including uses of proactive honeypots and crawlers, keywords based searching and buying spammer samples directly from online merchants. We processed the database associated with these spammer samples and interestingly we found three representative spamming behaviors: aggressive advertising, repeated duplicate reposting and aggressive following. We extract various features and compare the behaviors of spammers and legitimate users with regard to these features. It is found that spamming behaviors and normal behaviors have distinct characteristics. Based on these findings we design an automatic online spammer identification system. Through tests with real data it is demonstrated that the system can effectively detect the spamming behaviors and identify spammers in Sina Weibo.
Conference Paper
Twitter, with its rising popularity as a micro-blogging website, has inevitably attracted the attention of spammers. Spammers use myriad of techniques to evade security mechanisms and post spam messages, which are either unwelcome advertisements for the victim or lure victims in to clicking malicious URLs embedded in spam tweets. In this paper, we propose several novel features capable of distinguishing spam accounts from legitimate accounts. The features analyze the behavioral and content entropy, bait-techniques, and profile vectors characterizing spammers, which are then fed into supervised learning algorithms to generate models for our tool, CATS. Using our system on two real-world Twitter data sets, we observe a 96% detection rate with about 0.8% false positive rate beating state of the art detection approach. Our analysis reveals detection of more than 90% of spammers with less than five tweets and about half of the spammers detected with only a single tweet. Our feature computation has low latency and resource requirement making fast detection feasible. Additionally, we cluster the unknown spammers to identify and understand the prevalent spam campaigns on Twitter.
Article
The problem of gauging information credibility on social networks has received considerable attention in recent years. Most previous work has chosen Twitter, the world's largest micro-blogging platform, as the premise of research. In this work, we shift the premise and study the problem of information credibility on Sina Weibo, China's leading micro-blogging service provider. With eight times more users than Twitter, Sina Weibo is more of a Facebook-Twitter hybrid than a pure Twitter clone, and exhibits several important characteristics that distinguish it from Twitter. We collect an extensive set of microblogs which have been confirmed to be false rumors based on information from the official rumor-busting service provided by Sina Weibo. Unlike previous studies on Twitter where the labeling of rumors is done manually by the participants of the experiments, the official nature of this service ensures the high quality of the dataset. We then examine an extensive set of features that can be extracted from the microblogs, and train a classifier to automatically detect the rumors from a mixed set of true information and false information. The experiments show that some of the new features we propose are indeed effective in the classification, and even the features considered in previous studies have different implications with Sina Weibo than with Twitter. To the best of our knowledge, this is the first study on rumor analysis and detection on Sina Weibo.
Conference Paper
Maximum Entropy Model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. In this paper, we use maximum entropy model for text categorization. We compare and analyze its categorization performance using different approaches for text feature generation, different number of features and smoothing technique. Moreover, in experiments we compare it to Bayes, KNN and SVM, and show that its performance is higher than Bayes and comparable with KNN and SVM. We think it is a promising technique for text categorization.
Conference Paper
Twitter is now used to distribute substantive content such as breaking news, increasing the importance of assessing the credibility of tweets. As users increasingly access tweets through search, they have less information on which to base credibility judgments as compared to consuming content from direct social network connections. We present survey results regarding users' perceptions of tweet credibility. We find a disparity between features users consider relevant to credibility assessment and those currently revealed by search engines. We then conducted two experiments in which we systematically manipulated several features of tweets to assess their impact on credibility ratings. We show that users are poor judges of truthfulness based on content alone, and instead are influenced by heuristics such as user name when making credibility assessments. Based on these findings, we discuss strategies tweet authors can use to enhance their credibility with readers (and strategies astute readers should be aware of!). We propose design improvements for displaying social search results so as to better convey credibility.
Article
In the era of Web 2.0, huge volumes of consumer reviews are posted to the Internet every day. Manual approaches to detecting and analyzing fake reviews (i.e., spam) are not practical due to the problem of information overload. However, the design and development of automated methods of detecting fake reviews is a challenging research problem. The main reason is that fake reviews are specifically composed to mislead readers, so they may appear the same as legitimate reviews (i.e., ham). As a result, discriminatory features that would enable individual reviews to be classified as spam or ham may not be available. Guided by the design science research methodology, the main contribution of this study is the design and instantiation of novel computational models for detecting fake reviews. In particular, a novel text mining model is developed and integrated into a semantic language model for the detection of untruthful reviews. The models are then evaluated based on a real-world dataset collected from amazon.com. The results of our experiments confirm that the proposed models outperform other well-known baseline models in detecting fake reviews. To the best of our knowledge, the work discussed in this article represents the first successful attempt to apply text mining methods and semantic language models to the detection of fake consumer reviews. A managerial implication of our research is that firms can apply our design artifacts to monitor online consumer reviews to develop effective marketing or product design strategies based on genuine consumer feedback posted to the Internet.
Reflections on violence of public opinion in modern media environment: analysis of the kneeling incident of Li Yang’s students
  • L I Jun-Xiang
  • LI Jun-Xiang
Analysis on infringement speech of cybermob
  • X Liu
Young children defecation in “Hong Kong” the bedizen comments and events “internet mob” phenomenon analysis
  • N Cao
Edgerank: the secret sauce that makes Facebook’s news feed tick
  • J Kincaird
Analysis of the causes of “cyber violence”. Perspective of Communication Psychology
  • W Sun
Cybermob” with its legal regulation
  • S Tao
Reconstruction of the sense of responsibility to resolve the cyber violence
  • Z Hou
Comment target extraction and sentiment classification
  • H Liu
  • Y Zhao
  • B Qin
  • T Liu