Chapter

Precedent-Based Approach for the Identification of Deviant Behavior in Social Media

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The current paper is devoted to a problem of deviant users’ identification in social media. For this purpose, each user of social media source should be described through a profile that aggregates open information about him/her within the special structure. Aggregated user profiles are formally described in terms of multivariate random process. The special emphasis in the paper is made on methods for identifying of users with certain on a base of few precedents and control the quality of search results. Experimental study shows the implementation of described methods for the case of commercial usage of the personal account in social media.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Chapter
We consider the task of developing algorithms for cyber-physical systems (CPS) for proactively managing the state of unstable systems with a chaotically evolving state vector. Examples of such processes are changes in the state of gas- and hydrodynamic environments, stock price evolution, thermal phenomena, and so on. The main problem of this type of CPS is creating forecasts that would allow us to compare the efficiency of different feasible control actions. The presence of a chaotic element in the state dynamics of unstable systems does not allow to build of control CPS based on conventional statistical extrapolation algorithms. Hence, in the current chapter, we consider forecasting algorithms built upon machine learning and instance-based data analysis. In the conditions of chaotic influences, which are common in unstable immersion environments, obtaining an accurate forecast is highly complicated. Within the conducted computational experiment that employed direct averaging by three after-effects of analog windows, the average forecast accuracy oscillates between 15 and 20%. Effective forecasting of a chaotic process of a complicated inertia-less nature based on the considered computational schemes has not been achieved yet. This means that additional research, based on multidimensional statistical measures, is required.KeywordsPrecedent forecastingControlling of a chaotic systemMatrix similarity measures
Article
Full-text available
The R function kofnGA conducts a genetic algorithm search for the best subset of k items from a set of n alternatives, given an objective function that measures the quality of a subset. The function fills a gap in the presently available subset selection software, which typically searches over a range of subset sizes, restricts the types of objective functions considered, or does not include freely available code. The new function is demonstrated on two types of problem where a fixed-size subset search is desirable: design of environmental monitoring networks, and D-optimal design of experiments. Additionally, the performance is evaluated on a class of constructed test problems with a novel design that is interesting in its own right.
Conference Paper
Full-text available
Social Networks are popular platforms for users to express themselves, facilitate interactions, and share knowledge. Today, users in social networks have personalized profiles that contain their dynamic attributes representing their interest and behavior over time such as published content, and location check-ins. Several proposed models emerged that analyze those profiles with their dynamic content in order to measure the degree of similarity between users. This similarity value can be further used in friend suggesting and link prediction. The main drawback of the majority of these models is that they rely on a static snapshot of attributes which do not reflect the change in user interest and behavior over time. In this paper a novel framework for modeling the dynamic of user's behavior and measuring the similarity between users' profiles in twitter is proposed. In this proposed framework, dynamic attributes such as topical interests and the associated locations in tweets are used to represent user's interest and behavior respectively. Experiments on a real dataset from twitter showed that the proposed framework that utilizes those attributes outperformed multiple standard models that utilize a static snapshot of data.
Conference Paper
Full-text available
Friendships, relationships and social communications have all gone to a new level with new definitions as a result of the invention of online social networks. Meanwhile, alongside this transition there is increasing evi-dence that online social applications have been used by children and adoles-cents for bullying. State-of-the-art studies in cyberbullying detection have mainly focused on the content of the conversations while largely ignoring the users involved in cyberbullying. We hypothesis that incorporation of the users' profile, their characteristics, and post-harassing behaviour, for instance, posting a new status in another social network as a reaction to their bullying experience, will improve the accuracy of cyberbullying detection. Cross-system analyses of the users' behaviour -monitoring users' reactions in different online environ-ments -can facilitate this process and could lead to more accurate detection of cyberbullying. This paper outlines the framework for this faceted approach.
Conference Paper
Full-text available
The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and the other set consists of negative examples. However, it is often the case that the available training data are an incomplete set of positive examples, and a set of unlabeled examples, some of which are positive and some of which are negative. The problem solved in this paper is how to learn a standard binary classifier given a nontraditional training set of this nature. Under the assumption that the labeled examples are selected randomly from the positive examples, we show that a classifier trained on positive and unlabeled examples predicts probabilities that differ by only a constant factor from the true conditional probabilities of being positive. We show how to use this result in two different ways to learn a classifier from a nontraditional training set. We then apply these two new methods to solve a real-world problem: identifying protein records that should be included in an incomplete specialized molecular biology database. Our experiments in this domain show that models trained using the new methods perform better than the current state-of-the-art biased SVM method for learning from positive and unlabeled examples.
Conference Paper
Full-text available
Understanding and forecasting the health of an online community is of great value to its owners and managers who have vested interests in its longevity and success. Nevertheless, the association between community evolution and the behavioural patterns and trends of its members is not clearly understood, which hinders our ability of making accurate predictions of whether a community is flourishing or diminishing. In this paper we use statistical analysis, combined with a semantic model and rules for representing and computing behaviour in online communities. We apply this model on a number of forum communities from Boards.ie to categorise behaviour of community members over time, and report on how different behaviour compositions correlate with positive and negative community growth in these forums.
Article
With the rise of social media, people can now form relationships and communities easily regardless of location, race, ethnicity, or gender. However, the power of social media simultaneously enables harmful online behavior such as harassment and bullying. Cyberbullying is a serious social problem, making it an important topic in social network analysis. Machine learning methods can potentially help provide better understanding of this phenomenon, but they must address several key challenges: the rapidly changing vocabulary involved in cyber- bullying, the role of social network structure, and the scale of the data. In this study, we propose a model that simultaneously discovers instigators and victims of bullying as well as new bullying vocabulary by starting with a corpus of social interactions and a seed dictionary of bullying indicators. We formulate an objective function based on participant-vocabulary consistency. We evaluate this approach on Twitter and Ask.fm data sets and show that the proposed method can detect new bullying vocabulary as well as victims and bullies.
Flame Wars: Automatic Insult Detection
  • S Sax
Topic-level influencers identification in the Microblog sphere
  • Y Wang
A sub-linear, massive-scale look-alike audience extension system
  • Q Ma
Dynamic modeling of twitter users dynamic modeling of twitter users
  • A Galal
  • A Elkorany