Conference PaperPDF Available

Adopting MaxEnt to Identification of Bullying Incidents in Social Networks



Bullying is a widespread problem in cyberspace and social networks. Therefore, in the recent years many studies have been dedicated to cyberbullying. Lack of appropriate dataset, due to variety of reasons, is one of the major obstacles faced in most studies. In this work we suggest that to overcome some of these barriers a model should be employed which is minimally affected by prevalence and small sample size. To this end we adopted the use of the Maximum Entropy method (MaxEnt) to identify the bully users in YouTube. The final results were compared with the commonly used methods. All models provided reasonable prediction of the bullying incidents. MaxEnt models had the highest discrimination capacity of bullying posts and the lowest sensitivity towards prevalence. We demonstrate that MaxEnt can be successfully adopted to cyberbullying studies with imbalanced datasets.
Adopting MaxEnt to Identification of Bullying Incidents in Social Networks
Maral Dadvar
Web-Based Information Systems and Services
Stuttgart Media University
Stuttgart, Germany
Aidin Niamir
Data and Modelling Centre
Senckenberg Biodiversity and Climate Research Centre
Frankfurt am Main, Germany
Abstract Bullying is a widespread problem in cyberspace and
social networks. Therefore, in the recent years many studies have
been dedicated to cyberbullying. Lack of appropriate dataset,
due to variety of reasons, is one of the major obstacles faced in
most studies. In this work we suggest that to overcome some of
these barriers a model should be employed which is minimally
affected by prevalence and small sample size. To this end we
adopted the use of the Maximum Entropy method (MaxEnt) to
identify the bully users in YouTube. The final results were
compared with the commonly used methods. All models provided
reasonable prediction of the bullying incidents. MaxEnt models
had the highest discrimination capacity of bullying posts and the
lowest sensitivity towards prevalence. We demonstrate that
MaxEnt can be successfully adopted to cyberbullying studies
with imbalanced datasets.
Keywords Cyberbullying, Maximum Entropy, Prevalence,
Sample Size, Sentiment Analsysis, Social Networks, Text Retreival,
Bullying is a widespread problem in cyberspace and social
networks. Cyberbullying is as an aggressive, intentional act
carried out by a group or individual, using electronic forms of
contact repeatedly and over time against a victim who cannot
easily defend him or herself [1]. One of the most common
forms of bullying is the posting of hateful comments about
someone in social networks. Identification of bullying incidents
is one of the main courses of actions to combat such
misbehaviour in social networks.
To this end there are several studies which have
concentrated on detection of the bullying comments and
harassing contents [2][5] , as well as identification of the bully
users [6], [7]. As we have extensively explained in our
previous studies [8], one essential obstacle that is commonly
faced in almost all of the cyberbullying studies is lack of a
suitable dataset representing cyberbullying in social networks.
Imbalance of bullying and non-bullying incidents in the online
materials as well as the cumbersome process of labelling the
dataset make it even harder to develop the appropriate dataset
for these studies. Advances in artificial intelligence along with
powerful computational facilities have fuelled a rapid increase
in predictive modelling of bullying incidents from massive
social network’s data. However, low prevalence of these
incidents made the labelling process costly and laborious.
Commonly used methods for identification of bullying
incidents have been criticized for being inherently dependent
on prevalence, and have been argued that the low number of
bullying incidents introduces statistical artefacts.
In this work we suggest that to overcome the stated barriers
a model should be employed which is minimally affected by
prevalence and small sample size. For this purpose, we adopted
the use of the Maximum Entropy (MaxEnt) method for
modelling these incidents in social networks. MaxEnt is a
general-purpose machine learning method with a simple and
precise mathematical formulation, and it has number of aspects
that make it well-suited for studies such as cyberbullying
detection in which the target incidents are scarce. In order to
evaluate the proposed method, we performed a case study
using a manually labelled YouTube dataset. We compiled a set
of features to identify bully users representing the personal
characteristics of the users, content of their online activities and
behaviour of the users, respectively. MaxEnt predictions, solely
based on bullying incidents, were compared with those of
commonly used modelling methods; Generalized Linear
Models, Random Forests, and Support Vector Machine.
Maximum Entropy is a statistical learning method. It has
been developed and used in other fields, and has been
extensively used in modelling the geographical distribution of
species, where similar to our case of study, datasets with both
observed and not-observed classes are scarce. In this method,
the multivariate distribution of incidents (here the bully users)
in feature-space is estimated according to the principle of
maximum entropy. It states that the best approximation of an
unknown distribution is the one with maximum entropy (the
most spread out) subject to known constraints. The constraints
are defined by the expected value of the distribution, which is
estimated from a set of incidents.
Here we used Maxent software package (version 3.3.3; [9])
which is particularly popular in species distribution and
environmental niche modelling, with over 2000 applications
published since 2006. [9] outlined some advantages and
disadvantages of MaxEnt compare to other methods; Maxent
only requires incident data, often called incident-only data plus
features for the whole datasets. The results are amenable to
interpretation of the form of the feature response functions.
MaxEnt has properties that make it very robust to limited
amount of training data (i.e. small sample size), and is well-
regularized [10]. Because it uses an exponential model for
probabilities, it can give very large predicted values for
conditions that are outside the rage of those found in the data
used to develop the model. Nevertheless, extrapolation outside
of the range of values used to develop a model should be done
very cautiously no matter what modelling method is used.
A. Corpus
We used the labelled YouTube dataset provided by [11]. To
our knowledge no other comprehensive dataset for cyberbully
detection is publicly available. The dataset consist of the
activity logs of 3,825 users in the period of 4 months (April
June 2012), along with their profile information, such as their
age and the date they signed up. In total there are 54,050
comments in the dataset. On average there are 15 comments
per user (StDev = 10.7, Median = 14). The average age of the
users is 24 with 1.5 years of membership duration. The dataset
has been labelled manually as bullies or non-bullies. In total,
765 users (12% of the users) are labelled as bullies.
B. Feature Space
We compiled a set of fourteen features in three categories
to be used in our models (Table 1).
The activity features are the activities that users can
undertake in the social network. These features help to
determine how active the user is in the online environment; for
instance uploading videos, posting comments on uploaded
videos, or responding to other user’s comments. The user
features are the demographic and personal information of the
users, which were publicly available in their profile, such as
user’s age, or the membership duration of the users. The
content features are the ones which are extracted from the
comments posted by the users and pertain to the writing
structure and usage of specific words which represent their
writing style and structure. For more details please see [6].
Since correlation among features [12] violates the
assumption of independence of most standard statistical
procedures [13][14], the compiled features was investigated
using the variance inflation factor (VIF) as a measure of
C. Classification Techniques
We employed three well-known classification methods,
namely the generalised linear model [15], random forests [16],
and support vector machine [17], along with MaxEnt to
identify bully users.
The generalized linear model (GLM) uses a parametric
function to link the response variable to a linear, quadratic or
cubic combination of explanatory variables. We used an
ordinary polynomial GLM with an automatic stepwise model
selection based on the Akaike Information Criterion. The
random forests (RF) algorithm selects many bootstrap samples
from the data and fits a large number of regression trees to each
of these subsamples. Each tree is then used to predict those
subsamples that were not selected as bootstrap samples. The
classification is provided by considering each tree as a ‘vote’,
and the predicted class of an observation is determined by the
majority vote among all trees. The models presented here used
1000 trees. The support vector machine (SVM) is a machine-
learning generalised linear classifier that estimates the potential
bully users that is subject to the feature values by separating the
feature space by hyper-planes into bullying and non-bullying
feature values. The optimality criterion used to find the
separating hyper-plane is the maximised distance to the
training data points.
We randomly split the data, 75% of which was used to train
the models and the remaining 25% of which was used to
evaluate the model performance. All models except MaxEnt
were trained using both bully and non-bully labelled data,
whereas MaxEnt models were trained using bully-only labelled
data. We iterated this step 25 times and calculated the variation
and therefore robustness of the models.
Activity features
Number of comments
Number of subscriptions
Number of uploads
User features
Age of the user
Membership duration of the user
Content features
Number of profane words in the comments
Usernames containing profanities
Length of the comments
First person pronouns
Second person pronouns
Non-standard spelling of the words
Number of smilies in the comments
Number of capital letters in the comments
Second person pronouns followed by profanities
*. Variance Inflation Factor
D. Evalaution
The outputs of the models (i.e. probability of a user being
bully) are values ranging from 0 to 1. We used a threshold
independent measure to evaluate and compare the performance
of models. We evaluated the discrimination capacity by
analysing their receiver operation characteristic (ROC) curves.
A ROC curve plots “sensitivity” values (true positive fraction)
on the y-axis against “1 - specificity values (false positive
fraction) for all thresholds on the x-axis [18]. The area under
such a curve (AUC) is a threshold-independent metric and
provides a single measure of the performance of the model.
AUC scores vary from 0 to 1. AUC values of less than 0.5
indicate discrimination worse than chance; a score of 0.5
implies random predictive discrimination; and score of 1
indicates perfect discrimination.
We also assessed the goodnees-of-fit [19] of the models
using Miller’s calibration statistic [20], [21]. Miller's
calibration statistic evaluates the ability of a prediction model
to correctly predict the proportion of bully users with a given
feature profile. It is based on the hypothesis that the calibration
line perfect calibration has an intercept of zero and a slope
of one. The calibration plot shows the model’s estimated
probability (x-axis) against the mean observed proportion of
positive cases (y-axis) for equally sized probability intervals
(number of intervals = 10).
All models provided reasonable prediction of the bullying
incidents and were significantly (P < 0.001 in all four models)
better than random in both binomial tests of omission and
receiver operating characteristic (ROC) analyses (Table 2). The
area under the ROC curve was always higher for MaxEnt,
indicating stronger discrimination power of bullying users.
Variation in the performance of MaxEnt was as small as the
other models (Figure 1).
MaxEnt and RF models were better calibrated compared to
the GLM and SVM models, meaning that given feature profile,
they accurately predict the proportion of bully users to the
whole dataset (Figure 2). Better-calibrated models are of
greater interest if the objective lies in independent training of
the model, and then transferring the model and producing a
general conclusion beyond the training extent over which the
models are fitted.
Analysis of the feature’s contribution to the MaxEnt models
revealed that the number of profane words in the comments has
the highest contribution (~ 33%), followed by the number of
the comments (Figure 3). Although all the features
significantly contributed to the models (P < 0.01 in all fourteen
features), number of subscription had the least contribution to
the models (~ 1%).
Maximum Entropy (MaxEnt)
Generalized Linear Model (GLM)
Random Forests (RF)
Support Vector Machine (SVM)
*. Area under the receiver operating
characteristic (ROC) curve
False positive rate
True positiv e rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
Figure 1. ROC plot of MaxEnt Models (n=25 iterations)
Figure 2. Calibration plot for MaxEnt (light grey) and RF (dark grey)
Contribution %
Figure 3. Retative contribution of feature variables to the MaxEnt Model
In this experiment we adopted MaxEnt for identification of
potential bullying users. We compared the MaxEnt with a
variety of common models to calculate the probability of a user
being bully, given the features profile. We demonstrated that
the MaxEnt outperforms the other models in discrimination
capacity and also provide well-calibrated models that are
reliably transferable beyond the training extent over which the
models are fitted. The proposed approach is in principle
language independent and can be adapted to other social
networks as well. Spatial features such as location of the users
as well as temporal features such as the time of their activities
might be useful features to look into. We recommend using
MaxEnt as an incident-only approach in cyberbullying studies
with imbalanced datasets or rare number of target incidents.
[1] P. K. Smith, J. Mahdavi, M. Carvalho, S. Fisher, S. Russell, and N.
Tippett, “Cyberbullying: its nature and impact in secondary school
pupils.,” J. Child Psychol. Psychiatry., vol. 49, no. 4, pp. 37685,
Apr. 2008.
[2] K. Dinakar, R. Reichart, and H. Lieberman, “Modeling the
Detection of Textual Cyberbullying,” Assoc. Adv. Artif. Intell., pp.
1117, 2011.
[3] D. Yin, Z. Xue, L. Hong, B. D. Davison, A. Kontostathis, and L.
Edwards, “Detection of Harassment on Web 2.0,” Proc. Content
Anal. WEB 2.0 Work. WWW2009, ., 2009.
[4] M. Dadvar, D. Trieschnigg, R. Ordelman, and F. De Jong,
“Improving cyberbullying detection with user context,” Lect. Notes
Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes
Bioinformatics), vol. 7814 LNCS, pp. 693696, 2013.
[5] V. Nahar, X. Li, and C. Pang, “An Effective Approach for
Cyberbullying Detection,” ADC, pp. 160171, 2014.
[6] M. Dadvar, D. Trieschnigg, and F. de Jong, “Experts and Machines
against Bullies: A Hybrid Approach to Detect Cyberbullies,”
Springer International Publishing, 2014, pp. 275281.
[7] M. Dadvar, D. Trieschnigg, and F. de Jong, “Expert knowledge for
automatic detection of bullies in social networks,” pp. 57–64, 2013.
[8] M. Dadvar, “Experts and machines united against cyberbullying,”
University of Twente, Enschede, The Netherlands, 2014.
[9] S. J. Phillips, R. P. Anderson, and R. E. Schapire, “Maximum
entropy modeling of species geographic distributions,” Ecol.
Modell., vol. 190, no. 34, pp. 231259, Jan. 2006.
[10] S. J. Phillips and M. Dudík, “Modeling of species distributions with
Maxent: new extensions and a comprehensive evaluation,”
Ecography (Cop.)., vol. 31, no. 2, pp. 161175, Apr. 2008.
[11] M. Dadvar, D. Trieschnigg, and F. de Jong, “Experts and Machines
against Bullies: A Hybrid Approach to Detect Cyberbullies,” in
Advances in Artificial Intelligence, Springer International
Publishing, 2014, pp. 275281.
[12] D. C. Montgomery and E. A. Peck, introduction to linear regression
analysis. New York, New York, USA: John Wiley and Sons, 1982.
[13] P. Legendre, “Spatial Autocorrelation: Trouble or New Paradim,”
Ecology, no. 74, pp. 16591673, 1993.
[14] A. Niamir, A. K. Skidmore, A. G. Toxopeus, A. R. Munoz, and R.
Real, “Finessing atlas data for species distribution models,” Divers.
Distrib., vol. 17, no. 6, pp. 11731185, 2011.
[15] P. ; N. McCullagh J. A., Generalized Linear Models, vol. 135, no.
3. London: Chapman and Hall, 1989.
[16] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5
32, 2001.
[17] S. P. Maher, C. F. Randin, A. Guisan, and J. M. Drake, “Pattern-
recognition ecological niche models fit to presence-only and
presence-absence data,” Methods Ecol. Evol., vol. 5, no. 8, pp. 761
770, 2014.
[18] A. H. Fielding and J. F. Bell, “A review of methods for the
assessment of prediction errors in conservation presence/absence
models,” Environ. Conserv., vol. 24, no. 01, Mar. 1997.
[19] S. Lemeshow and D. W. Hosmer, A review of goodness of fit
statistics for use in the development of logistic regression models.,”
Am. J. Epidemiol., vol. 115, no. 1, pp. 92106, 1982.
[20] M. E. Miller, S. L. Hui, and W. M. Tierney, “Validation techniques
for logistic regression models.,” Stat. Med., vol. 10, no. 8, pp. 1213
1226, 1991.
[21] J. Pearce and S. Ferrier, “An evaluation of alternative algorithms for
fitting species distribution models using logistic regression,” Ecol.
Modell., vol. 128, no. 23, pp. 127147, 2000.
Conference Paper
Full-text available
Cyberbullying is becoming a major concern in online environments with troubling consequences. However, most of the technical studies have focused on the detection of cyberbullying through identifying harassing comments rather than preventing the incidents by detecting the bullies. In this work we study the automatic detection of bully users on YouTube. We compare three types of automatic detection: an expert system, supervised machine learning models, and a hybrid type combining the two. All these systems assign a score indicating the level of “bulliness” of online bullies. We demonstrate that the expert system outperforms the machine learning models. The hybrid classifier shows an even better performance.
Full-text available
One form of online misbehaviour which has deeply affected society with harmful consequences is known as cyberbullying. Cyberbullying can simply be defined as an intentional act that is conducted through digital technology to hurt someone. Cyberbullying is a widely covered topic in the social sciences. There are many studies in which the problem of cyberbullying has been introduced and its origins and consequences have been explored in detail. There are also studies which have investigated the intervention and prevention strategies and have proposed guidelines for parents and adults in this regard. However, studies on the technical dimensions of this topic are relatively rare. In this research the overall goal was to bridge the gap between social science approaches and technical solutions. In order to be able to suggest solutions that could contribute to minimizing the risk and impact of cyberbullying we have investigated the phenomenon of cyberbullying from different angles. We have thoroughly studied the origin of cyberbullying and its growth over time, as well as the role of technology in the emergence of this type of virtual behaviour and in the potential for reducing the extent of the social concern it raises. First we introduced a novel outlook towards the cyberbullying phenomenon. We looked into the gradual changes which have occurred in relationships and social communication with the emergence of the Internet. We argued that one should look at virtual environments as virtual communities, because the human needs projected on these environments, the relationships, human concerns and misbehaviour have the same nature as in real-life societies. Therefore, to make virtual communities safe, we need to take safety measures and precautions that are similar to the ones that are common in non-virtual communities. We derived the assumption that if cyberbullying is recognized and treated as a social problem and not just seen as some random mischief conducted by individuals with the use of technology, the methods for handling its consequences are likely to be more realistic, effective and comprehensive. This part of our study led to the conviction that for combating cyberbullying, behavioural and psychological studies, and the study of technical solutions should go hand in hand. One of the main limitations that we faced when we started our research was the lack of a comprehensive dataset for cyberbullying studies. We needed a dataset which included real instances of bullying incidents. Moreover, it was essential for our studies to also have the demographic information of the social media users as well as the history of their activities. We started our preliminary experiments using a dataset that was collected from MySpace forums. This dataset did not meet all the requirements for our experiment, namely in terms of size and sufficiency of information. Therefore we developed our own YouTube dataset, with the aim to encompass extensive information about the users and their activities as well as larger numbers of bullying comments. We collected information on user activities and posted textual comments as well as personal and demographic details of the users involved. Detecting a bullying comment or post at the earliest possible moment in time can substantially decrease the negative effects of cyberbullying incidents. We started our experiments by showing that besides the conventional features used for text mining methods such as sentiment analysis and specifically bullying detection, more personal features, in this experiment gender, can improve the accuracy of the detection models. As expected the models which were optimized accordingly resulted in a more accurate classification. The improved outcome motivated us to look into other personal features as well, such as age and the writing style of users. By adding more personal information, the previous classification results were outperformed and the detection accuracy enhanced even further. In the last experiment we made use of experts’ knowledge to identify potential bully users in social networks. To better understand and interpret the intentions underlying the online activities of users of social media, we decided to incorporate human reasoning and knowledge into a bulliness rating system by developing a Multi-Criteria Evaluation System. Moreover, to have more sources of information and to make use of the potential of both human and machine, we designed a hybrid approach, incorporating machine learning models on top of the expert system. The hybrid approach reached an optimum model which outperformed the results obtained from the machine learning models and the expert system individually. Our hybrid model illustrates the added value of integrating technical solutions with insights from the social sciences for the first time. As argued in this thesis, the integration of social studies into a software-enhanced monitoring workflow could pave the way towards the tackling of this kind of online misbehaviour. The ideas and algorithms proposed for fulfilling this purpose can be a stepping stone for future research in this direction. The work carried out is also a demonstration of the added value of frameworks for text categorization, sentiment mining and user profiling in applications addressing societal issues. This work can be viewed as a contribution to the more general societal challenge of increasing the level of cybersecurity, in particular for the younger generations of social network users. By turning the internet into a safer place for children, the chances increase that they will be able to benefit from the informational richness that it also offers.
Full-text available
Web 2.0 has led to the development and evolution of web-based communities and applications. These communities provide places for information sharing and collaboration. They also open the door for inappropriate online activities, such as harassment, in which some users post messages in a virtual community that are intention-ally offensive to other members of the community. It is a new and challenging task to detect online harassment; currently few systems attempt to solve this problem. In this paper, we use a supervised learning approach for detect-ing harassment. Our technique employs content features, sentiment features, and contextual features of documents. The experimental results described herein show that our method achieves significant improvements over several baselines, including Term Frequency-Inverse Document Frequency (TFIDF) approaches. Identification of online harassment is feasible when TFIDF is supplemented with sentiment and contextual feature attributes.
The scourge of cyberbullying has assumed alarming proportions with an ever-increasing number of adolescents admitting to having dealt with it either as a victim or as a bystander. Anonymity and the lack of meaningful supervision in the electronic medium are two factors that have exacerbated this social menace. Comments or posts involving sensitive topics that are personal to an individual are more likely to be internalized by a victim, often resulting in tragic outcomes. We decompose the overall detection problem into detection of sensitive topics, lending itself into text classification sub-problems. We experiment with a corpus of 4500 YouTube comments, applying a range of binary and multiclass classifiers. We find that binary classifiers for individual labels outperform multiclass classifiers. Our findings show that the detection of textual cyberbullying can be tackled by building individual topic-sensitive classifiers.
Identifying the boundary of a species' niche from observational and environmental data is a common problem in ecology and conservation biology and a variety of techniques have been developed or applied to model niches and predict distributions. Here, we examine the performance of some pattern‐recognition methods as ecological niche models ( ENM s). Particularly, one‐class pattern recognition is a flexible and seldom used methodology for modelling ecological niches and distributions from presence‐only data. The development of one‐class methods that perform comparably to two‐class methods (for presence/absence data) would remove modelling decisions about sampling pseudo‐absences or background data points when absence points are unavailable. We studied nine methods for one‐class classification and seven methods for two‐class classification (five common to both), all primarily used in pattern recognition and therefore not common in species distribution and ecological niche modelling, across a set of 106 mountain plant species for which presence–absence data was available. We assessed accuracy using standard metrics and compared trade‐offs in omission and commission errors between classification groups as well as effects of prevalence and spatial autocorrelation on accuracy. One‐class models fit to presence‐only data were comparable to two‐class models fit to presence–absence data when performance was evaluated with a measure weighting omission and commission errors equally. One‐class models were superior for reducing omission errors (i.e. yielding higher sensitivity), and two‐classes models were superior for reducing commission errors (i.e. yielding higher specificity). For these methods, spatial autocorrelation was only influential when prevalence was low. These results differ from previous efforts to evaluate alternative modelling approaches to build ENM and are particularly noteworthy because data are from exhaustively sampled populations minimizing false absence records. Accurate, transferable models of species' ecological niches and distributions are needed to advance ecological research and are crucial for effective environmental planning and conservation; the pattern‐recognition approaches studied here show good potential for future modelling studies. This study also provides an introduction to promising methods for ecological modelling inherited from the pattern‐recognition discipline.
Aim The spatial resolution of species atlases and therefore resulting model predictions are often too coarse for local applications. Collecting distribution data at a finer resolution for large numbers of species requires a comprehensive sampling effort, making it impractical and expensive. This study outlines the incorporation of existing knowledge into a conventional approach to predict the distribution of Bonelli’s eagle (Aquila fasciata) at a resolution 100 times finer than available atlas data.Location Malaga province, Andalusia, southern Spain.Methods A Bayesian expert system was proposed to utilize the knowledge from distribution models to yield the probability of a species being recorded at a finer resolution (1 × 1 km) than the original atlas data (10 × 10 km). The recorded probability was then used as a weight vector to generate a sampling scheme from the species atlas to enhance the accuracy of the modelling procedure. The maximum entropy for species distribution modelling (MaxEnt) was used as the species distribution model. A comparison was made between the results of the MaxEnt using the enhanced and, the random sampling scheme, based on four groups of environmental variables: topographic, climatic, biological and anthropogenic.Results The models with the sampling scheme enhanced by an expert system had a higher discriminative capacity than the baseline models. The downscaled (i.e. finer scale) species distribution maps using a hybrid MaxEnt/expert system approach were more specific to the nest locations and were more contrasted than those of the baseline model.Main conclusions The proposed method is a feasible substitute for comprehensive field work. The approach developed in this study is applicable for predicting the distribution of Bonelli’s eagle at a local scale from a national-level occurrence data set; however, the usefulness of this approach may be limited to well-known species.