Content uploaded by Maral Dadvar
Author content
All content in this area was uploaded by Maral Dadvar on Nov 05, 2015
Content may be subject to copyright.
MaximumEntropyModellingofBullyUsersinSocialNetworks
MaralDadvar
HumanMediaInteraction,UniversityofTwente
UnitedNationsGlobalPulse(UNGP)
dadvar.maral@gmail.com
Keywords.MaximumEntropy,Cyberbullying,SocialNetworks
Abstract. Cyberbullying is a widespread problem among children and adolescents.
Detection of bully users is one of the main course of actions to combat cyberbullying
in social networks. Advances in artificial intelligence along with powerful
computational facilities has fueled a rapid increase in predictive modeling of bully
users from massive social network’s data. However, low prevalence of bullying
incidents made the labelling process costly and laborious. Commonly used
cyberbullying methods have been criticized for being inherently dependent on
prevalence, and has been argued that low number of target class member (i.e. bully
users) introduces statistical artefacts. To overcome this barrier, I proposed the use of
theMaximumEntropy(ME)methodformodellingofbullyusersinsocialnetworks.
ME is a generalpurpose machine learning method with a simple and precise
mathematical formulation, and it has number of aspects that make it wellsuited for
cyberbullying studies. In order to evaluate the proposed method, I performed a case
study using YouTube dataset. The dataset has been manually labeled as bullying
and nonbullying posts. We compiled a set of 11 features in three categories to
identify bullying users namely user features, content features, and activity features
representingthecharacteristics,actionsandbehaviouroftheusers,respectively.
ME predictions were compared with those of commonly used modeling methods;
Boosted Regression Trees (BRT), Random Forests (RF), and Support Vector
Machine (SVM). Predictions were made in 10 steps and in each step 10% of the
remaining bullying posts were randomly excluded. All models provided reasonable
prediction of the bullying incidents and were significantly better than random in both
binomial tests of omission and receiver operating characteristic (ROC) analyses. The
area under the ROC curve (AUC) was always higher for ME, indicating better
discrimination of bullying post. Performance of ME was more robust and less
sensitivetowardsprevalence.
We believe that ME method can be used in its present form for many applications
withimbalanceddatasets,andmeritsfurtherresearchanddevelopment.