Cross-corpora positive class F 1 scores for Experiment I (T1), II (T2), and III (T3). Models are fitted on the training proportion of the corpora row-wise, and tested column- wise. The out-of-domain average (Avg) excludes test performance of the parent training corpus. The best overall test score is noted in bold, the best out-of-domain performance in gray.

Cross-corpora positive class F 1 scores for Experiment I (T1), II (T2), and III (T3). Models are fitted on the training proportion of the corpora row-wise, and tested column- wise. The out-of-domain average (Avg) excludes test performance of the parent training corpus. The best overall test score is noted in bold, the best out-of-domain performance in gray.

Source publication
Preprint
Full-text available
The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of th...

Contexts in source publication

Context 1
... will now cover results per experiment, and to what extent these provide support for the hypotheses posed in Section 3. As most of these required backward evaluation (e.g., Experiment III was tested on sets from Experiment I), the results of Experiment I-III are compressed in Table 4. Table 6 comprises the Improving Representations part of Experiment II (under 'word2vec' and 'DistilBERT') along with the preprocessing results effect of our baselines. ...
Context 2
... at Table 4, the upper group of rows under T1 represents the results for Experiment I. We posed in Hypothesis 1 that samples are underpowered regarding their representation of the language variation between platforms, both for bullying and normal language-use. The data analysis in Section 4.5 showed minimal overlap between domains in vocabulary and notable variances in numerous aspects of the available corpora. ...
Context 3
... Experiment I, however, our goal was to assess the out-of-domain performance of these classifiers, not to maximize performance. For this, we turn to the Avg column in Table 4. Between the top portion of the Table, the D ask model performs best across all domains (achieving highest on three, as mentioned above). ...
Context 4
... that these coefficient values can also flip to negative for particular sets, so for some of the features, the range goes from associated with the other class to highly associated with bullying. Given the results of Table 4 and Figure 4, we can conclude that our baseline model shows not to generalize out-of-domain. Given the quantitative and qualitative results reported on in this Experiment, this particular setting partly supports Hypothesis 1. ...
Context 5
... results for this experiment can be predominantly found in Table 4 (middle and lower parts, and T2 in particular), and partly in Table 6 (word2vec, DistilBERT). In this experiment, we seek to further test Hypothesis 1 by employing three methods: merging all cyberbullying data to increase volume and variety, aggregating on context level for a context change, and improving representations through pre-trained word embedding features. ...
Context 6
... results for this part are listed under D all in Table 4. For all of the following experiments, we now focus on the full results table (including that of Experi- h* of me xoxoxoxoxoxoox ment I) and see which individual classifiers generalize best across all test sets (highlighted in gray). ...
Context 7
... Change As for access to context scopes, we are restricted to the Ask.fm and Formspring data (C f rm and C ask in Table 4). Nevertheless, in both cases, we see a noticeable increase for in-domain performance: a positive F 1 score of .579 ...
Context 8
... add more empirical evidence to this, we trained models on toxicity, or cyber aggression, and tested them on bullying data (and vice-versa)-providing results on the overlap between the tasks. The results for this experiment can be found in the lower end of Table 4, under D tox and T3. ...
Context 9
... More strikingly, however, the other way around, toxicity classifiers perform second-best on the out-of-domain averages (Avg in Table 4). In the context scopes (C f rm and C ask ) it is notably close, and for other sets relatively close, to the in-domain performance. ...

Similar publications

Article
Full-text available
The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of th...