Ben VerhoevenUniversity of Antwerp | UA · Computational Linguistics & Psycholinguistics Research Center (CLiPS)
Ben Verhoeven
PhD in Computational Linguistics
About
32
Publications
16,705
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,523
Citations
Introduction
Skills and Expertise
Additional affiliations
October 2012 - September 2018
Publications
Publications (32)
The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of th...
An important bottleneck in the development of accurate and robust personality recognition systems based on supervised machine learning, is the limited availability of training data, and the high cost involved in collecting it. In this paper, we report on a proof of concept of using ensemble learning as a way to alleviate the data acquisition proble...
The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of th...
While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overlo...
We propose a novel way to create categorized discourse lexicons for multiple languages. We combine information from the Penn Discourse Treebank with statistical machine translation techniques on the Europarl corpus. Using gender profiling as an application, we evaluate our approach by comparing it with an approach using features from a knowledge-ba...
While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overlo...
Given the common ancestry of Dutch and Afrikaans, it is not surprising that they use similar periphrastic constructions to express progressive meaning: aan het (Dutch) and aan die/’t (Afrikaans) lit. ‘at the’; bezig met /( om ) te (Dutch) lit. ‘busy with/to’ and besig om te lit. ‘busy to’ (Afrikaans); and so-called cardinal posture verb constructio...
This article compares the grammaticalizing human impersonal pronoun "('n) mens" in Afrikaans to fully grammaticalized "men" and non-grammaticalized "een mens" in Dutch. It is shown that "'n mens" and "een mens" can still be used lexically, unlike "mens" and "men", and that "('n) mens" and "een mens" are restricted to non-referential indefinite, uni...
CLIN27 conference poster with intermediate results on cyberbullying detectection in the AMiCA project.
We present two experiments on the automated detection of racist discourse in Dutch social media. In both experiments, multiple classifiers are trained on the same training set. This training set consists of Dutch posts retrieved from two public Belgian social media pages which are likely to attract racist reactions. The posts were labeled as racist...
We present a dictionary-based approach to racism detection in Dutch social media comments, which were retrieved from two public Belgian social media sites likely to attract racist reactions. These comments were labeled as racist or non-racist by multiple annotators. For our approach, three discourse dictionaries were created: first, we created a di...
The recent development of social media poses new challenges to the research community in analyzing online interactions between people. Social networking sites offer great opportunities for connecting with others, but also increase the vulnerability of young people to undesirable phenomena, such as cybervictimization. Recent research reports that on...
This paper presents an overview of the author identification task at PAN-2015 evaluation lab. Similar to previous editions of PAN, this shared task focuses on the problem of author verification: given a set of documents by the same author and another document of unknown authorship, the task is to determine whether or not the known and unknown docum...
In the current era of online interactions, both positive and negative experiences are abundant on the Web. As in real life, negative experiences can have a serious impact on youngsters. Recent studies have reported cybervictimization rates among teenagers that vary between 20% and 40%. In this paper, we focus on cyberbullying as a particular form o...
We present a computational model for the generation of a
Twitter bot that aspires to be considered creative by generat-
ing riddles about celebrities and well-known characters. The
riddles are created by combining information from both well-
structured and poorly-structured information sources. This
model has been implemented as an interactive Twit...
This paper describes our submission for theWCPR14 shared task on computational personality recognition. We have in- vestigated whether the features proposed by Soler and Wan- ner [10] for gender prediction might also be useful in per- sonality recognition. We have compared these features with simple approaches using token unigrams, character trigra...
Compounding, the process of combining several simplex words into a complex whole, is a pro-ductive process in a wide range of languages. In particular, concatenative compounding, in which the components are "glued" together, leads to problems, for instance, in computational tools that rely on a predefined lexicon. Here we present the AuCoPro projec...
The linguistic categorisation of compounds dates back to some of the earliest work in linguistics. The cross-linguistic compound taxonomy of Bisetto and Scalise (2005), later refined in Scalise and Bisetto (2009), is well-known in linguistics for understanding the grammatical relations in compounds. Although this taxonomy has not been used extensiv...
Research in computational stylometry has always been constrained by the limited availability of training data since collecting textual data with the appropriate meta-data requires a large effort. We present the CLiPS Stylometry Investigation (CSI) corpus, a new Dutch corpus containing reviews and essays written by university students. It is designe...
Research in computational stylometry has always been constrained by the limited availability of training data since collecting textual data with the appropriate meta-data requires a large effort. We present the CLiPS Stylometry Investigation (CSI) corpus, a new Dutch corpus containing reviews and essays written by university students. It is designe...
The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significan...
This article describes the first attempt to semantically analyse Dutch noun-noun compounds using the distributional hypothesis, which states that the semantics of a word is implicitly represented by the words in its context. The purpose is not only to classify compounds based on their semantics. We also investigate in what circumstances this classi...
The CSI corpus is a yearly expanded corpus of student texts in two genres: essays and reviews. The purpose of this corpus lies primarily in stylometric research, but many other applications are possible. There is a vast amount of meta-data available, both on the author (gender, age, sexual orientation, region of origin, personality profile) and on...
The computational processing of compound semantics poses several interesting challenges. Up to now, the processing of nominal compounds with non-noun left-hand constituents (henceforth XN compounds) has not received any attention, despite the fact that these also seem to be rather productive in Germanic languages. In our research project, we aim to...
An important bottleneck in the development of accurate and robust personality recognition systems based on supervised machine learning, is the limited availability of training data, and the high cost involved in collecting it. In this paper, we report on a proof of concept of using ensemble learning as a way to alleviate the data acquisition proble...
This overview presents the framework and the results for the Author Profiling task at PAN 2014. Objective of this year is the analysis of the adaptability of the detection approaches when given different genres. For this purpose a corpus with four different parts (subcorpora) has been compiled: social media , Twitter, blogs, and hotel reviews. The...
This overview presents the framework and the results for the Author Profiling task at PAN 2014. Objective of this year is the analysis of the adaptability of the detection approaches when given different genres. For this purpose a corpus with four different parts (subcorpora) has been compiled: social media, Twitter, blogs, and hotel reviews. The c...
This article presents initial results on a supervised machine learning approach to determine the semantics of noun-noun compounds in Dutch and Afrikaans. After a discussion of previous research on the topic, we present our annotation methods used to provide a training set of compounds with the appropriate semantic class. The support vector machine...