Ben Verhoeven

Ben Verhoeven
University of Antwerp | UA · Computational Linguistics & Psycholinguistics Research Center (CLiPS)

PhD in Computational Linguistics

About

32
Publications
16,705
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,523
Citations
Introduction
Additional affiliations
October 2012 - September 2018
University of Antwerp
Position
  • Researcher

Publications

Publications (32)
Article
Full-text available
The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of th...
Article
An important bottleneck in the development of accurate and robust personality recognition systems based on supervised machine learning, is the limited availability of training data, and the high cost involved in collecting it. In this paper, we report on a proof of concept of using ensemble learning as a way to alleviate the data acquisition proble...
Preprint
Full-text available
The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of th...
Article
Full-text available
While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overlo...
Article
We propose a novel way to create categorized discourse lexicons for multiple languages. We combine information from the Penn Discourse Treebank with statistical machine translation techniques on the Europarl corpus. Using gender profiling as an application, we evaluate our approach by comparing it with an approach using features from a knowledge-ba...
Preprint
Full-text available
While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overlo...
Article
Given the common ancestry of Dutch and Afrikaans, it is not surprising that they use similar periphrastic constructions to express progressive meaning: aan het (Dutch) and aan die/’t (Afrikaans) lit. ‘at the’; bezig met /( om ) te (Dutch) lit. ‘busy with/to’ and besig om te lit. ‘busy to’ (Afrikaans); and so-called cardinal posture verb constructio...
Article
Full-text available
This article compares the grammaticalizing human impersonal pronoun "('n) mens" in Afrikaans to fully grammaticalized "men" and non-grammaticalized "een mens" in Dutch. It is shown that "'n mens" and "een mens" can still be used lexically, unlike "mens" and "men", and that "('n) mens" and "een mens" are restricted to non-referential indefinite, uni...
Article
Full-text available
CLIN27 conference poster with intermediate results on cyberbullying detectection in the AMiCA project.
Article
Full-text available
We present two experiments on the automated detection of racist discourse in Dutch social media. In both experiments, multiple classifiers are trained on the same training set. This training set consists of Dutch posts retrieved from two public Belgian social media pages which are likely to attract racist reactions. The posts were labeled as racist...
Article
We present a dictionary-based approach to racism detection in Dutch social media comments, which were retrieved from two public Belgian social media sites likely to attract racist reactions. These comments were labeled as racist or non-racist by multiple annotators. For our approach, three discourse dictionaries were created: first, we created a di...
Conference Paper
Full-text available
The recent development of social media poses new challenges to the research community in analyzing online interactions between people. Social networking sites offer great opportunities for connecting with others, but also increase the vulnerability of young people to undesirable phenomena, such as cybervictimization. Recent research reports that on...
Conference Paper
Full-text available
This paper presents an overview of the author identification task at PAN-2015 evaluation lab. Similar to previous editions of PAN, this shared task focuses on the problem of author verification: given a set of documents by the same author and another document of unknown authorship, the task is to determine whether or not the known and unknown docum...
Conference Paper
Full-text available
In the current era of online interactions, both positive and negative experiences are abundant on the Web. As in real life, negative experiences can have a serious impact on youngsters. Recent studies have reported cybervictimization rates among teenagers that vary between 20% and 40%. In this paper, we focus on cyberbullying as a particular form o...
Conference Paper
Full-text available
We present a computational model for the generation of a Twitter bot that aspires to be considered creative by generat- ing riddles about celebrities and well-known characters. The riddles are created by combining information from both well- structured and poorly-structured information sources. This model has been implemented as an interactive Twit...
Article
Full-text available
This paper describes our submission for theWCPR14 shared task on computational personality recognition. We have in- vestigated whether the features proposed by Soler and Wan- ner [10] for gender prediction might also be useful in per- sonality recognition. We have compared these features with simple approaches using token unigrams, character trigra...
Conference Paper
Full-text available
Compounding, the process of combining several simplex words into a complex whole, is a pro-ductive process in a wide range of languages. In particular, concatenative compounding, in which the components are "glued" together, leads to problems, for instance, in computational tools that rely on a predefined lexicon. Here we present the AuCoPro projec...
Conference Paper
Full-text available
The linguistic categorisation of compounds dates back to some of the earliest work in linguistics. The cross-linguistic compound taxonomy of Bisetto and Scalise (2005), later refined in Scalise and Bisetto (2009), is well-known in linguistics for understanding the grammatical relations in compounds. Although this taxonomy has not been used extensiv...
Conference Paper
Research in computational stylometry has always been constrained by the limited availability of training data since collecting textual data with the appropriate meta-data requires a large effort. We present the CLiPS Stylometry Investigation (CSI) corpus, a new Dutch corpus containing reviews and essays written by university students. It is designe...
Conference Paper
Research in computational stylometry has always been constrained by the limited availability of training data since collecting textual data with the appropriate meta-data requires a large effort. We present the CLiPS Stylometry Investigation (CSI) corpus, a new Dutch corpus containing reviews and essays written by university students. It is designe...
Article
Full-text available
The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significan...
Article
This article describes the first attempt to semantically analyse Dutch noun-noun compounds using the distributional hypothesis, which states that the semantics of a word is implicitly represented by the words in its context. The purpose is not only to classify compounds based on their semantics. We also investigate in what circumstances this classi...
Data
The CSI corpus is a yearly expanded corpus of student texts in two genres: essays and reviews. The purpose of this corpus lies primarily in stylometric research, but many other applications are possible. There is a vast amount of meta-data available, both on the author (gender, age, sexual orientation, region of origin, personality profile) and on...
Conference Paper
Full-text available
The computational processing of compound semantics poses several interesting challenges. Up to now, the processing of nominal compounds with non-noun left-hand constituents (henceforth XN compounds) has not received any attention, despite the fact that these also seem to be rather productive in Germanic languages. In our research project, we aim to...
Conference Paper
An important bottleneck in the development of accurate and robust personality recognition systems based on supervised machine learning, is the limited availability of training data, and the high cost involved in collecting it. In this paper, we report on a proof of concept of using ensemble learning as a way to alleviate the data acquisition proble...
Article
Full-text available
This overview presents the framework and the results for the Author Profiling task at PAN 2014. Objective of this year is the analysis of the adaptability of the detection approaches when given different genres. For this purpose a corpus with four different parts (subcorpora) has been compiled: social media , Twitter, blogs, and hotel reviews. The...
Article
Full-text available
This overview presents the framework and the results for the Author Profiling task at PAN 2014. Objective of this year is the analysis of the adaptability of the detection approaches when given different genres. For this purpose a corpus with four different parts (subcorpora) has been compiled: social media, Twitter, blogs, and hotel reviews. The c...
Conference Paper
Full-text available
This article presents initial results on a supervised machine learning approach to determine the semantics of noun-noun compounds in Dutch and Afrikaans. After a discussion of previous research on the topic, we present our annotation methods used to provide a training set of compounds with the appropriate semantic class. The support vector machine...

Network

Cited By