Chris Emmery

Chris Emmery
Tilburg University | UVT · Department of Cognitive Science and Artificial Intelligence

PhD

About

24
Publications
9,548
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
540
Citations
Introduction
I'm interested in the effect of intelligent systems on our lives. Systems that uncover our personal information, monitor and change our behavior, restrict our exposure to information, and treat us unfairly. My current research focuses on the dual-use of computational stylometry; a field that aims to infer information from writing for good, proving harmfully invasive at the same time. I develop open-source tools to better understand, and defend against such techniques invading one's privacy.
Additional affiliations
November 2016 - October 2022
Tilburg University
Position
  • Lecturer
Description
  • I was a lecturer at Tilburg, and joint PhD student with CLIPS, University of Antwerp. I taught Language & AI (1y), Data Mining (5y), Data Processing (Python, 2y), Text Mining (1y), and Spatio-temporal Data Processing (1y). In addition to the work I did for AMiCA, my dissertation mainly focused on user-centered security, and how to use Machine Learning to protect users from exposing latent information through their language use.
November 2014 - November 2016
University of Antwerp
Position
  • Researcher
Description
  • I worked on the AMiCA project for two years. This involved scientific development of tools for text forensics and online security such as detection of cyberbullying and child grooming, and author profiling.

Publications

Publications (24)
Preprint
Full-text available
This dissertation proposes a framework of user-centered security in Natural Language Processing (NLP), and demonstrates how it can improve the accessibility of related research. Accordingly, it focuses on two security domains within NLP with great public interest. First, that of author profiling, which can be employed to compromise online privacy t...
Conference Paper
Full-text available
Written language contains stylistic cues that can be exploited to automatically infer a variety of potentially sensitive author information. Adversarial stylometry intends to attack such models by rewriting an author’s text. Our re-search proposes several components to facilitate deployment of these adversarial attacks in the wild, where neither da...
Article
Full-text available
The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of th...
Preprint
Full-text available
A limited amount of studies investigates the role of model-agnostic adversarial behavior in toxic content classification. As toxicity classifiers predominantly rely on lexical cues, (deliberately) creative and evolving language-use can be detrimental to the utility of current corpora and state-of-the-art models when they are deployed for content mo...
Article
Full-text available
This study discusses the effect of semi-supervised learning in combination with pretrained language models for data-to-text generation. It is not known whether semi-supervised learning is still helpful when a large-scale language model is also supplemented. This study aims to answer this question by comparing a data-to-text system only supplemented...
Preprint
Full-text available
While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high-cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should...
Preprint
Full-text available
This study discusses the effect of semi-supervised learning in combination with pretrained language models for data-to-text generation. It is not known whether semi-supervised learning is still helpful when a large-scale language model is also supplemented. This study aims to answer this question by comparing a data-to-text system only supplemented...
Preprint
Full-text available
Recent years have seen an increasing need for gender-neutral and inclusive language. Within the field of NLP, there are various mono- and bilingual use cases where gender inclusive language is appropriate, if not preferred due to ambiguity or uncertainty in terms of the gender of referents. In this work, we present a rule-based and a neural approac...
Preprint
Full-text available
Written language contains stylistic cues that can be exploited to automatically infer a variety of potentially sensitive author information. Adversarial stylometry intends to attack such models by rewriting an author's text. Our research proposes several components to facilitate deployment of these adversarial attacks in the wild, where neither dat...
Conference Paper
Full-text available
This paper describes the CACAPO dataset, built for training both neural pipeline and end-to-end data-to-text language generation systems. The dataset is multilingual (Dutch and English), and contains almost 10,000 sentences from human-written news texts in the sports, weather, stocks, and incidents domain, together with aligned attribute-value pair...
Preprint
Full-text available
The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of th...
Article
Full-text available
The suggestions proposed by Lee et al. to improve cognitive modeling practices have significant parallels to the current best practices for improving reproducibility in the field of Machine Learning. In the current commentary on “robust modeling in cognitive science”, we highlight the practices that overlap and discuss how similar proposals have pr...
Preprint
Full-text available
The suggestions proposed by Lee et al. to improve cognitive modeling practices have significant parallels to the current best practices for improving reproducibility in the field of Machine Learning. In the current commentary on `Robust modeling in cognitive science', we highlight the practices that overlap and discuss how similar proposals have pr...
Article
Full-text available
While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overlo...
Conference Paper
Full-text available
The task of obfuscating writing style using sequence models has previously been investigated under the framework of obfuscation-by-transfer, where the input text is explicitly rewritten in another style. A side effect of this framework are the frequent major alterations to the semantic content of the input. In this work, we propose obfuscation-by-i...
Preprint
Full-text available
The task of obfuscating writing style using sequence models has previously been investigated under the framework of obfuscation-by-transfer, where the input text is explicitly rewritten in another style. These approaches also often lead to major alterations to the semantic content of the input. In this work, we propose obfuscation-by-invariance, an...
Preprint
Full-text available
While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overlo...
Article
Full-text available
CLIN27 conference poster with intermediate results on cyberbullying detectection in the AMiCA project.
Article
Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of constructing these for a variety of language-specific tasks. Still, many of the datasets used in these tasks could...
Conference Paper
Full-text available
In most languages, new words can be created through the process of compounding, which combines two or more words into a new lexical unit. Whereas in languages such as English the components that make up a compound are separated by a space, in languages such as Finnish, German, Afrikaans and Dutch these components are concatenated into one word. Com...

Network

Cited By