About
39
Publications
13,483
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
439
Citations
Introduction
Current institution
Publications
Publications (39)
A major challenge for survey researchers is dealing with missing data, which restricts the scope of analysis and the reliability of inferences that can be drawn. Recently, researchers have started investigating the potential of Large Language Models (LLMs) to role-play a pre-defined set of ``characters'' and simulate their survey responses with lit...
We present a novel approach for enhancing diversity and control in data annotation tasks by personalizing large language models (LLMs). We investigate the impact of injecting diverse persona descriptions into LLM prompts across two studies, exploring whether personas increase annotation diversity and whether the impacts of individual personas on th...
Surveys are a cornerstone of empirical social science research, providing invaluable insights into the opinions, beliefs, behaviours, and characteristics of people. However, issues such as refusal to participate, skipping questions, sampling bias, and attrition significantly impact the quality and reliability of survey data. Recently, researchers h...
A major challenge for survey researchers is dealing with missing data, which restricts the scope of analysis and the reliability of inferences that can be drawn. Recently, researchers have started investigating the potential of Large Language Models (LLMs) to role-play a pre-defined set of ``characters'' and simulate their survey responses with lit...
Machine learning (ML)-based content moderation tools are essential to keep online spaces free from hateful communication. Yet ML tools can only be as capable as the quality of the data they are trained on allows them. While there is increasing evidence that they underperform in detecting hateful communications directed towards specific identities a...
There is an increase in the proliferation of online hate commensurate with the rise in the usage of social media. In response, there is also a significant advancement in the creation of automated tools aimed at identifying harmful text content using approaches grounded in Natural Language Processing and Deep Learning. Although it is known that trai...
The characterization and detection of bots with their presumed ability to manipulate society on social media platforms have been subject to many research endeavors over the last decade. In the absence of ground truth data (i.e., accounts that are labeled as bots by experts or self-declare their automated nature), researchers interested in the chara...
This review paper provides a conceptualization of AI-assisted content moderation with various degrees of autonomy and summarizes experimental evidence for how different levels of automation in content moderation and related losses of autonomy affect individuals and groups. Our results show that current research predominantly focuses on individual l...
At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the plat...
Social media can be a mirror of human interaction, society, and historic disruptions. Their reach enables the global dissemination of information in the shortest possible time and, thus, the individual participation of people worldwide in global events in almost real-time. However, these platforms can be equally efficiently used in information warf...
The characterization and detection of social bots with their presumed ability to manipulate society on social media platforms have been subject to many research endeavors over the last decade, leaving a research gap on the impact of bots and accompanying phenomena on platform users and society. In this systematic data-driven study, we explore the u...
At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the plat...
Stream clustering is a technique capable of identifying homogeneous groups of observations that continuously arrive in a digital stream. In this work, we inherently refine a TF-IDF-based text stream clustering algorithm by the introduction of an automated distance threshold adaption technique for document insertion and cluster merging, improving th...
Computational social science uses computational and statistical methods in order to evaluate social interaction. The public availability of data sets is thus a necessary precondition for reliable and replicable research. These data allow researchers to benchmark the computational methods they develop, test the generalizability of their findings, an...
Abuse and hate are penetrating social media and many comment sections of news media companies. These platform providers invest considerable efforts to moderate user-generated contributions to prevent losing readers who get appalled by inappropriate texts. This is further enforced by legislative actions, which make non-clearance of these comments a...
Online comment sections revolutionised the participatory discourse as enabled by news media, limiting the hurdles to participate and speeding up the process from submission to publication. What was initially meant to strengthen public debates and democracy turned out to suffer from abusive use: Be it insulting journalists, posting misinformation, o...
While abusive language in online contexts is a long-known problem, algorithmic detection and moderation support are only recently experiencing rising interest. This survey provides a structured overview of the latest academic publications in the domain. Assessed concepts include the used datasets, their language, annotation origins and quality, as...
Nowadays fake news are heavily discussed in public and political debates. Even though the phenomenon of intended false information is rather old, misinformation reaches a new level with the rise of the internet and participatory platforms. Due to Facebook and Co., purposeful false information - often called fake news - can be easily spread by every...
The identification of coordinated campaigns within Social Media is a complex task that is often hindered by missing labels and large amounts of data that have to be processed. We propose a new two-phase framework that uses unsupervised stream clustering for detecting suspicious trends over time in a first step. Afterwards, traditional offline analy...
The past decade has been characterized by a strong increase in the use of social media and a continuous growth of public online discussion. With the failure of purely manual moderation, platform operators started searching for semi-automated solutions, where the application of Natural Language Processing (NLP) and Machine Learning (ML) techniques i...
Recently, social bots, (semi-) automatized accounts in social media, gained global attention in the context of public opinion manipulation. Dystopian scenarios like the malicious amplification of topics, the spreading of disinformation, and the manipulation of elections through “opinion machines” created headlines around the globe. As a consequence...
The detection of orchestrated and potentially manipulative campaigns in social media is far more meaningful than analyzing single account behaviour but also more challenging in terms of pattern recognition, data processing, and computational complexity. While supervised learning methods need an enormous amount of reliable ground truth data to find...
Nowadays fake news are heavily discussed in public and political debates. Even though the phenomenon of intended false information is rather old, misinformation reaches a new level with the rise of the internet and participatory platforms. Due to Facebook and Co., purposeful false information - often called fake news - can be easily spread by every...
Social bots have recently gained attention in the context of public opinion manipulation on social media platforms. While a lot of research effort has been put into the classification and detection of such automated programs, it is still unclear how technically sophisticated those bots are, which platforms they target, and where they originate from...
The digitization of the world has also led to a digitization of communication processes. Traditional research methods fall short in understanding communication in digital worlds as the scope has become too large in volume, variety, and velocity to be studied using traditional approaches. In this paper, we present computational methods and their use...
Social bots have recently gained attention in the context of public opinion manipulation on social media platforms. While a lot of research effort has been put into the classification and detection of such (semi-)automated programs, it is still unclear how sophisticated those bots actually are, which platforms they target, and where they originate...
The identification of automated activitiy in social media, specifically the detection of social bots, has become one of the major tasks within the field of social media computation. Recently published classification algorithms and frameworks focus on the identification of single bot accounts. Within different Twitter experiments, we show that these...
Presentation of the junior research group DemoRESILdigital at the DGPUK in Mannheim
This paper proposes a new stream clustering algorithm for text streams. The algorithm combines concepts from stream clustering and text analysis in order to incrementally maintain a number of text droplets that represent topics within the stream. Our algorithm adapts to changes of topic over time and can handle noise and outliers gracefully by deca...
Analysing streaming data has received considerable attention over the recent years. A key research area in this field is stream clustering which aims to recognize patterns in a possibly unbounded data stream of varying speed and structure. Over the past decades a multitude of new stream clustering algorithms have been proposed. However, to the best...