Bettina Berendt's research while affiliated with Technische Universität Berlin and other places

Publications (239)

Article
When technology enters applications and processes with a long tradition of controversial societal debate, multi- faceted new ethical and legal questions arise. This paper focusses on the process of whistleblowing, an activity with large impacts on democracy and business. Computer science can, for the first time in history, provide for truly anonymo...
Preprint
Large pre-trained language models are successfully being used in a variety of tasks, across many languages. With this ever-increasing usage, the risk of harmful side effects also rises, for example by reproducing and reinforcing stereotypes. However, detecting and mitigating these harms is difficult to do in general and becomes computationally expe...
Preprint
Pre-trained large-scale language models such as BERT have gained a lot of attention thanks to their outstanding performance on a wide range of natural language tasks. However, due to their large number of parameters, they are resource-intensive both to deploy and to fine-tune. Researchers have created several methods for distilling language models...
Preprint
An increasing awareness of biased patterns in natural language processing resources, like BERT, has motivated many metrics to quantify `bias' and `fairness'. But comparing the results of different metrics and the works that evaluate with such metrics remains difficult, if not outright impossible. We survey the existing literature on fairness metric...
Preprint
Full-text available
When technology enters applications and processes with a long tradition of controversial societal debate, multi-faceted new ethical and legal questions arise. This paper focusses on the process of whistleblowing, an activity with large impacts on democracy and business. Computer science can, for the first time in history, provide for truly anonymou...
Conference Paper
Full-text available
Algorithmic and data-driven systems have been introduced to assist Public Employment Services (PES) in various countries. However , their deployment has been heavily criticized. This paper is based on a workshop organized by a distributed team of researchers in AI ethics and adjacent fields, which brought together academics, system developers , rep...
Article
Machine learning is being integrated into a growing number of critical systems with far-reaching impacts on society. Unexpected behaviour and unfair decision processes are coming under increasing scrutiny due to this widespread use and its theoretical considerations. Individuals, as well as organisations, notice, test, and criticize unfair results...
Preprint
We classify seven months' worth of Belgian COVID-related Tweets using multilingual BERT and relate them to their governments' COVID measures. We classify Tweets by their stated opinion on Belgian government curfew measures (too strict, ok, too loose). We examine the change in topics discussed and views expressed over time and in reference to dates...
Article
Fact-checking has always been a central task of journalism, but given the ever-growing amount and speed of news offline and online, as well as the growing amounts of misinformation and disinformation, it is becoming increasingly important to support human fact-checkers with (semi-)automated methods to make their work more efficient. Within fact-che...
Article
This article examines benefits and risks of Artificial Intelligence (AI) in education in relation to fundamental human rights. The article is based on an EU scoping study [Berendt, B., A. Littlejohn, P. Kern, P. Mitros, X. Shacklock, and M. Blakemore. 2017. Big Data for Monitoring Educational Systems. Luxembourg: Publications Office of the European...
Preprint
Full-text available
Machine learning is being integrated into a growing number of critical systems with far-reaching impacts on society. Unexpected behaviour and unfair decision processes are coming under increasing scrutiny due to this widespread use and also due to theoretical considerations. Individuals, as well as organisations, notice, test, and criticize unfair...
Article
Full-text available
Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for...
Preprint
Pre-trained language models have been dominating the field of natural language processing in recent years, and have led to significant performance gains for various complex natural language tasks. One of the most prominent pre-trained language models is BERT (Bi-directional Encoders for Transformers), which was released as an English as well as a m...
Preprint
Full-text available
AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and...
Article
Increasingly, algorithms play an important role in everyday decision-making processes. Recommender systems, specifically, are algorithms that serve to influence end-users’ decision-making (e.g. what to read, who to befriend, who to rent to…). However, the companies that develop and produce these systems are not neutral, but have an economic goal an...
Preprint
Full-text available
Graphical emoji are ubiquitous in modern-day online conversations. So is a single thumbs-up emoji able to signify an agreement, without any words. We argue that the current state-of-the-art systems are ill-equipped to correctly interpret these emoji, especially in a conversational context. However, in a casual context, the benefits might be high: a...
Conference Paper
Full-text available
It is our great pleasure to welcome you to the Second FairUMAP workshop at UMAP 2019. This full-day workshop brings together researchers working at the intersection of user modeling, adaptation, and personalization on one hand, and bias, fairness and transparency in algorithmic systems on the other hand. The workshop was motivated by the observatio...
Article
Fake news is increasingly an issue on social media platforms. In this work, rather than detect misinformation, we propose the use of nudges to help steer internet users into fact checking the news they read online. We discuss two types of nudging strategies, by presentation and by information. We present the tool BalancedView, a proof-of-concept th...
Article
Full-text available
This article examines key ethical issues that are continuing to emerge from the task of archiving data scraped from online sources such as social media sites, blogs, and forums, particularly pertaining to online harassment and hostile groups. Given the proliferation of digital social data, an understanding of ethics and data stewardship that evolve...
Article
Full-text available
Recently, many AI researchers and practitioners have embarked on research visions that involve doing AI for “Good”. This is part of a general drive towards infusing AI research and practice with ethical thinking. One frequent theme in current ethical guidelines is the requirement that AI be good for all, or: contribute to the Common Good. Butwhat i...
Preprint
Recently, many AI researchers and practitioners have embarked on research visions that involve doing AI for "Good". This is part of a general drive towards infusing AI research and practice with ethical thinking. One frequent theme in current ethical guidelines is the requirement that AI be good for all, or: contribute to the Common Good. But what...
Article
Full-text available
Fighting crime has historically been a field that drives technological innovation, and it can serve as an example of different governance styles in societies. Predictive policing is one of the recent innovations that covers technical trends such as machine learning, preventive crime fighting strategies, and actual policing in cities. However, it se...
Conference Paper
The Diversity Checker is a tool that aims to make it easier for journalists to author their texts with diversity in mind. To provide helpful hints for them in this respect, it is necessary to define how to quantify diversity so that this can be programmed into the tool. At this early stage in the development of the tool, we present a two-fold contr...
Conference Paper
The FairUMAP Workshop at UMAP 2018 brought together researchers working at the intersection of user modeling, adaptation, and personalization on the one hand, and bias and fairness in machine learning on the other hand.
Preprint
Full-text available
The EU's General Data Protection Regulation is poised to present major challenges in bridging the gap between law and technology. This paper reports on a workshop on the deployment, content and design of the GDPR that brought together academics, practitioners, civil-society actors, and regulators from the EU and the US. Discussions aimed at advanci...
Chapter
The EU’s General Data Protection Regulation is poised to present major challenges in bridging the gap between law and technology. This paper reports on a workshop on the deployment, content and design of the GDPR that brought together academics, practitioners, civil-society actors, and regulators from the EU and the US. Discussions aimed at advanci...
Conference Paper
Full-text available
With the proliferation of online news read on devices ranging from desktops to smart watches, the need for meaningful summaries of long texts is growing. Manual summaries are labour-intensive and cannot be offered for all display sizes, whereas today's abstracts of most news texts are teasers designed to attract the reader's interest more than to p...
Book
Full-text available
Profile haben Konjunktur. Seit der Verbreitung von Social Networking Sites sind sie alltäglicher Ort der Selbstdarstellung. Doch die Praktiken und Techniken der Profilierung sind keineswegs neu. Schon lange beschreiben Profile potentielle StraftäterInnen. Nun bestimmen sie auch die potentielle Kreditwürdigkeit. Im Spannungsfeld zwischen Profil und...
Article
"Big Data" and data-mined inferences are affecting more and more of our lives, and concerns about their possible discriminatory effects are growing. Methods for discrimination-aware data mining and fairness-aware data mining aim at keeping decision processes supported by information technology free from unjust grounds. However, these formal approac...
Conference Paper
Data Protection by Design (DPbD, also known as Privacy by Design) has received much attention in recent years as a method for building data protection into IT systems from the start. In the EU, DPbD will become mandatory from 2018 onwards under the GDPR. In earlier work, we emphasized the multidisciplinary nature of DPbD. The present paper builds o...
Article
Full-text available
This article shows that the collaboration between social science and computer science scholars proves fruitful in enhancing conceptual and methodological innovation in research appropriate for the digital world. It presents arguments for ways in which a multi-disciplinary approach can strengthen media studies and innovatively advance both research...
Article
The increasing popularity of social networking sites has been a source of many privacy concerns. To mitigate these concerns and empower users, different forms of educational and technological solutions have been developed. Developing and evaluating such solutions, however, cannot be considered a neutral process. Instead, it is socially bound and in...
Chapter
The analysis of texts has been central to humanists since at least the renaissance. Italian humanists like Lorenzo Valla developed critical practices of interpretation and textual analysis as they tried to recover and interpret the classical texts of Greece and Rome. This chapter has been a necessary episodic jump over the evolution of text analysi...
Article
Full-text available
Understanding users’ sentiment expression in social media is important in many domains, such as marketing and online applications. Is one demographic group inherently different from another? Does a group express the same sentiment both in private and public? How can we compare the sentiments of different groups composed of multiple attributes? In t...
Conference Paper
Usage mining always was and still is a key topic for research in the context of the Web [16]. This is evidenced by the series of papers that appear in the scientific tracks of the WWW conference year by year. Web usage is being studied to create economic value by placing targeted ads or delivering personalized content, but also in order to better u...
Conference Paper
The concept of Privacy by Design (PbD) is a vision for creating data-processing environments in a way that respects privacy and data protection in the design of products and processes from the start. PbD has been inspired by and elaborated in different disciplines (especially law and computer science). Developments have taken place in research and...
Book
The three volume set LNAI 9851, LNAI 9852, and LNAI 9853 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2016, held in Riva del Garda, Italy, in September 2016. The 123 full papers and 16 short papers presented were carefully reviewed and selected from a total of 46...
Book
This book constitutes the thoroughly refereed post-conference proceedings of the Third Annual Privacy Forum, APF 2015, held in Luxembourg, Luxembourg, in October 2015. The 11 revised full papers presented in this volume were carefully reviewed and selected from 24 submissions. The topics focus on privacy by design (PbD), i.e. the attempt to combine...
Chapter
News and blogs are temporally indexed online texts and play a key role in today’s information distribution and consumption. News communicate selected information on current events, written by professional or citizen journalists; blogs are updated publications on the Web that span a much wider range of topics, styles, and authors. Particularly impor...
Raw Data
The USEWOD 2016 research dataset is a collection of usage data from Web of Data sources, which have been collected in 2015. It covers sources such as DBpedia, the Linked Data Fragments interface to DBpedia as well as Wikidata page views.
Article
We present a new form of online tracking: explicit, yet unnecessary leakage of personal information and detailed shopping habits from online merchants to payment providers. In contrast to the widely debated tracking of Web browsing, online shops make it impossible for their customers to avoid this dissemination of their data. We record and analyse...
Conference Paper
Full-text available
Die Bedrohung der Privatsphäre hat im 21. Jahrhundert im Wesentlichen zwei Dimen¬sionen: zum einen die Datensammelindustrie (Facebook, Google & Co.) zusammen mit der „frei¬willigen“ Veröffentlichung personenbezogener Daten der Internet-Nutzer, zum anderen die an¬lasslose Massenüberwachung durch die Geheimdienste. Im Workshop sollen unterschiedliche...
Conference Paper
Full-text available
Understanding users’ sentiments in social media is important in many domains, such as marketing and online applications. Is one demographic group inherently different from another? Does a group express the same sentiment both in private and public? How can we compare the sentiments of different groups composed of multiple attributes? In this paper,...
Article
“How to be a knowledge scientist after the Snowden revelations?” is a question we all have to ask as it becomes clear that our work and our students could be involved in the building of an unprecedented surveillance society. In this essay, we argue that this affects all the knowledge sciences such as AI, computational linguistics and the digital hu...
Article
The presence of multiple audiences and the collapse of boundaries between them in Facebook make it difficult for users to know and to control who has access to their online contributions. Previous research has shown how visualizations of Facebook friends are useful, but mainly focused on the instrumental goal of controlling access. It is unclear, h...
Conference Paper
We describe a new form of online tracking: explicit, yet unnecessary leakage of personal information and detailed shopping habits from online merchants to payment providers. In contrast to Web tracking, online shops make it impossible for their customers to avoid this proliferation of their data. We record and analyse leakage patterns for N = 881 U...
Article
The USEWOD 2015 research dataset is a collection of Linked Data endpoint access log files, which have been collected from 2014 until 2015. It contains various sources including DBpedia and the YASGUI SPARQL interface. This dataset can be requested via http://library.soton.ac.uk/datarequest - please also email a scanned copy of the signed Usage Agr...
Article
Webangebote sozialer Netzwerke wie Facebook und Twitter erfreuen sich einer großen und noch immer steigenden Beliebtheit bei Jugendlichen wie Erwachsenen. Gleichzeitig stehen sie immer stärker im Fokus einer Vielzahl von Bedenken hinsichtlich ihrer Auswirkungen auf unsere Privatsphäre (vgl. z. B. Berthold, 2010). Es geht um direkte und zeitnahe Aus...
Technical Report
Full-text available
ENISA is one of the key stakeholders in Europe in the area of Network and Information Security (NIS). Given its positioning, ENISA is active in the area of education and awareness, using its knowledge to promote NIS skills and supporting the Commission in enhancing the skills and competence of professionals in this area. This document continues wor...
Article
Choice Architecture for Human-Computer Interaction focuses on systems that help people choose for themselves. Realizing this potential requires a well-founded understanding of the ways in which people make everyday choices and the design strategies and computing technologies that can be used to support these processes. This work offers a compact sy...
Article
Decision makers in banking, insurance or employment mitigate many of their risks by telling “good” individuals and “bad” individuals apart. Laws codify societal understandings of which factors are legitimate grounds for differential treatment (and when and in which contexts)—or are considered unfair discrimination, including gender, ethnicity or ag...
Conference Paper
In this paper we describe a tool designed to support crowdsourcing a-posteori provenance information about the datasets used in research publications. It generates PROV data both to capture the data citation graphs via an extension to the PROV Data Model, and the crowdsourcing process via prov:bundles.
Chapter
Users of Online Social Networks (OSN) may share private information with the “wrong” friends. To help users choose their audience better, we first designed a tool for the exploratory visualization of friend groupings. These groups (“circles”) are formed by a hierarchical modularity-based algorithm for community detection (MOD). We then conducted a...
Chapter
News production, delivery, and consumption are increasing in ubiquity and speed, spreading over more software and hardware platforms, in particular mobile devices. This has led to an increasing interest in automated methods for multi-document summarization. The authors start this chapter with discussing several new alternatives for automated news s...
Article
The USEWOD 2014 research dataset is a collection of Linked Data endpoint access log files, which have been collected from 2013 until 2014. It covers sources such as DBpedia, data.semanticweb.org, LinkedGeoData, and BioPortal. This dataset can be requested via http://library.soton.ac.uk/datarequest - please also email a scanned copy of the signed U...
Conference Paper
Full-text available
Visualizations can stand in many relations to texts – and, as research into learning with pictures has shown, they can become particularly valuable when they transform the contents of the text (rather than just duplicate its message or structure it). But what kinds of transformations can be particularly helpful in the learning process? In this pape...
Conference Paper
The users in Online Social Networks (OSN) may share private information with wrong friends. One approach to tackle this issue is by applying community discovery methods in egocentric networks to automatically generate friend circles for the user. There is however a discrepancy between the predicted circles and the circles that the user has in mind....
Article
The USEWOD 2013 research dataset is a collection of Linked Data endpoint access log files, which have been collected from 2009 until 2013. It covers sources such as DBpedia, data.semanticweb.org, and LinkedGeoData. This dataset can be requested via http://library.soton.ac.uk/datarequest - please also email a scanned copy of the signed Usage Agreem...
Article
With the growing number of document sets accessible online, tracking their evolution over time story tracking became an increasingly interesting problem. In this paper we propose a story tracking method based on the dynamics of keyword-association graphs. We create a graph representation of the story evolution that we call story graphs, and investi...
Chapter
Structuring is one of the fundamental activities needed to understand data. Human structuring activity lies behind many of the datasets found on the internet that contain grouped instances, such as file or email folders, tags and bookmarks, ontologies and linked data. Understanding the dynamics of large-scale structuring activities is a key prerequ...
Conference Paper
Discrimination-aware data mining (DADM) aims at deriving patterns that do not discriminate on "unjust grounds" such as gender, ethnicity or nationality. DADM safeguards can be very helpful for decision-support applications in fields such as banking or employment. However, constraining data mining to exclude a fixed enumeration of potentially discri...
Conference Paper
In Online Social Networks (OSNs), it can be difficult to maintain the context of a conversation or action, i.e. to know what the situation is and how to act appropriately. The resulting uncertainties may lead to privacy issues. We focus on one issue Context Collision in this paper, and motivate that a first step to address this issue is to help use...
Article
Over the last decade, privacy has been widely recognised as one of the major problems of data collections in general and the Web in particular. This concerns specifically data arising from Web usage (such as querying or transacting) and social networking (characterised by rich self-profiling including relational information) and the inferences draw...
Conference Paper
Full-text available
According to psycholinguistic research, any text contains a lot of implicit information about its writer. The Internet provides an incredible amount of text produced by users. The information potential of texts that can be directly linked to a user (which is especially the case for forum and blog posts) is not yet sufficiently examined. Natural lan...
Chapter
Full-text available
Zukunftsorientiertes Forschungsdatenmanagement geht über die Dokumentation von Forschungsergebnissen und -prozessen hinaus und ermöglicht neue Formen der Wieder- und Weiternutzung der gespeicherten Daten. Dabei spielen Re- und Meta-Analysen dieser Daten eine besondere Rolle, und Visualisierungen als Form der explorativen Datenanalyse können wertvol...
Article
Rich information spaces like blogs or news are full of "stories": sets of statements that evolve over time, made in fast-growing streams of documents. Even if one reads a specific source every day and/or subscribes to a selection of feeds, one may easily lose track; in addition, it is difficult to reconstruct a story already in the past. In this pa...
Chapter
Full-text available
One aspect of user preference, which is of high interest especially for the field of e-learning, concerns the mode of presenting information: What sensory system(s) should be addressed to make information interesting and easy to understand for the user? The answer might be found when looking at the user's perceptual preferences. To test the user wo...
Chapter
“Web mining” or “Web Knowledge Discovery” is the analysis of web resources with data-mining techniques such as classification, clustering, association-rule or graph-structure methods. Its applications pervade much of the software web users interact with on a daily basis: search engines’ indexing and ranking choices, recommender systems’ recommendat...
Article
This paper argues for extending the scope of applying data mining towards making it a means to help people better understand, reflect and influence the information and information-producing and -consuming activities that they are surrounded by in today's knowledge societies. Data mining is thereby seen as a means to furthering infor-mation literacy...
Article
Full-text available
The workshop on Usage Analysis and the Web of Data (USEWOD2011) was the first workshop in the field to investigate combinations of usage data with semantics and the Web of Data. Questions the workshop aims to address are for example: How can semantics help in understanding usage data, how can semantic information be derived from usage data, and how...