Kalina Bontcheva

Kalina Bontcheva
The University of Sheffield | Sheffield · Department of Computer Science (Faculty of Engineering)

About

319
Publications
90,653
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,455
Citations

Publications

Publications (319)
Preprint
Full-text available
In the current era of social media and generative AI, an ability to automatically assess the credibility of online social media content is of tremendous importance. Credibility assessment is fundamentally based on aggregating credibility signals, which refer to small units of information, such as content factuality, bias, or a presence of persuasio...
Preprint
Full-text available
This work introduces EUvsDisinfo, a multilingual dataset of trustworthy and disinformation articles related to pro-Kremlin themes. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also...
Article
Full-text available
Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient. Previous results demonstrated that these methods can even improve performance on some classification tasks. This paper complements existing research by investigating how these techniques influence c...
Article
Purpose The authors investigate how COVID-19 has influenced the amount, type or topics of abuse that UK politicians receive when engaging with the public. Design/methodology/approach This work covers the first year of COVID-19 in the UK, from March 2020 to March 2021 and analyses Twitter abuse in replies to UK MPs. The authors collected and analys...
Preprint
Full-text available
As Large Language Models (LLMs) become more proficient, their misuse in large-scale viral disinformation campaigns is a growing concern. This study explores the capability of Chat-GPT to generate unconditioned claims about the war in Ukraine, an event beyond its knowledge cutoff, and evaluates whether such claims can be differentiated by human read...
Article
Full-text available
A key task in the fact-checking workflow is to establish whether the claim under investigation has already been debunked or fact-checked before. This is essentially a retrieval task where a misinformation claim is used as a query to retrieve from a corpus of debunks. Prior debunk retrieval methods have typically been trained on annotated pairs of m...
Conference Paper
Full-text available
This paper analyses two hitherto unstudied sites sharing state-backed disinformation, Reliable Recent News (rrn.world) and WarOnFakes (waronfakes.com), which publish content in Arabic, Chinese, English, French, German, and Spanish. We describe our content acquisition methodology and perform cross-site unsupervised topic clustering on the resulting...
Preprint
Full-text available
This paper analyses two hitherto unstudied sites sharing state-backed disinformation, Reliable Recent News (rrn.world) and WarOnFakes (waronfakes.com), which publish content in Arabic, Chinese, English, French, German, and Spanish. We describe our content acquisition methodology and perform cross-site unsupervised topic clustering on the resulting...
Preprint
The task of retrieving already debunked narratives aims to detect stories that have already been fact-checked. The successful detection of claims that have already been debunked not only reduces the manual efforts of professional fact-checkers but can also contribute to slowing the spread of misinformation. Mainly due to the lack of readily availab...
Chapter
Full-text available
This chapter focuses on the status of the English language, primarily acting as a benchmark for the level of technological support that other European languages could receive (see Maynard et al. 2022; Ananiadou et al. 2012). While it is rather unlikely that any other European language will ever reach this level, due to the continuing development of...
Article
Vaccine hesitancy has been a common concern, probably since vaccines were created and, with the popularisation of social media, people started to express their concerns about vaccines online alongside those posting pro- and anti-vaccine content. Predictably, since the first mentions of a COVID-19 vaccine, social media users posted about their fears...
Conference Paper
Full-text available
The COVID-19 pandemic led to an infodemic where an overwhelming amount of COVID-19 related content was being disseminated at high velocity through social media. This made it challenging for citizens to differentiate between accurate and inaccurate information about COVID-19. This motivated us to carry out a comparative study of the characteristics...
Preprint
Full-text available
Instruction-tuned Large Language Models (LLMs) have exhibited impressive language understanding and the capacity to generate responses that follow specific instructions. However, due to the computational demands associated with training these models, their applications often rely on zero-shot settings. In this paper, we evaluate the zero-shot perfo...
Preprint
Full-text available
This paper describes our approach for SemEval-2023 Task 3: Detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup. For Subtask 1 (News Genre), we propose an ensemble of fully trained and adapter mBERT models which was ranked joint-first for German, and had the highest mean rank of multi-language t...
Preprint
Full-text available
New events emerge over time influencing the topics of rumors in social media. Current rumor detection benchmarks use random splits as training, development and test sets which typically results in topical overlaps. Consequently, models trained on random splits may not perform well on rumor classification on previously unseen topics due to the tempo...
Preprint
Full-text available
Vaccine hesitancy has been a common concern, probably since vaccines were created and, with the popularisation of social media, people started to express their concerns about vaccines online alongside those posting pro- and anti-vaccine content. Predictably, since the first mentions of a COVID-19 vaccine, social media users posted about their fears...
Book
Full-text available
The UNESCO-supported publication features over 100 recommendations for action and practical new tools to help fight a global scourge that threatens journalists’ safety and poisons democratic discourse. The study, spanning three years and representing collaborative research in 15 countries, is the most geographically, linguistically, and ethnically...
Chapter
This paper compares quantitatively the spread of Ukraine-related disinformation and its corresponding debunks, first by considering re-tweets, replies, and favourites, which demonstrate that despite platform efforts Ukraine-related disinformation is still spreading wider than its debunks. Next, bidirectional post-hoc analysis is carried out using G...
Conference Paper
This paper compares quantitatively the spread of Ukraine-related disinformation and its corresponding debunks, first by considering re-tweets, replies, and favourites, which demonstrate that despite platform efforts Ukraine-related disinformation is still spreading wider than its debunks. Next, bidirectional post-hoc analysis is carried out using G...
Preprint
COVID-19 vaccine hesitancy is widespread, despite governments' information campaigns and WHO efforts. One of the reasons behind this is vaccine disinformation which widely spreads in social media. In particular, recent surveys have established that vaccine disinformation is impacting negatively citizen trust in COVID-19 vaccination. At the same tim...
Chapter
Full-text available
This chapter draws on over 714 women-identifying survey respondents, 15 country case studies (Kenya, Nigeria, South Africa, Pakistan, The Philippines, Sri Lanka, Lebanon, Tunisia, Poland, Serbia, Brazil, Mexico, the UK, the US, and Sweden) produced by the regional research teams attached to this study, and 182 long form interviews with journalists,...
Article
Full-text available
The Coronavirus (COVID-19) pandemic has led to a rapidly growing ‘infodemic’ of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall...
Preprint
Full-text available
The onset of the Coronavirus disease 2019 (COVID-19) pandemic instigated a global infodemic that has brought unprecedented challenges for society as a whole. During this time, a number of manual fact-checking initiatives have emerged to alleviate the spread of dis/mis-information. This study is about COVID-19 debunks published in multiple languages...
Article
The onset of the Coronavirus disease 2019 (COVID-19) pandemic instigated a global infodemic that has brought unprecedented challenges for society as a whole. During this time, a number of manual fact-checking initiatives have emerged to alleviate the spread of dis/mis-information. This study is about COVID-19 debunks published in multiple languages...
Preprint
Full-text available
The spreading COVID-19 misinformation over social media already draws the attention of many researchers. According to Google Scholar, about 26000 COVID-19 related misinformation studies have been published to date. Most of these studies focusing on 1) detect and/or 2) analysing the characteristics of COVID-19 related misinformation. However, the st...
Article
The spreading COVID-19 misinformation over social media already draws the attention of many researchers. According to Google Scholar, about 26000 COVID-19 related misinformation studies have been published to date. Most of these studies focusing on 1) detect and/or 2) analysing the characteristics of COVID-19 related misinformation. However, the st...
Research
Full-text available
The United Nations Educational Scientific and Cultural Organization (UNESCO) has published research produced by the International Center for Journalists (ICFJ) as part of a major interdisciplinary study being led by ICFJ's research team. The research, the most comprehensive of its kind, shows that the disturbing trend of online violence - from doxx...
Preprint
Full-text available
The UK has had a volatile political environment for some years now, with Brexit and leadership crises marking the past five years. With this work, we wanted to understand more about how the global health emergency, COVID-19, influences the amount, type or topics of abuse that UK politicians receive when engaging with the public. With this work, we...
Article
The UK has had a volatile political environment for some years now, with Brexit and leadership crises marking the past five years. With this work, we wanted to understand more about how the global health emergency, COVID-19, influences the amount, type or topics of abuse that UK politicians receive when engaging with the public. With this work, we...
Article
Full-text available
The explosion of disinformation accompanying the COVID-19 pandemic has overloaded fact-checkers and media worldwide, and brought a new major challenge to government responses worldwide. Not only is disinformation creating confusion about medical science amongst citizens, but it is also amplifying distrust in policy makers and governments. To help t...
Preprint
The Coronavirus (COVID-19) pandemic has led to a rapidly growing `infodemic' online. Thus, the accurate retrieval of reliable relevant data from millions of documents about COVID-19 has become urgently needed for the general public as well as for other stakeholders. The COVID-19 Multilingual Information Access (MLIA) initiative is a joint effort to...
Article
The Coronavirus (COVID-19) pandemic has led to a rapidly growing `infodemic' online. Thus, the accurate retrieval of reliable relevant data from millions of documents about COVID-19 has become urgently needed for the general public as well as for other stakeholders. The COVID-19 Multilingual Information Access (MLIA) initiative is a joint effort to...
Conference Paper
Full-text available
Stance classification can be a powerful tool for understanding whether and which users believe in online rumours. The task aims to automatically predict the stance of replies towards a given rumour, namely support, deny, question, or comment. Numerous methods have been proposed and their performance compared in the RumourEval shared tasks in 2017 a...
Conference Paper
Hate speech and toxic comments are a common concern of social media platform users. Although these comments are, fortunately, the minority in these platforms, they are still capable of causing harm. Therefore, identifying these comments is an important task for studying and preventing the proliferation of toxicity in social media. Previous work in...
Article
Full-text available
COVID-19 has given rise to a lot of malicious content online, including hate speech, online abuse, and misinformation. British MPs have also received abuse and hate on social media during this time. To understand and contextualise the level of abuse MPs receive, we consider how ministers use social media to communicate about the pandemic, and the c...
Conference Paper
In online debates, there are two opposing sides in which proponents and opponents sentimentally make arguments on various controversial topics. Currently, most debate summarization systems have focused on the generation of generic summaries. However, we view that these summaries may not entirely fulfill the needs of readers. On some occasions, read...
Preprint
Stance classification can be a powerful tool for understanding whether and which users believe in online rumours. The task aims to automatically predict the stance of replies towards a given rumour, namely support, deny, question, or comment. Numerous methods have been proposed and their performance compared in the RumourEval shared tasks in 2017 a...
Preprint
Hate speech and toxic comments are a common concern of social media platform users. Although these comments are, fortunately, the minority in these platforms, they are still capable of causing harm. Therefore, identifying these comments is an important task for studying and preventing the proliferation of toxicity in social media. Previous work in...
Preprint
Full-text available
COVID-19 has given rise to malicious content online, including online abuse and hate toward British MPs. In order to understand and contextualise the level of abuse MPs receive, we consider how ministers use social media to communicate about the crisis, and the citizen engagement that this generates. The focus of the paper is on a large-scale, mixe...
Article
COVID-19 has given rise to malicious content online, including online abuse and hate toward British MPs. In order to understand and contextualise the level of abuse MPs receive, we consider how ministers use social media to communicate about the crisis, and the citizen engagement that this generates. The focus of the paper is on a large-scale, mixe...
Article
Full-text available
The 2019 UK general election took place against a background of rising online hostility levels toward politicians, and concerns about the impact of this on democracy, as a record number of politicians cited the abuse they had been receiving as a reason for not standing for re-election. We present a four-factor framework in understanding who receive...
Preprint
Full-text available
As COVID-19 sweeps the globe, outcomes depend on effective relationships between the public and decision-makers. In the UK there were uncivil tweets to MPs about perceived UK tardiness to go into lockdown. The pandemic has led to increased attention on ministers with a role in the crisis. However, generally this surge has been civil. Prime minister...
Article
As COVID-19 sweeps the globe, outcomes depend on effective relationships between the public and decision-makers. In the UK there were uncivil tweets to MPs about perceived UK tardiness to go into lockdown. The pandemic has led to increased attention on ministers with a role in the crisis. However, generally this surge has been civil. Prime minister...
Preprint
Full-text available
The explosion of disinformation related to the COVID-19 pandemic has overloaded fact-checkers and media worldwide. To help tackle this, we developed computational methods to support COVID-19 disinformation debunking and social impacts research. This paper presents: 1) the currently largest available manually annotated COVID-19 disinformation catego...
Article
Full-text available
The explosion of disinformation related to the COVID-19 pandemic has overloaded fact-checkers and media worldwide. To help tackle this, we developed computational methods to support COVID-19 disinformation debunking and social impacts research. This paper presents: 1) the currently largest available manually annotated COVID-19 disinformation catego...
Article
Full-text available
We aimed to investigate whether daily fluctuations in mental health-relevant Twitter posts are associated with daily fluctuations in mental health crisis episodes. We conducted a primary and replicated time-series analysis of retrospectively collected data from Twitter and two London mental healthcare providers. Daily numbers of ‘crisis episodes’ w...
Chapter
The presence of misleading content on the web and messaging applications has proven to be a major contemporary problem. This context has generated some initiatives in Linguistics and Computation to investigate not only the informative content but also the media in which this mis/disinformation circulates. This paper describes one initiative, in par...
Article
The 2019 UK general election took place against a background of rising online hostility levels toward politicians and concerns about its impact on democracy. We collected 4.2 million tweets sent to or from election candidates in the six week period spanning from the start of November until shortly after the December 12th election. We found abuse in...
Preprint
Full-text available
The 2019 UK general election took place against a background of rising online hostility levels toward politicians and concerns about polarisation. We collected 4.2 million tweets sent to or from election candidates in the six week period spanning from the start of November until shortly after the December 12th election. We found abuse in 4.46% of r...
Article
Crowdsourcing platforms provide a convenient and scalable way to collect human-generated labels on-demand. This data can be used to train Artificial Intelligence (AI) systems or to evaluate the effectiveness of algorithms. The datasets generated by means of crowdsourcing are, however, dependent on many factors that affect their quality. These inclu...
Preprint
Full-text available
Against a backdrop of tensions related to EU membership, we find levels of online abuse toward UK MPs reach a new high. Race and religion have become pressing topics globally, and in the UK this interacts with "Brexit" and the rise of social media to create a complex social climate in which much can be learned about evolving attitudes. In 8 million...
Article
Against a backdrop of tensions related to EU membership, we find levels of online abuse toward UK MPs reach a new high. Race and religion have become pressing topics globally, and in the UK this interacts with "Brexit" and the rise of social media to create a complex social climate in which much can be learned about evolving attitudes. In 8 million...
Article
Verification of online rumours is becoming an increasingly important task with the prevalence of event discussions on social media platforms. This paper proposes an inner-attention-based neural network model that uses frequent, recurring terms from past rumours to classify a newly emerging rumour as true, false or unverified. Unlike other methods p...
Article
Full-text available
Assessing the credibility of a source of information is important in combating with misin-formation. In this work we tackle the source credibility assessment as regression task. For this purpose we release a dataset containing around 700 news sources along with detailed credibility and transparency scores. These scores are manually assigned to ever...
Article
Full-text available
The ability to discern news sources based on their credibility and transparency is useful for users in making decisions about news consumption. In this paper, we release a dataset of 673 sources with credibility and transparency scores manually assigned. Upon acceptance we will make this dataset publicly available. Furthermore, we compared features...
Conference Paper
This paper describes the participation of team “bertha-von-suttner” in the SemEval2019 task 4 Hyperpartisan News Detection task. Our system1 uses sentence representations from averaged word embeddings generated from the pre-trained ELMo model with Convolutional Neural Networks and Batch Normalization for predicting hyperpartisan news. The final pre...
Preprint
Full-text available
We extend previous work about general election-related abuse of UK MPs with two new time periods, one in late 2018 and the other in early 2019, allowing previous observations to be extended to new data and the impact of key stages in the UK withdrawal from the European Union on patterns of abuse to be explored. The topics that draw abuse evolve ove...
Article
We extend previous work about general election-related abuse of UK MPs with two new time periods, one in late 2018 and the other in early 2019, allowing previous observations to be extended to new data and the impact of key stages in the UK withdrawal from the European Union on patterns of abuse to be explored. The topics that draw abuse evolve ove...
Preprint
Full-text available
Societal debates and political outcomes are subject to news and social media influences, which are in turn subject to commercial and other forces. Local press are in decline, creating a "news gap". Research shows a contrary relationship between UK regions' economic dependence on EU membership and their voting in the 2016 UK EU membership referendum...
Article
Societal debates and political outcomes are subject to news and social media influences, which are in turn subject to commercial and other forces. Local press are in decline, creating a "news gap". Research shows a contrary relationship between UK regions' economic dependence on EU membership and their voting in the 2016 UK EU membership referendum...
Preprint
Full-text available
The recent past has highlighted the influential role of social networks and online media in shaping public debate on current affairs and political issues. This paper is focused on studying the role of politically-motivated actors and their strategies for influencing and manipulating public opinion online: partisan media, state-backed propaganda, an...
Article
The recent past has highlighted the influential role of social networks and online media in shaping public debate on current affairs and political issues. This paper is focused on studying the role of politically-motivated actors and their strategies for influencing and manipulating public opinion online: partisan media, state-backed propaganda, an...
Chapter
Usage of online textual media is steadily increasing. Daily, more and more news stories, blog posts and scientific articles are added to the online volumes. These are all freely accessible and have been employed extensively in multiple research areas, e.g. automatic text summarization, information retrieval, information extraction, etc. Meanwhile,...