Ivan SrbaKempelen Institute of Intelligent Technologies
Ivan Srba
PhD
About
55
Publications
11,153
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
779
Citations
Publications
Publications (55)
In the current time of globalization, collaboration among people in virtual environments is becoming an important precondition of success. This trend is reflected also in the educational domain where students collaborate in various short-term groups created repetitively but changing in each round (e.g. in MOOCs). Students in these kind of dynamic g...
Enormous amounts of knowledge sharing occur every day in community question answering (CQA) sites, some of which (for example, Stack Overflow or Ask Ubuntu) have become popular with software developers and users. In spite of these systems' overall success, problems are emerging in some of them - increased failure and churn rates. To investigate thi...
Community question-answering (CQA) systems, such as Yahoo! Answers or Stack Overflow, belong to a prominent group of successful and popular Web 2.0 applications, which are used every day by millions of users to find an answer on complex, subjective, or context-dependent questions. In order to obtain answers effectively, CQA systems should optimally...
Students' performance in Massive Open Online Courses (MOOCs) is enhanced by high quality discussion forums or recently emerging educational Community Question Answering (CQA) systems. Nevertheless, only a small number of students answer questions asked by their peers. This results in instructor overload, and many unanswered questions. To increase s...
The capabilities of recent large language models (LLMs) to generate high-quality content indistinguishable by humans from human-written texts rises many concerns regarding their misuse. Previous research has shown that LLMs can be effectively misused for generating disinformation news articles following predefined narratives. Their capabilities to...
In the current era of social media and generative AI, an ability to automatically assess the credibility of online social media content is of tremendous importance. Credibility assessment is fundamentally based on aggregating credibility signals, which refer to small units of information, such as content factuality, bias, or a presence of persuasio...
The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for classifier fine-tuning. Existing works on augmentation leverage the few-shot scenarios, where samples are given to LLMs as part of prompts, leading to better augmentations. Yet, the...
Learning with limited labelled data, such as prompting, in-context learning, fine-tuning, meta-learning or few-shot learning, aims to effectively train a model using only a small amount of labelled samples. However, these approaches have been observed to be excessively sensitive to the effects of uncontrolled randomness caused by non-determinism in...
Prompt tuning is a modular and efficient solution for training large language models (LLMs). One of its main advantages is task modularity, making it suitable for multi-task problems. However, current soft-prompt-based methods often sacrifice multi-task modularity, requiring the training process to be fully or partially repeated for each newly adde...
Recent LLMs are able to generate high-quality multilingual texts, indistinguishable for humans from authentic human-written ones. Research in machine-generated text detection is however mostly focused on the English language and longer texts, such as news articles, scientific papers or student essays. Social-media texts are usually much shorter and...
While fine-tuning of pre-trained language models generally helps to overcome the lack of labelled training samples, it also displays model performance instability. This instability mainly originates from randomness in initialisation or data shuffling. To address this, researchers either modify the training process or augment the available samples,...
Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient. Previous results demonstrated that these methods can even improve performance on some classification tasks. This paper complements existing research by investigating how these techniques influence c...
The latest generative large language models (LLMs) have found their application in data augmentation tasks, where small numbers of text samples are LLM-paraphrased and then used to fine-tune downstream models. However, more research is needed to assess how different prompts, seed data selection strategies, filtering methods, or model settings affec...
Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient. Previous results demonstrated that these methods can even improve performance on some classification tasks. This paper complements the existing research by investigating how these techniques influen...
Fact-checkers are often hampered by the sheer amount of online content that needs to be fact-checked. NLP can help them by retrieving already existing fact-checks relevant to the content being investigated. This paper introduces a new multilingual dataset -- MultiClaim -- for previously fact-checked claim retrieval. We collected 28k posts in 27 lan...
This paper presents the best-performing solution to the SemEval 2023 Task 3 on the subtask 3 dedicated to persuasion techniques detection. Due to a high multilingual character of the input data and a large number of 23 predicted labels (causing a lack of labelled data for some language-label combinations), we opted for fine-tuning pre-trained trans...
To mitigate the negative effects of false information more effectively, the development of automated AI (artificial intelligence) tools assisting fact-checkers is needed. Despite the existing research, there is still a gap between the fact-checking practitioners' needs and pains and the current AI research. We aspire to bridge this gap by employing...
In this paper, we present results of an auditing study performed over YouTube aimed at investigating how fast a user can get into a misinformation filter bubble, but also what it takes to “burst the bubble”, i.e., revert the bubble enclosure. We employ a sock puppet audit methodology, in which pre-programmed agents (acting as YouTube users) delve i...
In this paper, we present results of an auditing study performed over YouTube aimed at investigating how fast a user can get into a misinformation filter bubble, but also what it takes to "burst the bubble", i.e., revert the bubble enclosure. We employ a sock puppet audit methodology, in which pre-programmed agents (acting as YouTube users) delve i...
In this paper, we describe a black-box sockpuppeting audit which we carried out to investigate the creation and bursting dynamics of misinformation filter bubbles on YouTube. Pre-programmed agents acting as YouTube users stimulated YouTube's recommender systems: they first watched a series of misinformation promoting videos (bubble creation) and th...
False information has a significant negative influence on individuals as well as on the whole society. Especially in the current COVID-19 era, we witness an unprecedented growth of medical misinformation. To help tackle this problem with machine learning approaches, we are publishing a feature-rich dataset of approx. 317k medical news articles/blog...
The negative effects of misinformation filter bubbles in adaptive systems have been known to researchers for some time. Several studies investigated, most prominently on YouTube, how fast a user can get into a misinformation filter bubble simply by selecting wrong choices from the items offered. Yet, no studies so far have investigated what it take...
Purpose
Partisan news media, which often publish extremely biased, one-sided or even false news, are gaining popularity world-wide and represent a major societal issue. Due to a growing number of such media, a need for automatic detection approaches is of high demand. Automatic detection relies on various indicators (e.g. content characteristics) t...
Hate speech should be tackled and prosecuted based on how it is operationalized. However, the existing theoretical definitions of hate speech are not sufficiently fleshed out or easily operable. To overcome this inadequacy, and with the help of interdisciplinary experts, we propose an empirical definition of hate speech by providing a list of 10 ha...
From a computer science perspective, addressing on-line hate speech is a challenging task that is attracting the attention of both industry (mainly social media platform owners) and academia. In this chapter, we provide an overview of state-of-the-art data-science approaches – how they define hate speech, which tasks they solve to mitigate the phen...
From a computer science perspective, addressing on-line hate speech is a challenging task that is attracting the attention of both industry (mainly social media platform owners) and academia. In this chapter, we provide an overview of state-of-the-art data-science approaches - how they define hate speech, which tasks they solve to mitigate the phen...
Massive spreading of medical misinformation on the Web has a significant impact on individuals and on society as a whole. The majority of existing tools and approaches for detection of false information rely on features describing content characteristics without verifying its truthfulness against knowledge bases. In addition, such approaches lack e...
While digital space is a place where users communicate increasingly, the recent threat of COVID-19 infection even more emphasised the necessity of effective and well-organised online environment. Therefore, it is nowadays, more whenever in the past, important to deal with various unhealthy phenomena, that prohibit effective communication and knowle...
This paper is an extended version of the conference paper presented at DIONE 2020, held online on 17, April 2020.
The presence of external links or sources in the articles are considered as one of the indicators for assessing their quality by a librarian and information community. In this article, we explore linking patterns of the most popular traditional and “alternative” (partisan) digital news media in two V4 countries of Central Europe: Czech and Slovak R...
This work deals with time-aware recommender systems in a domain of location-based social networks, such as Yelp or Foursquare. We propose a novel method to recommend Points of Interest (POIs) which considers their yearly seasonality and long-term trends. In contrast to the existing methods, we model these temporal aspects specifically for individua...
In university courses as well as in MOOCs, Community Question Answering (CQA) systems have been recently recognized as a promising alternative to standard discussion forums for mediating online discussions. Despite emerging research on educational CQA systems, a study investigating when and how to use these systems to support university education i...
Systems for Community Question Answering (CQA) are well-known on the open web (e.g. Stack Overflow or Quora). They have been recently adopted also for use in educational domain (mostly in MOOCs) to mediate communication between students and teachers. As students are only novices in topics they learn about, they may need various scaffoldings to achi...
Successfulness of Community Question Answering (CQA) systems on the open web (e.g. Yahoo! Answers) motivated for their utilization in new contexts (e.g. education or enterprise) and environments (e.g. inside organizations). In spite of initial research how their specifics influence design of CQA systems, many additional problems have not been addre...
Community Question Answering (CQA) systems (e.g. StackOverflow) have gained popularity in the last years. With the increasing community size and amount of user generated content, a task of expert identification arose. To tackle this problem, various reputation mechanisms exist, however, they estimate user reputation especially according to overall...
Community Question Answering (CQA) systems, such as Yahoo! Answers and Stack Overflow, represent a well-known example of collective intelligence. The existing CQA systems, despite their overall successfulness and popularity, fail to answer a significant amount of questions in required time. One option for scaffolding collaboration in CQA systems is...
Community Question Answering (CQA) is a well-known example of a knowledge management system for effective knowledge sharing in open online communities. In spite of the increasing research effort in recent years, the beneficial effects of CQA systems have not been fully discovered in organizational and educational environments yet. We present a nove...
Web 2.0 has had a tremendous impact on education. It facilitates access and availability of learning content in variety of new formats, content creation, learning tailored to students’ individual preferences, and collaboration. The range of Web 2.0 tools and features is constantly evolving, with focus on users and ways that enable users to socializ...
One of the approaches how to support collaboration during formal or informal learning is application of concepts which have been successfully veri-fied in different domains. Especially various web-based knowledge sharing ap-plications have been applied as a model for designing learning environments so far (e.g. social networking sites or forums). H...
Nowadays, it is possible to access almost unlimited sources of infor-mation by ubiquitous information and communication technologies. However, sometimes it is difficult to find required information by standard web search engines. In these situations, Internet users have a possibility to ask their ques-tions in popular community question answering s...
We propose a method for creating different types of study groups with aim to support effective collaboration during learning. We concentrate on the small groups which solve short-term well-defined problems. The method is able to apply many types of students' characteristics as inputs, e.g. interests, knowledge, but also their collaborative characte...
In recent years we have witnessed expansion of Web 2.0. Its main feature is allowing users' collaboration in content creation using various means, e.g. annotations, discussions, wikis, blogs or tags. This approach has influenced also web-based learning, for which the term "Learning 2.0" has been introduced. In this paper we explore using tags in su...
Current web is known as a space with constantly growing interactivity among its users. It is changing from the data storage into a social interaction place where people not only search interesting information, but also communicate and collaborate. Obviously, social networks are the most used places for common interaction among people. We present a...