Stefano Mizzaro

Stefano Mizzaro
University of Udine | UNIUD · Department of Mathematical and Computer Science

About

113
Publications
12,905
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,193
Citations

Publications

Publications (113)
Preprint
Full-text available
With the degradation of guardrails against mis- and disinformation online, it is more critical than ever to be able to effectively combat it. In this paper, we explore the efficiency and effectiveness of using crowd-sourced truthfulness assessments based on condensed, large language model (LLM) generated summaries of online sources. We compare the...
Article
Full-text available
IIR 2024, the 14th Italian Information Retrieval Workshop, served as the annual event for the Information Retrieval (IR) and Recommender Systems (RS) communities both in Italy and collaborating with Italian research institutions. This year's event spanned two days and featured studies on various topics within IR, RS, and Large Language Models (LLMs...
Article
Full-text available
While the concept of responsible AI is becoming more and more popular, practitioners and researchers may often struggle to characterize responsible practices in their own work. This paper presents a four-day, PhD-level course on Responsible Artificial Intelligence conducted at the University of Udine by Dr. Damiano Spina. Using a hands-on approach,...
Chapter
Full-text available
This extended abstract presents results from two recent studies [1, 2] aimed at enhancing the practical application and effectiveness of fact-checking systems. La Barbera et al. [1] detail the implementation of crowdsourcing in fact-checking, demonstrating its practical viability through experimental evaluation using a dataset of political public s...
Article
Full-text available
There is an important ongoing effort aimed to tackle misinformation and to perform reliable fact-checking by employing human assessors at scale, with a crowdsourcing-based approach. Previous studies on the feasibility of employing crowdsourcing for the task of misinformation detection have provided inconsistent results: some of them seem to confirm...
Article
Full-text available
The increase of the amount of misinformation spread every day online is a huge threat to the society. Organizations and researchers are working to contrast this misinformation plague. In this setting, human assessors are indispensable to correctly identify, assess and/or revise the truthfulness of information items, i.e., to perform the fact-checki...
Article
Full-text available
Emergency Medical Services (EMS) are crucial in delivering timely and effective medical care to patients in need. However, the complex and dynamic nature of operations poses challenges for decision-making processes at strategic, tactical, and operational levels. This paper proposes an action-driven strategy for EMS management, employing a multi-obj...
Article
Envisioning a unique approach toward bias and fairness research.
Article
Online misinformation is posing a serious threat for the modern society. Assessing the veracity of online information is a complex problem which nowadays is addressed by heavily relying on trained fact-checking experts. This solution is not scalable, and due to the importance of the problem the issue gained the attention of the scientific community...
Chapter
Full-text available
Conversational agents provide new modalities to access and interact with services and applications. Recently, they saw a backfire in their popularity, due to the recent advancements in language models. Such agents have been adopted in various fields such as healthcare and education, yet they received little attention in public administration. We de...
Article
To scale the size of Information Retrieval collections, crowdsourcing has become a common way to collect relevance judgments at scale. Crowdsourcing experiments usually employ 100-10,000 workers, but such a number is often decided in a heuristic way. The downside is that the resulting dataset does not have any guarantee of meeting predefined statis...
Preprint
Full-text available
Due to the widespread use of data-powered systems in our everyday lives, concepts like bias and fairness gained significant attention among researchers and practitioners, in both industry and academia. Such issues typically emerge from the data, which comes with varying levels of quality, used to train supervised machine learning systems. With the...
Article
Full-text available
The detection of contaminants in several environments (e.g., air, water, sewage systems) is of paramount importance to protect people and predict possible dangerous circumstances. Most works do this using classical Machine Learning tools that act on the acquired measurement data. This paper introduces two main elements: a low-cost platform to acqui...
Article
Full-text available
Fairness is fundamental to all information access systems, including recommender systems. However, the landscape of fairness definition and measurement is quite scattered with many competing definitions that are partial and often incompatible. There is much work focusing on specific-and different-notions of fairness and there exist dozens of metric...
Article
Crowdsourcing is the practice of outsourcing a task that would otherwise be performed by one or a few experts to a crowd of individuals. It is often used to collect large amounts of manually created labels that form datasets for training and evaluating supervised machine learning models. When designing a (micro-task) crowdsourcing experiment, it is...
Conference Paper
Full-text available
Most of information retrieval effectiveness evaluation metrics assume that systems appending irrelevant documents at the bottom of the ranking are as effective as (or not worse than) systems that have a stopping criteria to truncate the ranking at the right position to avoid retrieving those irrelevant documents at the end. It can be argued, howeve...
Article
Full-text available
Automatically detecting online misinformation at scale is a challenging and interdisciplinary problem. Deciding what is to be considered truthful information is sometimes controversial and difficult also for educated experts. As the scale of the problem increases, human-in-the-loop approaches to truthfulness that combine both the scalability of mac...
Conference Paper
Full-text available
Most of information retrieval effectiveness evaluation metrics assume that systems appending irrelevant documents at the bottom of the ranking are as effective as (or not worse than) systems that have a stopping criteria to 'truncate' the ranking at the right position to avoid retrieving those irrelevant documents at the end. It can be argued, howe...
Conference Paper
Due to the increasing amount of information shared online every day, the need for sound and reliable ways of distinguishing between trustworthy and non-trustworthy information is as present as ever. One technique for performing fact-checking at scale is to employ human intelligence in the form of crowd workers. Although earlier work has suggested t...
Conference Paper
Due to their relatively low cost and ability to scale, crowdsourcing based approaches are widely used to collect a large amount of human annotated data. To this aim, multiple crowdsourcing platforms exist, where requesters can upload tasks and workers can carry them out and obtain payment in return. Such platforms share a task design and deploy wor...
Article
Relevance is a key concept in information retrieval and widely used for the evaluation of search systems using test collections. We present a comprehensive study of the effect of the choice of relevance scales on the evaluation of information retrieval systems. Our work analyzes and compares four crowdsourced scales (2-levels, 4-levels, and 100-lev...
Article
Full-text available
Recent work has demonstrated the viability of using crowdsourcing as a tool for evaluating the truthfulness of public statements. Under certain conditions such as: (1) having a balanced set of workers with different backgrounds and cognitive abilities; (2) using an adequate set of mechanisms to control the quality of the collected data; and (3) usi...
Article
Full-text available
Recently, the misinformation problem has been addressed with a crowdsourcing-based approach: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd of non-expert is exploited. We study whether crowdsourcing is an effective and reliable method to assess truthfulness during a pandemic, targeting statements related to...
Preprint
Full-text available
Recent work has demonstrated the viability of using crowdsourcing as a tool for evaluating the truthfulness of public statements. Under certain conditions such as: (1) having a balanced set of workers with different backgrounds and cognitive abilities; (2) using an adequate set of mechanisms to control the quality of the collected data; and (3) usi...
Preprint
Full-text available
Recently, the misinformation problem has been addressed with a crowdsourcing-based approach: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd of non-expert is exploited. We study whether crowdsourcing is an effective and reliable method to assess truthfulness during a pandemic, targeting statements related to...
Article
The design and analysis of experimental research in Data Mining (DM) is anchored in a correct choice of the type of task addressed (clustering, classification, regression, etc.). However, although DM is a relatively mature discipline, there is no consensus yet about what is the taxonomy of DM tasks, which are their formal characteristics, and their...
Conference Paper
Full-text available
Misinformation is an ever increasing problem that is difficult to solve for the research community and has a negative impact on the society at large. Very recently, the problem has been addressed with a crowdsourcing-based approach to scale up labeling efforts: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd...
Chapter
Full-text available
The rise of online misinformation is posing a threat to the functioning of democratic processes. The ability to algorithmically spread false information through online social networks together with the data-driven ability to profile and micro-target individual users has made it possible to create customized false content that has the potential to i...
Preprint
Misinformation is an ever increasing problem that is difficult to solve for the research community and has a negative impact on the society at large. Very recently, the problem has been addressed with a crowdsourcing-based approach to scale up labeling efforts: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd...
Preprint
Full-text available
In Ordinal Classification tasks, items have to be assigned to classes that have a relative ordering, such as positive, neutral, negative in sentiment analysis. Remarkably, the most popular evaluation metrics for ordinal classification tasks either ignore relevant information (for instance, precision/recall on each of the classes ignores their relat...
Article
Full-text available
We provide a uniform, general, and complete formal account of evaluation metrics for ranking, classification, clustering, and other information access problems. We leverage concepts from measurement theory, such as scale types and permissible transformation functions, and we capture the nature of evaluation metrics in many tasks by two formal defin...
Preprint
Full-text available
Truthfulness judgments are a fundamental step in the process of fighting misinformation, as they are crucial to train and evaluate classifiers that automatically distinguish true and false statements. Usually such judgments are made by experts, like journalists for political statements or medical doctors for medical statements. In this paper, we fo...
Conference Paper
Full-text available
News content can sometimes be misleading and influence users’ decision making processes (e.g., voting decisions). Quantitatively assessing the truthfulness of content becomes key, but it is often challenging and thus done by experts. In this work we look at how experts and non-expert assess truthfulness of content by focusing on the effect of the a...
Article
Full-text available
In test collection based evaluation of retrieval effectiveness, it has been suggested to completely avoid using human relevance judgments. Although several methods have been proposed, their accuracy is still limited. In this paper we present two overall contributions. First, we provide a systematic comparison of all the most widely adopted previous...
Conference Paper
Information Retrieval (IR) researchers have often used existing IR evaluation collections and transformed the relevance scale in which judgments have been collected, e.g., to use metrics that assume binary judgments like Mean Average Precision. Such scale transformations are often arbitrary (e.g., 0,1 mapped to 0 and 2,3 mapped to 1) and it is assu...
Conference Paper
Recently proposed methods allow the generation of simulated scores representing the values of an effectiveness metric, but they do not investigate the generation of the actual lists of retrieved documents. In this paper we address this limitation: we present an approach that exploits an evolutionary algorithm and, given a metric score, creates a si...
Chapter
Full-text available
Information retrieval effectiveness evaluation is often carried out by means of test collections. Many works investigated possible sources of bias in such an approach. We propose a systematic approach to identify bias and its causes, and to remove it, thus enforcing fairness in effectiveness evaluation by means of test collections.
Conference Paper
In a test collection setting, topic difficulty can be defined as the average effectiveness of a set of systems for a topic. In this paper we study the effects on the topic difficulty of: (i) the set of retrieval systems; (ii) the underlying document corpus; and (iii) the system components. By generalizing methods recently proposed to study system c...
Chapter
Full-text available
Peer review is a well known mechanism exploited within the scholarly publishing process to ensure the quality of scientific literature. Such a mechanism, despite being well established and reasonable, is not free from problems, and alternative approaches to peer review have been developed. Such approaches exploit the readers of scientific publicati...
Article
Full-text available
When evaluating IR run effectiveness using a test collection, a key question is: What search topics should be used? We explore what happens to measurement accuracy when the number of topics in a test collection is reduced, using the Million Query 2007, TeraByte 2006, and Robust 2004 TREC collections, which all feature more than 50 topics, something...
Chapter
Full-text available
This paper describes Readersourcing 2.0, an ecosystem providing an implementation of the Readersourcing approach proposed by Mizzaro [10]. Readersourcing is proposed as an alternative to the standard peer review activity that aims to exploit the otherwise lost opinions of readers. Readersourcing 2.0 implements two different models based on the so-c...
Conference Paper
Full-text available
We propose an alternative approach to the standard peer review activity that aims to exploit the otherwise lost opinions of readers of publications which is called Readersourcing, originally proposed by Mizzaro [ CITATION mizzaro-2012-readersourcing-a-manifesto \l 1040 ]. Such an approach can be formalized by means of different models which share t...
Article
The evaluation of retrieval effectiveness by means of test collections is a commonly used methodology in the information retrieval field. Some researchers have addressed the quite fascinating research question of whether it is possible to evaluate effectiveness completely automatically, without human relevance assessments. Since human relevance ass...
Article
Full-text available
Effectiveness evaluation of information retrieval systems by means of a test collection is a widely used methodology. However, it is rather expensive in terms of resources, time, and money; therefore, many researchers have proposed methods for a cheaper evaluation. One particular approach, on which we focus in this article, is to use fewer topics:...
Conference Paper
This paper proposes a theoretical framework which models the information provided by retrieval systems in terms of Information Theory. The proposed framework allows to formalize: (i) system effectiveness as an information theoretic similarity between system outputs and human assessments, and (ii) ranking fusion as an information quantity measure. A...
Preprint
Full-text available
This paper proposes a theoretical framework which models the information provided by retrieval systems in terms of Information Theory. The proposed framework allows to formalize: (i) system effectiveness as an information theoretic similarity between system outputs and human assessments, and (ii) ranking fusion as an information quantity measure. A...
Article
Full-text available
The purpose of the Strategic Workshop in Information Retrieval in Lorne is to explore the long-range issues of the Information Retrieval field, to recognize challenges that are on-or even over-the horizon, to build consensus on some of the key challenges, and to disseminate the resulting information to the research community. The intent is that thi...
Conference Paper
Some methods have been developed for automatic effectiveness evaluation without relevance judgments. We propose to use those methods, and their combination based on a machine learning approach, for query performance prediction. Moreover, since predicting average precision as it is usually done in query performance prediction literature is sensitive...
Conference Paper
The unpredictability of user behavior and the need for effectiveness make it difficult to define a suitable research methodology for Information Retrieval (IR). In order to tackle this challenge, we categorize existing IR methodologies along two dimensions: (1) empirical vs. theoretical, and (2) top-down vs. bottom-up. The strengths and drawbacks o...
Conference Paper
In Information Retrieval evaluation, the classical approach of adopting binary relevance judgments has been replaced by multi-level relevance judgments and by gain-based metrics leveraging such multi-level judgment scales. Recent work has also proposed and evaluated unbounded relevance scales by means of Magnitude Estimation (ME) and compared them...
Conference Paper
We propose IRevalOO, a flexible Object Oriented framework that (i) can be used as-is as a replacement of the widely adopted trec\_eval software, and (ii) can be easily extended (or "instantiated'', in framework terminology) to implement different scenarios of test collection based retrieval evaluation. Instances of IRevalOO can provide a usable and...
Conference Paper
Full-text available
Several researchers have proposed to reduce the number of topics used in TREC-like initiatives. One research direction that has been pursued is what is the optimal topic subset of a given cardinality that evaluates the systems/runs in the most accurate way. Such a research direction has been so far mainly theoretical, with almost no indication on h...
Conference Paper
The agreement between relevance assessors is an important but understudied topic in the Information Retrieval literature because of the limited data available about documents assessed by multiple judges. This issue has gained even more importance recently in light of crowdsourced relevance judgments, where it is customary to gather many relevance l...
Article
In the context of micro-task crowdsourcing, each task is usually performed by several workers. This allows researchers to leverage measures of the agreement among workers on the same task, to estimate the reliability of collected data and to better understand answering behaviors of the participants. While many measures of agreement between annotato...
Conference Paper
After a network-based analysis of TREC results, Mizzaro and Robertson [4] found the rather unpleasant result that topic ease (i.e., the average effectiveness of the participating systems, measured with average precision) correlates with the ability of topics to predict system effectiveness (defined as topic hubness). We address this issue by: (i) p...
Article
Magnitude estimation is a psychophysical scaling technique for the measurement of sensation, where observers assign numbers to stimuli in response to their perceived intensity. We investigate the use of magnitude estimation for judging the relevance of documents for information retrieval evaluation, carrying out a large-scale user study across 18 T...
Article
Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to the availability of crowdsourcing platforms and quality control techniques that allow to obtain reliable results. Previous work has used crowdsourcing to ask multiple crowd workers to judge the relevance of a document with respect to a query and studi...
Conference Paper
Magnitude estimation is a psychophysical scaling technique for the measurement of sensation, where observers assign numbers to stimuli in response to their perceived intensity. We investigate the use of magnitude estimation for judging the relevance of documents in the context of information retrieval evaluation, carrying out a large-scale user stu...
Article
Full-text available
Purpose – The purpose of this paper is to discuss the emerging geographic features of current concepts of relevance, and to improve, modify, and extend the framework proposed by Mizzaro (1998). The objective is to define a new framework able to account, more completely and precisely, for the notions of relevance involved in mobile information seeki...
Conference Paper
Magnitude estimation is a psychophysical scaling technique whereby numbers are assigned to stimuli to reflect the ratios of their perceived intensity. We report on a crowdsourcing experiment aimed at understanding if magnitude estimation can be used to gather reliable relevance judgements for documents, as is commonly required for test collection-b...
Conference Paper
Recent work has shown that when documents in a TREC ad hoc collection are partitioned, different rankers will perform optimally on different partitions. This result suggests that choosing different highly effective rankers for each partition and merging the results, should be able to improve overall effectiveness. Analyzing results from a novel ora...
Article
Full-text available
Crowdsourcing, i.e., the outsourcing of tasks typically performed by a few experts to a large crowd as an open call, has been shown to be reasonably effective in many cases, like Wikipedia, the Chess match of Kasparov against the world in 1999, and several others. The aim of the present paper is to describe the setup of an experimentation of crowds...
Article
We study whether the tasks currently proposed on crowdsourcing platforms are adequate to mobile devices. We aim at understanding both (i) which crowdsourcing platforms, among the existing ones, are more adequate to mobile devices, and (ii) which kinds of tasks are more adequate to mobile devices. Results of four diversified experiments (three user...
Article
This position paper analyzes the current situation in scholarly publishing and peer review practices and presents three theses: (a) we are going to run out of peer reviewers; (b) it is possible to replace referees with readers, an approach that I have named “Readersourcing”; and (c) it is possible to avoid potential weaknesses in the Readersourcing...
Article
Full-text available
Geographic relevance aims to assess the relevance of physical entities (e.g., shops and museums) in geographic space for a mobile user in a given context, thereby shifting the focus from the digital world (the realm of classical information retrieval) to the physical world. We study the elicitation of geographic relevance criteria by means of both...
Article
We discuss the use of tag clouds as a way of visualizing the results of a clustering search engine. We briefly present a specific tag cloud approach and its implementation in the CloudCredo prototype. Then we describe an experimental user study aimed at demonstrating that tag cloud visualization is: (i) as effective as classical tree like visu-aliz...
Article
Full-text available
We present a general purpose solution to Web content and services perusal by means of mobile devices, named Social Context-Aware Browser. This is a novel approach for information access based on users’ context, that exploits social and collaborative models to overtake the limits of the existing solutions. Instead of relying on a pool of experts and...
Article
Web searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list-based search interface paradigm does not scale well to mobile devices due to their inherent limitations. In this article, we investigate the application of search results clustering, used with some success for desktop compu...
Article
Full-text available
I will present the Context Aware Browser, a novel paradigm for context-aware access to Web contents with mobile devices. The idea is to allow automatic download of Web pages, and even automatic execution of Web applications, on user's own mobile device. The Web resources are not simply pushed on the mobile device; rather, they are selected on the b...
Conference Paper
Full-text available
The recent trend towards pervasive computing and information technology becoming omnipresent and entering all aspects of modern living, means that we are moving away from the traditional interaction paradigm between human and technology being that of the desktop computer. This shift towards ubiquitous computing is perhaps most evident in the increa...
Conference Paper
Full-text available
Although mobile information retrieval is seen as the next frontier of the search market, the rendering of results on mobile devices is still unsatisfactory. We present Credino, a clustering engine for PDAs based on the theory of concept lattices that can help overcome some specific challenges posed by small-screen, narrow-band devices. Credino is p...
Conference Paper
The recent trend towards pervasive computing and information technology becoming omnipresent and entering all aspects of modern living, means that we are moving away from the traditional interaction paradigm between human and technology being that of the desktop computer. This shift towards ubiquitous computing is perhaps most evident in the increa...
Article
able to negotiate automatically with the other departments that are located in the same campus. At present, the negotiation takes place verbally among the deans of the departments, the administrative sta#s, and/or the operators of the timetabling system (i.e., ourselves). It requires good "diplomatic skills", it is quite time consuming, and in gene...
Article
We present an approach to increasing the effectiveness of ranked-output retrieval systems that relies on graphical display and user manipulation of "views" of retrieval results, where a view is the subset of retrieved documents that contain a specified subset of query terms. This approach has been implemented in a system named VIEWER (VIEwing WEb R...