Salvatore Ruggieri's research while affiliated with Università di Pisa and other places

Publications (276)

Chapter
Bias in the training data can be inherited by Machine Learning models and then reproduced in socially-sensitive decision-making tasks leading to potentially discriminatory decisions. The state-of-the-art of pre-processing methods to mitigate unfairness in datasets mainly considers a single binary sensitive attribute. We devise GenFair, a fairness-e...
Chapter
Explaining opaque Machine Learning (ML) models is an increasingly relevant problem. Current explanation in AI (XAI) methods suffer several shortcomings, among others an insufficient incorporation of background knowledge, and a lack of abstraction and interactivity with the user. We propose reasonx, an explanation method based on Constraint Logic Pr...
Preprint
Explaining opaque Machine Learning (ML) models is an increasingly relevant problem. Current explanation in AI (XAI) methods suffer several shortcomings, among others an insufficient incorporation of background knowledge, and a lack of abstraction and interactivity with the user. We propose REASONX, an explanation method based on Constraint Logic Pr...
Preprint
Full-text available
In eXplainable Artificial Intelligence (XAI), several counterfactual explainers have been proposed, each focusing on some desirable properties of counterfactual instances: minimality, actionability, stability, diversity, plausibility, discriminative power. We propose an ensemble of counterfactual explainers that boosts weak explainers, which provid...
Preprint
Full-text available
In this paper we present the initial screening order problem, a crucial step within candidate screening. It involves a human-like screener with an objective to find the first k suitable candidates rather than the best k suitable candidates in a candidate pool given an initial screening order. The initial screening order represents the way in which...
Article
There is a fast-growing literature in addressing the fairness of AI models (fair-AI), with a continuous stream of new conceptual frameworks, methods, and tools. How much can we trust them? How much do they actually impact society? We take a critical focus on fair-AI and survey issues, simplifications, and mistakes that researchers and practitioners...
Article
Selective classification (also known as classification with reject option) conservatively extends a classifier with a selection function to determine whether or not a prediction should be accepted (i.e., trusted, used, deployed). This is a highly relevant issue in socially sensitive tasks, such as credit scoring. State-of-the-art approaches rely on...
Preprint
Many high-performing machine learning models are not interpretable. As they are increasingly used in decision scenarios that can critically affect individuals, it is necessary to develop tools to better understand their outputs. Popular explanation methods include contrastive explanations. However, they suffer several shortcomings, among others an...
Preprint
In uses of pre-trained machine learning models, it is a known issue that the target population in which the model is being deployed may not have been reflected in the source population with which the model was trained. This can result in a biased model when deployed, leading to a reduction in model performance. One risk is that, as the population c...
Preprint
Full-text available
We present counterfactual situation testing (CST), a causal data mining framework for detecting discrimination in classifiers. CST aims to answer in an actionable and meaningful way the intuitive question "what would have been the model outcome had the individual, or complainant, been of a different protected status?" It extends the legally-grounde...
Conference Paper
Many high-performing machine learning models are not interpretable. As they are increasingly used in decision scenarios that can critically affect individuals, it is necessary to develop tools to better understand their outputs. Popular explanation methods include contrastive explanations. However, they suffer several shortcomings, among others an...
Article
Full-text available
Recent years have witnessed the rise of accurate but obscure classification models that hide the logic of their internal decision processes. Explaining the decision taken by a black-box classifier on a specific input instance is therefore of striking interest. We propose a local rule-based model-agnostic explanation method providing stable and acti...
Preprint
Full-text available
Selective classification (or classification with a reject option) pairs a classifier with a selection function to determine whether or not a prediction should be accepted. This framework trades off coverage (probability of accepting a prediction) with predictive performance, typically measured by distributive loss functions. In many application sce...
Article
Full-text available
Turnover intention is an employee’s reported willingness to leave her organization within a given period of time and is often used for studying actual employee turnover. Since employee turnover can have a detrimental impact on business and the labor market at large, it is important to understand the determinants of such a choice. We describe and an...
Article
Full-text available
We present xspells , a model-agnostic local approach for explaining the decisions of black box models in classification of short texts. The explanations provided consist of a set of exemplar sentences and a set of counter-exemplar sentences. The former are examples classified by the black box with the same label as the text to explain. The latter a...
Preprint
Full-text available
Protected attributes are often presented as categorical features that need to be encoded before feeding them into a machine learning algorithm. Encoding these attributes is paramount as they determine the way the algorithm will learn from the data. Categorical feature encoding has a direct impact on the model performance and fairness. In this work,...
Article
Full-text available
Causality is a complex concept, which roots its developments across several fields, such as statistics, economics, epidemiology, computer science, and philosophy. In recent years, the study of causal relationships has become a crucial part of the Artificial Intelligence community, as causality can be a key tool for overcoming some limitations of co...
Article
Full-text available
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the “phase 2” of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are bei...
Chapter
In eXplainable Artificial Intelligence (XAI), several counterfactual explainers have been proposed, each focusing on some desirable properties of counterfactual instances: minimality, actionability, stability, diversity, plausibility, discriminative power. We propose an ensemble of counterfactual explainers that boosts weak explainers, which provid...
Article
Fairness in Artificial Intelligence rightfully receives a lot of attention these days. Many life-impacting decisions are being partially automated, including health-care resource planning decisions, insurance and credit risk predictions, recidivism predictions, etc. Much of work appearing on this topic within the Data Mining, Machine Learning and A...
Article
We study the problem of estimating the total number of searches (volume) of queries in a specific domain, which were submitted to a search engine in a given time period. Our statistical model assumes that the distribution of searches follows a Zipf’s law, and that the observed sample volumes are biased accordingly to three possible scenarios. These...
Preprint
Full-text available
We study the problem of estimating the total number of searches (volume) of queries in a specific domain, which were submitted to a search engine in a given time period. Our statistical model assumes that the distribution of searches follows a Zipf's law, and that the observed sample volumes are biased accordingly to three possible scenarios. These...
Chapter
Full-text available
We present xspells , a model-agnostic local approach for explaining the decisions of a black box model for sentiment classification of short texts. The explanations provided consist of a set of exemplar sentences and a set of counter-exemplar sentences. The former are examples classified by the black box with the same label as the text to explain....
Preprint
Full-text available
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being...
Article
Full-text available
The discovery of discriminatory bias in human or automated decision making is a task of increasing importance and difficulty, exacerbated by the pervasive use of machine learning and data mining. Currently, discrimination discovery largely relies upon correlation analysis of decisions records, disregarding the impact of confounding biases. We prese...
Article
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the “phase 2” of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are bei...
Article
Full-text available
Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for...
Preprint
Full-text available
AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and...
Article
The rise of sophisticated machine learning models has brought accurate but obscure decision systems, which hide their logic, thus undermining transparency, trust, and the adoption of AI in socially sensitive and safety-critical contexts. We introduce a local rule-based explanation method providing faithful explanations of the decision made by a bla...
Article
Black box AI systems for automated decision making, often based on machine learning over (big) data, map a user’s features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases inherited by the algorithms from human prejudices and collection artifacts hidden in...
Conference Paper
We study the problem of estimating the total volume of queries of a specific domain, which were submitted to the Google search engine in a given time period. Our statistical model assumes a Zipf's law distribution of the population in the reference domain, and a non-uniform or noisy sampling of queries. Parameters of the distribution are estimated...
Preprint
Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a...
Article
ICT risk assessment and management relies on the analysis of data on the joint behavior of a target system and its attackers. The tools in the Haruspex suite model intelligent, goal-oriented attackers that reach their goals through sequences of attacks. The tools synthetically generate these sequences through a Monte Carlo method that runs multiple...
Article
Full-text available
We introduce a framework for the data-driven analysis of social segregation of minority groups, and challenge it on a complex scenario. The framework builds on quantitative measures of segregation, called segregation indexes, proposed in the social science literature. The segregation discovery problem is introduced, which consists of searching sub-...
Preprint
Full-text available
Black box systems for automated decision making, often based on machine learning over (big) data, map a user's features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases hidden in the algorithms, due to human prejudices and collection artifacts hidden in the...
Preprint
Full-text available
The recent years have witnessed the rise of accurate but obscure decision systems which hide the logic of their internal decision processes to the users. The lack of explanations for the decisions of black box systems is a key ethical issue, and a limitation to the adoption of machine learning components in socially sensitive and safety-critical co...
Chapter
Full-text available
Security risk assessment and prevention in ICT systems rely on the analysis of data on the joint behavior of the system and its (malicious) users. The Haruspex tool models intelligent, goal-oriented agents that reach their goals through attack sequences. Data is synthetically generated through a Monte Carlo method that runs multiple simulations of...
Article
Segregation discovery consists of finding contexts of segregation of social groups distributed across units or communities. The SCube system implements an approach for segregation discovery on top of frequent itemset mining, by offering to the analyst a multi-dimensional (segregation) data cube for exploratory data analysis. The demonstration first...
Conference Paper
Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorith...
Chapter
During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led to profound pervasiveness of relational databases in any kind of organization. More importantly, these technical advances have enabled the first round of business intelligence applications and...
Article
Full-text available
Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorith...
Conference Paper
The acceptance of analytical methods for discrimination discovery by practitioners and legal scholars can be only achieved if the data mining and machine learning communities will be able to provide case studies, methodological refinements, and the consolidation of a KDD process. We summarize here an approach along these directions.
Article
Social discrimination is considered illegal and unethical in the modern world. Such discrimination is often implicit in observed decisions' datasets, and anti-discrimination organizations seek to discover cases of discrimination and to understand the reasons behind them. Previous work in this direction adopted simple observational data analysis; ho...
Article
Full-text available
The aim of this article is to synthetically describe the research projects that a selection of Italian universities is undertaking in the context of big data. Far from being exhaustive, this article has the objective of offering a sample of distinct applications that address the issue of managing huge amounts of data in Italy, collected in relation...
Conference Paper
We introduce a framework for a data-driven analysis of segregation of minority groups in social networks, and challenge it on a complex scenario. The framework builds on quantitative measures of segregation, called segregation indexes, proposed in the social science literature. The segregation discovery problem consists of searching sub-graphs and...
Article
AI can significantly support the societal and economic development driven by the rapidly emerging demand for innovative technologies in many application areas both in the public and private sectors. A special track of the XIII Symposium of the Italian Association for Artificial Intelligence (AI*IA 2014) held in Pisa, Italy 10-12 December 2014, enti...
Conference Paper
Full-text available
Social discrimination discovery from data is an important task to identify illegal and unethical discriminatory patterns towards protected-by-law groups, e.g., ethnic minorities. We deploy privacy at- tack strategies as tools for discrimination discovery under hard assumptions which have rarely tackled in the literature: indirect discrimination dis...
Article
We investigate the relation between t-closeness, a well-known model of data anonymization against attribute disclosure, and _-protection, a model of the social discrimination hidden in data. We show that t-closeness implies bdf (t)-protection, for a bound function bdf () depending on the discrimination measure f() at hand. This allows us to adapt i...
Article
Full-text available
The whole computer hardware industry embraced multi-core. The extreme optimisation of sequential algorithms is then no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an...
Article
Objective Assessing the frequency of Wearing-Off (WO) in Parkinson's disease (PD) patients, and its impact on Quality of Life (QoL). Methods Consecutive ambulatory patients, who were on dopaminergic treatment for ≥1 year, were included in this multicentre, observational cross-sectional study. In a single visit, WO was diagnosed based on neurologis...
Conference Paper
We investigate the relation between t-closeness, a well-known model of data anonymization, and α-protection, a model of data discrimination. We show that t-closeness implies bd(t)-protection, for a bound function bd() depending on the discrimination measure at hand. This allows us to adapt an inference control method, the Mondrian multidimensional...
Article
The collection and analysis of observational and experimental data represent the main tools for assessing the presence, the extent, the nature, and the trend of discrimination phenomena. Data analysis techniques have been proposed in the last 50 years in the economic, legal, statistical, and, recently, in the data mining literature. This is not sur...
Article
Discovering contexts of unfair decisions in a dataset of historical decision records is a non-trivial problem. It requires the design of ad hoc methods and techniques of analysis, which have to comply with existing laws and with legal argumentations. While some data mining techniques have been adapted to the purpose, the state-of-the-art of researc...
Article
In this paper, we explore the computational complexity of the conjunctive fragment of the first-order theory of linear arithmetic. Quantified propositional formulas of linear inequalities with (k−1)(k−1) quantifier alternations are log-space complete in ΣkP or ΠkP depending on the initial quantifier. We show that when we restrict ourselves to quant...
Conference Paper
Parameterized linear systems allow for modelling and reasoning over classes of polyhedra. Collections of squares, rectangles, polytopes, and so on, can readily be defined by means of linear systems with parameters. In this paper, we investigate the problem of learning a parameterized linear system whose class of polyhedra includes a given set of ex...
Article
A Quantified Linear Implication (QLI) is an inclusion query over two polyhedral sets, with a quantifier string that specifies which variables are existentially quantified and which are universally quantified. Equivalently, it can be viewed as a quantified implication of two systems of linear inequalities. In this paper, we provide a 2-person game s...
Article
Full-text available
Information about patients' adherence to therapy represents a primary issue in Parkinson's disease (PD) management. To perform the linguistic validation of the Italian version of the self-rated 8-Item Morisky Medical Adherence Scale (MMAS-8) and to describe in a sample of Italian patients affected by PD the adherence to anti-Parkinson drug therapy...
Chapter
Discrimination discovery from data consists in the extraction of discriminatory situations and practices hidden in a large amount of historical decision records.We discuss the challenging problems in discrimination discovery, and present, in a unified form, a framework based on classification rules extraction and filtering on the basis of legally-g...
Article
Discrimination data analysis has been investigated for the last fifty years in a large body of social, legal, and economic studies. Recently, discrimination discov-ery and prevention has become a blooming research topic in the knowledge discov-ery community. This chapter provides a multi-disciplinary annotated bibliography of the literature on disc...
Conference Paper
The selection of projects for funding can hide discriminatory decisions. We present a case study investigating gender discrimination in a dataset of scientific research proposals submitted to an Italian national call. The method for the analysis relies on a data mining classification strategy that is inspired by a legal methodology for proving evid...
Article
Extending linear constraints by admitting parameters allows for more abstract problem modeling and reasoning. A lot of focus has been given to conducting research that demonstrates the usefulness of parameterized linear constraints and implementing tools that utilize their modeling strength. However, there is no approach that considers basic theore...
Article
Full-text available
Data mining approaches for discrimination discovery unveil contexts of possible discrimination against protected-by-law groups by extracting classification rules from a dataset of historical decision records. Rules are ranked according to some legally-grounded contrast measure defined over a 4-fold contingency table, including risk difference, risk...
Conference Paper
Full-text available
In this paper we discuss the computational complexities of procedures for inclusion queries over polyhedral sets. The polyhedral sets that we consider occur in a wide range of applications, ranging from logistics to program verification. The goal of our study is to establish bound-aries between hard and easy problems in this context.
Conference Paper
With the support of the legally-grounded methodology of situation testing, we tackle the problems of discrimination discovery and prevention from a dataset of historical decisions by adopting a variant of k-NN classification. A tuple is labeled as discriminated if we can observe a significant difference of treatment among its neighbors belonging to...
Technical Report
Full-text available
The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This pa...
Conference Paper
Full-text available
We present a knowledge discovery case study on customer classification having the objective of mining the distinctive characteristics of new customers of a service of tax return. Two general approaches are described. The first one, a symbolic approach, is based on extracting and ranking classification rules on the basis of significativeness measure...
Conference Paper
Concise representations of frequent itemsets sacrifice readability and direct interpretability by a data analyst of the concise patterns extracted. In this paper, we introduce an extension of itemsets, called regular, with an immediate semantics and interpretability, and a conciseness comparable to closed itemsets. Regular itemsets allow for specif...
Article
Full-text available
We present a type system for linear constraints over the reals intended for reasoning about the input-output directionality of variables. Types model the properties of definiteness, range width or approximation, lower and upper bounds of variables in a linear constraint. Several proof proce-dures are presented for inferring the type of a variable a...
Article
Owing to uncertainty on the pathogenic mechanisms underlying motor neuron degeneration in amyotrophic lateral sclerosis (ALS) riluzole remains the only available therapy, with only marginal effects on disease survival. Here we review some of the recent advances in the search for disease-modifying drugs for ALS based on their putative neuroprotectiv...
Conference Paper
Full-text available
The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This pa...
Conference Paper
Full-text available
Discrimination discovery in databases consists in finding unfair practices against minorities which are hidden in a dataset of historical decisions. The DCUBE system implements the approach of [5], which is based on classification rule extraction and analysis, by centering the analysis phase around an Oracle database. The proposed demonstration gui...
Article
Full-text available
In the context of civil rights law, discrimination refers to unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit. Discrimination in credit, mortgage, insurance, labor market, and education has been investigated by researchers in economics and human sciences. With the advent of au...
Conference Paper
Full-text available
We introduce an extension of linear constraints, called linear- range constraints, which allows for (meta-)reasoning about the approximation width of variables. Semantics for linear- range constraints is provided in terms of parameterized linear systems. We devise procedures for checking satisfiability and for entailing the maximal width of a varia...
Article
The intermittent oral intake of the dopamine (DA) precursor L-3,4-dihydroxyphenylalanine (L-DOPA) is the classic therapy of Parkinson's disease (PD). In this way, the drug precursor can be metabolised into the active neurotransmitter DA. Although this occurs throughout the brain, the therapeutic relief is believed to be due to restoring extracellul...
Conference Paper
Full-text available
We present a reference model for finding (prima facie) evidence of discrimination in datasets of historical decision records in socially sensitive tasks, including access to credit, mortgage, insurance, labor market and other benefits. We formalize the process of direct and indirect discrimination discovery in a rule-based framework, by modelling p...
Conference Paper
Full-text available
Discrimination in social sense (e.g., against minorities and disadvantaged groups) is the subject of many laws worldwide, and it has been extensively studied in the social and economic sciences. We tackle the problem of determining, given a dataset of historical decision records, a precise measure of the degree of discrimination suffered by a given...
Article
ABSTRACT Seven patients, six suffering from amyotrophic lateral sclerosis (ALS) and one from Friedreich ataxia, were treated with a placebo i.v. infusion during the first day and with TRH-T i.v. infusion at a rate of 2 mg/h for 8 h daily (total daily dosage 16 mg) on the 2 consecutive days. Continuous blood pressure (BP) and EKG monitorings were pe...
Article
Full-text available
1. Proc Natl Acad Sci U S A. 2008 Feb 12;105(6):2052-7. doi: 10.1073/pnas.0708022105. Epub 2008 Feb 4. Lithium delays progression of amyotrophic lateral sclerosis. Fornai F(1), Longone P, Cafaro L, Kastsiuchenka O, Ferrucci M, Manca ML, Lazzeri G, Spalloni A, Bellio N, Lenzi P, Modugno N, Siciliano G, Isidoro C, Murri L, Ruggieri S, Paparelli A....