Salvatore Ruggieri's research while affiliated with Università di Pisa and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (276)
Bias in the training data can be inherited by Machine Learning models and then reproduced in socially-sensitive decision-making tasks leading to potentially discriminatory decisions. The state-of-the-art of pre-processing methods to mitigate unfairness in datasets mainly considers a single binary sensitive attribute. We devise GenFair, a fairness-e...
Explaining opaque Machine Learning (ML) models is an increasingly relevant problem. Current explanation in AI (XAI) methods suffer several shortcomings, among others an insufficient incorporation of background knowledge, and a lack of abstraction and interactivity with the user. We propose reasonx, an explanation method based on Constraint Logic Pr...
Explaining opaque Machine Learning (ML) models is an increasingly relevant problem. Current explanation in AI (XAI) methods suffer several shortcomings, among others an insufficient incorporation of background knowledge, and a lack of abstraction and interactivity with the user. We propose REASONX, an explanation method based on Constraint Logic Pr...
In eXplainable Artificial Intelligence (XAI), several counterfactual explainers have been proposed, each focusing on some desirable properties of counterfactual instances: minimality, actionability, stability, diversity, plausibility, discriminative power. We propose an ensemble of counterfactual explainers that boosts weak explainers, which provid...
In this paper we present the initial screening order problem, a crucial step within candidate screening. It involves a human-like screener with an objective to find the first k suitable candidates rather than the best k suitable candidates in a candidate pool given an initial screening order. The initial screening order represents the way in which...
There is a fast-growing literature in addressing the fairness of AI models (fair-AI), with a continuous stream of new conceptual frameworks, methods, and tools. How much can we trust them? How much do they actually impact society? We take a critical focus on fair-AI and survey issues, simplifications, and mistakes that researchers and practitioners...
Selective classification (also known as classification with reject option) conservatively extends a classifier with a selection function to determine whether or not a prediction should be accepted (i.e., trusted, used, deployed). This is a highly relevant issue in socially sensitive tasks, such as credit scoring. State-of-the-art approaches rely on...
Many high-performing machine learning models are not interpretable. As they are increasingly used in decision scenarios that can critically affect individuals, it is necessary to develop tools to better understand their outputs. Popular explanation methods include contrastive explanations. However, they suffer several shortcomings, among others an...
In uses of pre-trained machine learning models, it is a known issue that the target population in which the model is being deployed may not have been reflected in the source population with which the model was trained. This can result in a biased model when deployed, leading to a reduction in model performance. One risk is that, as the population c...
We present counterfactual situation testing (CST), a causal data mining framework for detecting discrimination in classifiers. CST aims to answer in an actionable and meaningful way the intuitive question "what would have been the model outcome had the individual, or complainant, been of a different protected status?" It extends the legally-grounde...
Many high-performing machine learning models are not interpretable. As they are increasingly used in decision scenarios that can critically affect individuals, it is necessary to develop tools to better understand their outputs. Popular explanation methods include contrastive explanations. However, they suffer several shortcomings, among others an...
Recent years have witnessed the rise of accurate but obscure classification models that hide the logic of their internal decision processes. Explaining the decision taken by a black-box classifier on a specific input instance is therefore of striking interest. We propose a local rule-based model-agnostic explanation method providing stable and acti...
Selective classification (or classification with a reject option) pairs a classifier with a selection function to determine whether or not a prediction should be accepted. This framework trades off coverage (probability of accepting a prediction) with predictive performance, typically measured by distributive loss functions. In many application sce...
Turnover intention is an employee’s reported willingness to leave her organization within a given period of time and is often used for studying actual employee turnover. Since employee turnover can have a detrimental impact on business and the labor market at large, it is important to understand the determinants of such a choice. We describe and an...
We present xspells , a model-agnostic local approach for explaining the decisions of black box models in classification of short texts. The explanations provided consist of a set of exemplar sentences and a set of counter-exemplar sentences. The former are examples classified by the black box with the same label as the text to explain. The latter a...
Protected attributes are often presented as categorical features that need to be encoded before feeding them into a machine learning algorithm. Encoding these attributes is paramount as they determine the way the algorithm will learn from the data. Categorical feature encoding has a direct impact on the model performance and fairness. In this work,...
Causality is a complex concept, which roots its developments across several fields, such as statistics, economics, epidemiology, computer science, and philosophy. In recent years, the study of causal relationships has become a crucial part of the Artificial Intelligence community, as causality can be a key tool for overcoming some limitations of co...
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the “phase 2” of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are bei...
In eXplainable Artificial Intelligence (XAI), several counterfactual explainers have been proposed, each focusing on some desirable properties of counterfactual instances: minimality, actionability, stability, diversity, plausibility, discriminative power. We propose an ensemble of counterfactual explainers that boosts weak explainers, which provid...
Fairness in Artificial Intelligence rightfully receives a lot of attention these days. Many life-impacting decisions are being partially automated, including health-care resource planning decisions, insurance and credit risk predictions, recidivism predictions, etc. Much of work appearing on this topic within the Data Mining, Machine Learning and A...
We study the problem of estimating the total number of searches (volume) of queries in a specific domain, which were submitted to a search engine in a given time period. Our statistical model assumes that the distribution of searches follows a Zipf’s law, and that the observed sample volumes are biased accordingly to three possible scenarios. These...
We study the problem of estimating the total number of searches (volume) of queries in a specific domain, which were submitted to a search engine in a given time period. Our statistical model assumes that the distribution of searches follows a Zipf's law, and that the observed sample volumes are biased accordingly to three possible scenarios. These...
We present xspells , a model-agnostic local approach for explaining the decisions of a black box model for sentiment classification of short texts. The explanations provided consist of a set of exemplar sentences and a set of counter-exemplar sentences. The former are examples classified by the black box with the same label as the text to explain....
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being...
The discovery of discriminatory bias in human or automated decision making is a task of increasing importance and difficulty, exacerbated by the pervasive use of machine learning and data mining. Currently, discrimination discovery largely relies upon correlation analysis of decisions records, disregarding the impact of confounding biases. We prese...
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the “phase 2” of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are bei...
Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for...
AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and...
The rise of sophisticated machine learning models has brought accurate but obscure decision systems, which hide their logic, thus undermining transparency, trust, and the adoption of AI in socially sensitive and safety-critical contexts. We introduce a local rule-based explanation method providing faithful explanations of the decision made by a bla...
Black box AI systems for automated decision making, often based on machine learning over (big) data, map a user’s features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases inherited by the algorithms from human prejudices and collection artifacts hidden in...
We study the problem of estimating the total volume of queries of a specific domain, which were submitted to the Google search engine in a given time period. Our statistical model assumes a Zipf's law distribution of the population in the reference domain, and a non-uniform or noisy sampling of queries. Parameters of the distribution are estimated...
Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a...
ICT risk assessment and management relies on the analysis of data on the joint behavior of a target system and its attackers. The tools in the Haruspex suite model intelligent, goal-oriented attackers that reach their goals through sequences of attacks. The tools synthetically generate these sequences through a Monte Carlo method that runs multiple...
We introduce a framework for the data-driven analysis of social segregation of minority groups, and challenge it on a complex scenario. The framework builds on quantitative measures of segregation, called segregation indexes, proposed in the social science literature. The segregation discovery problem is introduced, which consists of searching sub-...
Black box systems for automated decision making, often based on machine learning over (big) data, map a user's features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases hidden in the algorithms, due to human prejudices and collection artifacts hidden in the...
The recent years have witnessed the rise of accurate but obscure decision systems which hide the logic of their internal decision processes to the users. The lack of explanations for the decisions of black box systems is a key ethical issue, and a limitation to the adoption of machine learning components in socially sensitive and safety-critical co...
Security risk assessment and prevention in ICT systems rely on the analysis of data on the joint behavior of the system and its (malicious) users. The Haruspex tool models intelligent, goal-oriented agents that reach their goals through attack sequences. Data is synthetically generated through a Monte Carlo method that runs multiple simulations of...
Segregation discovery consists of finding contexts of segregation of social groups distributed across units or communities. The SCube system implements an approach for segregation discovery on top of frequent itemset mining, by offering to the analyst a multi-dimensional (segregation) data cube for exploratory data analysis. The demonstration first...
Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorith...
During the last 35 years, data management principles such as physical and logical
independence, declarative querying and cost-based optimization have led to profound
pervasiveness of relational databases in any kind of organization. More importantly, these
technical advances have enabled the first round of business intelligence applications and...
Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorith...
The acceptance of analytical methods for discrimination discovery by practitioners and legal scholars can be only achieved if the data mining and machine learning communities will be able to provide case studies, methodological refinements, and the consolidation of a KDD process. We summarize here an approach along these directions.
Social discrimination is considered illegal and unethical in the modern world. Such discrimination is often implicit in observed decisions' datasets, and anti-discrimination organizations seek to discover cases of discrimination and to understand the reasons behind them. Previous work in this direction adopted simple observational data analysis; ho...
The aim of this article is to synthetically describe the research projects that a selection of Italian universities is undertaking in the context of big data. Far from being exhaustive, this article has the objective of offering a sample of distinct applications that address the issue of managing huge amounts of data in Italy, collected in relation...
We introduce a framework for a data-driven analysis of segregation of minority groups in social networks, and challenge it on a complex scenario. The framework builds on quantitative measures of segregation, called segregation indexes, proposed in the social science literature. The segregation discovery problem consists of searching sub-graphs and...
AI can significantly support the societal and economic development driven by the rapidly emerging demand for innovative technologies in many application areas both in the public and private sectors. A special track of the XIII Symposium of the Italian Association for Artificial Intelligence (AI*IA 2014) held in Pisa, Italy 10-12 December 2014, enti...
Social discrimination discovery from data is an important task to identify illegal and unethical discriminatory patterns towards protected-by-law groups, e.g., ethnic minorities. We deploy privacy at- tack strategies as tools for discrimination discovery under hard assumptions which have rarely tackled in the literature: indirect discrimination dis...
We investigate the relation between t-closeness, a well-known model of data anonymization against attribute disclosure, and _-protection, a model of the social discrimination hidden in data. We show that t-closeness implies bdf (t)-protection, for a bound function bdf () depending on the discrimination measure f() at hand. This allows us to adapt i...
The whole computer hardware industry embraced multi-core. The extreme optimisation of sequential algorithms is then no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an...
Objective
Assessing the frequency of Wearing-Off (WO) in Parkinson's disease (PD) patients, and its impact on Quality of Life (QoL).
Methods
Consecutive ambulatory patients, who were on dopaminergic treatment for ≥1 year, were included in this multicentre, observational cross-sectional study. In a single visit, WO was diagnosed based on neurologis...
We investigate the relation between t-closeness, a well-known model of data anonymization, and α-protection, a model of data discrimination. We show that t-closeness implies bd(t)-protection, for a bound function bd() depending on the discrimination measure at hand. This allows us to adapt an inference control method, the Mondrian multidimensional...
The collection and analysis of observational and experimental data represent the main tools for assessing the presence, the extent, the nature, and the trend of discrimination phenomena. Data analysis techniques have been proposed in the last 50 years in the economic, legal, statistical, and, recently, in the data mining literature. This is not sur...
Discovering contexts of unfair decisions in a dataset of historical decision records is a non-trivial problem. It requires the design of ad hoc methods and techniques of analysis, which have to comply with existing laws and with legal argumentations. While some data mining techniques have been adapted to the purpose, the state-of-the-art of researc...
In this paper, we explore the computational complexity of the conjunctive fragment of the first-order theory of linear arithmetic. Quantified propositional formulas of linear inequalities with (k−1)(k−1) quantifier alternations are log-space complete in ΣkP or ΠkP depending on the initial quantifier. We show that when we restrict ourselves to quant...
Parameterized linear systems allow for modelling and reasoning over classes of polyhedra. Collections of squares, rectangles, polytopes, and so on, can readily be defined by means of linear systems with parameters. In this paper, we investigate the problem of learning a parameterized linear system whose class of polyhedra includes a given set of ex...
A Quantified Linear Implication (QLI) is an inclusion query over two polyhedral sets, with a quantifier string that specifies which variables are existentially quantified and which are universally quantified. Equivalently, it can be viewed as a quantified implication of two systems of linear inequalities. In this paper, we provide a 2-person game s...
Information about patients' adherence to therapy represents a primary issue in Parkinson's disease (PD) management. To perform the linguistic validation of the Italian version of the self-rated 8-Item Morisky Medical Adherence Scale (MMAS-8) and to describe in a sample of Italian patients affected by PD the adherence to anti-Parkinson drug therapy...
Discrimination discovery from data consists in the extraction of discriminatory situations and practices hidden in a large amount of historical decision records.We discuss the challenging problems in discrimination discovery, and present, in a unified form, a framework based on classification rules extraction and filtering on the basis of legally-g...
Discrimination data analysis has been investigated for the last fifty years in a large body of social, legal, and economic studies. Recently, discrimination discov-ery and prevention has become a blooming research topic in the knowledge discov-ery community. This chapter provides a multi-disciplinary annotated bibliography of the literature on disc...
The selection of projects for funding can hide discriminatory decisions. We present a case study investigating gender discrimination in a dataset of scientific research proposals submitted to an Italian national call. The method for the analysis relies on a data mining classification strategy that is inspired by a legal methodology for proving evid...
Extending linear constraints by admitting parameters allows for more abstract problem modeling and reasoning. A lot of focus has been given to conducting research that demonstrates the usefulness of parameterized linear constraints and implementing tools that utilize their modeling strength. However, there is no approach that considers basic theore...
Data mining approaches for discrimination discovery unveil contexts of possible discrimination against protected-by-law groups by extracting classification rules from a dataset of historical decision records. Rules are ranked according to some legally-grounded contrast measure defined over a 4-fold contingency table, including risk difference, risk...
In this paper we discuss the computational complexities of procedures for inclusion queries over polyhedral sets. The polyhedral sets that we consider occur in a wide range of applications, ranging from logistics to program verification. The goal of our study is to establish bound-aries between hard and easy problems in this context.
With the support of the legally-grounded methodology of situation testing, we tackle the problems of discrimination discovery and prevention from a dataset of historical decisions by adopting a variant of k-NN classification. A tuple is labeled as discriminated if we can observe a significant difference of treatment among its neighbors belonging to...
The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This pa...
We present a knowledge discovery case study on customer classification having the objective of mining the distinctive characteristics of new customers of a service of tax return. Two general approaches are described. The first one, a symbolic approach, is based on extracting and ranking classification rules on the basis of significativeness measure...
Concise representations of frequent itemsets sacrifice readability and direct interpretability by a data analyst of the concise patterns extracted. In this paper, we introduce an extension of itemsets, called regular, with an immediate semantics and interpretability, and a conciseness comparable to closed itemsets. Regular itemsets allow for specif...
We present a type system for linear constraints over the reals intended for reasoning about the input-output directionality of variables. Types model the properties of definiteness, range width or approximation, lower and upper bounds of variables in a linear constraint. Several proof proce-dures are presented for inferring the type of a variable a...
Owing to uncertainty on the pathogenic mechanisms underlying motor neuron degeneration in amyotrophic lateral sclerosis (ALS) riluzole remains the only available therapy, with only marginal effects on disease survival. Here we review some of the recent advances in the search for disease-modifying drugs for ALS based on their putative neuroprotectiv...
The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This pa...
Discrimination discovery in databases consists in finding unfair practices against minorities which are hidden in a dataset of historical decisions. The DCUBE system implements the approach of [5], which is based on classification rule extraction and analysis, by centering the analysis phase around an Oracle database. The proposed demonstration gui...
In the context of civil rights law, discrimination refers to unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit. Discrimination in credit, mortgage, insurance, labor market, and education has been investigated by researchers in economics and human sciences. With the advent of au...
We introduce an extension of linear constraints, called linear- range constraints, which allows for (meta-)reasoning about the approximation width of variables. Semantics for linear- range constraints is provided in terms of parameterized linear systems. We devise procedures for checking satisfiability and for entailing the maximal width of a varia...
The intermittent oral intake of the dopamine (DA) precursor L-3,4-dihydroxyphenylalanine (L-DOPA) is the classic therapy of Parkinson's disease (PD). In this way, the drug precursor can be metabolised into the active neurotransmitter DA. Although this occurs throughout the brain, the therapeutic relief is believed to be due to restoring extracellul...
We present a reference model for finding (prima facie) evidence of discrimination in datasets of historical decision records
in socially sensitive tasks, including access to credit, mortgage, insurance, labor market and other benefits. We formalize
the process of direct and indirect discrimination discovery in a rule-based framework, by modelling p...
Discrimination in social sense (e.g., against minorities and disadvantaged groups) is the subject of many laws worldwide, and it has been extensively studied in the social and economic sciences. We tackle the problem of determining, given a dataset of historical decision records, a precise measure of the degree of discrimination suffered by a given...
ABSTRACT Seven patients, six suffering from amyotrophic lateral sclerosis (ALS) and one from Friedreich ataxia, were treated with a placebo i.v. infusion during the first day and with TRH-T i.v. infusion at a rate of 2 mg/h for 8 h daily (total daily dosage 16 mg) on the 2 consecutive days. Continuous blood pressure (BP) and EKG monitorings were pe...
1. Proc Natl Acad Sci U S A. 2008 Feb 12;105(6):2052-7. doi:
10.1073/pnas.0708022105. Epub 2008 Feb 4.
Lithium delays progression of amyotrophic lateral sclerosis.
Fornai F(1), Longone P, Cafaro L, Kastsiuchenka O, Ferrucci M, Manca ML, Lazzeri
G, Spalloni A, Bellio N, Lenzi P, Modugno N, Siciliano G, Isidoro C, Murri L,
Ruggieri S, Paparelli A....