Michele Donini

Michele Donini
  • PhD
  • Researcher at Amazon

About

65
Publications
11,670
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,825
Citations
Current institution
Amazon
Current position
  • Researcher
Additional affiliations
January 2013 - March 2016
University of Padua
Position
  • PhD Student
Description
  • Machine Learning

Publications

Publications (65)
Preprint
Hyperparameters are configuration variables controlling the behavior of machine learning algorithms. They are ubiquitous in machine learning and artificial intelligence and the choice of their values determine the effectiveness of systems based on these technologies. Manual hyperparameter search is often unsatisfactory and becomes unfeasible when t...
Preprint
fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then prese...
Chapter
Explainability of machine learning methods is of fundamental importance in healthcare to calibrate trust. A large branch of explainable machine learning uses tools linked to the Shapley value, which have nonetheless been found difficult to interpret and potentially misleading. Taking multiclass classification as a reference task, we argue that a cr...
Preprint
Full-text available
We revisit the problem of fair principal component analysis (PCA), where the goal is to learn the best low-rank linear approximation of the data that obfuscates demographic information. We propose a conceptually simple approach that allows for an analytic solution similar to standard PCA and can be kernelized. Our methods have the same complexity a...
Preprint
Full-text available
We present Fortuna, an open-source library for uncertainty quantification in deep learning. Fortuna supports a range of calibration techniques, such as conformal prediction that can be applied to any trained neural network to generate reliable uncertainty estimates, and scalable Bayesian inference methods that can be applied to Flax-based deep neur...
Preprint
Full-text available
Data-driven methods that detect anomalies in times series data are ubiquitous in practice, but they are in general unable to provide helpful explanations for the predictions they make. In this work we propose a model-agnostic algorithm that generates counterfactual ensemble explanations for time series anomaly detection models. Our method generates...
Preprint
Full-text available
The large size and complex decision mechanisms of state-of-the-art text classifiers make it difficult for humans to understand their predictions, leading to a potential lack of trust by the users. These issues have led to the adoption of methods like SHAP and Integrated Gradients to explain classification decisions by assigning importance scores to...
Preprint
Full-text available
With the increasing adoption of machine learning (ML) models and systems in high-stakes settings across different industries, guaranteeing a model's performance after deployment has become crucial. Monitoring models in production is a critical aspect of ensuring their continued performance and reliability. We present Amazon SageMaker Model Monitor,...
Article
Full-text available
In many machine learning scenarios, looking for the best classifier that fits a particular dataset can be very costly in terms of time and resources. Moreover, it can require deep knowledge of the specific domain. We propose a new technique which does not require profound expertise in the domain and avoids the commonly used strategy of hyper-parame...
Article
We present a simple and effective methodology for the generation of lexicons (word lists) that may be used in natural language scoring applications. In particular, in the finance industry, word lists have become ubiquitous for sentiment scoring. These have been derived from dictionaries such as the Harvard Inquirer and require manual curation. Here...
Preprint
Full-text available
Understanding the predictions made by machine learning (ML) models and their potential biases remains a challenging and labor-intensive task that depends on the application, the dataset, and the specific model. We present Amazon SageMaker Clarify, an explainability feature for Amazon SageMaker that launched in December 2020, providing insights into...
Article
The central goal of algorithmic fairness is to develop AI-based systems which do not discriminate subgroups in the population with respect to one or multiple notions of inequity, knowing that data is often humanly biased. Researchers are racing to develop AI-based systems able to reach superior performance in terms of accuracy, increasing the risk...
Preprint
Full-text available
Hyperparameter optimization (HPO) is increasingly used to automatically tune the predictive performance (e.g., accuracy) of machine learning models. However, in a plethora of real-world applications, accuracy is only one of the multiple -- often conflicting -- performance criteria, necessitating the adoption of a multi-objective (MO) perspective. W...
Preprint
Full-text available
With the ever-increasing complexity of neural language models, practitioners have turned to methods for understanding the predictions of these models. One of the most well-adopted approaches for model interpretability is feature-based interpretability, i.e., ranking the features in terms of their impact on model predictions. Several prior studies h...
Preprint
Full-text available
Tuning complex machine learning systems is challenging. Machine learning models typically expose a set of hyperparameters, be it regularization, architecture, or optimization parameters, whose careful tuning is critical to achieve good performance. To democratize access to such systems, it is essential to automate this tuning process. This paper pr...
Conference Paper
Full-text available
We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedule -- the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap...
Preprint
Full-text available
Given the increasing importance of machine learning (ML) in our lives, algorithmic fairness techniques have been proposed to mitigate biases that can be amplified by ML. Commonly, these specialized techniques apply to a single family of ML models and a specific definition of fairness, limiting their effectiveness in practice. We introduce a general...
Article
We address the problem of randomized learning and generalization of fair and private classifiers. From one side we want to ensure that sensitive information does not unfairly influence the outcome of a classifier. From the other side we have to learn from data while preserving the privacy of individual observations. We initially face this issue in...
Preprint
Full-text available
We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization. This allows us to explicitly search for schedules that achieve good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate, the hypergradient, and based on this we introduce a n...
Preprint
In many machine learning scenarios, looking for the best classifier that fits a particular dataset can be very costly in terms of time and resources. Moreover, it can require deep knowledge of the specific domain. We propose a new technique which does not require profound expertise in the domain and avoids the commonly used strategy of hyper-parame...
Preprint
Full-text available
Developing learning methods which do not discriminate subgroups in the population is a central goal of algorithmic fairness. One way to reach this goal is by modifying the data representation in order to meet certain fairness constraints. In this work we measure fairness according to demographic parity. This requires the probability of the possible...
Article
Full-text available
Combining neuroimaging and clinical information for diagnosis, as for example behavioral tasks and genetics characteristics, is potentially beneficial but presents challenges in terms of finding the best data representation for the different sources of information. Their simple combination usually does not provide an improvement if compared with us...
Preprint
Full-text available
We tackle the problem of algorithmic fairness, where the goal is to avoid the unfairly influence of sensitive information, in the general context of regression with possible continuous sensitive attributes. We extend the framework of fair empirical risk minimization to this general scenario, covering in this way the whole standard supervised learni...
Conference Paper
Full-text available
A central goal of algorithmic fairness is to reduce bias in automated decision making. An unavoidable tension exists between accuracy gains obtained by using sensitive information as part of a statistical model, and any commitment to protect these characteristics. Often, due to biases present in the data, using the sensitive information in the func...
Preprint
Full-text available
Combining neuroimaging and clinical information for diagnosis, as for example behavioral tasks and genetics characteristics, is potentially beneficial but presents challenges in terms of finding the best data representation for the different sources of information. Their simple combination usually does not provide an improvement if compared with us...
Conference Paper
Full-text available
We address the problem of algorithmic fairness: ensuring that sensitive information does not unfairly influence the outcome of a classifier. We present an approach based on empirical risk minimization, which incorporates a fairness constraint into the learning problem. It encourages the conditional risk of the learned classifier to be approximately...
Preprint
Full-text available
A central goal of algorithmic fairness is to reduce bias in automated decision making. An unavoidable tension exists between accuracy gains obtained by using sensitive information (e.g., gender or ethnic group) as part of a statistical model, and any commitment to protect these characteristics. Often, due to biases present in the data, using the se...
Article
When dealing with kernel methods, one has to decide which kernel and which values for the hyperparameters to use. Resampling techniques can address this issue but these procedures are time-consuming. This problem is particularly challenging when dealing with structured data, in particular with graphs, since several kernels for graph data have been...
Article
Full-text available
We address the problem of algorithmic fairness: ensuring that sensitive variables do not unfairly influence the outcome of a classifier. We present an approach based on empirical risk minimization, which incorporates a fairness constraint into the learning problem. It encourages the conditional risk of the learned classifier to be approximately con...
Article
Full-text available
Background The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problem...
Article
Full-text available
We consider a class of a nested optimization problems involving inner and outer objectives. We observe that by taking into explicit account the optimization dynamics for the inner objective it is possible to derive a general framework that unifies gradient-based hyperparameter optimization and meta-learning (or learning-to-learn). Depending on the...
Article
Full-text available
Recent literature has shown the merits of having deep representations in the context of neural networks. An emerging challenge in kernel learning is the definition of similar deep representations. In this paper, we propose a general methodology to define a hierarchy of base kernels with increasing expressiveness and combine them via Multiple Kernel...
Conference Paper
Full-text available
In neuroimaging-based diagnostic problems, the combination of different sources of information as MR images and clinical data is a challenging task. Their simple combination usually does not provides an improvement if compared with using the best source alone. In this paper, we deal with the well known Alzheimer's Disease Neuroimaging Initiative (A...
Conference Paper
Full-text available
We address the speaker-independent acoustic inversion (AI) problem, also referred to as acoustic-to-articulatory mapping. The scarce availability of multi-speaker articulatory data makes it difficult to learn a mapping which generalizes from a limited number of training speakers and reliably reconstructs the articulatory movements of unseen speaker...
Conference Paper
Full-text available
We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror two methods of computing gradients for recurrent neural networks and have different trade-offs in terms of run...
Conference Paper
Full-text available
Model selection is one of the most computationally expensive tasks in a machine learning application. When dealing with kernel methods for structures, the choice with the largest impact on the overall performance is the selection of the feature bias, i.e. the choice of the concrete kernel for structures. Each kernel in turn exposes several hyper-pa...
Conference Paper
Full-text available
Several mechanisms exist in the literature to solve a multiclass classification problem exploiting a binary kernel-machine. Most of them are based on problem decomposition that consists on splitting the problem in many binary tasks. These tasks have different complexity and they require different kernels. Our goal is to use the Multiple Kernel Lear...
Article
Kernels for Structured Domains are widely adopted in real-world applications that involve learning on structured data. In this context many kernels have been proposed in literature, but no theoretical comparison among them is present. In this paper we provide different formal definitions of expressiveness of a kernel by exploiting the most recent r...
Article
Full-text available
We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror two methods of computing gradients for recurrent neural networks and have different trade-offs in terms of run...
Article
Full-text available
The high diffusion of smartphones in the users' pockets allows to sense their movements, thus monitoring the amount of physical activity they do during the day. But, it also gives the possibility to use these devices to persuade people to change their behaviours. In this paper, we present ClimbTheWorld, a serious game which uses a machine learning...
Conference Paper
Full-text available
Multi Conjugated Adaptive Optics is based upon tomographic reconstruction of the atmospheric turbulence over the line of sight of a telescope, achieved by combining measurements from different directions in the sky. Using deformable mirrors optically conjugated to different altitudes, a correction can be performed directly on the reconstructed turb...
Conference Paper
Full-text available
Past research on Multitask Learning (MTL) has focused mainly on devising adequate regularizers and less on their scalability. In this paper, we present a method to scale up MTL methods which penalize the variance of the task weight vectors. The method builds upon the alternating direction method of multipliers to decouple the variance regularizer....
Conference Paper
Full-text available
The increasing number of people that are overweight due to a sedentary life requires persuasive strategies to convince people to change their behaviours. In this paper, we present a machine learning based technique to recognize and count stairsteps when a person climbs or descends stairs. This technique has been used as part of ClimbTheWorld, a rea...
Conference Paper
Full-text available
We present an approach for learning an anisotropic RBF kernel in a game theoretical setting where the value of the game is the degree of separation between positive and negative training examples. The method extends a previ-ously proposed method (KOMD) to perform feature re-weighting and distance metric learning in a kernel-based classification set...
Conference Paper
Full-text available
In the last few years, the number of overweight people in wealthy countries, either among adults and children, is dramatically increasing mostly due to an incorrect diet and the lack of physical activity [6] [3]: modern technologies allow people to avoid simple strains of everyday life, like climbing the stairs using elevators and escalators. But p...
Conference Paper
Full-text available
The goal of Multiple Kernel Learning (MKL) is to combine kernels derived from multiple sources in a data-driven way with the aim to enhance the accuracy of a kernel based machine. In this paper, we propose a time and space efficient MKL algorithm that can easily cope with hundreds of thousands of kernels and more. We compared our algorithm with oth...
Conference Paper
Full-text available
Magnetic resonance imaging (MRI) allows the ac-quisition of high-resolution images of the brain. The diagno-sis of various brain illnesses is supported by the distinguished analysis of the different kind of brain tissues, which imply their segmentation and classification. Brain MRI is organized in vol-umes composed by millions of voxels (at least 6...
Conference Paper
Full-text available
Next generation commercial air transportation will likely see an increased use of rotorcraft, including heli-copter and tilt-rotors. Two advantages of rotorcraft with respect to fixed wing aircraft are the optimized aero-dynamic levitation and the possibility of takeoff and landing without a runway, which minimizes their in-terference with fixed wi...

Network

Cited By