About
35
Publications
3,302
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
500
Citations
Citations since 2017
Introduction
Skills and Expertise
Publications
Publications (35)
Continual learning enables the incremental training of machine learning models on non-stationary data streams.While academic interest in the topic is high, there is little indication of the use of state-of-the-art continual learning algorithms in practical machine learning deployment. This paper presents Renate, a continual learning library designe...
Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidel...
In many real-world scenarios, data to train machine learning models become available over time. However, neural network models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is often difficult to prevent due to practical constraints, such as...
The goal of continual learning (CL) is to efficiently update a machine learning model with new data without forgetting previously-learned knowledge. Most widely-used CL methods rely on a rehearsal memory of data points to be reused while training on new data. Curating such a rehearsal memory to maintain a small, informative subset of all the data s...
Learning text classifiers based on pre-trained language models has become the standard practice in natural language processing applications. Unfortunately, training large neural language models, such as transformers, from scratch is very costly and requires a vast amount of training data, which might not be available in the application domain of in...
We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. We evaluate the method in the context of continual learning, where it can be used to curate a rehearsal memory. Our method performs strong competitors...
In this work we consider the problem of repeated hyperparameter and neural architecture search (HNAS). We propose an extension of Successive Halving that is able to leverage information gained in previous HNAS problems with the goal of saving computational resources. We empirically demonstrate that our solution is able to drastically decrease costs...
AutoML systems provide a black-box solution to machine learning problems by selecting the right way of processing features, choosing an algorithm and tuning the hyperparameters of the entire pipeline. Although these systems perform well on many datasets, there is still a non-negligible number of datasets for which the one-shot solution produced by...
Contextual bandit algorithms are extremely popular and widely used in recommendation systems to provide online personalised recommendations. A recurrent assumption is the stationarity of the reward function, which is rather unrealistic in most of the real-world applications. In the music recommendation scenario for instance, people's music taste ca...
Personalization is a crucial aspect of many online experiences. In particular, content ranking is often a key component in delivering sophisticated personalization results. Commonly, supervised learning-to-rank methods are applied, which suffer from bias introduced during data collection by production systems in charge of producing the ranking. To...
Delayed feedback is an ubiquitous problem in many industrial systems employing bandit algorithms. Most of those systems seek to optimize binary indicators as clicks. In that case, when the reward is not sent immediately, the learner cannot distinguish a negative signal from a not-yet-sent positive one: she might be waiting for a feedback that will...
We investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks that implements the underlying feedback sharing mechanism by estimating the neighborhood of users in a context-dependent manner. CAB makes sharp departures from the state of the art by incorporating collaborative effects into inference as well as learning...
We investigate two data-dependent clustering techniques for content
recommendation based on exploration-exploitation strategies in contextual
multiarmed bandit settings. Our algorithms dynamically group users based on the
items under consideration and, possibly, group items based on the similarity of
the clusterings induced over the users. The resu...
We investigate two context-dependent clustering techniques for content
recommendation based on exploration-exploitation strategies in contextual
multiarmed bandit settings. Our algorithms dynamically group users based on the
items under consideration and, possibly, group items based on the similarity of
the clusterings induced over the users. The r...
We show that the mistake bound for predicting the nodes of an arbitrary weighted graph is characterized (up to logarithmic factors) by the weighted cutsize of a ran- dom spanning tree of the graph. The cutsize is induced by the unknown adversarial labeling of the graph nodes. In deriving our characterizati on, we obtain a simple randomized algorith...
We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation ("bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world...
Twitter is one of the most popular micro-blogging services in the world, often studied in the context of political opinion mining for its peculiar nature of online public discussion platform. In our work we analyse the phenomenon of political disaffection defined as the "lack of confidence in the political process, politicians, and democratic insti...
Online Social Networks (OSN) have enriched the social lives of millions of users. Discovering new friends in the social network is valuable both for the user and for the health of OSN since users with more friends engage longer and more often with the site. The simplest way to formalize friend-ship recommendation is to cast the problem as a link pr...
Multi-armed bandit problems are receiving a great deal of attention because
they adequately formalize the exploration-exploitation trade-offs arising in
several industrially relevant applications, such as online advertisement and,
more generally, recommendation systems. In many cases, however, these
applications have a strong social component, whos...
In our work we analyse the political disaffection or "the subjective feeling
of powerlessness, cynicism, and lack of confidence in the political process,
politicians, and democratic institutions, but with no questioning of the
political regime" by exploiting Twitter data through machine learning
techniques. In order to validate the quality of the t...
We investigate the problem of active learning on a given tree whose nodes are
assigned binary labels in an adversarial way. Inspired by recent results by
Guillory and Bilmes, we characterize (up to constant factors) the optimal
placement of queries so to minimize the mistakes made on the non-queried nodes.
Our query selection algorithm is extremely...
Predicting the nodes of a given graph is a fascinating theoretical problem
with applications in several domains. Since graph sparsification via spanning
trees retains enough information while making the task much easier, trees are
an important special case of this problem. Although it is known how to predict
the nodes of an unweighted tree in a nea...
We present very efficient active learning algorithms for link classification
in signed networks. Our algorithms are motivated by a stochastic model in which
edge labels are obtained through perturbations of a initial sign assignment
consistent with a two-clustering of the nodes. We provide a theoretical
analysis within this model, showing that we c...
Motivated by social balance theory, we develop a theory of link
classification in signed networks using the correlation clustering index as
measure of label regularity. We derive learning bounds in terms of correlation
clustering within three fundamental transductive learning settings: online,
batch and active. Our main algorithmic contribution is...
We investigate the problem of sequentially predicting the binary labels on
the nodes of an arbitrary weighted graph. We show that, under a suitable
parametrization of the problem, the optimal number of prediction mistakes can
be characterized (up to logarithmic factors) by the cutsize of a random
spanning tree of the graph. The cutsize is induced b...
The task of predicting the label of a network node, based on the labels of the remaining nodes, is an area of growing interest in machine learning, as various types of data are naturally represented as nodes in a graph. As an increasing number of methods and approaches are proposed to solve this task, the problem of comparing their performance beco...
We introduce a scalable algorithm, MUCCA, for multiclass node classification
in weighted graphs. Unlike previously proposed methods for the same task, MUCCA
works in time linear in the number of nodes. Our approach is based on a
game-theoretic formulation of the problem in which the test labels are
expressed as a Nash Equilibrium of a certain game....
Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nea...
Active learning algorithms for graph node classification select a subset L of nodes in a given graph. The goal is to minimize the mistakes made on the remaining nodes by a standard node classifier using L as training set. Bilmes and Guillory introduced a combinatorial quantity, Ψ * (L), and related it to the performance of the mincut classifier run...