## About

66

Publications

7,243

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

1,865

Citations

Introduction

Additional affiliations

February 2017 - present

February 2017 - present

October 2016 - December 2016

**Microsoft Research New England**

Position

- Researcher

Education

January 2005 - December 2007

**Università degli Studi di Torino and ISI Foundation Torino**

Field of study

- Computational neuroscience

September 1997 - September 2004

## Publications

Publications (66)

We show that discrete synaptic weights can be efficiently used for learning
in large scale neural systems, and lead to unanticipated computational
performance. We focus on the representative case of learning random patterns
with binary synapses in single layer networks. The standard statistical
analysis shows that this problem is exponentially domi...

Significance
Artificial neural networks are some of the most widely used tools in data science. Learning is, in principle, a hard problem in these systems, but in practice heuristic algorithms often find solutions with good generalization properties. We propose an explanation of this good performance in terms of a nonequilibrium statistical physics...

Significance
Most biological processes rely on specific interactions between proteins, but the experimental characterization of protein−protein interactions is a labor-intensive task of frequently uncertain outcome. Computational methods based on exponentially growing genomic databases are urgently needed. It has recently been shown that coevolutio...

The anterior inferotemporal cortex (IT) is the highest stage along the hierarchy of visual areas that, in primates, processes visual objects. Although several lines of evidence suggest that IT primarily represents visual shape information, some recent studies have argued that neuronal ensembles in IT code the semantic membership of visual objects (...

Recent experimental studies indicate that synaptic changes induced by neuronal activity are discrete jumps between a small number of stable states. Learning in systems with discrete synapses is known to be a computationally hard problem. Here, we study a neurobiologically plausible on-line learning algorithm that derives from belief propagation alg...

Multiple neurophysiological experiments have shown that dendritic non-linearities can have a strong influence on synaptic input integration. In this work we model a single neuron as a two-layer computational unit with non-overlapping sign-constrained synaptic weights and a biologically plausible form of dendritic non-linearity, which is analyticall...

We study the binary and continuous negative-margin perceptrons as simple nonconvex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These min...

We develop a full-fledged analysis of an algorithmic decision process that, in a multialternative choice problem, produces computable choice probabilities and expected decision times.

We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These mi...

We apply digitized quantum annealing (QA) and quantum approximate optimization algorithm (QAOA) to a paradigmatic task of supervised learning in artificial neural networks: the optimization of synaptic weights for the binary perceptron. At variance with the usual QAOA applications to MaxCut, or to quantum spin-chains ground-state preparation, here...

We systematize the approach to the investigation of deep neural network landscapes by basing it on the geometry of the space of implemented functions rather than the space of parameters. Grouping classifiers into equivalence classes, we develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology. On...

We introduce an evolutionary algorithm called recombinator-
$k$
-means for optimizing the highly nonconvex kmeans problem. Its defining feature is that its crossover step involves all the members of the current generation, stochastically recombining them with a repurposed variant of the
$k$
-means++ seeding algorithm. The recombination also uses...

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning an...

We present a meta-method for initializing (seeding) the $k$-means clustering algorithm called PNN-smoothing. It consists in splitting a given dataset into $J$ random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a meta-method in the sense that when cluster...

We systematize the approach to the investigation of deep neural network landscapes by basing it on the geometry of the space of implemented functions rather than the space of parameters. Grouping classifiers into equivalence classes, we develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology. On...

The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of n...

We apply digitized Quantum Annealing (QA) and Quantum Approximate Optimization Algorithm (QAOA) to a paradigmatic task of supervised learning in artificial neural networks: the optimization of synaptic weights for the binary perceptron. At variance with the usual QAOA applications to MaxCut, or to quantum spin-chains ground state preparation, the c...

The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. In this work we first discuss the relationship between alternative measures of flatness: the local entropy , which is useful for an...

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that escape the bias-variance predictions of sta...

The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of n...

We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We explor...

We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We explor...

Simulated Annealing is the crowning glory of Markov Chain Monte Carlo Methods for the solution of NP-hard optimization problems in which the cost function is known. Here, by replacing the Metropolis engine of Simulated Annealing with a reinforcement learning variation -- that we call Macau Algorithm -- we show that the Simulated Annealing heuristic...

The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. First, we discuss Gaussian mixture classification models and show analytically that there exist Bayes optimal pointwise estimators...

In this paper, we provide an axiomatic foundation for the value-based version of the drift diffusion model (DDM) of Ratcliff, a successful model that describes two-alternative speeded decisions between consumer goods. Our axioms present a test for model misspecification and connect the externally observable properties of choice with an important ne...

Even if we know that two families of homologous proteins interact, we do not necessarily know, which specific proteins interact inside each species. The reason is that most families contain paralogs, i.e., more than one homologous sequence per species. We have developed a tool to predict interacting paralogs between the two protein families, which...

Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How t...

The geometrical features of the (non-convex) loss landscape of neural network models are crucial in ensuring successful optimization and, most importantly, the capability to generalize well. While minimizers' flatness consistently correlates with good generalization, there has been little rigorous work in exploring the condition of existence of suc...

Rectified linear units (ReLUs) have become the main model for the neural units in current deep learning systems. This choice was originally suggested as a way to compensate for the so-called vanishing gradient problem which can undercut stochastic gradient descent learning in networks composed of multiple layers. Here we provide analytical results...

Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the ‘local entropy’...

Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the `local entropy'...

Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems. This choice has been originally suggested as a way to compensate for the so called vanishing gradient problem which can undercut stochastic gradient descent (SGD) learning in networks composed of multiple layers. Here we provide analytica...

Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points, and that such minimizers are often satisfactory at avoiding overf...

We present a simple heuristic algorithm for efficiently optimizing the notoriously hard "minimum sum-of-squares clustering" problem, usually addressed by the classical k-means heuristic and its variants. The algorithm, called recombinator-k-means, is very similar to a genetic algorithmic scheme: it uses populations of configurations, that are optim...

Stochastic neural networks are a prototypical computational device able to build a probabilistic representation of an ensemble of external stimuli. Building on the relationship between inference and learning, we derive a synaptic plasticity rule that relies only on delayed activity correlations, and that shows a number of remarkable features. Our d...

Stochastic neural networks are a prototypical computational device able to build a probabilistic representation of an ensemble of external stimuli. Building on the relation between inference and learning, we derive a synaptic plasticity rule that relies only on delayed activity correlations, and that shows a number of remarkable features. Our "dela...

We present a brief introduction to the statistical mechanics approaches for the study of inverse problems in data science. We then provide concrete new results on inferring couplings from sampled configurations in systems characterized by an extensive number of stable attractors in the low temperature regime. We also show how these result are conne...

Stochasticity and limited precision of synaptic weights in neural network models is a key aspect of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties suc...

We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4x faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-100, without introducing any additional hyper-parameters. We expl...

Significance
Quantum annealers are physical quantum devices designed to solve optimization problems by finding low-energy configurations of an appropriate energy function by exploiting cooperative tunneling effects to escape local minima. Classical annealers use thermal fluctuations for the same computational purpose, and Markov chains based on thi...

Background
Distinct RNA species may compete for binding to microRNAs (miRNAs). This competition creates an indirect interaction between miRNA targets, which behave as miRNA sponges and eventually influence each other’s expression levels. Theoretical predictions suggest that not only the mean expression levels of targets but also the fluctuations ar...

We present a method for Monte Carlo sampling on systems with discrete variables (focusing in the Ising case), introducing a prior on the candidate moves in a Metropolis-Hastings scheme which can significantly reduce the rejection rate, called the reduced-rejection-rate (RRR) method. The method employs same probability distribution for the choice of...

Understanding protein-protein interactions is central to our understanding of almost all complex biological processes. Computational tools exploiting rapidly growing genomic databases to characterize protein-protein interactions are urgently needed. Such methods should connect multiple scales from evolutionary conserved interactions between familie...

We introduce a novel Entropy-driven Monte Carlo (EdMC) strategy to
efficiently sample solutions of random Constraint Satisfaction Problems (CSPs).
First, we extend a recent result that, using a large-deviation analysis, shows
that the geometry of the space of solutions of the Binary Perceptron Learning
Problem (a prototypical CSP), contains regions...

Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic states. The choice of discrete synapses is motivated by biological reasoning and experiments, and possibly by hardware implementation considerations as well. In this paper we extend a previous large deviations analysis which unveiled the exi...

Understanding the theoretical foundations of how memories are encoded and
retrieved in neural populations is a central challenge in neuroscience. A
popular theoretical scenario for modeling memory function is the attractor
neural network scenario, whose prototype is the Hopfield model. The model has a
poor storage capacity, compared with the capaci...

We present an efficient learning algorithm for the problem of training neural
networks with discrete synapses, a well-known hard (NP-complete) discrete
optimization problem. The algorithm is a variant of the so-called Max-Sum (MS)
algorithm. In particular, we show how, for bounded integer weights with $q$
distinct states and independent concave a p...

Recent studies reported complex post-transcriptional interplay among targets
of a common pool of microRNAs, a class of small non-coding downregulators of
gene expression. Behaving as microRNA-sponges, distinct RNA species may compete
for binding to microRNAs and coregulate each other in a dose-dependent manner.
Although previous studies in cell pop...

The scope of these lecture notes is to provide an introduction to modern statistical physics mean-field methods for the study of phase transitions and optimization problems over random structures. We first give a brief introduction to the field using as tutorial example the percolation problem in random graphs. Next we describe the so called cavity...

Memristors are memory resistors that promise the efficient implementation of synaptic weights in artificial neural networks [1]. This kind of technology has permitted the implementation of a large number of real world data in an evolutionary learning artificial system. Human brain is capable of processing such data with standard always equal signal...

In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring p...

Advances in experimental techniques resulted in abundant genomic, transcriptomic, epigenomic, and proteomic data that have the potential to reveal critical drivers of human diseases. Complementary algorithmic developments enable researchers to map these data onto protein-protein interaction networks and infer which signaling pathways are perturbed...

Neural networks are able to extract information from the timing of spikes.
Here we provide new results on the behavior of the simplest neuronal model
which is able to decode information embedded in temporal spike patterns, the so
called tempotron. Using statistical physics techniques we compute the capacity
for the case of "material" discrete synap...

Object categories of the three clustering hypotheses. The 11 semantic categories (A), the 15 shape-based categories (B) and the 8 low-level object categories (C). See main text (Materials and Methods) for a definition of the categories.
(TIF)

Computation of the stability region in the parameter space of the D-MST clustering algorithm. Average number of clusters and average overlap (inset) in repeated D-MST clustering outcomes, showing the only stable region of the parameters (found at dmax = 6, λ ∈ [0.74,0.88]). The main panel shows the average number of clusters at dmax = 6 as a functi...

Face selectivity of the recorded inferotemporal neurons. (A) Histogram showing the distribution of the Face Selectivity Index (FSI) across the recorded population of IT neurons. The index was defined, according to Tsao et al (Science, 2006), as: FSI = (mean responsefaces−mean responsenon-face objects)/(mean responsefaces+mean responsenon-face objec...

Supporting Materials and Methods. Description of how the D-MST clustering algorithm was applied in the context of this study.
(DOC)

We consider the generalization problem for a perceptron with binary synapses, implementing the Stochastic Belief-Propagation-Inspired
(SBPI) learning algorithm which we proposed earlier, and perform a mean-field calculation to obtain a differential equation
which describes the behaviour of the device in the limit of a large number of synapses N. We...