Publications (98)255.16 Total impact
 [Show abstract] [Hide abstract]
ABSTRACT: The meanfield (MF) approximation offers a simple, fast way to infer direct interactions between elements in a network of correlated variables, a common, computationally challenging problem with practical applications in fields ranging from physics and biology to the social sciences. However, MF methods achieve their best performance with strong regularization, well beyond Bayesian expectations, an empirical fact that is poorly understood. In this work, we study the influence of pseudocount and L_{2}norm regularization schemes on the quality of inferred Ising or Potts interaction networks from correlation data within the MF approximation. We argue, based on the analysis of small systems, that the optimal value of the regularization strength remains finite even if the sampling noise tends to zero, in order to correct for systematic biases introduced by the MF approximation. Our claim is corroborated by extensive numerical studies of diverse model systems and by the analytical study of the mcomponent spin model for large but finite m. Additionally, we find that pseudocount regularization is robust against sampling noise and often outperforms L_{2}norm regularization, particularly when the underlying network of interactions is strongly heterogeneous. Much better performances are generally obtained for the Ising model than for the Potts model, for which only couplings incoming onto mediumfrequency symbols are reliably inferred.Physical Review E 07/2014; 90(11):012132. · 2.33 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Experiments indicate that unbinding rates of proteins from DNA can depend on the concentration of proteins in nearby solution. Here we present a theory of multistep replacement of DNAbound proteins by solutionphase proteins. For four different kinetic scenarios we calculate the depen dence of protein unbinding and replacement rates on solution protein concentration. We find (1) strong effects of progressive 'rezipping' of the solutionphase protein onto DNA sites liberated by 'unzipping' of the originally bound protein; (2) that a model in which solutionphase proteins bind nonspecifically to DNA can describe experiments on exchanges between the non specific DNA binding proteins FisFis and FisHU; (3) that a binding specific model describes experiments on the exchange of CueR proteins on specific binding sites.Physical Review Letters 05/2014; 112(23). · 7.73 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: We outline a theory to quantify the interplay of entropic and selective forces on nucleotide organization and apply it to the genomes of singlestranded RNA viruses. We quantify these forces as intensive variables that can easily be compared between sequences, outline a computationally efficient transfermatrix method for their calculation, and apply this method to influenza and HIV viruses. We find viruses altering their dinucleotide motif use under selective forces, with these forces on CpG dinucleotides growing stronger in influenza the longer it replicates in humans. For a subset of genes in the human genome, many involved in antiviral innate immunity, the forces acting on CpG dinucleotides are even greater than the forces observed in viruses, suggesting that both effects are in response to similar selective forces involving the innate immune system. We further find that the dynamics of entropic forces balancing selective forces can be used to predict how long it will take a virus to adapt to a new host, and that it would take H1N1 several centuries to adapt to humans from birds, typically contributing many of its synonymous substitutions to the forcible removal of CpG dinucleotides. By examining the probability landscape of dinucleotide motifs, we predict where motifs are likely to appear using only a singleforce parameter and uncover the localization of UpU motifs in HIV. Essentially, we extend the natural language and concepts of statistical physics, such as entropy and conjugated forces, to understanding viral sequences and, more generally, constrained genome evolution.Proceedings of the National Academy of Sciences 03/2014; · 9.81 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Two methods for reconstructing the freeenergy landscape of a DNA molecule from the knowledge of the equilibrium unzipping force versus extension signal are introduced: a simple and fast procedure, based on a parametric representation of the experimental force signal, and a maximumlikelihood inference of coarsegrained freeenergy parameters. In addition, we propose a force alignment procedure to correct for the drift in the experimental measure of the opening position, a major source of error. For unzipping data obtained by Huguet et al., the reconstructed basepair (bp) free energies agree with the running average of the true free energies on a 2050 bp scale, depending on the region in the sequence. Features of the landscape at a smaller scale (510 bp) could be recovered in favorable regions at the beginning of the molecule. Based on the analysis of synthetic data corresponding to the 16S rDNA gene of bacteria, we show that our approach could be used to identify specific DNA sequences among thousands of homologous sequences in a database.Biophysical Journal 01/2014; 106(2):4309. · 3.83 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: The dynamical evolution of complex systems is often intrinsically stochastic and subject to external random forces. In order to study the resulting variability in dynamics, it is essential to make measurements on replicate systems and to separate arbitrary variation of the average dynamics of these replicates from fluctuations around the average dynamics. Here we do so for population timeseries data from replicate ecosystems sharing a common average dynamics or common trend. We explain how model parameters, including the effective interactions between species and dynamical noise, can be estimated from the data and how replication reduces errors in these estimates. For this, it is essential that the model can fit a variety of average dynamics. We then show how one can judge the quality of a model, compare alternate models, and determine which combinations of parameters are poorly determined by the data. In addition we show how replicate population dynamics experiments could be designed to optimize the acquired information of interest about the systems. Our approach is illustrated on a set of time series gathered from replicate microbial closed ecosystems.Physical Review E 12/2013; 88(61):062714. · 2.31 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: We consider the HopfieldPotts model for the covariation between residues in protein families recently introduced in Cocco, Monasson, Weigt (2013). The patterns of the model are inferred from the data within a new gauge, more symmetric in the residues. We compute the statistical error bars on the pattern components. Results are illustrated on real data for a response regulator receiver domain (Pfam ID PF00072) family.Journal of Physics Conference Series 11/2013; 473(1).  [Show abstract] [Hide abstract]
ABSTRACT: Various approaches have explored the covariation of residues in multiplesequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residueresidue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the HopfieldPotts model to naturally interpolate between these two approaches. The HopfieldPotts model allows us to identify relevant 'patterns' of residues from the knowledge of the eigenmodes and eigenvalues of the residueresidue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residueresidue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiplesequence alignments of reduced size. In addition, we show that loweigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the threedimensional protein fold.PLoS Computational Biology 08/2013; 9(8):e1003176. · 4.83 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: We study the stable phases of an attractor neural network model, with binary units, for hippocampal place cells encoding onedimensional (1D) or 2D spatial maps or environments. Different maps correspond to random allocations (permutations) of the place fields. Based on replica calculations we show that, below critical levels for the noise in the neural response and for the number of environments, the network activity is spatially localized in one environment. For high noise and loads the network activity extends over space, either uniformly or with spatial heterogeneities due to the crosstalk between the maps, and memory of environments is lost. Remarkably the spatially localized regime is very robust against the neural noise until it reaches its critical level. Numerical simulations are in excellent quantitative agreement with our theoretical predictions.Physical Review E 06/2013; 87(61):062813. · 2.31 Impact Factor  BMC Neuroscience 01/2013; 14(1). · 2.85 Impact Factor
 BMC Neuroscience 01/2013; 14(1). · 2.85 Impact Factor

Article: Lorenzo Saitta, Attilio Giordana, Antoine Cornuéjols: Phase Transitions in Machine Learning
Journal of Statistical Physics 12/2012; 149(6). · 1.28 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: A simple dynamical scheme for Attractor Neural Networks with nonmonotonic three state effective neurons is discussed. For the unsupervised Hebb learning rule, we give some basic numerical results which are interpreted in terms of a combinatorial task realized by the dynamical process (dynamical selection of optimal subspaces). An analytical estimate of optimal performance is given by resorting to two different simplified versions of the model. We show that replica symmetry breaking is required since the replica symmetric solutions are unstable.International Journal of Neural Systems 11/2011; 03(supp01). · 6.06 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: We present a procedure to solve the inverse Ising problem, that is to find the interactions between a set of binary variables from the measure of their equilibrium correlations. The method consists in constructing and selecting specific clusters of variables, based on their contributions to the crossentropy of the Ising model. Small contributions are discarded to avoid overfitting and to make the computation tractable. The properties of the cluster expansion and its performances on synthetic data are studied. To make the implementation easier we give the pseudocode of the algorithm.Journal of Statistical Physics 10/2011; · 1.28 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: We present two Bayesian procedures to infer the interactions and external currents in an assembly of stochastic integrateandfire neurons from the recording of their spiking activity. The first procedure is based on the exact calculation of the most likely time courses of the neuron membrane potentials conditioned by the recorded spikes, and is exact for a vanishing noise variance and for an instantaneous synaptic integration. The second procedure takes into account the presence of fluctuations around the most likely time courses of the potentials, and can deal with moderate noise levels. The running time of both procedures is proportional to the number S of spikes multiplied by the squared number N of neurons. The algorithms are validated on synthetic data generated by networks with known couplings and currents. We also reanalyze previously published recordings of the activity of the salamander retina (including from 32 to 40 neurons, and from 65,000 to 170,000 spikes). We study the dependence of the inferred interactions on the membrane leaking time; the differences and similarities with the classical crosscorrelation analysis are discussed.Journal of Computational Neuroscience 10/2011; 31(2):199227. · 2.09 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Multielectrode recordings allow the recording of the activity of a neural population of tens to hundred cells over periods of hours. Two examples are given by the recording of the activity of ganglion cells in the retina [13,6] , and the recording of the activity of the prefrontal cortex in behaving rats [4,5]. Two important issues in neuroscience are 1) to find a predictive model able to reproduce the statistical features of the recorded activity, as the spiking frequencies of the cells, the twocell correlations and the occurrence of multicell patterns; 2) to infer from the recorded activity some functional couplings between cells, which could give an insight about neural circuits. Schneidman et al. [2] and Shlens et al. [3] have first used the Ising model to analyze retina recordings, as a Boltzmann machine (BM) able to reproduce both the average activity of the cells and the pairwise correlations between cells in an equal time window of size Dt. However efficient algorithms to infer the Ising couplings for a large population of cells remain to be found. In addition, an important question is how to deal with time (finite recording time) and space (small area recorded) undersampling. In the present work we propose a new and efficient algorithm to infer fields and pairwise couplings of an Ising model from the data. Our procedure considerably improves over the algorithm presented in [7] and is based on an adaptive cluster expansion of the cross entropy between the Ising model and the data. The interaction network is progressively unveiled, through a recursive processing of larger and larger subsets of variables, which we call clusters. To each cluster is associated an entropy contribution which assesses how much the cluster is relevant to infer the BM. Clusters such that the entropy contribution is smaller than a fixed threshold are discarded; the other clusters are kept and recursively used to generate larger clusters. The threshold must be large enough to avoid overfitting of the data corrupted by the sampling noise and small enough in order not to miss important components of the interaction network. Contrary to previous cluster expansions [7], the number, size, and composition of the clusters automatically adapt to the data, and, rather than the sole size of the cells population determine the running time of the algorithm. We provide a pseudocode for the practical implementation of our algorithm and intend to release soon an openaccess code. Our procedure has been validated on synthetic data sets, and used to reanalyze multielectrode recordings of neural cells of the activity of salamander ganglion cells previously published in [2,6] and of the activity in a region of the prefrontal cortex of a behaving rat previously published in [4,5] . The algorithm can efficiently deal with population size that varied on the data sets from 32 cells to 120 cells. To illustrate the potential applications we have tested the dependence of couplings upon stimulus on two data sets registered by Schnitzer and Meister [6] from the same retina in dark condition (spontaneous activity) and with a random flickering checkerboard. We have compared for each pair of cells i,j, the values of the interactions J ij inferred from Dark and Flicker. We have found that most of the couplings are conserved under the two stimuli but some pairs of neurons with large interactions in Flicker have weak couplings in Dark. We have used the inferred couplings to draw retinal maps in the receptive fields plane of the cells. For Dark, the largest coupling map define a planar graph with short range (almost nearest neighbor) connections. For flicker the strong non conserved couplings pointed out in the previous paragraph often are longrange interactions.BMC Neuroscience 08/2011; 12:2328. · 2.85 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: We consider the problem of inferring the interactions between a set of N binary variables from the knowledge of their frequencies and pairwise correlations. The inference framework is based on the Hopfield model, a special case of the Ising model where the interaction matrix is defined through a set of patterns in the variable space, and is of rank much smaller than N. We show that maximum likelihood inference is deeply related to principal component analysis when the amplitude of the pattern components ξ is negligible compared to √N. Using techniques from statistical mechanics, we calculate the corrections to the patterns to the first order in ξ/√N. We stress the need to generalize the Hopfield model and include both attractive and repulsive patterns in order to correctly infer networks with sparse and strong interactions. We present a simple geometrical criterion to decide how many attractive and repulsive patterns should be considered as a function of the sampling noise. We moreover discuss how many sampled configurations are required for a good inference, as a function of the system size N and of the amplitude ξ. The inference approach is illustrated on synthetic and biological data.Physical Review E 05/2011; 83(5 Pt 1):051123. · 2.31 Impact Factor 
Article: On the trajectories and performance of Infotaxis, an informationbased greedy search algorithm
[Show abstract] [Hide abstract]
ABSTRACT: We present a continuousspace version of Infotaxis, a search algorithm where a searcher greedily moves to maximize the gain in information about the position of the target to be found. Using a combination of analytical and numerical tools we study the nature of the trajectories in two and three dimensions. The probability that the search is successful and the running time of the search are estimated. A possible extension to nongreedy search is suggested.EPL (Europhysics Letters) 04/2011; 94(2):20005. · 2.27 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: We introduce a procedure to infer the interactions among a set of binary variables, based on their sampled frequencies and pairwise correlations. The algorithm builds the clusters of variables contributing most to the entropy of the inferred Ising model, and rejects the small contributions due to the sampling noise. Our procedure successfully recovers benchmark Ising models even at criticality and in the low temperature phase, and is applied to neurobiological data.Physical Review Letters 02/2011; · 7.73 Impact Factor  BMC Neuroscience 01/2011; 12:12. · 2.85 Impact Factor
 [Show abstract] [Hide abstract]
ABSTRACT: We consider the Sinai model, in which a random walker moves in a random quenched potential V, and ask the following questions: 1. how can the quenched potential V be inferred from the observations of one or more realizations of the random motion? 2. how many observations (walks) are required to make a reliable inference, that is, to be able to distinguish between two similar but distinct potentials, V 1 and V2? We show how question 1 can be easily solved within the Bayesian framework. In addition, we show that the answer to question 2 is, in general, intimately connected to the calculation of the survival probability of a fictitious walker in a potential W defined from V 1 and V2, with partial absorption at sites where V1 and V2 do not coincide. For the onedimensional Sinai model, this survival probability can be analytically calculated, in excellent agreement with numerical simulations.Journal of Physics Conference Series 12/2009; 197(1).
Publication Stats
3k  Citations  
255.16  Total Impact Points  
Top Journals
Institutions

2009–2014

Pierre and Marie Curie University  Paris 6
 • Laboratoire de Physique Théorique ENS (LPTENS)
 • Laboratoire de physique statistique de l'Ecole Normale Supérieure (LPS)  UMR 8550
Lutetia Parisorum, ÎledeFrance, France 
The Rockefeller University
New York City, New York, United States


1996–2014

French National Centre for Scientific Research
 Laboratoire Statistique et Génome
Lutetia Parisorum, ÎledeFrance, France


1995–2014

Ecole Normale Supérieure de Paris
 Laboratoire de Physique Théorique
Lutetia Parisorum, ÎledeFrance, France 
INFN  Istituto Nazionale di Fisica Nucleare
Frascati, Latium, Italy


2001–2002

University of Chicago
 James Franck Institute
Chicago, IL, United States 
Université ParisSud 11
Orsay, ÎledeFrance, France
