Francesco Zamponi’s research while affiliated with Paris Diderot University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (284)


Fluctuations and the limit of predictability in protein evolution
  • Preprint
  • File available

December 2024

·

5 Reads

Saverio Rossi

·

Leonardo Di Bari

·

Martin Weigt

·

Francesco Zamponi

Protein evolution involves mutations occurring across a wide range of time scales [Di Bari et al., PNAS 121, e2406807121 (2024)]. In analogy with other disordered systems, this dynamical heterogeneity suggests strong correlations between mutations happening at distinct sites and times. To quantify these correlations, we examine the role of various fluctuation sources in protein evolution, simulated using a data-driven epistatic landscape. By applying spatio-temporal correlation functions inspired by statistical physics, we disentangle fluctuations originating from the ancestral protein sequence from those driven by stochastic mutations along independent evolutionary paths. Our analysis shows that, in diverse protein families, fluctuations from the ancestral sequence predominate at shorter time scales. This allows us to identify a time scale over which ancestral sequence information persists, enabling its reconstruction. We link this persistence to the strength of epistatic interactions: ancestral sequences with stronger epistatic signatures impact evolutionary trajectories over extended periods. At longer time scales, however, ancestral influence fades as epistatically constrained sites evolve collectively. To confirm this idea, we apply a standard ancestral sequence reconstruction algorithm and verify that the time-dependent recovery error is influenced by the properties of the ancestor itself.

Download

Fluctuations and the limit of predictability in protein evolution

December 2024

·

8 Reads

Protein evolution involves mutations occurring across a wide range of time scales [Di Bari et al., PNAS 121, e2406807121 (2024)]. In analogy with other disordered systems, this dynamical heterogeneity suggests strong correlations between mutations happening at distinct sites and times. To quantify these correlations, we examine the role of various fluctuation sources in protein evolution, simulated using a data-driven epistatic landscape. By applying spatio-temporal correlation functions inspired by statistical physics, we disentangle fluctuations originating from the ancestral protein sequence from those driven by stochastic mutations along independent evolutionary paths. Our analysis shows that, in diverse protein families, fluctuations from the ancestral sequence predominate at shorter time scales. This allows us to identify a time scale over which ancestral sequence information persists, enabling its reconstruction. We link this persistence to the strength of epistatic interactions: ancestral sequences with stronger epistatic signatures impact evolutionary trajectories over extended periods. At longer time scales, however, ancestral influence fades as epistatically constrained sites evolve collectively. To confirm this idea, we apply a standard ancestral sequence reconstruction algorithm and verify that the time-dependent recovery error is influenced by the properties of the ancestor itself.


Author Correction: Understanding epistatic networks in the B1 β-lactamases through coevolutionary statistical modeling and deep mutational scanning

November 2024

·

5 Reads

·

M. Bisardi

·

D. Lee

·

[...]

·


Exact full-RSB SAT/UNSAT transition in infinitely wide two-layer neural networks

October 2024

We analyze the problem of storing random pattern-label associations using two classes of continuous non-convex weights models, namely the perceptron with negative margin and an infinite width two layer neural network with non-overlapping receptive fields and generic activation function. Using a full-RSB ansatz we compute the exact value of the SAT/UNSAT transition. Furthermore, in the case of the negative perceptron model we show that, depending on the value of the margin and the constrained density, there is a line separating a phase in which the distribution of overlaps of typical states does not possess a gap from one in which it does. Our results show that the hypothesis underlying some recently developed theorems claiming that Approximate Message Passing (AMP) based algorithms are able to reach capacity, does not hold in general. Finally, we show that Gradient Descent is not able to reach the maximal capacity both in cases where there is and there is not a non-overlap gap phase for the typical states. This, similarly to what occurs in binary weight models, suggests that gradient-based algorithms are biased towards highly atypical states, whose inaccessibility determines the algorithmic threshold.


Family-wide residue mutability by direct coupling analysis (DCA)
a Schematic of computational and experimental workflow. The B1 β-lactamase enzyme family is isolated through sequence space exploration via a sequence similarity network. The entire set of sequences is used to generate a co-evolutionary model via DCA. Two highly diversified homologs within the family are selected for DMS to generate a large mutational dataset. b Each square in the heat map is colored by the predicted mutational tolerance (measured as context-dependent entropy, CDE) for each position of the 100 aligned homologs. Blank cells represent alignment gaps. The distribution of CDEs by position is presented as box plots (N=number of aligned homologs at position, exact count in source data) on top: 0–100% as whiskers, 25–75% (IQR) as bars. The bars are colored by the median value, with the same color scale as the heatmap. The secondary structure of a representative (VIM-2, PDB ID:5YD7), as well as the active site residues (circles), are depicted under the heat map;. The maximum likelihood phylogenetic tree of the 100 homologs is shown on the left of the heat map; arrows represent β-sheets, bars represent α-helices. c Distributions of the per position spread in mutational tolerance across homologs, measured as the distance (in bits) of the IQR or max-min mutational tolerance. Source data are provided as a Source Data file.
Overview of DMS for NDM-1 and structural similarity to VIM-2
a Workflow for DMS of NDM-1. b Correlation between replicates of the NDM-1 library selected at 256 µg/mL AMP. The R² and P-value of a linear regression are shown at the top. c Comparisons of mutational tolerance at each aligned position for the NDM-1 and VIM-2 experiments. The R² and line of best fit for a linear regression are shown. d Distribution of differences in mutational tolerance at the same aligned position in DMS or DCA between NDM-1 and VIM-2, or the difference in DCA between 100 random pairs of homologs. e Comparison of mutational tolerance between DMS and DCA for NDM-1 and VIM-2. The R² and line of best fit for a linear regression are shown. f Comparison of the mutational tolerance difference between NDM-1 and VIM-2 at each position between DMS and DCA. The data is colored by the IQR of mutational tolerance across the 102 homologs (100 + NDM-1 and VIM-2), with the colors scaled to the distribution (median is the center). The R² and line of best fit for a linear regression are shown. Source data are provided as a Source Data file.
Structural basis of tolerance classifications
a All homologs mutational tolerance data overlaid on the crystal structure of VIM-2 (5YD7), with the thickness of the backbone representing the IQR in the 102 homologs, and colored by the median. b DMS mutational tolerance data overlaid on the crystal structure of VIM-2, with the thickness of the backbone representing the absolute difference between NDM-1 and VIM-2, and colored by the average mutational tolerance. c DCA mutational tolerance of VIM-2 and NDM-1 overlaid on the crystal structure of VIM-2, with the thickness of the backbone representing the absolute difference, and colored by the average. d Scatter plot of the median mutational tolerance values of 102 homologs versus the average ASA of VIM-2 and NDM-1, with positions colored by the mutational tolerance IQR. e Scatter plot of the average DMS mutational tolerance of NDM-1 and VIM-2 DCA CDE versus the average ASA, with the positions colored by the difference in mutational tolerance between NDM-1 and VIM-2. f Same as panel e but for the DCA predictions: scatter plot of the average mutational tolerance of VIM-2 and NDM-1 versus their average ASA, colored by the mutational tolerance differences. In d–f, the ρ² and p-value of a Spearman correlation between x and y variables are shown. Source data are provided as a Source Data file.
Residue level epistasis between NDM-1 and VIM-2
a Flowchart of epistasis classification. b Correlation of DMS data between NDM-1 and VIM-2 at shared and compatible positions. Dashed lines highlight range of neutral fitness effects (1.96xstandard deviation (SD) of synonymous variants distribution of respective homolog, diagonal uses y-range), centered on 0 (axes) or the 1:1 line (diagonal). The fit line (linear correlation) is shown in black, with R² and p-value displayed. c Distribution of fitness effect differences between NDM-1 and VIM-2 at shared and compatible positions. Dashed lines represents the range of neutral fitness around 0 (1.96 × SD of synonymous variants for NDM-1). d Fraction of epistatic mutations (Nmut epistatic/Ntotal observed) by position overlaid on the VIM-2 structure (shared and compatible positions, others transparent), colored according to the lower right (with distribution). Thickness of the structure corresponds to the average ASA of the NDM-1 and VIM-2 crystal structures. e Plot of the fraction of epistatic mutations at each position versus the ASA. Only positions highlighted in panel d are included. f Scatter plot of fitness effects for reverting VIM-2 WT to NDM-1 WT (x-axis) and NDM-1 WT to VIM-2 WT (y-axis) for equivalent positions. The vertical dashed line indicates the lower bound of the region of neutral effects for VIM-2 based on the synonymous variant distribution, and the horizontal dashed line shows the lower bound of the region of neutral effects for NDM-1. Quadrants with different behavioral classes are colored as in a. g Differing WT positions between NDM-1 and VIM-2, colored by entrenchment class. The structure thickness corresponds to average ASA and regions outside the classification are transparent. h Scatter plot of ΔE for reverting VIM-2 WT to NDM-1 WT (x-axis) and NDM-1 WT to VIM-2 WT (y-axis) for equivalent positions, colored as in a. The dashed line represents the expected behavior without epistasis. i Bar plot showing the relative fraction of points in each epistatic class at various distances from the diagonal of h. The total number of points in each distance bin is shown on top. Source data are provided as a Source Data file.
Testing interactions of entrenched positions in NDM-1
a Example of potentially interacting entrenched WT positions in the crystal structures of NDM-1 and VIM-2. b Experimental scheme for testing single or combined mutational effects in the NDM-1 background. c Entrenched WT positions that were chosen for testing of epistatic interactions. Positions with the same color are mutated together to test for compensation of entrenchment; A204L overlaps 2 sets and is also tested with G192Y (red). d Plot of IQR in mutational tolerance across 102 homologs and the average ASA of NDM-1 and VIM-2 structures, with the tested positions highlighted. Tested combinations are shown as lines. e Scatterplot of DCA energy change of all selected double mutants, with the expected additive single mutant effects in the x-axis, and the observed double mutant effects in the y-axis. Dashed line indicates a 1:1 correlation. f Scatterplot of all tested double (1 triple, purple) mutants, with the expected additive single mutant effects in the x-axis, and the observed double mutant effects in the y-axis. Effects are calculated as fold change relative to wtNDM-1. The line of best fit for a linear correlation is shown in black, with the R² and p-value displayed. Source data are provided as a Source Data file.
Understanding epistatic networks in the B1 β-lactamases through coevolutionary statistical modeling and deep mutational scanning

September 2024

·

94 Reads

·

1 Citation

Throughout evolution, protein families undergo substantial sequence divergence while preserving structure and function. Although most mutations are deleterious, evolution can explore sequence space via epistatic networks of intramolecular interactions that alleviate the harmful mutations. However, comprehensive analysis of such epistatic networks across protein families remains limited. Thus, we conduct a family wide analysis of the B1 metallo-β-lactamases, combining experiments (deep mutational scanning, DMS) on two distant homologs (NDM-1 and VIM-2) and computational analyses (in silico DMS based on Direct Coupling Analysis, DCA) of 100 homologs. The methods jointly reveal and quantify prevalent epistasis, as ~1/3rd of equivalent mutations are epistatic in DMS. From DCA, half of the positions have a >6.5 fold difference in effective number of tolerated mutations across the entire family. Notably, both methods locate residues with the strongest epistasis in regions of intermediate residue burial, suggesting a balance of residue packing and mutational freedom in forming epistatic networks. We identify entrenched WT residues between NDM-1 and VIM-2 in DMS, which display statistically distinct behaviors in DCA from non-entrenched residues. Entrenched residues are not easily compensated by changes in single nearby interactions, reinforcing existing findings where a complex epistatic network compounds smaller effects from many interacting residues.


Emergent time scales of epistasis in protein evolution

September 2024

·

6 Reads

·

4 Citations

Proceedings of the National Academy of Sciences

We introduce a data-driven epistatic model of protein evolution, capable of generating evolutionary trajectories spanning very different time scales reaching from individual mutations to diverged homologs. Our in silico evolution encompasses random nucleotide mutations, insertions and deletions, and models selection using a fitness landscape, which is inferred via a generative probabilistic model for protein families. We show that the proposed framework accurately reproduces the sequence statistics of both short-time (experimental) and long-time (natural) protein evolution, suggesting applicability also to relatively data-poor intermediate evolutionary time scales, which are currently inaccessible to evolution experiments. Our model uncovers a highly collective nature of epistasis, gradually changing the fitness effect of mutations in a diverging sequence context, rather than acting via strong interactions between individual mutations. This collective nature triggers the emergence of a long evolutionary time scale, separating fast mutational processes inside a given sequence context, from the slow evolution of the context itself. The model quantitatively reproduces epistatic phenomena such as contingency and entrenchment, as well as the loss of predictability in protein evolution observed in deep mutational scanning experiments of distant homologs. It thereby deepens our understanding of the interplay between mutation and selection in shaping protein diversity and functions, allows one to statistically forecast evolution, and challenges the prevailing independent-site models of protein evolution, which are unable to capture the fundamental importance of epistasis.


Expanding the space of self-reproducing RNAs using probabilistic generative models

July 2024

·

58 Reads

Estimating the plausibility of RNA self-reproduction is central to origin-of-life scenarios but self-reproduction has been shown in only a handful of systems. Here, we populated a vast sequence space of ribozymes using statistical covariation models and secondary structure prediction. Experimentally assayed sequences were found active as far as 65 mutations from a reference natural sequence. The number of potentially generated sequences together with the experimental success rate indicate that at least 10 ³⁹ such ribozymes may exist. Randomly sampled artificial ribozymes exhibited autocatalytic self-reproduction akin to the reference sequence. The combination of high-throughput screening and probabilistic modeling considerably improves our estimation of the number of self-reproducing systems, paving the way for a statistical approach to the origin of life.


Nearest-Neighbours Neural Network architecture for efficient sampling of statistical physics models

July 2024

·

11 Reads

The task of sampling efficiently the Gibbs-Boltzmann distribution of disordered systems is important both for the theoretical understanding of these models and for the solution of practical optimization problems. Unfortunately, this task is known to be hard, especially for spin glasses at low temperatures. Recently, many attempts have been made to tackle the problem by mixing classical Monte Carlo schemes with newly devised Neural Networks that learn to propose smart moves. In this article we introduce the Nearest-Neighbours Neural Network (4N) architecture, a physically-interpretable deep architecture whose number of parameters scales linearly with the size of the system and that can be applied to a large variety of topologies. We show that the 4N architecture can accurately learn the Gibbs-Boltzmann distribution for the two-dimensional Edwards-Anderson model, and specifically for some of its most difficult instances. In particular, it captures properties such as the energy, the correlation function and the overlap probability distribution. Finally, we show that the 4N performance increases with the number of layers, in a way that clearly connects to the correlation length of the system, thus providing a simple and interpretable criterion to choose the optimal depth.



Unlearning regularization for Boltzmann machines

June 2024

·

27 Reads

·

4 Citations

Boltzmann machines (BMs) are graphical models with interconnected binary units, employed for the unsupervised modeling of data distributions. When trained on real data, BMs show the tendency to behave like critical systems, displaying a high susceptibility of the model under a small rescaling of the inferred parameters. This behavior is not convenient for the purpose of generating data, because it slows down the sampling process, and induces the model to overfit the training-data. In this study, we introduce a regularization method for BMs to improve the robustness of the model under rescaling of the parameters. The new technique shares formal similarities with the unlearning algorithm, an iterative procedure used to improve memory associativity in Hopfield-like neural networks. We test our unlearning regularization on synthetic data generated by two simple models, the Curie–Weiss ferromagnetic model and the Sherrington–Kirkpatrick spin glass model. We show that it outperforms Lp -norm schemes and discuss the role of parameter initialization. Eventually, the method is applied to learn the activity of real neuronal cells, confirming its efficacy at shifting the inferred model away from criticality and coming out as a powerful candidate for actual scientific implementations.


Citations (50)


... Yet, the sequence diversity of natural evolution still remains out of reach of such experiments, thus leaving an unexplored gap in evolutionary time scales. In order to fill this gap, one can simulate the evolution of protein sequences in silico [26][27][28][29][30], relying on the data-driven approach that goes under the name of Direct Coupling Analysis (DCA) [31,32], in which a fitness landscape (analogous to an energy function in the statistical physics vocabulary) is inferred starting from a Multiple Sequence Alignment (MSA) of natural homologs constituting a given protein family [33][34][35][36][37]. The energy function that results from this inference procedure can then be used to assign a probability to each sequence. ...

Reference:

Fluctuations and the limit of predictability in protein evolution
Emergent time scales of epistasis in protein evolution
  • Citing Article
  • September 2024

Proceedings of the National Academy of Sciences

... To further illustrate the absence of directional memory in the rheological response of acid-induced CMC gels (Divoux et al. 2024), we perform a strainsweep experiment on the same 3% CMC gel first by increasing the strain amplitude 0 before immediately decreasing it. The result is shown in Fig. 6b. ...

Ductile-to-brittle transition and yielding in soft amorphous materials: perspectives and open questions

Soft Matter

... Pasimeni (2022) stated that the effectiveness of monetary policy reducing inflation is directly proportional to the relative importance of demand factors in driving price pressure. Knicker, et al. (2024) in addition emphasized that without appropriate fiscal policy, the shocked economy can take years to recover, or it can even tip over into a deep recession, and the success of monetary policy reducing inflation depends not only on the direct economic impact of interest rate hikes, but also on customers' expectation anchoring. Szabó and Jančovič (2022) also supplemented the set of main inflation drivers arguing that inflation expectations determine inflation dynamics strongly and statistically significantly. ...

Post-COVID inflation and the monetary policy dilemma: an agent-based scenario analysis

Journal of Economic Interaction and Coordination

... Successively, in Section V we will test the algorithm on a larger amount of data in order to show that it can very well learn a model that approximates their probability distribution emulating the Boltzmann Machine mechanism [5]. More insights about the use of the Unlearning rule for the generative task are reported in [12]. The Conclusion of the paper will present some open problems left by our investigation, highlighting possible useful contributions to the domains of Biology, Computer Science and Physics. ...

Unlearning regularization for Boltzmann machines

... The three DBD sequences that correspond to the blue, green, and red colors are identified, respectively, with the code Figure S1. Evolution of χ bg , χ dyn and χtot as a function of Monte Carlo sweeps using a sparse edge-activated version [56] of the Potts model (a) and using the dynamics of Ref. [30] implementing insertions (b), deletions and single nucleotide substitutions for the DBD family. ...

Towards parsimonious generative modeling of RNA families

Nucleic Acids Research

... Several mean-field arguments [27,28] and phenomenological studies [29,30] support the quartic scaling D(ω) ∼ ω 4 for the distribution of low-frequency QLEs in disordered solids. However, effective medium theories, such as the "Fluctuating Elasticity Theory", propose deviations from the D(ω) ∼ ω 4 law [31][32][33]. While various aspects of the quartic law have been tested in simulations of amorphous solids, several studies have also observed deviations from this behavior. ...

The nature of non-phononic excitations in disordered systems

... Computer simulations are a promising tool to elucidate this conjecture. However, drawing more quantitative conclusions for molecular liquids will require models that more closely resemble molecular liquids, e.g. by reducing the polydispersity [138] or implementing rotational degrees of freedom [139,140]. ...

Creating equilibrium glassy states via random particle bonding

Journal of Statistical Mechanics Theory and Experiment

... Jamming Results-As generally expected for systems with a complex landscape [49][50][51], different optimization algorithms reach IS at different "depths". Although the maximal radius r IS achieved when starting within the basin of attraction of a particular IS is geometrically fixed, the probability of ending in that basin is algorithm dependent, as in Eq. (5). ...

On weak ergodicity breaking in mean-field spin glasses

SciPost Physics

... We note that the established approaches used to estimate heat-transport lengthscales in disordered solids -based on the Dynamical Structure Factor (DSF) [63] and its vibrational extension [64], or velocity-current correlations [52] -all rely on the approximate identification of a blurred band structure. This identification is often (albeit not always [91]) possible at low frequency, but in general prohibitively challenging at high frequency in structurally disordered materials where disorder cannot be directly obtained from a reference crystalline structure [92]. A paradigmatic example is given by crystalline and strongly irradiated graphite, whose very different densities and structural properties do not allow to find a direct and univocal mapping relating the atoms of one structure to those in the other. ...

Finding defects in glasses through machine learning

... While there are several other parameters needed to define completely Mark-0 (see e.g. [53] for the most recent version), the numerical investigation of [47] has suggested that parameters R and Θ play a particularly important role in determining the aggregate behaviour, as illustrated in phase-diagram of the shown in Fig. 1. As expected from the general intuition gained from the study of physical systems, [47]. ...

Post-COVID Inflation & the Monetary Policy Dilemma: An Agent-Based Scenario Analysis
  • Citing Article
  • January 2023

SSRN Electronic Journal