[Show abstract][Hide abstract] ABSTRACT: Proteins are essential components of living systems, capable of performing a
huge variety of tasks at the molecular level, such as recognition, signalling,
copy, transport, ... The protein sequences realizing a given function may
largely vary across organisms, giving rise to a protein family. Here, we
estimate the entropy of those families based on different approaches, including
Hidden Markov Models used for protein databases and inferred statistical models
reproducing the low-order (1-and 2-point) statistics of multi-sequence
alignments. We also compute the entropic cost, that is, the loss in entropy
resulting from a constraint acting on the protein, such as the fixation of one
particular amino-acid on a specific site, and relate this notion to the escape
probability of the HIV virus. The case of lattice proteins, for which the
entropy can be computed exactly, allows us to provide another illustration of
the concept of cost, due to the competition of different folds. The relevance
of the entropy in relation to directed evolution experiments is stressed.
Full-text · Article · Dec 2015 · Journal of Statistical Physics
[Show abstract][Hide abstract] ABSTRACT: Recent studies have demonstrated abundant transcription of a set of noncoding RNAs (ncRNAs) preferentially within tumors as opposed to normal tissue. Using an approach from statistical physics, we quantify global transcriptome-wide motif use for the first time, to our knowledge, in human and murine ncRNAs, determining that most have motif use consistent with the coding genome. However, an outlier subset of tumor-associated ncRNAs, typically of recent evolutionary origin, has motif use that is often indicative of pathogen-associated RNA. For instance, we show that the tumor-associated human repeat human satellite repeat II (HSATII) is enriched in motifs containing CpG dinucleotides in AU-rich contexts that most of the human genome and human adapted viruses have evolved to avoid. We demonstrate that a key subset of these ncRNAs functions as immunostimulatory "self-agonists" and directly activates cells of the mononuclear phagocytic system to produce proinflammatory cytokines. These ncRNAs arise from endogenous repetitive elements that are normally silenced, yet are often very highly expressed in cancers. We propose that the innate response in tumors may partially originate from direct interaction of immunogenic ncRNAs expressed in cancer cells with innate pattern recognition receptors, and thereby assign a previously unidentified danger-associated function to a set of dark matter repetitive elements. These findings potentially reconcile several observations concerning the role of ncRNA expression in cancers and their relationship to the tumor microenvironment.
Preview · Article · Nov 2015 · Proceedings of the National Academy of Sciences
[Show abstract][Hide abstract] ABSTRACT: Despite the biological importance of non-coding RNA, their structural characterization remains challenging. Making use of
the rapidly growing sequence databases, we analyze nucleotide coevolution across homologous sequences via Direct-Coupling
Analysis to detect nucleotide-nucleotide contacts. For a representative set of riboswitches, we show that the results of Direct-Coupling
Analysis in combination with a generalized Nussinov algorithm systematically improve the results of RNA secondary structure
prediction beyond traditional covariance approaches based on mutual information. Even more importantly, we show that the results
of Direct-Coupling Analysis are enriched in tertiary structure contacts. By integrating these predictions into molecular modeling
tools, systematically improved tertiary structure predictions can be obtained, as compared to using secondary structure information
Full-text · Article · Sep 2015 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: The maximum entropy principle (MEP) is a very useful working hypothesis in a wide variety of inference problems, ranging from biological to engineering tasks. To better understand the reasons of the success of MEP, we propose a statistical-mechanical formulation to treat the space of probability distributions constrained by the measures of (experimental) observables. In this paper we first review the results of a detailed analysis of the simplest case of randomly chosen observables. In addition, we investigate by numerical and analytical means the case of smooth observables, which is of practical relevance. Our preliminary results are presented and discussed with respect to the efficiency of the MEP.
No preview · Article · Sep 2015 · Journal of Physics Conference Series
[Show abstract][Hide abstract] ABSTRACT: The spontaneous transitions between D-dimensional spatial maps in an
attractor neural network are studied. Two scenarios for the transition from on
map to another are found, depending on the level of noise: (1) through a mixed
state, partly localized in both maps around positions where the maps are most
similar; (2) through a weakly localized state in one of the two maps, followed
by a condensation in the arrival map. Our predictions are confirmed by
numerical simulations, and qualitatively compared to recent recordings of
hippocampal place cells during quick-environment-changing experiments in rats.
[Show abstract][Hide abstract] ABSTRACT: The mean-field (MF) approximation offers a simple, fast way to infer direct interactions between elements in a network of correlated variables, a common, computationally challenging problem with practical applications in fields ranging from physics and biology to the social sciences. However, MF methods achieve their best performance with strong regularization, well beyond Bayesian expectations, an empirical fact that is poorly understood. In this work, we study the influence of pseudocount and L[subscript 2]-norm regularization schemes on the quality of inferred Ising or Potts interaction networks from correlation data within the MF approximation. We argue, based on the analysis of small systems, that the optimal value of the regularization strength remains finite even if the sampling noise tends to zero, in order to correct for systematic biases introduced by the MF approximation. Our claim is corroborated by extensive numerical studies of diverse model systems and by the analytical study of the m-component spin model for large but finite m. Additionally, we find that pseudocount regularization is robust against sampling noise and often outperforms L[subscript 2]-norm regularization, particularly when the underlying network of interactions is strongly heterogeneous. Much better performances are generally obtained for the Ising model than for the Potts model, for which only couplings incoming onto medium-frequency symbols are reliably inferred.
[Show abstract][Hide abstract] ABSTRACT: We consider the problem of learning a target probability distribution over a
set of $N$ binary variables from the knowledge of the expectation values (with
this target distribution) of $M$ observables, drawn uniformly at random. The
space of all probability distributions compatible with these $M$ expectation
values within some fixed accuracy, called version space, is studied. We
introduce a biased measure over the version space, which gives a boost
increasing exponentially with the entropy of the distributions and with an
arbitrary inverse `temperature' $\Gamma$. The choice of $\Gamma$ allows us to
interpolate smoothly between the unbiased measure over all distributions in the
version space ($\Gamma=0$) and the pointwise measure concentrated at the
maximum entropy distribution ($\Gamma \to \infty$). Using the replica method we
compute the volume of the version space and other quantities of interest, such
as the distance $R$ between the target distribution and the center-of-mass
distribution over the version space, as functions of $\alpha=(\log M)/N$ and
$\Gamma$ for large $N$. Phase transitions at critical values of $\alpha$ are
found, corresponding to qualitative improvements in the learning of the target
distribution and to the decrease of the distance $R$. However, for fixed
$\alpha$, the distance $R$ does not vary with $\Gamma$, which means that the
maximum entropy distribution is not closer to the target distribution than any
other distribution compatible with the observable values. Our results are
confirmed by Monte Carlo sampling of the version space for small system sizes
Full-text · Article · Mar 2015 · Journal of Statistical Physics
[Show abstract][Hide abstract] ABSTRACT: This article is about resettled Afghan Hazaras in Australia, many of whom are currently undergoing a complex process of transition (from transience into a more stable position) for the first time in their lives. Despite their permanent residency status, we show how resettlement can
be a challenging transitional experience. For these new migrants, we argue that developing a sense of belonging during the transition period is a critical rite of passage in the context of their political and cultural identity. A study of forced migrants such as these, moving out of one transient
experience into another transitional period (albeit one that holds greater promise and permanence) poses a unique intellectual challenge. New understandings about the ongoing, unpredictable consequences of ‘transience’ for refugee communities is crucial as we discover what might
be necessary, as social support structures, to facilitate the process of transition into a distinctly new environment. The article is based on a doctoral ethnographic study of 30 resettled Afghan Hazara living in the region of Dandenong in Melbourne, Australia. Here, we include four of these
participants’ reflections of transition during different phases of their resettlement. These reflections were particularly revealing of the ways in which some migrants deal with change and acquire a sense of belonging to the community. Taking a historical view, and drawing on Bourdieu’s
notion of symbolic social capital to highlight themes in individual experiences of belonging, we show how some new migrants adjust and learn to ‘embody’ their place in the new country. Symbolic social capital illuminates how people access and use resources such as social networks
as tools of empowerment, reflecting how Hazara post-arrival experiences are tied to complex power relations in their everyday social interactions and in their life trajectories as people in transition. We learned that such tools can facilitate the formation of Hazara migrant identities and
are closely tied to their civic community participation, English language development, and orientation in, as well as comprehension of local cultural knowledge and place. This kind of theorization allows refugee, post-refugee and recent migrant narratives to be viewed not merely as static
expressions of loss, trauma or damage, but rather as individual experiences of survival, adaptation and upward mobility.
[Show abstract][Hide abstract] ABSTRACT: Experiments indicate that unbinding rates of proteins from DNA can depend on
the concentration of proteins in nearby solution. Here we present a theory of
multi-step replacement of DNA-bound proteins by solution-phase proteins. For
four different kinetic scenarios we calculate the depen- dence of protein
unbinding and replacement rates on solution protein concentration. We find (1)
strong effects of progressive 'rezipping' of the solution-phase protein onto
DNA sites liberated by 'unzipping' of the originally bound protein; (2) that a
model in which solution-phase proteins bind non-specifically to DNA can
describe experiments on exchanges between the non specific DNA- binding
proteins Fis-Fis and Fis-HU; (3) that a binding specific model describes
experiments on the exchange of CueR proteins on specific binding sites.
Preview · Article · May 2014 · Physical Review Letters
[Show abstract][Hide abstract] ABSTRACT: We outline a theory to quantify the interplay of entropic and selective forces on nucleotide organization and apply it to the genomes of single-stranded RNA viruses. We quantify these forces as intensive variables that can easily be compared between sequences, outline a computationally efficient transfer-matrix method for their calculation, and apply this method to influenza and HIV viruses. We find viruses altering their dinucleotide motif use under selective forces, with these forces on CpG dinucleotides growing stronger in influenza the longer it replicates in humans. For a subset of genes in the human genome, many involved in antiviral innate immunity, the forces acting on CpG dinucleotides are even greater than the forces observed in viruses, suggesting that both effects are in response to similar selective forces involving the innate immune system. We further find that the dynamics of entropic forces balancing selective forces can be used to predict how long it will take a virus to adapt to a new host, and that it would take H1N1 several centuries to adapt to humans from birds, typically contributing many of its synonymous substitutions to the forcible removal of CpG dinucleotides. By examining the probability landscape of dinucleotide motifs, we predict where motifs are likely to appear using only a single-force parameter and uncover the localization of UpU motifs in HIV. Essentially, we extend the natural language and concepts of statistical physics, such as entropy and conjugated forces, to understanding viral sequences and, more generally, constrained genome evolution.
Preview · Article · Mar 2014 · Proceedings of the National Academy of Sciences
[Show abstract][Hide abstract] ABSTRACT: Two methods for reconstructing the free-energy landscape of a DNA molecule from the knowledge of the equilibrium unzipping force versus extension signal are introduced: a simple and fast procedure, based on a parametric representation of the experimental force signal, and a maximum-likelihood inference of coarse-grained free-energy parameters. In addition, we propose a force alignment procedure to correct for the drift in the experimental measure of the opening position, a major source of error. For unzipping data obtained by Huguet et al., the reconstructed basepair (bp) free energies agree with the running average of the true free energies on a 20-50 bp scale, depending on the region in the sequence. Features of the landscape at a smaller scale (5-10 bp) could be recovered in favorable regions at the beginning of the molecule. Based on the analysis of synthetic data corresponding to the 16S rDNA gene of bacteria, we show that our approach could be used to identify specific DNA sequences among thousands of homologous sequences in a database.
Preview · Article · Jan 2014 · Biophysical Journal
[Show abstract][Hide abstract] ABSTRACT: The dynamical evolution of complex systems is often intrinsically stochastic and subject to external random forces. In order to study the resulting variability in dynamics, it is essential to make measurements on replicate systems and to separate arbitrary variation of the average dynamics of these replicates from fluctuations around the average dynamics. Here we do so for population time-series data from replicate ecosystems sharing a common average dynamics or common trend. We explain how model parameters, including the effective interactions between species and dynamical noise, can be estimated from the data and how replication reduces errors in these estimates. For this, it is essential that the model can fit a variety of average dynamics. We then show how one can judge the quality of a model, compare alternate models, and determine which combinations of parameters are poorly determined by the data. In addition we show how replicate population dynamics experiments could be designed to optimize the acquired information of interest about the systems. Our approach is illustrated on a set of time series gathered from replicate microbial closed ecosystems.
Full-text · Article · Dec 2013 · Physical Review E
[Show abstract][Hide abstract] ABSTRACT: We consider the Hopfield-Potts model for the covariation between residues in protein families recently introduced in Cocco, Monasson, Weigt (2013). The patterns of the model are inferred from the data within a new gauge, more symmetric in the residues. We compute the statistical error bars on the pattern components. Results are illustrated on real data for a response regulator receiver domain (Pfam ID PF00072) family.
No preview · Article · Nov 2013 · Journal of Physics Conference Series
[Show abstract][Hide abstract] ABSTRACT: Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant 'patterns' of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold.
Preview · Article · Aug 2013 · PLoS Computational Biology
[Show abstract][Hide abstract] ABSTRACT: We study the stable phases of an attractor neural network model, with binary units, for hippocampal place cells encoding one-dimensional (1D) or 2D spatial maps or environments. Different maps correspond to random allocations (permutations) of the place fields. Based on replica calculations we show that, below critical levels for the noise in the neural response and for the number of environments, the network activity is spatially localized in one environment. For high noise and loads the network activity extends over space, either uniformly or with spatial heterogeneities due to the crosstalk between the maps, and memory of environments is lost. Remarkably the spatially localized regime is very robust against the neural noise until it reaches its critical level. Numerical simulations are in excellent quantitative agreement with our theoretical predictions.
Full-text · Article · Jun 2013 · Physical Review E