ArticlePDF Available

Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein


Abstract and Figures

The distribution of fitness effects of protein mutations is still unknown. Of particular interest is whether accumulating deleterious mutations interact, and how the resulting epistatic effects shape the protein's fitness landscape. Here we apply a model system in which bacterial fitness correlates with the enzymatic activity of TEM-1 beta-lactamase (antibiotic degradation). Subjecting TEM-1 to random mutational drift and purifying selection (to purge deleterious mutations) produced changes in its fitness landscape indicative of negative epistasis; that is, the combined deleterious effects of mutations were, on average, larger than expected from the multiplication of their individual effects. As observed in computational systems, negative epistasis was tightly associated with higher tolerance to mutations (robustness). Thus, under a low selection pressure, a large fraction of mutations was initially tolerated (high robustness), but as mutations accumulated, their fitness toll increased, resulting in the observed negative epistasis. These findings, supported by FoldX stability computations of the mutational effects, prompt a new model in which the mutational robustness (or neutrality) observed in proteins, and other biological systems, is due primarily to a stability margin, or threshold, that buffers the deleterious physico-chemical effects of mutations on fitness. Threshold robustness is inherently epistatic-once the stability threshold is exhausted, the deleterious effects of mutations become fully pronounced, thereby making proteins far less robust than generally assumed.
Content may be subject to copyright.
epistasis link shapes the fitness
landscape of a randomly drifting protein
Shimon Bershtein
, Michal Segal
, Roy Bekerman
, Nobuhiko Tokuriki
& Dan S. Tawfik
The distribution of fitness effects of protein mutations is still
. Of particular interest is whether accumulating dele-
terious mutations interact, and how the resulting epistatic effects
shape the protein’s fitness landscape. Here we apply a model sys-
tem in which bacterial fitness correlates with the enzymatic activ-
ity of TEM-1 b-lactamase (antibiotic degradation). Subjecting
TEM-1 to random mutational drift and purifying selection (to
purge deleterious mutations) produced changes in its fitness land-
scape indicative of negative epistasis; that is, the combined dele-
terious effects of mutations were, on average, larger than expected
from the multiplication of their individual effects. As observed in
computational systems
, negative epistasis was tightly associated
with higher tolerance to mutations (robustness). Thus, under a
low selection pressure, a large fraction of mutations was initially
tolerated (high robustness), but as mutations accumulated, their
fitness toll increased, resulting in the observed negative epistasis.
These findings, supported by FoldX stability computations of the
mutational effects
, prompt a new model in which the mutational
robustness (or neutrality) observed in proteins, and other bio-
logical systems, is due primarily to a stability margin, or threshold,
that buffers the deleterious physico-chemical effects of mutations
on fitness. Threshold robustness is inherently epistatic—once the
stability threshold is exhausted, the deleterious effects of muta-
tions become fully pronounced, thereby making proteins far less
robust than generally assumed.
Interactions between mutations have been studied by a number of
fields, although nomenclature differs; geneticists term interactions
within, or between genes, as ‘epistasis’
, whereas protein biophysicists
use ‘double mutant cycles’
to describe intragenic interactions.
Because the ultimate impact of mutations depends both on the effect
of each mutation on fitness and on the interactions between accu-
mulating mutations, insights from these different disciplines can
nevertheless be combined in one model
Taking the converse view, if deleterious mutations do not interact,
then under no selection, fitness (or the stability, or activity, of a
protein) should decline exponentially:
5 exp(2an)
where n is the number of mutations and a is the exponential decline
. If, however, deleterious mutations interact so that their
combined impact on fitness is greater than expected from multiply-
ing their individual effects (or more than additive in terms of logW,
or DDG for protein stability and function), fitness decline would
accelerate with the accumulation of mutations, giving rise to ‘nega-
tive epistasis’:
5 exp(2an 2 bn
where a reflects the fraction of multiplicative deleterious mutations
and b is the epistasis parameter
Because neither of these crucial factors (a, b) is quantitatively
understood—particularly at the level of a single protein
—we set
up an experimental system that measured the fitness landscape of a
protein subjected to a lengthy random drift (up to 20 mutations per
gene). We derived the exponential decline and epistasis parameters
and examined their impact on the rate and dynamics of mutation
As our model protein we chose TEM-1 b-lactamase, an enzyme
that degrades the antibiotic ampicillin and thereby confers ampicillin
resistance on Gram-negative bacteria such as Escherichia coli. The
fitness of a TEM-1 variant is directly correlated with the maximal
concentration of ampicillin that E. coli carrying this gene variant can
tolerate. At the level of a population, fitness (W) refers to the fraction
of variants that confer resistance at a given concentration of ampi-
cillin. By determining the fraction of viable variants over the entire
range of ampicillin concentrations, a fitness landscape was obtained
(Supplementary Fig. 1).
The TEM-1 gene was cloned into a plasmid (as it occurs in nature)
under its endogenous promoter. Recloning after each round of muta-
genesis confined the mutational drift to the open reading frame of
TEM-1. Our in vitro random mutagenesis protocol was optimized for
high reproducibility and was calibrated to obtain, on average, two
mutations per gene per round of mutagenesis. We maintained three
populations of randomly drifting TEM-1 genes: one population
under no selection (Lib0), and the rest under purifying selection at
‘high’ and ‘low’ stringencies. Each population, or plasmid library,
was separately mutated, ligated into an empty vector and trans-
formed into E. coli host cells; it then underwent purifying selection:
‘high’ selection pressure (250 mgml
ampicillin; Lib250; Supple-
mentary Fig. 2), and ‘low’ selection pressure (12.5 mgml
lin; Lib12.5). After growth on selection plates, plasmid DNA was
extracted from the surviving E. coli colonies, and the TEM-1 genes
were subjected to the next round of mutagenesis. Altogether, ten
successive rounds of mutagenesis and purifying selection were per-
formed. Loss of diversity was less than 50% per round, and a diversity
of at least 10
variants per library was maintained throughout.
As expected, a rapid fitness decline was observed in Lib0 (no selec-
tion). The fitness of the selected populations (Lib12.5 and Lib250)
remained unchanged under the threshold of selection, and decreased
above that threshold (Supplementary Fig. 3). Fitness values (W)of
the randomly drifting population (Lib0) were plotted as a function of
the mutational load n. These plots showed consistent and significant
deviations from simple exponential decays. This was particularly
true at low fitness levels; indeed, at less than 1,000 mgml
the data could only be fitted reliably to equation (2) (Fig. 1). The
‘net’ effect of deleterious mutations accumulating randomly in the
Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel.
Vol 444
14 December 2006
TEM-1 gene was therefore found to be synergistic (b . 0; negative
The a and b parameters deduced from fitness decay plots for all
ampicillin concentrations (Supplementary Fig. 4) are plotted in
Fig. 2a. The exponential decay parameter of TEM-1 (a) increased
almost linearly with fitness level. Extrapolating a to zero ampicillin
indicated that the fraction of TEM-1 mutations that are uncondition-
ally lethal is about 7% (Fig. 2a), corresponding to mutations that
severely undermine the stability of TEM-1 or abolish its catalytic
activity. The degree of negative epistasis was tightly correlated with
fitness levels; a higher decline in fitness (larger a values) gave rise to
weaker epistasis (b/a) (Fig. 2b). It therefore seems that when high
fitness levels are maintained, the fraction of lethal mutations is also
high (large a, low robustness). However, the remaining fraction of
mutations is largely neutral, and their epistatic potential is minimal
(b/a < 0). Conversely, under low fitness levels a much larger fraction
of mutations is tolerated (small a, high robustness), but these exhibit
large negative epistatic effects (b/a ? 0).
In terms of protein stability and function (DDG), the negative
epistasis observed here implies that TEM-1 mutations, on average,
interact in a more-than-additive manner. However, this result does
contradict our current knowledge of proteins. ‘Double mutant
is a commonly used tool to dissect proteins, where the effects
of different mutations on various physico-chemical properties (ther-
modynamic stability, binding affinity or enzymatic rates) are mea-
sured on their own and together
. They therefore comprise a measure
of epistasis. Numerous cycles performed on many different proteins
indicate that the physico-chemical effects of deleterious mutations
(in terms of DDG, or logW) are mostly additive, or less than additive
for interacting residues. Effects that are more than additive are rare
. We therefore surmise that the negative epistasis
observed here is the outcome of a non-additive decline in fitness,
in response to additive physico-chemical changes (see also ref. 2).
To examine this hypothesis we assumed that most deleterious
mutations had undermined stability, thereby reducing the levels of
soluble, active protein
. We then applied the FoldX
algorithm to
predict the stability changes induced by mutations in the drifting
TEM-1 populations. FoldX analysis indicated that the destabilizing
mutations frequently observed in the unselected library Lib0 were
purged by the purifying selections (Fig. 3a). Initially (as indicated
by sampling of the fifth round), mutations with considerable desta-
bilizing effects (DDG # 6.5 kcal mol
) could accumulate in Lib12.5,
whereas in Lib250 only mutations exhibiting DDG # 3.5 kcal mol
were tolerated (a higher DDG value corresponds to more destabiliz-
ing mutations). However, by the tenth round, only weakly destabiliz-
ing mutations (DDG # 3.5 kcal mol
) accumulated in both
populations (Fig. 3b).
The FoldX analysis therefore indicated the existence of a ‘neutral’
region in which changes in stability had only a mild effect on fitness,
and, notably, that the width of this region changed with fitness level.
As more mutations accumulated, this threshold was exhausted, and
the tolerance to mutations under both regimes became essentially the
same (Fig. 3b). This analysis is also in line with the finding that the
percentage of tolerated mutations within Lib12.5 was high at the
beginning, and decreased during subsequent rounds (Table 1 and
Supplementary Fig. 5). By the fifth round of Lib12.5, an average of
3.6 non-synonymous mutations per gene had accumulated, whereas
only 1.9 mutations were added over the subsequent five rounds. In
contrast, Lib250 proceeded at a steady rate (2.2 non-synonymous
0 0.2 0.30.1
Directional epistasis (
Decay parameter,
0.41,000 1,500 2,000500
Concentration of ampicillin (µg ml
Figure 2
The correlation between mutational robustness and negative
epistasis. a
, The decay parameter a (filled circles) and directional epistasis
b/a (open circles) were extracted from the fitness decline fits of Lib0 for each
ampicillin concentration (Fig. 1, and Supplementary Fig. 4). These
parameters are presented as a function of ampicillin concentration,
corresponding to the fitness levels in our system. The vertical dotted lines
show the fitness levels for the purifying selections.
b, The correlation
between mutational robustness (higher a values indicate lower tolerance to
mutations, and hence lower robustness) and epistasis. The same correlation
was observed when an alternative measure of epistasis
was applied
(Supplementary Fig. 7).
verage number o
f mutations per gene, n
Fitness, W
Figure 1
Negative epistasis underlines the random drift of TEM-1. The
fitness decline of unselected library Lib0 at the highest fitness level
(2,500 mgml
ampicillin; filled circles) is exponential and fits equation (1)
with a 5 0.371 (lower black line). At the lowest ampicillin concentration
(12.5 mgml
; open circles) the data fit equation (2) well with a 5 0.072 and
b 5 0.009 (upper black line), but poorly to equation (1) (a 5 0.145; broken
line). Error bars represent s.d. for two to five independent measurements of
Vol 444
14 December 2006
mutations at the fifth round, and 4.6 at the tenth round). Indeed, the
average numbers of mutations found in Lib12.5 and Lib250 by the
tenth round of selection were nearly identical, despite a 20-fold dif-
ference in the stringency of selection.
It therefore seems that mutational robustness (or neutrality)
should be described with two terms: ‘threshold’ and ‘gradient (Fig. 3c).
A ‘threshold’ induces a delay in fitness decline in the face of muta-
tions. The threshold is observed not because the protein is unaffected
by mutation but because the decline in its thermodynamic stability is
buffered and has no immediate effect on fitness
. As mutations accu-
mulate, however, the threshold is largely exhausted, and a gradient
phase appears in which the fitness declines in parallel with DDG
changes. Because the robustness observed under lower fitness levels
is largely due to a wider threshold, it is also inevitably epistatic (the
fitness toll of later mutations is higher than that of the early ones). In
fact, the manner in which a (robustness) and b (negative epistasis)
correlate in our experiments (Fig. 2b) is in striking agreement with
both theory
and simulations
. The origins of our observed nega-
tive epistasis, and its correlation with robustness, therefore suggest
that ‘threshold robustness’ is a general phenomenon that goes well
beyond TEM-1.
Two conclusions can be derived from our results and model.
First, proteins may not be as robust as is generally assumed. So far,
experiments have only tested the response to several random muta-
tions and have reported an exponential decline in activity with no
. The tolerance to mutations observed in these experi-
ments is therefore likely to contain a major component of threshold
robustness, especially as they were performed at low fitness
We, too, observed high levels of mutation tolerance under low fitness
levels. Indeed, over the first five rounds of mutagenesis (70% of total
mutations and 55% of non-synonymous mutations; Table 1), only
7% of mutations were unconditionally deleterious. These figures are
in agreement with the common view that the vast majority of protein
mutations are tolerated
. Yet once the stability threshold is
exhausted, tolerance to mutations under the gradient regime is mark-
edly lower and fits the view that most mutations affect fitness
Second, our model provides general insights into the origins of
robustness and epistasis, and why the two are interlinked
Threshold robustness relates to the margin of excess stability and
function that could be compromised with little or no immediate
effect on fitness, and is therefore inherently epistatic. Indeed, a quant-
itative correlation between the threshold (in DDG terms) and dir-
ectional epistasis (b/a) has been observed in our system and will be
described in future work. Threshold robustness would be the type
of robustness derived from redundancy at the genome level (for
example duplicated genes)
, from stabilizing mutations
or from
global suppressors in single genes
(Fig. 3c). It has been shown, for
example, that TEM-1 carrying a global suppressor mutation that
increased stability by 2.7 kcal mol
shows higher robustness rela-
tive to the wild type
(a 5 0.042 versus 0.152 for wild type, by equa-
tion (1)). However, when we applied our model (equation (2)), we
also recorded a much higher epistasis (b/a 5 0.598 versus 0.144;
Supplementary Fig. 6).
Thus, theory and simulations
have predicted a tight correlation
between robustness and epistasis. Our work provides an experi-
mental verification of this correlation and proposes a mechanism
that accounts for it. Our model implies that any biological system
that exhibits threshold robustness, or redundancy robustness
inevitably epistatic. In such systems, mechanisms that purge poten-
tially deleterious mutations, such as recombination (through sexual
reproduction and other mechanisms
) are of crucial importance,
as they help to maintain this threshold. In this way, recombination,
threshold robustness and negative epistasis may be interlinked—each
being an inevitable by-product of the other
Table 1
Sequence data of the TEM-1 libraries
Library Round of
mutagenesis and
No. of variants (base pairs)
Total no. of
mutations detected
Average no. of mutations
per gene
Average no. of non-synonymous
mutations per gene
Percentage of tolerated mutations
Total Non-synonymous
Lib0 110(8,610) 16 1.6 6 1.41.11 6 1.0
221(18,081) 81 3.9 6 1.62.67 6 1.1
35(4,305) 25 5.0 6 0.73.46 6 0.5
517(14,637) 161 9.5 6 3.76.55 6 2.6
10 14 (12,054) 290 20.7 6 2.814.3 6 1.9
216(13,776) 52 3.3 6 1.61.6 6 1.284% 61%
539(33,579) 255 6.6 6 3.03.6 6 1.770% 55%
10 17 (14,637) 171 10.7 6 3.55.5 6 2.252% 39%
Lib250 214(12,054) 24 1.7 6 1.21.3 6 0.944% 48%
539(33,579) 193 5.0 6 2
.22.2 6 1.552% 33%
10 18 (15,498) 170 9.4 6 3.04.6 6 2. 246% 32%
Where errors are shown, results are means 6 s.d.
Frequency (%)
ab c
∆∆G (kcal mol
∆∆G (kcal mol
3.2 4.8 6.4 8.0 1.6 3.2 4.8 6.4 8.0
Figure 3
The physico-chemical and fitness changes accompanying
random drifts. a
, b, The stability changes induced by 980 mutations
identified in the three drifting populations were individually computed with
FoldX. The calculated DDG values were arranged in 1 kcal mol
Plotted are the frequencies of all destabilizing mutations found in the fifth
round (
a) and tenth round (b) for the unselected library (Lib0, black), and
the two selected libraries (Lib12.5 (blue) and Lib250 (red)).
c, Our model
surmises that DDG (physico-chemical changes relative to wild type, and in
particular stability) declines linearly with the number of mutations
/ n). However, the decline in fitness W
that accompanies these
DDG changes is nonlinear and comprises both a threshold (dashed two-
headed arrow) that buffers the deleterious effects of mutations, and a
gradient in which fitness declines concomitantly with DDG changes. At high
fitness levels (solid blue line) the threshold is relatively narrow. At low fitness
levels (red line) the threshold widens, giving rise to higher epistasis. A wider
threshold is also predicted for protein variants carrying stabilizing
mutations or global suppressors (dotted line; Supplementary Fig. 6).
Vol 444
14 December 2006 LETTERS
Detailed methods are provided in Supplementary Information.
Library construction and selection. The TEM-1 b-lactamase open reading
frame was recloned (NcoI/NotI) under the control of its endogenous promoter
into a modified pUC19 plasmid containing a chloramphenicol resistance gene.
Random mutagenesis was performed by ‘wobble’-base polymerase chain reac-
tion with TEM-1 plasmids as template. The protocol was optimized to achieve an
average of two mutations per gene per generation, and a characteristic pattern of
nucleotide exchange (Supplementary Table 2). Plasmid libraries were obtained
by electroporation into E. coli and more than 10
individual transformants were
grown in Luria–Bertani medium in the presence of chloramphenicol. DNA
extracted from these cultures was retransformed into XL-1 Blue E. coli cells
and plated on agar plates supplemented with ampicillin (0, 12.5 or 250 mgml
Plasmid DNA from surviving colonies was extracted and served as a template for
the subsequent round of mutagenesis.
Fitness measurements. Several hundred colonies from each library were ran-
domly picked and grown in 96-well plates in the presence of chloramphenicol.
The 96-well plates were replica-plated onto agar plates with the entire range of
ampicillin concentrations (0–2,500 mgml
). Fitness values (W) were deter-
mined from the fraction of clones that survived each ampicillin concentration
(two to five independent repetitions of this protocol were performed and stand-
ard deviations are presented). The reproducibility of the mutagenesis protocol
was verified by an independent repetition of one round of mutagenesis, indi-
cating a variability within the deviation of the fitness measurements (Supple-
mentary Table 1).
Computation of stability effects. FoldX is a structure-based algorithm that
predicts the DDG values of mutations with considerable fidelity (R 5 0.83 for
a data set of 1,030 mutations in 27 proteins)
. We applied the published pro-
cedure, including an adjustment of the calculated values
. A correlation between
the deleterious effect of mutations and their predicted DDG values was also
obtained (Supplementary Table 3).
Received 1 September; accepted 27 October 2006.
Published online 19 November 2006.
1. Pal, C., Papp, B. & Lercher, M. J. An integrated view of protein evolution. Nature
Rev. Genet. 7, 337
348 (2006).
2. DePristo, M. A., Weinreich, D. M. & Hartl, D. L. Missense meanderings in
sequence space: a biophysical view of protein evolution. Nature Rev. Genet. 6,
687 (2005).
3. Wilke, C. O. & Adami, C. Interaction between directional epistasis and average
mutational effects. Proc. R. Soc. Lond. B 268, 1469
1474 (2001).
4. Azevedo, R. B., Lohaus, R., Srinivasan, S., Dang, K. K. & Burch, C. L. Sexual
reproduction selects for robustness and negative epistasis in artificial gene
networks. Nature 440, 87
90 (2006).
5. Elena, S. F., Carrasco, P., Daros, J. A. & Sanjuan, R. Mechanisms of genetic
robustness in RNA viruses. EMBO Rep. 7, 168
173 (2006).
6. Guerois, R., Nielsen, J. E. & Serrano, L. Predicting changes in the stability of
proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol.
320, 369
387 (2002).
7. Wolf, J. B., Brodie, E. D. & Wade, M. J. Epistasis and the evolutionary process (Oxford
Univ. Press, Oxford, 2000).
8. Horovitz, A. Double-mutant cycles: a powerful tool for analyzing protein structure
and function. Fold. Des. 1, R121
R126 (1996).
9. Charlesworth, B. Mutation
selection balance and the evolutionary advantage of
sex and recombination. Genet. Res. 55, 199
221 (1990).
10. Bloom, J. D. et al. Thermodynamic prediction of protein neutrality. Proc. Natl Acad.
Sci. USA 102, 606
611 (2005).
11. Wagner, G. P., Laubichler, M. D. & Bagheri-Chaichian, H. Genetic measurement of
theory of epistatic effects. Genetica 102
103, 569
580 (1998).
12. Shafikhani, S., Siegel, R. A., Ferrari, E. & Schellenberger, V. Generation of large
libraries of random mutants in Bacillus subtilis by PCR-based plasmid
multimerization. Biotechniques 23, 304
310 (1997).
13. Daugherty, P. S., Chen, G., Iverson, B. L. & Georgiou, G. Quantitative analysis of the
effect of the mutation frequency on the affinity maturation of single chain Fv
antibodies. Proc. Natl Acad. Sci. USA 97, 2029
2034 (2000).
14. Guo, H. H., Choe, J. & Loeb, L. A. Protein tolerance to random amino acid change.
Proc. Natl Acad. Sci. USA 101, 9205
9210 (2004).
15. Bowie, J. U., Reidhaar-Olson, J. F., Lim, W. A. & Sauer, R. T. Deciphering the
message in protein sequences: tolerance to amino acid substitutions. Science 247,
1310 (1990).
16. Sauer, U. H., San, D. P. & Matthews, B. W. Tolerance of T4 lysozyme to proline
substitutions within the long interdomain a-helix illustrates the adaptability
of proteins to potentially destabilizing lesions. J. Biol. Chem. 267, 2393
17. Huang, W., Petrosino, J., Hirsch, M., Shenkin, P. S. & Palzkill, T. Amino acid
sequence determinants of b-lactamase structure and activity. J. Mol. Biol. 258,
703 (1996).
18. Wagner, A. Distributed robustness versus redundancy as causes of mutational
robustness. BioEssays 27, 176
188 (2005).
19. Bloom, J. D., Labthavikul, S. T., Otey, C. R. & Arnold, F. A. Protein stability
promotes evolvability. Proc. Natl Acad. Sci. USA 103, 5869
5874 (2006).
20. Kondrashov, A. S. Deleterious mutations and the evolution of sexual reproduction.
Nature 336, 435
440 (1988).
21. Lynch, M., Burger, R., Butcher, D. & Gabriel, W. The mutational meltdown in
asexual populations. J. Hered. 84, 339
344 (1993).
22. Kiel, C. & Serrano, L. The ubiquitin domain superfold: structure-based sequence
alignments and characterization of binding epitopes. J. Mol. Biol. 355, 821
Supplementary Information is linked to the online version of the paper at
Acknowledgements We thank F. Kondrashov for insights and inspiration,
L. Serrano, U. Alon and G. Schreiber for contributions, and our Department,
Institute and the Feinberg Graduate School, for uncompromising support.
Author Contributions S.B. designed, performed and analysed the experiments.
M.S. and R.B. provided technical assistance. N.T. performed the FoldX
computations and helped devise the model. D.S.T. designed and supervised the
experiments, and devised the model. S.B. and D.S.T. wrote the manuscript.
Author Information Reprints and permissions information is available at The authors declare no competing financial interests.
Correspondence and requests for materials should be addressed to D.S.T.
Vol 444
14 December 2006
... The shape of fitness peaks and ridges, and their distribution in genotype space has implications for fundamental questions in evolution (de Visser and Krug, 2014) and practical applications (Sardanyés et al., 2008). Evolution starting at a sharp fitness peak is expected to proceed at a different pace than evolution on a flat one (Bershtein et al., 2006;Codoñer et al., 2006;de Visser et al., 2003;Draghi et al., 2010;Wagner, 2008). Furthermore, it has been suggested that flat fitness peaks, representing robust genotypes, may be evolutionarily preferable to sharp peaks, which represent fragile genotypes (Bershtein et al., 2006;de Visser et al., 2003;Draghi et al., 2010;Klug et al., 2019;Zheng et al., 2020). ...
... Evolution starting at a sharp fitness peak is expected to proceed at a different pace than evolution on a flat one (Bershtein et al., 2006;Codoñer et al., 2006;de Visser et al., 2003;Draghi et al., 2010;Wagner, 2008). Furthermore, it has been suggested that flat fitness peaks, representing robust genotypes, may be evolutionarily preferable to sharp peaks, which represent fragile genotypes (Bershtein et al., 2006;de Visser et al., 2003;Draghi et al., 2010;Klug et al., 2019;Zheng et al., 2020). However, how different shapes of fitness peaks may be distributed in genotype space has not been explored (Chan et al., 2017;Kemble et al., 2019). ...
... The online version of this article includes the following figure supplement(s) for figure 3: Flat fitness peaks correspond to mutationally robust proteins, those that are capable of withstanding multiple mutations without losing function, while sharp fitness peaks correspond to mutationally fragile ones. The observed differences in mutational robustness of different proteins may be explained by thermodynamic stability (Bershtein et al., 2006;Echave and Wilke, 2017;Gong et al., 2013;Kurahashi et al., 2018;Poelwijk et al., 2019;Sarkisyan et al., 2016). Therefore, we performed an array of assays aimed at the biophysical characterisation of the four wildtype proteins and an additional genotype, amacGFP:V12L, which differed from amacGFP by the V12L mutation that was extremely common in the amacGFP mutant library. ...
Full-text available
Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machine-learning-driven protein design - instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.
... Synergistic epistasis has been shown to be a common form in evolutionary trajectories [163], where neutral permissive mutations allow the introduction of other mutations that bring about new functions. In contrast, it has been suggested that antagonistic epistasis confers mutational robustness throughout evolution [164]. ...
Synthetic biology is a fast-evolving research field that combines biology and engineering principles to develop new biological systems for medical, pharmacological, and industrial applications. Synthetic biologists use iterative “design, build, test, and learn” cycles to efficiently engineer genetic systems that are reliable, reproducible, and predictable. Protein engineering by directed evolution can benefit from such a systematic engineering approach for various reasons. Learning can be carried out before starting, throughout or after finalizing a directed evolution project. Computational tools, bioinformatics, and scanning mutagenesis methods can be excellent starting points, while molecular dynamics simulations and other strategies can guide engineering efforts. Similarly, studying protein intermediates along evolutionary pathways offers fascinating insights into the molecular mechanisms shaped by evolution. The learning step of the cycle is not only crucial for proteins or enzymes that are not suitable for high-throughput screening or selection systems, but it is also valuable for any platform that can generate a large amount of data that can be aided by machine learning algorithms. The main challenge in protein engineering is to predict the effect of a single mutation on one functional parameter—to say nothing of several mutations on multiple parameters. This is largely due to nonadditive mutational interactions, known as epistatic effects—beneficial mutations present in a genetic background may not be beneficial in another genetic background. In this work, we provide an overview of experimental and computational strategies that can guide the user to learn protein function at different stages in a directed evolution project. We also discuss how epistatic effects can influence the success of directed evolution projects. Since machine learning is gaining momentum in protein engineering and the field is becoming more interdisciplinary thanks to collaboration between mathematicians, computational scientists, engineers, molecular biologists, and chemists, we provide a general workflow that familiarizes nonexperts with the basic concepts, dataset requirements, learning approaches, model capabilities and performance metrics of this intriguing area. Finally, we also provide some practical recommendations on how machine learning can harness epistatic effects for engineering proteins in an “outside-the-box” way.
... Finally, Gonzalez Somermeyer et al. found that the increased mutational sensitivity of avGFP and cgreGFP (and to a lesser degree ppluGFP2) was due to negative epistasis -that is, when an individual mutation is well tolerated, but has a negative effect on the protein's function when combined with other mutations (Bershtein et al., 2006;Domingo et al., 2019). The reduced fluorescence of amacGFP, however, could be ascribed almost entirely to additive effects, with each mutation incrementally making the protein less functional. ...
Full-text available
Using a neural network to predict how green fluorescent proteins respond to genetic mutations illuminates properties that could help design new proteins.
... Indeed, EstN7 DM can be considered a more generalist enzyme due to its ability to hydrolyse substrates of different chain lengths. The role of stabilising or suppressor mutations allowing functionally useful but structurally deleterious mutations to be productively expressed in a particular protein is important to the natural and directed evolutionary process [39][40][41]. Here, we inadvertently discovered a suppressor mutation [N211A] in EstN7 but it clearly shows the impact such mutations can have. ...
Full-text available
Cold active esterases have gained great interest in several industries. The recently determined structure of a family IV cold active esterase (EstN7) from Bacillus cohnii strain N1 was used to expand its substrate range and to probe its commercially valuable substrates. Database mining suggested that triacetin was a potential commercially valuable substrate for EstN7, which was subsequently proved experimentally with the final product being a single isomeric product, 1,2-glyceryl diacetate. Enzyme kinetics revealed that EstN7’s activity is restricted to C2 and C4 substrates due to a plug at the end of the acyl binding pocket that blocks access to a buried water-filled cavity. Residues M187, N211 and W206 were identified as key plug forming residues. N211A stabilised EstN7 allowing incorporation of the destabilising M187A mutation. The M187A-N211A double mutant had the broadest substrate range, capable of hydrolysing a C8 substrate. W206A did not appear to have any significant effect on substrate range either alone or when combined with the double mutant. Thus, the enzyme kinetics and engineering together with a recently determined structure of EstN7 provide new insights into substrate specificity and the role of acyl binding pocket plug residues in determining family IV esterase stability and substrate range.
Errors in DNA replication generate genetic mutations, while errors in transcription and translation lead to phenotypic mutations. Phenotypic mutations are orders of magnitude more frequent than genetic ones, yet they are less understood. Here, we review the types of phenotypic mutations, their quantifications, and their role in protein evolution and disease. The diversity generated by phenotypic mutation can facilitate adaptive evolution. Indeed, phenotypic mutations, such as ribosomal frameshift and stop codon readthrough, sometimes serve to regulate protein expression and function. Phenotypic mutations have often been linked to fitness decrease and diseases. Thus, understanding the protein heterogeneity and phenotypic diversity caused by phenotypic mutations will advance our understanding of protein evolution and have implications on human health and diseases.
How do proteins evolve? How do changes in sequence mediate changes in protein structure, and in turn in function? This question has multiple angles, ranging from biochemistry and biophysics to evolutionary biology. This review provides a brief integrated view of some key mechanistic aspects of protein evolution. First, we explain how protein evolution is primarily driven by randomly acquired genetic mutations and selection for function, and how these mutations can even give rise to completely new folds. Then, we also comment on how phenotypic protein variability, including promiscuity, transcriptional and translational errors, may also accelerate this process, possibly via “plasticity‐first” mechanisms. Finally, we highlight open questions in the field of protein evolution, with respect to the emergence of more sophisticated protein systems such as protein complexes, pathways, and the emergence of pre‐LUCA enzymes.
In polyandrous internally fertilizing species, a multiply-mated female can use stored sperm from different males in a biased manner to fertilize her eggs. The female's ability to assess sperm quality and compatibility is essential for her reproductive success, and represents an important aspect of postcopulatory sexual selection. In Drosophila melanogaster, previous studies demonstrated that the female nervous system plays an active role in influencing progeny paternity proportion, and suggested a role for octopaminergic/tyraminergic Tdc2 neurons in this process. Here, we report that inhibiting Tdc2 neuronal activity causes females to produce a higher-than-normal proportion of first-male progeny. This difference is not due to differences in sperm storage or release, but instead is attributable to the suppression of second-male sperm usage bias that normally occurs in control females. We further show that a subset of Tdc2 neurons innervating the female reproductive tract is largely responsible for the progeny proportion phenotype that is observed when Tdc2 neurons are inhibited globally. On the contrary, overactivation of Tdc2 neurons does not further affect sperm storage and release or progeny proportion. These results suggest that octopaminergic/tyraminergic signaling allows a multiply-mated female to bias sperm usage, and identify a new role for the female nervous system in postcopulatory sexual selection.
A large number of beneficial substitutions can be obtained from a successful directed enzyme evolution campaign and/or (semi)rational design. It is expected that the recombination of some beneficial substitutions leads to a much higher degree of performance through synergistic effect. However, systematic recombination studies show that poorly performing variants are often obtained after recombination of three to four individual beneficial substitutions and this limits protein engineers to exploit nature's potential in generating better performing enzymes. Computer-assisted Recombination (CompassR) strategy allows the recombination of identified beneficial substitutions in an effective and efficient manner in order to generate active enzymes with improved performance. Here, we describe in detail the CompassR procedure with an example of recombining four substitutions and discuss some important practical issues that should be considered (such as the selection of protein structures, number of FoldX runs, evaluation of calculations) for application of the CompassR rule. The core part of this protocol (system setup, ΔΔGfold calculation, and CompassR application) is transferable to other enzymes and any recombination of single beneficial substitutions.
Epistasis can markedly affect evolutionary trajectories. In recent decades, protein-level fitness landscapes have revealed extensive idiosyncratic epistasis among specific mutations. By contrast, other work has found ubiquitous and apparently nonspecific patterns of global diminishing-returns and increasing-costs epistasis among mutations across the genome. Here, we used a hierarchical CRISPR gene drive system to construct all combinations of 10 missense mutations from across the genome in budding yeast and measured their fitness in six environments. We show that the resulting fitness landscapes exhibit global fitness–correlated trends but that these trends emerge from specific idiosyncratic interactions. We thus provide experimental validation of recent theoretical work arguing that fitness-correlated trends can emerge as the generic consequence of idiosyncratic epistasis.
Full-text available
SignificanceEvolution through natural selection is an overwhelmingly complex process, and it is not surprising that theoretical approaches are strongly simplifying it. For instance, population genetics considers mainly dynamics of gene allele frequencies. Here, we develop a complementary approach to evolutionary dynamics based on three elements-organism reproduction, variations, and selection-that are essential for any evolutionary theory. By considering such general dynamics as a stochastic thermodynamic process, we clarify the nature and action of the evolutionary forces. We show that some of the forces cannot be described solely in terms of fitness landscapes. We also find that one force contribution can make organism reproduction insensitive (robust) to variations.
Full-text available
To investigate the ability of a protein to accommodate potentially destabilizing amino acid substitutions, and also to investigate the steric requirements for catalysis, proline was substituted at different sites within the long alpha-helix that connects the amino-terminal and carboxyl-terminal domains of T4 lysozyme. Of the four substitutions attempted, three yielded folded, functional proteins. The catalytic activities of these three mutant proteins (Q69P, D72P, and A74P) were 60-90% that of wild-type. Their melting temperatures were 7-12 degrees C less than that of wild-type at pH 6.5. Mutant D72P formed crystals isomorphous with wild-type allowing the structure to be determined at high resolution. In the crystal structure of wild-type lysozyme the interdomain alpha-helix has an overall bend angle of 8.5 degrees. In the mutant structure the introduction of the proline causes this bend angle to increase to 14 degrees and also causes a corresponding rotation of 5.5 degrees of carboxyl-terminal domain relative to the amino-terminal one. Except for the immediate location of the proline substitution there is very little change in the geometry of the interdomain alpha-helix. The results support the view that protein structures are adaptable and can compensate for potentially destabilizing amino acid substitutions. The results also suggest that the precise shape of the active site cleft of T4 lysozyme is not critical for catalysis.
Full-text available
Loss of fitness due to the accumulation of deleterious mutations appears to be inevitable in small, obligately asexual populations, as these are incapable of reconstituting highly fit genotypes by recombination or back mutation. The cumulative buildup of such mutations is expected to lead to an eventual reduction in population size, and this facilitates the chance accumulation of future mutations. This synergistic interaction between population size reduction and mutation accumulation leads to an extinction process known as the mutational meltdown, and provides a powerful explanation for the rarity of obligate asexuality. We give an overview of the theory of the mutational meltdown, showing how the process depends on the demographic properties of a population, the properties of mutations, and the relationship between fitness and number of mutations incurred.
Full-text available
We describe a PCR-based method for the generation of plasmid multimers that can be directly transformed into Bacillus subtilis with very high efficiency. This technique is particularly useful for the generation of large libraries of randomly mutagenized genes, which are required for the optimization of enzymes by directed evolution. We subjected the gene coding for the protease subtilisin to six consecutive rounds of PCR at three different levels of mutagenicity. The resulting 18 populations were cloned using our PCR multimerization protocol, and the mutation frequencies were determined by DNA sequencing. The resulting data demonstrate that the mutation frequency during PCR can be controlled by adding varying concentrations of manganese chloride to the reaction mixture. We observed a bias in the type of base pair changes with A and T being mutated much more frequently than C and G. We determined the fraction of active clones in all populations and found that its natural logarithm is proportional to the average mutation frequency of the populations. These data reveal that a fraction of about 0.27 of all possible mutations leads to the inactivation of the subtilisin gene, which provides a measure for its structural plasticity.
Full-text available
Epistasis is defined as the influence of the genotype at one locus on the effect of a mutation at another locus. As such it plays a crucial role in a variety of evolutionary phenomena such as speciation, population bottle necks, and the evolution of genetic architecture (i.e., the evolution of dominance, canalization, and genetic correlations). In mathematical population genetics, however, epistasis is often represented as a mere noise term in an additive model of gene effects. In this paper it is argued that epistasis needs to be scaled in a way that is more directly related to the mechanisms of evolutionary change. A review of general measurement theory shows that the scaling of a quantitative concept has to reflect the empirical relationships among the objects. To apply these ideas to epistatic mutation effects, it is proposed to scale A x A epistatic effects as the change in the magnitude of the additive effect of a mutation at one locus due to a mutation at a second locus. It is shown that the absolute change in the additive effect at locus A due to a substitution at locus B is always identical to the absolute change in B due to the substitution at A. The absolute A x A epistatic effects of A on B and of B on A are identical, even if the relative effects can be different. The proposed scaling of A x A epistasis leads to particularly simple equations for the decomposition of genotypic variance. The Kacser Burns model of metabolic flux is analyzed for the presence of epistatic effects on flux. It is shown that the non-linearity of the Kacser Burns model is not sufficient to cause A x A epistasis among the genes coding for enzymes. It is concluded that non-linearity of the genotype-phenotype map is not sufficient to cause epistasis. Finally, it is shown that there exist correlations among the additive and epistatic effects among pairs of loci, caused by the inherent symmetries of Mendelian genetic systems. For instance, it is shown that a mutation that has a larger than average additive effect will tend to decrease the additive effect of a second mutation, i.e., it will tend to have a negative (canalizing) interaction with a subsequent gene substitution. This is confirmed in a preliminary analysis of QTL-data for adult body weight in mice.
An amino acid sequence encodes a message that determines the shape and function of a protein. This message is highly degenerate in that many different sequences can code for proteins with essentially the same structure and activity. Comparison of different sequences with similar messages can reveal key features of the code and improve understanding of how a protein folds and how it performs its function.
Mutation-selection balance in a multi-locus system is investigated theoretically, using a modification of Bulmer's infinitesimal model of selection on a normally-distributed quantitative character, taking the number of mutations per individual (n) to represent the character value. The logarithm of the fitness of an individual with n mutations is assumed to be a quadratic, decreasing function of n. The equilibrium properties of infinitely large asexual populations, random-mating populations lacking genetic recombination, and random-mating populations with arbitrary recombination frequencies are investigated. With 'synergistic' epistasis on the scale of log fitness, such that log fitness declines more steeply as n increases, it is shown that equilibrium mean fitness is least for asexual populations. In sexual populations, mean fitness increases with the number of chromosomes and with the map length per chromosome. With 'diminishing returns' epistasis, such that log fitness declines less steeply as n increases, mean fitness behaves in the opposite way. Selection on asexual variants and genes affecting the rate of genetic recombination in random-mating populations was also studied. With synergistic epistasis, zero recombination always appears to be disfavoured, but free recombination is disfavoured when the mutation rate per genome is sufficiently small, leading to evolutionary stability of maps of intermediate length. With synergistic epistasis, an asexual mutant is unlikely to invade a sexual population if the mutation rate per diploid genome greatly exceeds unity. Recombination is selectively disadvantageous when there is diminishing returns epistasis. These results are compared with the results of previous theoretical studies of this problem, and with experimental data.
The origin and maintenance of sexual reproduction continues to be an important problem in evolutionary biology. If the deleterious mutation rate per genome per generation is greater than 1, then the greater efficiency of selection against these mutations in sexual populations may be responsible for the evolution of sex and related phenomena. In modern human populations detrimental mutations with small individual effects are probably accumulating faster than they are being eliminated by selection.
TEM-1 beta-lactamase catalyzes the hydrolysis of beta-lactam antibiotics such as the penicillins and cephalosporins, thus providing for bacterial resistance to these compounds. To determine the amino acid residues critical for the structure and function of TEM-1 beta-lactamase, the codons for each of the 263 amino acid residues that constitute the mature form of the enzyme were randomized using a site-directed mutagenesis procedure. Functional random mutants were selected based on their ability to confer ampicillin resistance to Escherichia coli. The DNA sequence of several functional mutants was determined for each set of random mutants. It was found that 43 out of the 263 amino acid residues do not tolerate substitutions and therefore are critical for the structure and activity of the enzyme. In addition, a comparison of conserved residue positions among functional beta-lactamase mutants with conserved residues in the beta-lactamase gene family identified many positions which did not tolerate substitutions in the mutagenesis studies but are freely substituted among members of the gene family. This observation may be due to the accumulation of compensating mutations among members of the gene family. Finally, the sequence variability at residue positions among functional mutants was quantitated by calculating the effective number of substitutions at each position using information-theoretical entropy. These values were used to obtain a quantitative estimate of the correlation between the sequence variability at a position and the fractional accessible surface area of the residue. The correlation is found to be statistically significant in that buried residues tend to exhibit low variability and invariant residues tend to exhibit low solvent exposure. However, the correlation is weak because most residues are neither completely buried nor invariant.
A double-mutant cycle involves wild-type protein, two single mutants and the corresponding double mutant protein. If the change in free energy associated with a structural or functional property of the protein upon a double mutation differs from the sum of changes in free energy due to the single mutations, then the residues at the two positions are coupled. Such coupling reflects either direct or indirect interactions between these residues. Double-mutant cycle analysis can be used to measure the strength of intramolecular and intermolecular pairwise interactions in proteins or protein-ligand complexes with known structure. Double-mutant cycles can also be employed to characterize structures that are inaccessible to NMR and X-ray crystallography, such as those of transition states for protein folding, ligand binding and enzyme catalysis, or of membrane proteins. Multidimensional mutant cycle analysis can be used to measure higher-order cooperativity between intramolecular or intermolecular interactions. In the absence of coupling between residues, prediction of mutational effects is possible by assuming their additivity.