ArticlePDF AvailableLiterature Review

Abstract and Figures

Recent years have seen an explosion of interest in understanding the physicochemical parameters that shape enzyme evolution, as well as substantial advances in computational enzyme design. This review discusses three areas where evolutionary information can be used as part of the design process: (i) using ancestral sequence reconstruction (ASR) to generate new starting points for enzyme design efforts; (ii) learning from how nature uses conformational dynamics in enzyme evolution to mimic this process in silico; and (iii) modular design of enzymes from smaller fragments, again mimicking the process by which nature appears to create new protein folds. Using showcase examples, we highlight the importance of incorporating evolutionary information to continue to push forward the boundaries of enzyme design studies.
Content may be subject to copyright.
Special issue: Protein engineering and chemoenzymatic synthesis
Exploiting enzyme evolution for computational
protein design
Gaspar P. Pinto ,
Marina Corbella ,
Andrey O. Demkiv ,
Shina Caroline Lynn Kamerlin
Recent years have seen an explosion of interest in understanding the physico-
chemical parameters that shape enzyme evolution, as well as substantial
advances in computational enzyme design. This review discusses three areas
where evolutionary information can be used as part of the design process:
(i) using ancestral sequence reconstruction (ASR) to generate new starting points
for enzyme design efforts; (ii) learning from how nature uses conformational
dynamics in enzyme evolution to mimic this process in silico; and (iii) modular
design of enzymes from smaller fragments, again mimicking the process by
which nature appears to create new protein folds. Using showcase examples,
we highlight the importance of incorporating evolutionary information to con-
tinue to push forward the boundaries of enzyme design studies.
Computational enzyme design based on protein evolution: an overview
Roughly three decades have passed since the rst attempts to design new enzymes using com-
putational approaches [1,2], and the eld has matured considerably since then. While the earliest
attempts at computational enzyme design focused primarily on side-chain positioning [14]oron
focusing the search space for in vitro directed evolution (see Glossary) studies [5], subsequent
work broadly expanded the scope of the eld, including the fully de novo design of new enzymes
[6] (typically followed by optimization using directed evolution) and the repurposing of existing en-
zymes to catalyze ever more complex chemical reactions [7,8]. In addition, computational design
approaches are becoming ever-more streamlined, such that there now exists a range of powerful
web servers that can assist in the design process [9].
In principle, computational design approaches can take two very loosely dened directions:
structure-based design approaches that require some level of knowledge of the system of in-
terest, including information about the chemical mechanisms, transition states, and key catalytic
residues involved; and sequence-based design approaches that can, for example, draw on
evolutionary information to predict potential hotspots for protein engineering as well as new
variants with desired physicochemical properties, something that is in particular increasingly
being achieved using machine-learning approaches [10].
Computational approaches that require minimal knowledge of the molecular details of the chem-
ical processes involved are attractive for their speed and efciency, as exploring the underlying
mechanisms and transition states typically requires signicant experimental and/or computational
effort. However, much like their experimental counterparts, such approaches are likely to hit
optimization plateaus [11,12] where further improvement in activity becomes extremely chal-
lenging, and without knowledge of the underlying chemistry it can be difcult-to-impossible to
overcome such plateaus. Therefore, rather than competing with each other, sequence- and
structure-based approaches are highly complementary as each provides different types of
We can learn from natures tricks by
reconstructing evolutionary trajectories
to design improved enzymes.
Ancestral sequence reconstruction
(ASR) provides a compelling tool
to obtain enzymes with customized
catalytic properties.
Conformational dynamics in enzyme de-
sign is crucial in increasing the sampling
of states with new catalytic functions as
well as reducing the sampling of non-
productive conformations.
A catalog of fragments characterized by
specic biophysical features may provide
an invaluable resource for the design of
custom-made enzymes.
Department of Chemistry BMC,
Uppsala University, BMC Box 576,
S-751 23 Uppsala, Sweden
These authors contributed equally
(S.C.L. Kamerlin).
Twitter: @GasparRPPP (G.P. Pinto),
@CorbellaMorato (M. Corbella), and
@kamerlinlab (S.C.L. Kamerlin).
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 1
© 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (
Trends in
Biochemical Sciences OPEN ACCESS
TIBS 1830 No. of Pages 15
insights into how to improve a given system. In addition, nature has already provided a blueprint
for how new enzymes evolve, and by reconstructing evolutionary trajectories and probing the
natural evolution of enzymes, we can learn from natures tricks to improve enzymes both in vitro
and in silico.
Experimental protein engineers turned evolutionists[13] have had substantial progress in en-
zyme design using insights from natural evolution. However, there has also been signicant prog-
ress in computational design studies based on evolutionary information, in particular due to
increasing awareness of the role of conformational dynamics in the natural evolution of enzymes
[12,1418], which is now being increasingly incorporated into the computational design process
[15,17]. In this review, we discuss three directions where evolutionary information is being used to
guide the computational design process, specically: (i) repurposing of protein scaffolds identied
through ASR as potential starting points for the generation of new enzyme activities;
(ii) harnessing conformational dynamics, a key driver of natural enzyme evolution, in computa-
tional enzyme design; and (iii) modular design processes based on identifying evolutionarily
important subdomain segments that can be rearranged to create new enzymes. These are just
some of the current directions where evolutionary information can be used to drive the design
process, but they showcase the potential of this eld.
Repurposing ancient enzymes for new catalytic functions
While there are various ways evolutionary information can be used in enzyme design, one of
the most obvious is the use of ASR to identify potential starting points for subsequent exper-
imental or computational design effort. Enzymes obtained from ASR make attractive starting
points for enzyme design as they tend to be highly thermostable, conformationally exible,
and evolvable [1921]. While many different computational algorithms exist with which to
perform ASR (Table 1),thebasicprincipleofalloftheseistousethesequencesof
known, extant proteins to reconstruct phylogenetic trees containing sequences of putative
ancestors, based on the probability of nding a given amino acid substitution at the given
point in amino acid sequence [22,23]. Such ancestral inference yields a cloudof sequences
that relate to putative historical ancestors [2426], although typically only the most probabi-
listic of these sequences is subject to further experimental or computational characterization
of the evolutionary trajectory.
There is, of course, doubt about how realistic the sequence predictions from ASR are: as pointed
out by Copley [27], due to ambiguities in the reconstruction, the probability of even the most
probabilistic sequence being the real sequence is very low, as, in practice, many positions are
predicted with a probability of <50%, particularly if the data set of extant proteins usedfor the re-
construction is highly diverged. In addition, particular challenges are posed by gaps in alignments
(i.e., insertion/deletion evolutionary events), since different ways of dealing with these can lead to
different sequences and phenotypes [28]. Furthermore, while many biochemical studies of recon-
structed ancestral proteins suggest enhanced thermostability of the putative ancestors, it has
been argued that there is a risk that this enhanced thermostability is a result of reconstruction
bias rather than a true property of the putative ancestors (e.g., see the detailed discussion
in [29]). However, a recent study has used experimental phylogenetics [30] to reproduce in
the laboratory an evolutionary trajectory of an RFP [24]. The advantage of this approach is that
the sequences of the true ancestral nodes are actually known and can be phenotypically charac-
terized and compared with the sequences predicted by ASR. This study demonstrated
that although there exists some level of mistakes in the reconstructed sequences (which are
exacerbated the more ancient the node), the phenotypes of the actual and reconstructed ances-
tors are similar. In addition, one can reconstruct multiple putative ancestors and compare their
Trends in Biochemical Sciences
2Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
Ancestral sequence reconstruction
(ASR): statistical approach used in the
study of protein evolution to predict the
sequences of putative ancestors using
the sequences of extant proteins.
Catalytic promiscuity: ability of an
enzyme to catalyze more than one,
chemically distinct, reaction.
Conformational ensemble: collection
of structures that describes the
conformational space a protein can
Conformational selection:
functionally desirable shift in the
conformational ensemble of a protein,
triggered by ligand binding or aminoacid
substitutions/structural modications to
De novo design: in the context of
enzyme design, designing a new
scaffold with catalytic functionality, or a
new active site in an existing scaffold,
from rst principles.
Directedevolution: protein engineering
technique that iteratively introduces
mutations into a protein, rening for a
property of interest, thus mimicking the
process of Darwinian evolution either
in vivo or in vitro.
Distal mutations: outer-sphere
mutations of functional or catalytic
Dynamic alloste ry: regulation of the
conformational properties of a protein.
Enzyme evolvability: ability of an
enzyme to adapt and acquire new
functions on mutation.
Evolutionary trajectory: in the context
of enzymes, the sequence of amino acid
substitutions that take an enzyme from
point A to point B in sequence/function
Excess positional mutual
information analysis: approach
to obtain information about the
interdependence of the positions of two
variables in a system.
Machine learning: branch of articial
intelligence in which an analytical model
is designed to automatically self-
improve on the acquisition of new data.
Markov statemodel: stochastic mode l
used to describe the long-timescale
dynamics of a molecular system.
Modular design : in the context of
protein design, this refers to design
using protein fragments that are pieced
together like molecular LEGO® to
(re)create a larger scaffold.
Optimization plateau: in the contextof
directed evolution, a at region of the
properties to increase the likelihood that one is observing the phenotypic properties of the true
ancestor [27].
Clearly, however, while there are challenges in inferring actual evolutionary information through
the use of ASR, what is obvious is that protein scaffolds obtained through ASR are excellent
starting points for subsequent protein design effort due to their high thermostability and
evolvability [31]. There have been a great number of experimental studies using ancestrally recon-
structed proteins for enzyme design due to their greater thermostability; more recently, interest
has also shifted to using ASR to obtain scaffolds that can be used as starting points for the engi-
neering of catalytic properties [22,23,27,31]. Several such studies have focused specically on
using ASR to obtain exible scaffolds that can be manipulated for engineering purposes in
terms of their conformational properties. These are discussed in more detail in the subsequent
section. However, as some (of many) other examples of recent success stories, ASR has been
used to understand allosteric communication in a multienzyme complex of a key metabolic en-
zyme, tryptophan synthase [32,33], to obtain a high-redox-potential laccase [34], to modulate
the catalytic adaptability of an extremophile kinase [35], and to identify novel heme binding that
modulates the allosteric regulation of an ancestral glycosidase (Figure 1)[36]. This latter study
is notable as heme binding was not observed in any of ~5500 crystal structures of ~1400 modern
glycosidases. Experimental characterization of a number of modern glycosidases showed appre-
ciable levels of heme binding but signicantly lower than that of the ancestral glycosidase, indicat-
ing that the ability to efciently bind heme to allosterically regulate catalysis is a specic feature of
the ancestral enzyme [36].
Overall, the use of evolutionary information obtained from ASR is a powerful tool for enzyme en-
gineering, and with growing interest in the use of ASR to obtain enzymes with tailored catalytic
properties [22,31], it is likely that we will observe much greater usage of this technique in protein
design in the coming years.
Harnessing conformational dynamics for enzyme engineering
Recent years have seen increasing awareness of the importance of conformational dynamics
to catalytic promiscuity and enzyme evolvability [14,15,17,18,37]. This dates back to
early work by James and Tawk[14], who presented a model for the role of conformational
selection in enzyme evolution in which conformational plasticity allows enzymes to sample con-
formations that can bind new ligands and facilitate new chemistry. While such conformations may
be only rarely sampled in the wild-type enzyme, evolutionary pressure can alter the ensemble
such that a previously rare conformation becomes dominant and a new activity becomes pre-
ferred. One would expect this phenomenon to be observed more frequently in laboratory than
in natural evolution, as in the laboratory even low-level promiscuous activities can be detected
and amplied, whereas natural evolution requires selective pressure to amplify a promiscuous
activity. There are exceptions to this, however [38,39], and the importance of conformational
dynamics in shaping enzyme evolutionary trajectories has been observed in both natural and
articial (directed) evolutionary trajectories [1517]. Here, the ne-tuning of conformational
dynamics to enhance activity has been proposed to occur in two ways: (i) increased sampling
of states capable of conferring new catalytic activities; and (ii) reduced sampling of non-
productive conformations [16]. Here, and in line with the rest of this review, our focus is
on computational studies, as learning from evolution and harnessing conformational dynamics
in computational protein engineering is an actively growing area. We note that related
methodological aspects have been recently reviewed in, for example, [40], and a summary of
selectedmethodsispresentedinTable 1.Inaddition,theeld is growing continuously, and
the number of user-friendly tools is growing accordingly [9].
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 3
search space where further
improvements are challenging.
Population shif t: change in the
conformational ensemble of a protein,
usually triggered by either ligand binding
or amino acid substitutions/structural
modications to a protein.
Productive and non-productive
conformations: conformations of a
protein that are either benecial or
detrimental for a desired function (for
the purposes of this review, catalytic
Structure- versus sequence-based
design approaches: for simplicity, we
distinguish between approaches that
primarily use structural information and
those that primarily use sequence
information in the design process. Such
a clean distinction is, however, not
usually possible as there tends to be
overlap between the two.
Table 1. Overview of computational resources of relevance to enzyme design
Category Name Description URL Features and limitations
ASR Bali-phy A standalone tool for the estimation of
multiple sequence alignments and
evolutionary trees Features: Excellent performance on tested
protein data sets
Limitations: Tends to systematically
under-align on the biological sequence data
IQ-TREE Standalone software and web tool for the
inference of phylogenetic trees using the
maximum-likelihood approach Features: It is an integral component of
many biomedical research open-source
applications such as Galaxy, Nextstrain,
and QIIME 2
Limitations: Both IQ-Tree- and
RaxML-NG-inferred maximum-likelihood
gene trees have been suggested to have
reproducibility issues (
Genetics Analysis
User-friendly software for molecular
evolution analysis and construction of
phylogenetic trees Features: Easy-to-use graphical user
interface (GUI) with good documentation
available; works across all platforms
Limitations: Works like a black box for some
parameters, with no user control
MrBayes A program for Bayesian inference and
model choice across a wide range of
phylogenetic and evolutionary models
Features: GPU acceleration available
Limitations: Requires quite complex input
data, which can hinder non-experts from
performing ASR successfully
Analysis by
Likelihood (PAML)
Standalone software and web tool to
perform phylogenetic analysis using the
maximum likelihood approach
Features: Has a GUI version; very
Limitations: Steep learning curve for
novice users
A standalone tool for ASR using the
maximum-likelihood approach; a GUI is
being developed
Features: Has a GUI (under development
but already available for use)
Limitations: Both IQ-Tree- and
RaxML-NG-inferred maximum-likelihood
gene trees have been suggested to have
reproducibility issues (
The FastML
Server (FastML)
A web tool for ASR using the
maximum-likelihood approach Features: Easy-to-use web tool; can take
unaligned sequences as input
Limitations: Accepts only the FASTA le
format as input
Allostery DFI A protocol developed to identify
per-residue contributions to a proteins
overall dynamical prole Features: Code easily available and ready
to use
Limitations: Requires atomic coordinates
Ohm Web tool that predicts allosteric sites and
inter-residue correlation and identies the
allosteric pathways between them
Features: Easy-to-use web tool requiring
only the insertion of a 3D structure
Limitations: Structure-based predictions do
not take dynamics into account
SPM Tool developed for the identication of
distal mutations affecting function, based
on the shortest-path-map algorithm
Features: Uses long dynamic simulations
to predict allosteric sites
Limitations: Poor availability of the code
BRENDA Electronic repository containing molecular
and biochemical information on enzymes Features: Several different queries are
available as examples; integrates CATH
and SCOPe
Limitations: Requires login and there is a
PDB A protein databank containing more than
179 000 macromolecular structures Features: General database for protein
structures obtained through NMR, X-ray
crystallography, and cryoelectron
Limitations: Contains redundant data
and the search function could be better
Trends in Biochemical Sciences
4Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
Table 1. (continued)
Category Name Description URL Features and limitations
UniProt A central repository of protein data
created by combining the Swiss-Prot,
TrEMBL, and PIR-PSD databases Features: De facto go-to database for
general protein information
Limitations: Overwhelming amount of data
for newcomers
CATH Protein
A database containing information about
the evolutionary relationships of protein
domains Features: Huge open-source database
helps with automated implementation in
ones own workows
Limitations: Drug compound information
is still being developed
Fuzzle A database containing evolutionary
information about protein fragments Features: Evolutionary information on
Limitations: Hierarchical databases
can lead to misclassication when
slight differences are present in the
Pfam A database with information about protein
families, represented by multiple
sequence alignments generated using
Hidden Markov models Features: Constant development and
integration with other European
Bioinformatics Institute (EBI) tools
Limitations: There is a high-quality
database and a low-quality database
that can lead to errors
SCOP A database of proteins classied based
on structural relatedness, such as
superfamilies, families, and folds Features: Uses data deposited in the PDB
to group proteins; curated to be
Limitations: New version is not totally
backwards compatible
Modeling AlphaFold Protein prediction tool using neural
networks; achieved a score twice as good
as the second-best protein predictor in
Features: New gold standard for
protein structure prediction; makes
use of a newly developed neural
Limitations: It is better where others
were good, but still lacks good loop
region predictions.
iTasser Both a web tool and a standalone
tool, it predicts protein structure using
a hierarchical approach; runs
iteratively until the lowest-energy
structures are achieved and then uses
publicly available function information
to identify closely related templates
with the same function
Features: Good and easy-to-use web tool
and a powerful standalone tool
Limitations: Lacks accuracy when few
templates are available and is slower than
similar options available
Modeller Protein structure modeling tool that
predicts structures by the satisfaction
of spatial restraints obtained from a
sequence alignment and shown as a
probability-density function Features: Can be installed on any platform
and is very fast in creating a new model
Limitations: Speed comes at the cost of
accuracy for models where slower tools
yield better results
Clustal Omega Multiple sequence alignment web tool
Features: Has been constantly
developed since the 1980s for MSA
Limitations: Does not yield good results
when a large amount of sequences is
provided as input
Alignment Using
Fast Fourier
Multiple sequence alignment web tool;
can also be used locally as a standalone
Features: Users can choose between
various multiple alignment methods
Limitations: It requires more memory to run
(continued on next page)
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 5
Table 1. (continued)
Category Name Description URL Features and limitations
Comparison by
Multiple sequence alignment web tool; as
with Clustal Omega, it is integrated in the
EBI ecosystem
Features: Can achieve better average
accuracy and speed than ClustalW2 or
Limitations: The Kimura distance used
at the second stage, although fast,
does not consider which changes of
amino acids occur between sequences
AbDesign An algorithm for backbone design using
structure- and sequence-based
Features: Stepwise workow for the
design of antibodies that focuses on
stability and binding afnity
Limitations: Requires sufcient sequence
data on homologs as well as atomic
FuncLib Web tool to design and rank multiple point
mutations based on evolutionary
information and protein-folding stability
calculations Features: Multipoint variant design tool
with an easy-to-use web server
Limitations: Works better with a
pre-stabilized protein scaffold; poor
knowledge of ones system may lead to
poor results
Loop Grafter A web tool with a workow to compare
loop dynamics between proteins and
transplant loops from one protein to
Features: Automated way to transfer
loops from one protein to the other
Limitations: Works only with the input of
both the template and the scaffold
PROSS A user-friendly web tool to predict protein
amino acid substitutions that yield
higher-stability variants
Features: Automated way to stabilize
proteins by inserting mutations in the
original protein; the method is reliable
enough that only a more limited
number of designs as output is
Limitations: Requires a structure
(which may not always be available)
for stability calculations
ProtLego A Python library for chimera design and
Features: Automatic construction and
ranking of chimeras
Limitations: The correlation between
structural features and experimental
Rosetta A comprehensive software suite with
several algorithms that can be used for
the modeling and analysis of proteins Features: Encompasses many modules
under the same umbrella name
Limitations: Not unied and developed by
many people through the years; can be
hard to implement and use different
SEWING A protocol to design new tertiary protein
structures by sewingtogether
secondary-structure building blocks
Features: Continuous and discontinuous
SEWING can be merged to create
additional diversity
Limitations: At present, it appears to have
been applied only to the construction of
all-α-helix chimeras
CAVER Software tool for the analysis and
visualization of tunnels and channels in
protein structure Features: Integration with other
CaverSuite tools allows deeper analysis
Limitations: Still lacks the possibility of
calculating pores
POVME Standalone tool for ligand-binding pocket
Features: Calculates ligand-binding
pockets using MD snapshots
Limitations: The lack of a GUI makes it
less accessible to non-bioinformaticians
Trends in Biochemical Sciences
6Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
A showcase model system for understanding the role of conformational dynamics in protein evo-
lution and design, particularly from a computational perspective, has been β-lactamases, enzymes
that are capable of hydrolyzing the lactam core of nearly all β-lactam antibiotics (including penicillin
derivatives, cephalosporins, and carbapenems) and are thus major contributors to bacterial antibi-
otic resistance [41]. In particular, the evolution of serine β-lactamases has been characterized ex-
tensively through ASR, and thoroughly characterized in terms of their biochemical and biophysical
properties, including their structure, function, and stability [42,43]. Subsequent experimental and
computational analysis suggestedan important role for conformational dynamics in the evolvability
of these enzymes, with a narrowing of the conformational ensemble on transitioning from
the Precambrian nodes to the more specialized modern enzymes [18,20,24,43]. In addition, it
has been suggested that mutations utilize dynamic allostery to confer antibiotic resistance
in the modern TEM-1 β-lactamase [44], which is one of the most commonly encountered
β-lactamases in Gram-negative bacteria [45]. Interestingly, despite large differences in both
sequence and scaffold exibility, the overall tertiary structure remains largely unchanged over
the course of these enzymesevolution [46]. Excess positional mutual information analysis
and molecular dynamics (MD) simulations have been used to predict allosteric mutations that
affect β-lactamase drug resistance [47]andMarkov state models have been used to identify
hidden conformational states that are important for conferring antibiotic resistance [48], which
can allow for the prediction of potential resistance to new compounds and aid drug discovery
efforts. Finally, the conformational exibility at the Precambrian nodes could be exploited to
insert a de novo active site capable of catalyzing Kemp eliminase activity [46], which could
be further optimized [49] using ultralow-throughput computational screening (using FuncLib
[50]) to reach catalytic activities comparable with those of modern enzymes towards their
natural substrates (Figure 2)[51].
Following from this, Kemp elimination is a model reaction for base-catalyzed proton abstraction
from carbon and is one of the commonly targeted reactions in articial enzyme design [52]. In re-
lated recent work [53], Chica and coworkers used molecular simulations to recapitulate theeffect
of the directed evolution of the most procient Kemp eliminase designed to date, HG3.17 [54].
They then exploited changes in conformational dynamics during the evolutionary trajectory lead-
ing to HG3.17 from a de novo design, HG3 [55], to engineer a biocatalyst, HG4, with catalytic ef-
ciencies close to that of HG3.17 but using only key rst- and second-shell substitutions picked
Table 1. (continued)
Category Name Description URL Features and limitations
BLAST A tool to calculate statistical signicance
between biological sequences
Features: One of the most-used tools for
local alignment search; fast and easy to
Limitations: Developed in 1990 and has
not changed substantially since then
FASTA A web tool to provide a heuristic local
search with a protein query
Features: First tool in the eld; as with any
other tool from the EBI, it is integrated with
many other tools and analyses; newer
tools exist that have evolved from this
Limitations: As opposed to BLAST, it
does not remove low-complexity regions
Bio3D R package for the analysis of protein
structure and trajectory data; provides a
variety of approaches for conformational
analysis of a protein Features: Comprehensive tool with many
tutorials and easy installation
Limitations: Lack of GUI and a webserver
makes for a steep learning curve for
Note that this list is based on a constantly expanding toolkit, and therefore it is impossible to be exhaustive.
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 7
up during the evolutionary trajectory based on a template obtained using ensemble-based
Another group that has invested signicant effort in understanding the role of conformational dy-
namics and distal mutations in enzyme catalysis and evolution is that of Osuna and coworkers.
This includes, in 2017, the development of a new approach, the Shortest Path Map(SPM) [56].
This approach can be applied to MD simulations to identify catalytically important communication
pathways in proteins as well as the pairs of residues with the greatest contributions to the com-
munication pathway. It has so far been successfully applied to either understand the evolution of
catalysis and/or to engineer the activity of retro-aldolases [56], tryptophan synthase [33,57],
monoamine oxidase [58], and, most recently, cytochrome P450 monooxygenase [59]. In this
latter case, the authors were able to identify two mutations that initially affected only activity, with-
out a corresponding population shift of conformations, and thus did not lead to a selectivity
change. However, a third mutation identied using the SPM approach was able to modify the
conformational ensemble and change both the activity and the selectivity. Further examples of
approaches that can be used to predict distal mutations are discussed in Box 1.
Following from this, there is increasing evidence that the conformational dynamics of exible
loops [17,60]andtunnels[61] can be exploited for protein design. For example, in 2019
Figure 1. A comparison of representative structures extracted from 10 × 500-ns molecular dynamics (MD)
simulations of an ancestral glycosidase both (A) in the absence and (B) in the presence of heme, as well as
(C) its corresponding extant counterpart (in this case a glycosidase from Halothermothrix orenii). Also shown
here are (D) the absolute and (E) the relative (heme-bound vs non-heme-bound ancestral glycosidase) root mean square
uctuations (RMSFs) (Å) of the backbone C
-atoms of each relevant system. It can be seen from these simulations that
there are clear differences in exibility in the region spanning residues 227334, when comparing both the ancestral and
extant proteins and the ancestral glycosidase with and without heme bound. This region corresponds to missing residues
in the electron density of the ancestral protein, an effect that is particularly pronounced in the absence of heme. The
corresponding rigidication of the ancestral protein by bound heme was shown, in turn, to lead to differences in catalytic
activity. This gure was originally published in [36]. Copyright 2018, Springer Nature. Published under a CC-BY license
Trends in Biochemical Sciences
8Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
Damborsky and collaborators published a study where ASR was used to revive a common an-
cestor of the Renilla luciferase and haloalkane dehalogenases [62]. The resurrected ancestor
showed higher thermostability than both of the extant enzymes (which is a common outcome
of ASR, as discussed in the prior section), with the conformations visited during similar-length
MD simulations showing slower dynamics on the ancestor than on the modern enzymes. The ac-
tivity for the monooxidase was only residual and the hydrolase activity was mostly lower in the an-
cestorthanintheirmodernenzymeofchoice. A recent continuation of this study [63]
hypothesized that a helix at the tunnel mouth was controlling the rate of the monooxygenase ac-
tivity. With an insertion variant of the ancestor and with a fragment grafted from the modern Renilla
luciferase into the ancestor, it was possible to engineer the conformational dynamics of this lucif-
erase to obtain both lower product inhibition and highly stable bioluminescence [63], resulting in a
more procient reporter system for gene expression.
Enzyme engineering from protein LEGO®
While the repurposing of either extant or reconstructed ancestral enzymes for new catalytic func-
tions has been shown to be tremendously powerful as a starting point for protein design,
repurposed enzymes could in principle also carry unnecessary (or even potentially undesirable)
features for the desired function, which are just leftover traces of their evolutionary path. There-
fore, substantial effort has also been invested in the design of new protein scaffolds, completely
de novo,withafocusonobtainingtailoredphysicochemical properties. The past decades
witnessed tremendous progress in the design of novel enzymes completely de novo by grafting
minimal active sites onto pre-existing scaffolds [64]. However, the activities of the resulting con-
structs were typically low, with signicant gains obtained only through subsequent directed
Figure 2. Enhancing the de novo Kemp eliminase activity of a Precambrian β-lactamase (GNCA4-WT) [46]
using FuncLib [50]. (A) Biochemical characterization of the activity of the top-20-ranked variants obtained from FuncLib
shows four variants with signicant improvements in activity compared with the wild-type enzyme. The most efcient of
these, GNCA4-12, has a turnover number (k
and a catalytic efciency (k
which is comparable with the catalytic efciency of the average modern enzyme towards its native substrate [51]. (B) An
overlay of the crystal structures of the GNCA4-WT (tan) and GNCA-2 (light blue) β-lactamases, in complex with the
transition-state analog 6-nitrobenzotriazole, shows the overall scaffolds to be essentially superimposable, with only minor
differences in side-chain positioning at the de novo active site. Simulations of the reaction catalyzed by these enzymes
indicate that enhancements in catalysis can be directly linked to improved positioning of the reacting fragments for optimal
proton transfer at the Michaelis complex. As the catalytic D229 side chain is placed on a exible loop, this positioning is
likely to be further improved by rigidifying the exible loop. Adapted from [49]. Copyright 2020, Royal Society of Chemistry.
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 9
evolution [6]. While it is unclear whether de novo designer scaffolds will necessarily provide more
efcient enzymes, they have great potential to vastly expand the scope of new catalytic function-
alities that a scaffold can accommodate. This makes de novo protein design once again an excit-
ing area, particularly as the eld has matured signicantly [6567], with signicant contributions
also likely to be made from recent developments in protein structure prediction (such as Alpha
Fold [68] and recent developments in protein structure prediction using a three-track neural
network [69]).
Various approaches have been used to try to design new protein scaffolds [6567], one of which
is modular design to assemble new protein folds [70]. This approach is attractive because it effec-
tively allows new proteins to be assembled as if they are made of molecular LEGO® building
blocks. Modular design is relevant from an evolutionary perspective, as there is increasing evi-
dence that natural proteins evolve from fragments or motifs, such as short propellor-like motifs
that have evolved into modern β-barrels [71], an omnipresent Rossman ribose-binding motif
that hints at common ancestry among Rossman-fold enzymes that use ribose-based cofactors
[72], and a β-α-βfragment that appears to be the minimal functional motif leading to modern
P-loop NTPase and Rossman enzymes [73], among other examples [7476]. Therefore, modular
design mimics the repeating themes that lead to the natural evolution of functional protein
Signicant contributions to this area were made in early studies by Höcker, Sterner, and col-
leagues [7680], who helped to elucidate the modular nature of protein evolution and develop
the concept that even TIM-barrel folds could be amenable to structure-based recombination.
As a more recent example, in 2016 Kuhlman and coworkers developed a design strategy they
called SEWING [81], as this approach effectively sewstogether connected or disconnected
pieces of existing proteins to create novel scaffolds. This allowed them to design highly stable
α-helical proteins, showcasing the potential of the modular design approach. It was suggested
that this approach can be used as a means with which to rapidly generate a wide variety of
designable protein scaffolds. Related to this, Fleishman and coworkers have presented a novel
approach, AbDesign [82], which assembles new backbone architectures by combining naturally
occurring modular fragments using the combinatorial backbone-conformational space of many
Box 1. Computational tools to predict distal mutations
Enzyme activity can be engineered through substitutions made either to active-siteresidues or to distal residues. In the rst
case, such substitutions typically change the shape of the binding pocket and/or modulate key interactions with the sub-
strate, whereas in the latter case distal mutations affect allosteric regulation of the enzyme and/or stabilize alternative active
site congurations [16,56,93]. Although it is nontrivial, there is currently rapidly increasing interest in the prediction of distal
mutations that can modulate enzyme activity [17,40]. There are a number of computational approaches that have shown
promise as tools to reliably predict the allosteric effect of distal mutations. We focus here on three of them: (i) the Dynamic
Flexibility Index (DFI) [94]; (ii) the SPM approach [56]; and (iii) Ohm [95]. While all three approaches aim to predict long-range
allosteric interaction networks, they use different sources of information to achieve this.
The DFI approach uses elastic network models to roughly sample the different conformations of a protein [94]. Recently,
this approach has used structural information on not only one protein but also proteins in the same phylogenetic branch, to
compare their dynamical properties and assess which positions affect the function of the protein when substitutions occur
[18]. The SPM approach [56] uses long-timescale MD simulations to sample as much of the conformationalensemble as
possible and then uses that information to predict mutations that will have an allosteric effect on the proteins function. It
was developed to study the role of dynamics in the evolution of enzymes. Finally, the Ohm web tool is the most recent ap-
proach of the three, and requires less knowledge of the system and of computational tools for protein engineering [95].
This approach identies allosterically important residues that are distal to the active site based on the position of the active
site, the allosteric pathway connecting the position to the active site, and the critical residues between the active site and
the allosteric site. Taken together, these examples showcase rapid progress in user-friendly computational tools that can
be used to predict distal mutations that can alter the conformational landscape of an enzyme and thus regulate activity.
Trends in Biochemical Sciences
10 Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
protein families. This approach allows for the generation of a repertoire of scaffolds that exceeds
the diversity of the entire Protein Data Bank (PDB) [83] for even a modestly sized family of homo-
logs, giving the user signicant control over the design process.
These provide just a few of many examples of the successful application of modular assembly
to articial protein design (for a summary of selected methods, see Table 1). However, natural
evolution has also harnessed billions of years of evolution to generate a library of building
blocks that can be identied recurrently across the protein universe. These are most commonly
classied in terms of domains, typically comprising ~50250 amino acids that represent an
independent folding unit [84]. However, a number of studies have suggested that duplication
and recombination of segments shorter than domains could have been the origin of the do-
mains themselves and that these are therefore the ultimate building blocks [71,75,81,85,86].
As has been argued, for instance by Kolodny and cowokers [75], the proteins in the ancient
universe were presumably shorter, but over extended periods of evolutionary time their
sequences were afforded many copypasteopportunities, leading to the scaffolds we
observe today.
In parallel with these developments, there are now increasing efforts in the construction of a
catalog of locally similar segments across globally different domains [prior efforts, e.g., the
Structural Classication of Protein Domains (SCOP) [87], have focused on cataloguing full do-
mains]. As an example, Lupas and coworkers have constructed a vocabulary of ancient pep-
tides that have led to modern folded proteins, borrowing from strategies to how linguists
have compared modern languages to reconstruct ancient vocabularies [88]. Following from
this, Kolodny and coworkers have presented a global view of the protein universe based on
protein networks that connect domains that share fragments [89]. More recently, they have in-
troduced the concept of themesas recurrent fragments of short protein segments of at least
35 amino acids that are unexpectedly found in domains of independent evolutionary origin [75].
Along similar lines, Höcker and coworkers have developed the Fuzzledatabase [86], which
was obtained by clustering the SCOPe95 database by sequence identity and performing a
sequence- and structure-based comparison of all domains. This allowed the authors to identify
1337 fragments with lengths ranging from 11 to 229 amino acids that populate a wide diversity
of folds (519/1221 folds in the SCOP/Fuzzle database). A subsequent Python package,
ProtLego, uses the Fuzzle database to obtain evolutionarily conserved fragments for the auto-
mated and high-throughput in silico design of chimeric proteins [90]. Taken together, these
approaches, some of which are summarized in Figure 3,pavethewayfortheconstructionof
a catalog of fragments that can be characterized by specic biophysical features, or even func-
tional roles, such as metal binding, nucleotide binding, or nucleic acid binding, on the basis of
known structures [75,88]. This, in turn, provides an extremely valuable resource for the facile
design of custom-made proteins.
While modular design has frequently focused on the generation of novel stably folded proteins,
there is also increasing interest in using this approach to generate functional enzymes. For exam-
ple, in the case of AbDesign [82], Fleishman and coworkers were successfully able to use this
approach for the development of stable and highly active new enzyme scaffolds. Specically,
they applied this approach to two TIM-barrel enzyme families with very different sequences,
active-site structures, and catalytic activities (GH10 and PLLs) and were able to obtain
enzymes with catalytic and stability proles similar to those of the natural enzymes for the
GH10 designs, while for the PLLs some designs exhibited higher catalytic efciencies than
the natural enzymes, as well as broad substrate promiscuity. This was achieved by rst
segmenting all homologous structures within the protein family of interest (e.g., all GH10
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 11
xylanases) into modular parts, then performing an idealizationprocedure that forces ideal
bonds and relaxes the associated torsion angles, computing backbone conformational data-
bases based on this information as well as enforcing position-specic sequence constraints
based on the results of multiple sequence alignment, performing a precomputation step involv-
ing the design and ranking of thousands of unique backbones, assembling the resulting back-
bones, and, nally, performing stability optimization using the Protein Repair One Stop Shop
(PROSS) [91] to generate a seamless structure. It is perhaps unsurprising that such a strategy
wouldleadtohighlyefcient catalysts, as this appears to be the strategy also taken by nature
itself in designing new proteins [74,75,89,92]. However, it opens a new (and highly promising)
door for evolutionary-based computational protein design.
Figure 3. Schematic classication of current subdomain approaches. This gure shows the process of generation of
the different subdomain databases discussed in the main text. Fragments[88] used the SCOPe30 database as input and
clusters by sequence similarity, yielding 40 clusters of fragments of about 35 amino acids in length. Themes[75]used
Evolutionary Classication of Protein Domains (ECOD) [96] and the Protein Data Bank (PDB) [83] as input, clustered by
sequence similarity, and performed C
-RMSD calculations (3.5-Å threshold), obtaining 2195 curated hits of at least 35
amino acids. Finally, the Fuzzle[86] data set is based on clustering of the SCOPe95 database by sequence identity
combined with sequence and structure alignments to yield 1337 fragments with lengths ranging from 11 to 229 amino
acids. This gure was adapted from [92], which was published under a CC-BY license. Copyright 2021, Elsevier.
Trends in Biochemical Sciences
12 Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
Concluding remarks
The past decade has witnessed an explosion of interest in both computational enzyme de-
sign and, in parallel, understanding the molecular evolution of enzyme function. These two
developments are interlinked, as greater insight into how enzymes naturally emerge,
evolve, and gain new functions can be incorporated in turn to guide the design process:
to borrow from Tawk, protein engineers turned evolutionists[13]. While evolutionary
insight is being used extensively to guide experimental protein design studies, it has only
more recently also been incorporated into computational enzyme design, driven in part
by increases in computational power that allow us to now model enzyme evolutionary
trajectories in atomic detail and pinpoint the physicochemical parameters that shape the
emergence of new functions.
This review has focused on three key directions being taken in the eld: (i) the use of evolution-
ary information, gained through ancestral inference by bioinformatics analysis, to identify scaf-
folds that can be used as starting seeds for protein engineering; (ii) increased appreciation of
the role of conformational dynamics and its importance in enzyme design; and (iii) renewed ap-
preciation of and impetus towards modular design, following the modular nature of the natural
emergence of new protein scaffolds. These three areas, particularly the rst two, are indepen-
dently evolving subelds but also highly interlinked. In addition, as these are all fast-growing
areas, the list of studies presented here is by no means exhaustive, but rather is meant to
showcase examples of success in the eld. Finally, as a fast-growing eld, there remain a num-
ber of outstanding questions that require addressing, some of which have been highlighted in
the Outstanding questions.
The use of computational models to efciently design new proteins has long been a major goal of
computational biologists and chemists. Past decades have seen substantial progress in this area,
through both the development of new methodologies and their successful application in protein
design. We are at an exciting moment in the eld as these investments are starting to bear fruit,
and theory is becoming an indispensable component of the design process. As we demonstrate
here, turning evolutionistis also important from a computational perspective, as learning from
nature will allow us to circumvent signicant remaining challenges in the eld. Therefore, in
time, there is a likelihood that we will all become protein engineers turned evolutionists, exploiting
evolutionary insight to guide the design process.
This work was fundedby the Knut and Alice Wallenberg Foundation (Wallenberg Academy Fellowship and Wallenberg Scholar
Fellowshipto S.C.L.K., grants KAW2018.0140 and 2019.0431), the Human FrontierScience Program (grantRGP0041/2017),
and the Swedish Research Council (grant 2019-03499). This project has received funding from the European UnionsHorizon
2020 Research and Innovation Programme under the Marie Skłodowska-Curie grant agreement no. 890562 to M.C.
Declaration of interests
No interests are declared.
1. Hellinga, H.W. and Richards, F.M. (1991) Construction of new li-
gand binding sites in proteins of known structure: I. Computer-
aided modeling of sites with pre-dened geometry. J. Mol.
Biol. 222, 763785
2. Dahiyat, B.I. and Mayo, S.L. (1996) Protein design automation.
Protein Sci. 5, 895903
3. Voigt, C.A. et al. (2001) Computational method to reduce the
search space for directed protein evolution. Proc. Natl . Acad.
Sci. U. S. A. 98, 37783783
4. Looger, L.L. et al. (2003) Computational design of receptor and
sensor proteins with novel functions. Nature 423, 185190
5. Currin, A. et al. (2015) Synthetic biology for the directed evolu-
tion of protein biocatalysts: navigating sequence space intelli-
gently. Chem. Soc. Rev. 44, 11721239
6. Kries, H. et al. (2013) De novo enzymes by computational de-
sign. Curr. Opin. Chem. Biol. 17, 221228
7. Lutz, S. and Iamurri, S.M. (2018) Protein engineering: past,
present and future. Methods Mol. Biol. 1685, 112
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 13
Outstanding questions
Thermostability is a desirable property
when designing new enzymes, and
reconstructed proteins obtained from
ancestral sequence reconstruction
Is this real or artifactual from the
reconstruction? Resolving this debate
would be valuable in understanding
the biophysical properties of ancestral
Despite signicant improvements
in methodologies, predicting distal
mutations that are catalytically relevant
remains challenging as the impact of
these mutations on the catalytic rate is
often a sum of multiple small effects.
How can we improve our predictions?
There is now an increasing number of
user-friendly webservers for the predic-
tion of variants with improved catalytic
activities, as well as structural bioinfor-
matics tools that can characterize pro-
tein exibility without the need for
extensive and computationally intensive
enhanced sampling simulations. How
far can the characterization of confor-
mational properties and allosteric inter-
actions be streamlined and simplied
to, for instance, a webserver that can
predict hotspots based on conforma-
tional properties in a physically mean-
ingful way? To what extent can
machine learning contribute?
Despite ever-increasing knowledge
about the evolution of protein folds,
as well as the availability of huge cata-
logs of segments, the de nov o design
of highly efcient enzymes remains
challenging. Part of this is due to inac-
curacies in the predictions of the posi-
tions of the side chains of key
residues involved in substrate binding,
transition-state stabilization, or sub-
strate release, each of which can
have a critical impact on the resulting
activity. To what extent can the
cataloging of segments and domains
reduce current inaccuracies in scaffold
8. Drienovská, I. and Roelfes, G. (2020) Expanding the enzyme uni-
verse with genetically encoded unnatural amino acids. Na t.
Catal. 3, 193202
9. Marques, S.M. et al. (2021) Web-based tools for computational
enzyme design. Curr. Opin. Struct. Biol. 69, 1934
10. Xu, Y. et al. (2020) Deep dive into machine learning models for
protein engineering. J. Chem. Inf. Model. 60, 27732790
11. Chou, H.-H. et al. (2011) Diminishing returns epistasis among
benecial mutations decelerates adaptation. Science 332,
12. Tokuriki, N. et al. (2012) Diminishing returns and tradeoffs con-
strain the laboratory optimization of an enzyme. Nat. Commun.
3, 1257
13. Trudeau, D.L. and Tawk, D.S. (2019) Protein engineers turned
evolutionists the quest for the optimal starting point. Curr.
Opin. Biotechnol. 60, 4652
14. James, L.C. and Tawk, D.S. (2003) Conformational diversity
and protein evolution a 60-year-old hypothesis revisited.
Trends Biochem. Sci. 28, 361368
15. Maria-Solano, M.A. et al. (2018) Role of conformational dynam-
ics in the evolution of novel enzyme function. Chem. Commun.
54, 66226634
16. Campbell, E.C. et al. (2018) Laboratory evolution of protein con-
formational dynamics. Curr. Opin. Struct. Biol. 50, 4957
17. Crean, R.M. et al. (2020) Harnessing conformational plasticity to
generate designer enzymes. J. Am. Chem. Soc. 142, 1132411342
18. Campitelli, P. et al. (2020) The role of conformational dynamics
and allostery in modulating protein evolution. Annu. Rev.
Biophys. 49, 267288
19. Romero-Romero, M.L. et al. (2016) Engineering ancestral pro-
tein hyperstability. Biochem. J. 473, 36113620
20. Zou, T. et al. (2015) Evolution of conformational dynamics deter-
mines the conversion of a promiscuous generalist into a special-
ist enzyme. Mol. Biol. Evol. 32, 132143
21. Trudeau, D.L. et al . (2016) On the potential origins of the high
stability of reconstructed ancestral proteins. Mol. Biol. Evol. 33,
22. Spence, M.A. et al. (2021) Ancestral sequence reconstructionfor
protein engineers. Curr. Opin. Struct. Biol. 69, 131141
23. Selberg, A.G.A. et al. (2021) Ancestral sequence reconstruction:
from chemical paleogenetics to maximum likelihood algorithms
and beyond. J. Mol. Evol. 89, 157164
24. Randall, R.N. et al. (2016) An experimental phylogeny to bench-
mark ancestral sequencereconstruction. Nat. Commun.7, 12847
25. Bar-Rogovsky, H. et al. (2015) Assessing the prediction delity of
ancestral reconstruction by a library approach. Protein Eng. Des.
Sel. 28, 507518
26. Eick, G.N. et al. (2017) Robustness of reconstructed ancestral
protein functions to statistical uncertainty. Mol. Biol. Evol. 34,
27. Copley, S.D. (2021) Setting the stage for evolution of a new
enzyme. Curr. Opin. Struct. Biol. 69, 4149
28. Thomas, A. et al. (2019) Highly thermostable carboxylic acid re-
ductases generated by ancestral sequence reconstruction.
Commun. Biol. 2, 429
29. Wheeler, L.C. et al. (2016) The thermostability and specicity of
ancient proteins. Curr. Opin. Struct. Biol. 38, 3743
30. Hillis, D.M. et al. (1992) Experimental phylogenetics: generation
of a known phylogeny. Science 255, 589592
31. Gardner, J.M. et al. (2020) Manipulating conformational dynam-
ics to repurpose ancient proteins for modern catalytic functions.
ACS Catal. 10, 48634870
32. Schupfner, M. et al. (2020) Analysis of allosteric communication
in a multienzyme complex by ancestral sequence reconstruction.
Proc. Natl. Acad. Sci. U. S. A. 117, 346354
33. Maria-Solano, M.A. et al. (2021) Rational prediction of distal ac-
tivity enhancing mutations in tryptophan synthase. ChemRxiv
Published online March 4, 2021.
34. Gomez-Fernandez, B.J. et al. (2020) Consensus design of an
evolved high-redox potential laccase. Front. Bioeng. Biotechnol.
8, 354
35. Zamora, R.A. et al. (2020) Tuning of conformational dynamics
through evolution-based design modulates the catalytic adapt-
ability of an extremophile kinase. ACS Catal. 10, 1084710857
36. Gamiz-Arco, G. et al. (2021) Heme-binding enables allosteric
modulation in an ancient TIM-barrel glycosidase. Nat. Commun.
12, 380
37. Tokuriki, N. and Tawk, D.S. (2009) Protein dynamism and
evolvability. Science 324, 203207
38. Babtie, A.C. et al. (2009) Efcient catalytic promiscuity for chem-
ically distinct reactions. Angew. Chem. Int. Ed. Engl. 48,
39. Bigley, A.N. and Rashel, M. (1834) Catalytic mechanisms for
phosphotriesterases. Biochim. Biophys. Acta 1834, 443453
40. Osuna, S. (2020) The challenge of predicting distal active site
mutations in computational enzyme design. WIREs Comput.
Mol. Sci. 11, e1502
41. Worthington, R.J. and Melander, C. (2013) Overcoming resis-
tance to β-lactam antibiotics. J. Org. Chem. 78, 42074213
42. Hall, B.G. and Barlow, M. (2003) Structure-based phylogenies of
the serine beta-lactamases. J. Mol. Evol. 57, 255260
43. Risso, V.A. et al. (2013) Hyperstability and substrate promiscuity
in laboratory resurrections of Precambrian β-lactamases. J. Am.
Chem. Soc. 135, 28992902
44. Modi, T. and Banu Ozkan, S. (2018) Mutations utilize dynamic al-
lostery to confer resistance in TEM-1 β-lactamase. In t. J. Mol.
Sci. 19, 3808
45. Shah, A.A. et al. (2004) Characteristics, epidemiologyand clinical
importance of emerging strains of Gram-negative bacilli produc-
ing extended-spectrum beta-lactamases. Res. Microbiol. 155,
46. Risso, V.A. et al. (2017) De novo active sites for resurrected Pre-
cambrian enzymes. Nat. Commun. 8, 16113
47. Cortina, G.A. and Kasson, P.M. (2016) Excess positional mutual
information predicts both local and allosteric mutations affecting
beta lactamase drug resistance. Bioinformatics 32, 34203427
48. Hart, K.M. et al. (2016) Modelling proteinshidden conformations
to predict antibiotic resistance. Nat. Commun. 7, 12965
49. Risso, V.A. et al. (2020) Enhancing a de novo enzyme activity by
computationally-focused ultra-low-throughputscreening. Chem.
Sci. 11, 61346148
50. Khersonsky, O. et al. (2018) Automated design of efcient and
functionally diverse enzyme repertoires. Mol. Cell 72, 178186.
51. Bar-Even, A. et al. (2011) The moderately efcient enzyme: evo-
lutionary and physicochemical trends shaping enzyme parame-
ters. Biochemistry 50, 44024410
52. Korendovych, I.V. and DeGrado, W.F. (2014) Catalytic efciency
of designed catalytic proteins. Curr. Opin. Struct. Biol. 27,
53. Broom, A. et al. (2020) Ensemble-based enzyme design can re-
capitulate the effects of laboratory directed evolution in silico.
Nat. Commun. 11, 4808
54. Blomberg, R. et al. (2013) Precision is essential for efcient catal-
ysis in an evolved Kemp eliminase. Nature 503, 418421
55. Privett, H.K. et al. (2012) Iterative approach to computational en-
zyme design. Proc. Natl. Acad. Sci. U. S. A. 109, 37903795
56. Romero-Rivera, A. et al.(2017) Role of conformational dynamics in
the evolution of retro-aldolase activity. ACS Catal.7, 85248532
57. Maria-Solano, M.A. et al. (2019) Deciphering the allosterically
driven conformational ensemble in tryptophan synthase. J. Am.
Chem. Soc. 141, 140913056
58. Curado-Carballada, C. et al. (2019) Hidden conformations in As-
pergillus niger monoamine oxidase are key for catalytic ef-
ciency. Angew. Chem. Int. Ed. Engl. 58, 30973101
59. Acevedo-Rocha, C.G. et al. (2021) Pervasive cooperative muta-
tional effects on multiple catalytic enzyme traits emerge via long-
range conformational dynamics. Nat. Commun. 12, 1621
60. Nestl, B.M. and Hauer, B. (2014) Engineering of exible loops in
enzymes. ACS Catal. 4, 32013211
61. Kokkonen, P. et al. (2019) Engineering enzyme access tunnels.
Biotechnol. Adv. 37, 107386
62. Chaloupkova, R. et al. (2019) Light-emitting dehalogenases: re-
construction of multifunctional biocatalysts. ACS Catal. 9,
63. Schenkmayerova, A. et al. (2021) Engineering the protein dy-
namics of an ancestral luciferase. Nat. Commun. 12, 3616
64. Zanghellini, A. (2014) De novo computational enzyme design.
Curr. Opin. Biotechnol. 29, 132138
Trends in Biochemical Sciences
14 Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
65. Dawson, W.M. et al. (2019) Towards functional de novo de-
signed proteins. Curr. Opin. Chem. Biol. 52, 102111
66. Korendovych, I.V. and DeGrado, W.F. (2020) D e novo protein
design, a retrospective. Q. Rev. Biophys. 53, e3
67. Pan, X. and Kortemme, T. (2021) Recent advances in de novo
protein design: principles, methods, and applications. J. Biol.
Chem. 296, 100558
68. Jumper, J. et al. (2021) Highly accurate protein structure predic-
tion with AlphaFold. Nature 596, 583589
69. Baek, M. et al. (2021) Accurate prediction of protein structures
and interactions using a three-track neural network. Science
373, 871876
70. Lutz, S. and Benkovic, S.J. (2000) Homology-independent pro-
tein engineering. Curr. Opin. Biotechnol. 11, 319324
71. Smock, R.G. et al. (2016) De novo evolutionary emergence of a
symmetrical protein is shaped by folding constraints. Cell
72. Laurino, P. et al. (2016) An ancient ngerprint indicates the com-
mon ancestry of Rossman-fold enzymes utilizing different ribose-
based cofactors. PLoS Biol. 14, e1002396
73. Longo, L.M. et al. (2020) On the emergence of P-Loop NTPase
and Rossmann enzymes from a beta-alpha-beta ancestral frag-
ment. eLife 9, e64415
74. Kolodny, R. (2021) Searching protein space for ancient sub-
domain segments. Curr. Opin. Struct. Biol. 68, 105112
75. Kolodny, R. et al. (2021) Bridging themes: short protein seg-
ments found in different architectures. Mol. Biol. Evol. 38,
76. Höcker, B. et al. (2002) A common evolutionary origin of two el-
ementary enzyme folds. FEBS Lett. 510, 133135
77. Höcker, B. et al. (2001) Dissection of a (βα)
-barrel enzyme into
two folded halves. Nat. Struct. Biol. 8, 3236
78. Höcker, B. et al. (2004) Mimicking enzyme evolution by generat-
ing new (betaalpha)
-barrels from (betaalpha)
Proc. Natl. Acad. Sci. U. S. A. 101, 1644816453
79. Bharat, T.A.M. et al. (2008) A beta alpha-barrel built by the com-
bination of fragments from different folds. Proc. Natl. Acad. Sci.
U. S. A. 105, 99429947
80. Claren, J. et al. (2009) Establishing wild-type levels of catalytic
activity on natural and articial (βα)
-barrel protein scaffolds.
Proc. Natl. Acad. Sci. U. S. A. 106, 37043709
81. Jacobs, T.M. et al. (2016) Design of structurally distinct proteins
using strategies inspired by evolution. Science 352, 687690
82. Lipsh-Sokolik, R. et al. (2020) The AbDesign computational
pipeline for modular backbone assembly and design of binders
and enzymes. Protein Sci. 30, 151159
83. Berman, H.M. et al. (2000) The Protein Data Bank. Nucleic Acids
Res. 28, 235242
84. Wetlaufer, D.B. (1973) Nucleation, rapid folding, and globular
intrachain regions in proteins. Proc. Natl. Acad. Sci. U. S. A.
70, 697701
85. Lupas, A.N. et al. (2001) On the evolution of protein folds: are similar
motifs in different protein folds the result of convergence, insertion, or
relics of an ancient peptide world. J. Struct. Biol. 134, 191203
86. Ferruz, N. et al. (2020) Identication and analysis of natural build-
ing blocks for evolution-guided fragment-based protein design.
J. Mol. Biol. 432, 38983914
87. Murzin, A.G. et al. (1995) SCOP: a structural classication of pro-
teins database for the investigation of sequences and structures.
J. Mol. Biol. 247, 536540
88. Alva, V. et al. (2015) A vocabulary of ancient peptides at the or-
igin of folded proteins. eLife 4, e09410
89. Nepomnyachiy, S. et al . (2014) Global view of the protein uni-
verse. Proc. Natl. Acad. Sci. U. S. A. 111, 1169111696
90. Ferruz, N. et al. (2021)ProtLego: a Pythonpackage forthe analysis
and design of chimeric proteins. Bioinformatics Published online
April 26, 2021.
91. Goldenzwig, A. et al. (2016) Automated structure- and
sequence-based design of proteins for high bacterial expression
and stability. Mol. Cell 63, 337346
92. Romero-Romero, S. et al. (2021) Evolution, folding, and design of
TIM barrels and related proteins. Curr. Opin. Struct. Biol. 68, 94104
93. Hong, N.-S. et al. (2018) The evolution of multiple active site con-
gurations in a designed enzyme. Nat. Commun. 9, 3900
94. Nevin Gerek, Z. et al. (2013) Structural dynamics exibility in-
forms function and evolution at a proteome scale. Evol. Appl.
6, 423433
95. Wang, J. et al. (2020) Mapping allosteric communications within
individual proteins. Nat. Commun. 11, 3862
96. Schaeffer, R.D. et al. (2016) ECOD: new developments in the
evolutionary classication of domains. Nucleic Acids Res. 45,
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 15
... ; 1101/2021 One challenge in answering these questions stems from the lack of a resource that stores easy-to-use information about the optimal growth conditions of living organisms, together with their genomic data. Currently, there are more than 14,400 genome sequences from representative bacterial species publicly available. ...
Full-text available
Despite the rapidly increasing number of organisms with sequenced genomes, there is no existing resource that simultaneously contains information about genome sequences and the optimal growth conditions for a given species. In the absence of such a resource, we cannot immediately sort genomic sequences by growth conditions, making it difficult to study how organisms and biological molecules adapt to distinct environments. To address this problem, we have created a database called GSHC (Genome Sequences: Hot, Cold, and everything in between). This database, available at, brings together information about the genomic sequences and optimal growth temperatures for 25,324 species, including ~89% of the bacterial species with known genome sequences. Using this database, it is now possible to readily compare genomic sequences from thousands of species and correlate variations in genes and genomes with optimal growth temperatures, at the scale of the entire tree of life. The database interface allows users to retrieve protein sequences sorted by optimal growth temperature for their corresponding species, providing a tool to explore how organisms, genomes, and individual proteins and nucleic acids adapt to certain temperatures. We hope that this database will contribute to medicine and biotechnology by helping to create a better understanding of molecular adaptations to heat and cold, leading to new ways to preserve biological samples, engineer useful enzymes, and develop new biological materials and organisms with the desired tolerance to heat and cold.
Enzyme promiscuity is the ability of (some) enzymes to perform alternate reactions or catalyze non-cognate substrate(s). The latter is referred to as substrate promiscuity, widely studied for its biotechnological applications and understanding enzyme evolution. Insights into the structural basis of substrate promiscuity would greatly benefit the design and engineering of enzymes. Previous studies on some enzymes have suggested that flexibility, hydrophobicity, and active site protonation state could play an important role in enzyme promiscuity. However, it is not known yet whether substrate promiscuous enzymes have distinctive structural characteristics compared to specialist enzymes, which are specific for a substrate. In pursuit to address this, we have systematically compared substrate/catalytic binding site structural features of substrate promiscuous with those of specialist enzymes. For this, we have carefully constructed dataset of substrate promiscuous and specialist enzymes. On careful analysis, surprisingly, we found that substrate promiscuous and specialist enzymes are similar in various binding/catalytic site structural features such as flexibility, surface area, hydrophobicity, depth, and secondary structures. Recent studies have also alluded that promiscuity is widespread among enzymes. Based on these observations, we propose that substrate promiscuity could be defined as a continuum feature that varies from narrow (specialist) to broad range of substrate preferences. Moreover, diversity of conformational states of an enzyme accessible for ligand binding may possibly regulate its substrate preferences.
Cascade reactions have been widely recognized to cut costs, decrease solvent usage, and reduce cycle times in chemical processes. Recently, biocatalytic cascades have altered how we design synthetic routes to complex molecules to achieve sustainable commercial processes for pharmaceutical, agricultural, and fine chemical industries. With advancements in protein engineering and an increase in the number of enzyme classes available to chemists, industrial and academic groups alike have endeavored to expand the scope of biocatalysis from single reactions to multi-enzyme cascades to rapidly build complex molecular structures. Recent reports have drawn inspiration from biosynthetic pathways and have applied engineered enzymes to in vitro enzymatic cascades. Furthermore, combining transition-metal catalysis and enzymes in one-pot chemoenzymatic cascades likewise serves to broaden the scope of biocatalysis, enabling traditional chemical reactions to be performed under mild aqueous conditions. In this article, we review recent biocatalytic and chemoenzymatic cascades from 2019 to 2021.
Full-text available
Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1–4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the 3-D structure that a protein will adopt based solely on its amino acid sequence, the structure prediction component of the ‘protein folding problem’8, has been an important open research problem for more than 50 years9. Despite recent progress10–14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even where no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experiment in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.
Full-text available
Protein dynamics are often invoked in explanations of enzyme catalysis, but their design has proven elusive. Here we track the role of dynamics in evolution, starting from the evolvable and thermostable ancestral protein Anc HLD-RLuc which catalyses both dehalogenase and luciferase reactions. Insertion-deletion (InDel) backbone mutagenesis of Anc HLD-RLuc challenged the scaffold dynamics. Screening for both activities reveals InDel mutations localized in three distinct regions that lead to altered protein dynamics (based on crystallographic B-factors, hydrogen exchange, and molecular dynamics simulations). An anisotropic network model highlights the importance of the conformational flexibility of a loop-helix fragment of Renilla luciferases for ligand binding. Transplantation of this dynamic fragment leads to lower product inhibition and highly stable glow-type bioluminescence. The success of our approach suggests that a strategy comprising (i) constructing a stable and evolvable template, (ii) mapping functional regions by backbone mutagenesis, and (iii) transplantation of dynamic features, can lead to functionally innovative proteins.
Full-text available
Motivation: Duplication and recombination of protein fragments have led to the highly diverse protein space that we observe today. By mimicking this natural process, the design of protein chimeras via fragment recombination has proven experimentally successful and has opened a new era for the design of customizable proteins. The in-silico building of structural models for these chimeric proteins, however, remains a manual task that requires a considerable degree of expertise and is not amenable for high-throughput studies. Energetic and structural analysis of the designed proteins often require the use of several tools, each with their unique technical difficulties and available in different programming languages or web servers. Results: We implemented a Python package that enables automated, high-throughput design of chimeras and their structural analysis. First, it fetches evolutionarily conserved fragments from a built-in database (also available at These relationships can then be represented via networks or further selected for chimera construction via recombination. Designed chimeras or natural proteins are then scored and minimised with the Charmm and Amber forcefields and their diverse structural features can be analysed at ease. Here, we showcase Protlego's pipeline by exploring the relationships between the P-loop and Rossmann superfolds, building and characterising their offspring chimeras. We believe that Protlego provides a powerful new tool for the protein design community. Availability and implementation: Protlego runs on the Linux platform and is freely available at ( with tutorials and documentation and runs on Linux OS. Supplementary information: Supplementary data are available at Bioinformatics online.
Full-text available
Computational de novo protein design is increasingly applied to address a number of key challenges in biomedicine and biological engineering. Successes in expanding applications are driven by advances in design principles and methods over several decades. Here, we review recent innovations in major aspects of de novo protein design, and include how these advances were informed by principles of protein architecture and interactions derived from the wealth of structures in the PDB. We describe developments in de novo generation of designable backbone structures, in optimization of sequences, in design scoring functions, and in design of function. The advances not only highlight design goals reachable now but also point to the challenges and opportunities for the future of the field.
Full-text available
Multidimensional fitness landscapes provide insights into the molecular basis of laboratory and natural evolution. To date, such efforts usually focus on limited protein families and a single enzyme trait, with little concern about the relationship between protein epistasis and conformational dynamics. Here, we report a multiparametric fitness landscape for a cytochrome P450 monooxygenase that was engineered for the regio- and stereoselective hydroxylation of a steroid. We develop a computational program to automatically quantify non-additive effects among all possible mutational pathways, finding pervasive cooperative signs and magnitude epistasis on multiple catalytic traits. By using quantum mechanics and molecular dynamics simulations, we show that these effects are modulated by long-range interactions in loops, helices and β-strands that gate the substrate access channel allowing for optimal catalysis. Our work highlights the importance of conformational dynamics on epistasis in an enzyme involved in secondary metabolism and offers insights for engineering P450s. Connecting conformational dynamics and epistasis has so far been limited to a few proteins and a single fitness trait. Here, the authors provide evidence of positive epistasis on multiple catalytic traits in the evolution and dynamics of engineered cytochrome P450 monooxygenase, offering insights for in silico protein design.
Full-text available
The vast majority of theoretically possible polypeptide chains do not fold, let alone confer function. Hence, protein evolution from preexisting building blocks has clear potential advantages over ab initio emergence from random sequences. In support of this view, sequence similarities between different proteins is generally indicative of common ancestry, and we collectively refer to such homologous sequences as ‘themes’. At the domain level, sequence homology is routinely detected. However, short themes which are segments, or fragments of intact domains, are particularly interesting because they may provide hints about the emergence of domains, as opposed to divergence of preexisting domains, or their mixing-and-matching to form multi-domain proteins. Here we identified 525 representative short themes, comprising 20-to-80 residues, that are unexpectedly shared between domains considered to have emerged independently. Among these ‘bridging themes’ are ones shared between the most ancient domains, e.g., Rossmann, P-loop NTPase, TIM-barrel, Flavodoxin, and Ferredoxin-like. We elaborate on several particularly interesting cases, where the bridging themes mediate ligand binding. Ligand binding may have contributed to the stability and the plasticity of these building blocks, and to their ability to invade preexisting domains or serve as starting points for completely new domains.
DeepMind presented remarkably accurate predictions at the recent CASP14 protein structure prediction assessment conference. We explored network architectures incorporating related ideas and obtained the best performance with a three-track network in which information at the 1D sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging X-ray crystallography and cryo-EM structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short circuiting traditional approaches which require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.
In addition to its value in the study of molecular evolution, ancestral sequence reconstruction (ASR) has emerged as a useful methodology for engineering proteins with enhanced properties. Proteins generated by ASR often exhibit unique or improved activity, stability, and/or promiscuity, all of which are properties that are valued by protein engineers. Comparison between extant proteins and evolutionary intermediates generated by ASR also allows protein engineers to identify substitutions that have contributed to functional innovation or diversification within protein families. As ASR becomes more widely adopted as a protein engineering approach, it is important to understand the applications, limitations, and recent developments of this technique. This review highlights recent exemplifications of ASR, as well as technical aspects of the reconstruction process that are relevant to protein engineering.
The evolution of novel enzymes has fueled the diversification of life on earth for billions of years. Insights into events that set the stage for the evolution of a new enzyme can be obtained from ancestral reconstruction and laboratory evolution. Ancestral reconstruction can reveal the emergence of a promiscuous activity in a pre-existing protein and the impact of subsequent mutations that enhance a new activity. Laboratory evolution provides a more holistic view by revealing mutations elsewhere in the genome that indirectly enhance the level of a newly important enzymatic activity. This review will highlight recent studies that probe the early stages of the evolution of a new enzyme from these complementary points of view.
Enzymes are in high demand for very diverse biotechnological applications. However, natural biocatalysts often need to be engineered for fine-tuning their properties towards the end applications, such as the activity, selectivity, stability to temperature or co-solvents, and solubility. Computational methods are increasingly used in this task, providing predictions that narrow down the space of possible mutations significantly and can enormously reduce the experimental burden. Many computational tools are available as web-based platforms, making them accessible to non-expert users. These platforms are typically user-friendly, contain walk-throughs, and do not require deep expertise and installations. Here we describe some of the most recent outstanding web-tools for enzyme engineering and formulate future perspectives in this field.