Available via license: CC BY 4.0
Content may be subject to copyright.
Special issue: Protein engineering and chemoenzymatic synthesis
Review
Exploiting enzyme evolution for computational
protein design
Gaspar P. Pinto ,
1,2,@
Marina Corbella ,
1,2,@
Andrey O. Demkiv ,
1
and
Shina Caroline Lynn Kamerlin
1,
*
,@
Recent years have seen an explosion of interest in understanding the physico-
chemical parameters that shape enzyme evolution, as well as substantial
advances in computational enzyme design. This review discusses three areas
where evolutionary information can be used as part of the design process:
(i) using ancestral sequence reconstruction (ASR) to generate new starting points
for enzyme design efforts; (ii) learning from how nature uses conformational
dynamics in enzyme evolution to mimic this process in silico; and (iii) modular
design of enzymes from smaller fragments, again mimicking the process by
which nature appears to create new protein folds. Using showcase examples,
we highlight the importance of incorporating evolutionary information to con-
tinue to push forward the boundaries of enzyme design studies.
Computational enzyme design based on protein evolution: an overview
Roughly three decades have passed since the first attempts to design new enzymes using com-
putational approaches [1,2], and the field has matured considerably since then. While the earliest
attempts at computational enzyme design focused primarily on side-chain positioning [1–4]oron
focusing the search space for in vitro directed evolution (see Glossary) studies [5], subsequent
work broadly expanded the scope of the field, including the fully de novo design of new enzymes
[6] (typically followed by optimization using directed evolution) and the repurposing of existing en-
zymes to catalyze ever more complex chemical reactions [7,8]. In addition, computational design
approaches are becoming ever-more streamlined, such that there now exists a range of powerful
web servers that can assist in the design process [9].
In principle, computational design approaches can take two very loosely defined directions:
structure-based design approaches that require some level of knowledge of the system of in-
terest, including information about the chemical mechanisms, transition states, and key catalytic
residues involved; and sequence-based design approaches that can, for example, draw on
evolutionary information to predict potential hotspots for protein engineering as well as new
variants with desired physicochemical properties, something that is in particular increasingly
being achieved using machine-learning approaches [10].
Computational approaches that require minimal knowledge of the molecular details of the chem-
ical processes involved are attractive for their speed and efficiency, as exploring the underlying
mechanisms and transition states typically requires significant experimental and/or computational
effort. However, much like their experimental counterparts, such approaches are likely to hit
optimization plateaus [11,12] where further improvement in activity becomes extremely chal-
lenging, and without knowledge of the underlying chemistry it can be difficult-to-impossible to
overcome such plateaus. Therefore, rather than competing with each other, sequence- and
structure-based approaches are highly complementary as each provides different types of
Highlights
We can learn from nature’s tricks by
reconstructing evolutionary trajectories
to design improved enzymes.
Ancestral sequence reconstruction
(ASR) provides a compelling tool
to obtain enzymes with customized
catalytic properties.
Conformational dynamics in enzyme de-
sign is crucial in increasing the sampling
of states with new catalytic functions as
well as reducing the sampling of non-
productive conformations.
A catalog of fragments characterized by
specific biophysical features may provide
an invaluable resource for the design of
custom-made enzymes.
1
Department of Chemistry –BMC,
Uppsala University, BMC Box 576,
S-751 23 Uppsala, Sweden
2
These authors contributed equally
*Correspondence:
lynn.kamerlin@kemi.uu.se
(S.C.L. Kamerlin).
@
Twitter: @GasparRPPP (G.P. Pinto),
@CorbellaMorato (M. Corbella), and
@kamerlinlab (S.C.L. Kamerlin).
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx https://doi.org/10.1016/j.tibs.2021.08.008 1
© 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Trends in
Biochemical Sciences OPEN ACCESS
TIBS 1830 No. of Pages 15
insights into how to improve a given system. In addition, nature has already provided a blueprint
for how new enzymes evolve, and by reconstructing evolutionary trajectories and probing the
natural evolution of enzymes, we can learn from nature’s tricks to improve enzymes both in vitro
and in silico.
Experimental ‘protein engineers turned evolutionists’[13] have had substantial progress in en-
zyme design using insights from natural evolution. However, there has also been significant prog-
ress in computational design studies based on evolutionary information, in particular due to
increasing awareness of the role of conformational dynamics in the natural evolution of enzymes
[12,14–18], which is now being increasingly incorporated into the computational design process
[15,17]. In this review, we discuss three directions where evolutionary information is being used to
guide the computational design process, specifically: (i) repurposing of protein scaffolds identified
through ASR as potential starting points for the generation of new enzyme activities;
(ii) harnessing conformational dynamics, a key driver of natural enzyme evolution, in computa-
tional enzyme design; and (iii) modular design processes based on identifying evolutionarily
important subdomain segments that can be rearranged to create new enzymes. These are just
some of the current directions where evolutionary information can be used to drive the design
process, but they showcase the potential of this field.
Repurposing ancient enzymes for new catalytic functions
While there are various ways evolutionary information can be used in enzyme design, one of
the most obvious is the use of ASR to identify potential starting points for subsequent exper-
imental or computational design effort. Enzymes obtained from ASR make attractive starting
points for enzyme design as they tend to be highly thermostable, conformationally flexible,
and evolvable [19–21]. While many different computational algorithms exist with which to
perform ASR (Table 1),thebasicprincipleofalloftheseistousethesequencesof
known, extant proteins to reconstruct phylogenetic trees containing sequences of putative
ancestors, based on the probability of finding a given amino acid substitution at the given
point in amino acid sequence [22,23]. Such ancestral inference yields a ‘cloud’of sequences
that relate to putative historical ancestors [24–26], although typically only the most probabi-
listic of these sequences is subject to further experimental or computational characterization
of the evolutionary trajectory.
There is, of course, doubt about how realistic the sequence predictions from ASR are: as pointed
out by Copley [27], due to ambiguities in the reconstruction, the probability of even the most
probabilistic sequence being the real sequence is very low, as, in practice, many positions are
predicted with a probability of <50%, particularly if the data set of extant proteins usedfor the re-
construction is highly diverged. In addition, particular challenges are posed by gaps in alignments
(i.e., insertion/deletion evolutionary events), since different ways of dealing with these can lead to
different sequences and phenotypes [28]. Furthermore, while many biochemical studies of recon-
structed ancestral proteins suggest enhanced thermostability of the putative ancestors, it has
been argued that there is a risk that this enhanced thermostability is a result of reconstruction
bias rather than a true property of the putative ancestors (e.g., see the detailed discussion
in [29]). However, a recent study has used experimental phylogenetics [30] to reproduce in
the laboratory an evolutionary trajectory of an RFP [24]. The advantage of this approach is that
the sequences of the true ancestral nodes are actually known and can be phenotypically charac-
terized and compared with the sequences predicted by ASR. This study demonstrated
that although there exists some level of mistakes in the reconstructed sequences (which are
exacerbated the more ancient the node), the phenotypes of the actual and reconstructed ances-
tors are similar. In addition, one can reconstruct multiple putative ancestors and compare their
Trends in Biochemical Sciences
OPEN ACCESS
2Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
Glossary
Ancestral sequence reconstruction
(ASR): statistical approach used in the
study of protein evolution to predict the
sequences of putative ancestors using
the sequences of extant proteins.
Catalytic promiscuity: ability of an
enzyme to catalyze more than one,
chemically distinct, reaction.
Conformational ensemble: collection
of structures that describes the
conformational space a protein can
sample.
Conformational selection:
functionally desirable shift in the
conformational ensemble of a protein,
triggered by ligand binding or aminoacid
substitutions/structural modifications to
aprotein.
De novo design: in the context of
enzyme design, designing a new
scaffold with catalytic functionality, or a
new active site in an existing scaffold,
from first principles.
Directedevolution: protein engineering
technique that iteratively introduces
mutations into a protein, refining for a
property of interest, thus mimicking the
process of Darwinian evolution either
in vivo or in vitro.
Distal mutations: outer-sphere
mutations of functional or catalytic
importance.
Dynamic alloste ry: regulation of the
conformational properties of a protein.
Enzyme evolvability: ability of an
enzyme to adapt and acquire new
functions on mutation.
Evolutionary trajectory: in the context
of enzymes, the sequence of amino acid
substitutions that take an enzyme from
point A to point B in sequence/function
space.
Excess positional mutual
information analysis: approach
to obtain information about the
interdependence of the positions of two
variables in a system.
Machine learning: branch of artificial
intelligence in which an analytical model
is designed to automatically self-
improve on the acquisition of new data.
Markov statemodel: stochastic mode l
used to describe the long-timescale
dynamics of a molecular system.
Modular design : in the context of
protein design, this refers to design
using protein fragments that are pieced
together like molecular LEGO® to
(re)create a larger scaffold.
Optimization plateau: in the contextof
directed evolution, a flat region of the
properties to increase the likelihood that one is observing the phenotypic properties of the ‘true’
ancestor [27].
Clearly, however, while there are challenges in inferring actual evolutionary information through
the use of ASR, what is obvious is that protein scaffolds obtained through ASR are excellent
starting points for subsequent protein design effort due to their high thermostability and
evolvability [31]. There have been a great number of experimental studies using ancestrally recon-
structed proteins for enzyme design due to their greater thermostability; more recently, interest
has also shifted to using ASR to obtain scaffolds that can be used as starting points for the engi-
neering of catalytic properties [22,23,27,31]. Several such studies have focused specifically on
using ASR to obtain flexible scaffolds that can be manipulated for engineering purposes in
terms of their conformational properties. These are discussed in more detail in the subsequent
section. However, as some (of many) other examples of recent success stories, ASR has been
used to understand allosteric communication in a multienzyme complex of a key metabolic en-
zyme, tryptophan synthase [32,33], to obtain a high-redox-potential laccase [34], to modulate
the catalytic adaptability of an extremophile kinase [35], and to identify novel heme binding that
modulates the allosteric regulation of an ancestral glycosidase (Figure 1)[36]. This latter study
is notable as heme binding was not observed in any of ~5500 crystal structures of ~1400 modern
glycosidases. Experimental characterization of a number of modern glycosidases showed appre-
ciable levels of heme binding but significantly lower than that of the ancestral glycosidase, indicat-
ing that the ability to efficiently bind heme to allosterically regulate catalysis is a specific feature of
the ancestral enzyme [36].
Overall, the use of evolutionary information obtained from ASR is a powerful tool for enzyme en-
gineering, and with growing interest in the use of ASR to obtain enzymes with tailored catalytic
properties [22,31], it is likely that we will observe much greater usage of this technique in protein
design in the coming years.
Harnessing conformational dynamics for enzyme engineering
Recent years have seen increasing awareness of the importance of conformational dynamics
to catalytic promiscuity and enzyme evolvability [14,15,17,18,37]. This dates back to
early work by James and Tawfik[14], who presented a model for the role of conformational
selection in enzyme evolution in which conformational plasticity allows enzymes to sample con-
formations that can bind new ligands and facilitate new chemistry. While such conformations may
be only rarely sampled in the wild-type enzyme, evolutionary pressure can alter the ensemble
such that a previously rare conformation becomes dominant and a new activity becomes pre-
ferred. One would expect this phenomenon to be observed more frequently in laboratory than
in natural evolution, as in the laboratory even low-level promiscuous activities can be detected
and amplified, whereas natural evolution requires selective pressure to amplify a promiscuous
activity. There are exceptions to this, however [38,39], and the importance of conformational
dynamics in shaping enzyme evolutionary trajectories has been observed in both natural and
artificial (directed) evolutionary trajectories [15–17]. Here, the fine-tuning of conformational
dynamics to enhance activity has been proposed to occur in two ways: (i) increased sampling
of states capable of conferring new catalytic activities; and (ii) reduced sampling of non-
productive conformations [16]. Here, and in line with the rest of this review, our focus is
on computational studies, as learning from evolution and harnessing conformational dynamics
in computational protein engineering is an actively growing area. We note that related
methodological aspects have been recently reviewed in, for example, [40], and a summary of
selectedmethodsispresentedinTable 1.Inaddition,thefield is growing continuously, and
the number of user-friendly tools is growing accordingly [9].
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 3
search space where further
improvements are challenging.
Population shif t: change in the
conformational ensemble of a protein,
usually triggered by either ligand binding
or amino acid substitutions/structural
modifications to a protein.
Productive and non-productive
conformations: conformations of a
protein that are either beneficial or
detrimental for a desired function (for
the purposes of this review, catalytic
function).
Structure- versus sequence-based
design approaches: for simplicity, we
distinguish between approaches that
primarily use structural information and
those that primarily use sequence
information in the design process. Such
a clean distinction is, however, not
usually possible as there tends to be
overlap between the two.
Table 1. Overview of computational resources of relevance to enzyme design
a
Category Name Description URL Features and limitations
ASR Bali-phy A standalone tool for the estimation of
multiple sequence alignments and
evolutionary trees
http://www.bali-phy.org Features: Excellent performance on tested
protein data sets
Limitations: Tends to systematically
under-align on the biological sequence data
IQ-TREE Standalone software and web tool for the
inference of phylogenetic trees using the
maximum-likelihood approach
http://www.iqtree.org Features: It is an integral component of
many biomedical research open-source
applications such as Galaxy, Nextstrain,
and QIIME 2
Limitations: Both IQ-Tree- and
RaxML-NG-inferred maximum-likelihood
gene trees have been suggested to have
reproducibility issues (https://doi.org/
10.1038/s41467-020-20005-6)
Molecular
Evolutionary
Genetics Analysis
(MEGA X)
User-friendly software for molecular
evolution analysis and construction of
phylogenetic trees
https://www.megasoftware.net Features: Easy-to-use graphical user
interface (GUI) with good documentation
available; works across all platforms
Limitations: Works like a black box for some
parameters, with no user control
MrBayes A program for Bayesian inference and
model choice across a wide range of
phylogenetic and evolutionary models
http://nbisweden.github.io/
MrBayes
Features: GPU acceleration available
Limitations: Requires quite complex input
data, which can hinder non-experts from
performing ASR successfully
Phylogenetic
Analysis by
Maximum
Likelihood (PAML)
Standalone software and web tool to
perform phylogenetic analysis using the
maximum likelihood approach
http://abacus.gene.ucl.ac.uk/
software/paml.html
Features: Has a GUI version; very
customizable
Limitations: Steep learning curve for
novice users
Randomized
Axelerated
Maximum
Likelihood
(RAxML)
A standalone tool for ASR using the
maximum-likelihood approach; a GUI is
being developed
https://cme.h-its.org/exelixis/
web/software/raxml
Features: Has a GUI (under development
but already available for use)
Limitations: Both IQ-Tree- and
RaxML-NG-inferred maximum-likelihood
gene trees have been suggested to have
reproducibility issues (https://doi.org/
10.1038/s41467-020-20005-6)
The FastML
Server (FastML)
A web tool for ASR using the
maximum-likelihood approach
http://fastml.tau.ac.il Features: Easy-to-use web tool; can take
unaligned sequences as input
Limitations: Accepts only the FASTA file
format as input
Allostery DFI A protocol developed to identify
per-residue contributions to a protein’s
overall dynamical profile
https://github.com/avishekrk/DFI Features: Code easily available and ready
to use
Limitations: Requires atomic coordinates
Ohm Web tool that predicts allosteric sites and
inter-residue correlation and identifies the
allosteric pathways between them
https://dokhlab.med.psu.edu/
ohm/#
Features: Easy-to-use web tool requiring
only the insertion of a 3D structure
Limitations: Structure-based predictions do
not take dynamics into account
SPM Tool developed for the identification of
distal mutations affecting function, based
on the shortest-path-map algorithm
https://silviaosuna.wordpress.com/
tools
Features: Uses long dynamic simulations
to predict allosteric sites
Limitations: Poor availability of the code
Databases
(assorted)
BRENDA Electronic repository containing molecular
and biochemical information on enzymes
https://www.brenda-enzymes.org Features: Several different queries are
available as examples; integrates CATH
and SCOPe
Limitations: Requires login and there is a
‘professional’version
PDB A protein databank containing more than
179 000 macromolecular structures
https://www.rcsb.org Features: General database for protein
structures obtained through NMR, X-ray
crystallography, and cryoelectron
microscopy
Limitations: Contains redundant data
and the search function could be better
Trends in Biochemical Sciences
OPEN ACCESS
4Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
Table 1. (continued)
Category Name Description URL Features and limitations
UniProt A central repository of protein data
created by combining the Swiss-Prot,
TrEMBL, and PIR-PSD databases
https://www.uniprot.org Features: De facto go-to database for
general protein information
Limitations: Overwhelming amount of data
for newcomers
Databases
(family
information)
CATH Protein
Structure
Classification
database
A database containing information about
the evolutionary relationships of protein
domains
http://cathdb.info Features: Huge open-source database
helps with automated implementation in
one’s own workflows
Limitations: Drug compound information
is still being developed
Fuzzle A database containing evolutionary
information about protein fragments
https://fuzzle.uni-bayreuth.de Features: Evolutionary information on
fragmentsasopposedtofullproteins
only
Limitations: Hierarchical databases
can lead to misclassification when
slight differences are present in the
sequences
Pfam A database with information about protein
families, represented by multiple
sequence alignments generated using
Hidden Markov models
http://pfam.xfam.org Features: Constant development and
integration with other European
Bioinformatics Institute (EBI) tools
Limitations: There is a high-quality
database and a low-quality database
that can lead to errors
SCOP A database of proteins classified based
on structural relatedness, such as
superfamilies, families, and folds
https://scop.mrc-lmb.cam.ac.uk Features: Uses data deposited in the PDB
to group proteins; curated to be
non-redundant
Limitations: New version is not totally
backwards compatible
Modeling AlphaFold Protein prediction tool using neural
networks; achieved a score twice as good
as the second-best protein predictor in
CASP14
https://github.com/deepmind/
alphafold
Features: New gold standard for
protein structure prediction; makes
use of a newly developed neural
network
Limitations: It is better where others
were good, but still lacks good loop
region predictions.
iTasser Both a web tool and a standalone
tool, it predicts protein structure using
a hierarchical approach; runs
iteratively until the lowest-energy
structures are achieved and then uses
publicly available function information
to identify closely related templates
with the same function
https://zhanglab.dcmb.med.
umich.edu/I-TASSER
Features: Good and easy-to-use web tool
and a powerful standalone tool
Limitations: Lacks accuracy when few
templates are available and is slower than
similar options available
Modeller Protein structure modeling tool that
predicts structures by the satisfaction
of spatial restraints obtained from a
sequence alignment and shown as a
probability-density function
https://salilab.org/modeller Features: Can be installed on any platform
and is very fast in creating a new model
Limitations: Speed comes at the cost of
accuracy for models where slower tools
yield better results
Multiple
sequence
alignment
(MSA)
Clustal Omega Multiple sequence alignment web tool https://www.ebi.ac.uk/Tools/
msa/clustalo
Features: Has been constantly
developed since the 1980s for MSA
Limitations: Does not yield good results
when a large amount of sequences is
provided as input
Multiple
Alignment Using
Fast Fourier
Transform
(MAFFT)
Multiple sequence alignment web tool;
can also be used locally as a standalone
tool
https://mafft.cbrc.jp/alignment/
software
Features: Users can choose between
various multiple alignment methods
Limitations: It requires more memory to run
(continued on next page)
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 5
Table 1. (continued)
Category Name Description URL Features and limitations
Multiple
Sequence
Comparison by
Log-Expectation
(MUSCLE)
Multiple sequence alignment web tool; as
with Clustal Omega, it is integrated in the
EBI ecosystem
https://www.ebi.ac.uk/Tools/
msa/muscle
Features: Can achieve better average
accuracy and speed than ClustalW2 or
T-Coffee
Limitations: The Kimura distance used
at the second stage, although fast,
does not consider which changes of
amino acids occur between sequences
Protein
design
AbDesign An algorithm for backbone design using
structure- and sequence-based
information
https://github.com/Fleishman-Lab/
AbDesign_for_enzymes
Features: Stepwise workflow for the
design of antibodies that focuses on
stability and binding affinity
Limitations: Requires sufficient sequence
data on homologs as well as atomic
coordinates
FuncLib Web tool to design and rank multiple point
mutations based on evolutionary
information and protein-folding stability
calculations
https://funclib.weizmann.ac.il Features: Multipoint variant design tool
with an easy-to-use web server
Limitations: Works better with a
pre-stabilized protein scaffold; poor
knowledge of one’s system may lead to
poor results
Loop Grafter A web tool with a workflow to compare
loop dynamics between proteins and
transplant loops from one protein to
another
https://loschmidt.chemi.muni.cz/
loopgrafter
Features: Automated way to transfer
loops from one protein to the other
Limitations: Works only with the input of
both the template and the scaffold
proteins
PROSS A user-friendly web tool to predict protein
amino acid substitutions that yield
higher-stability variants
https://pross.weizmann.ac.il/step/
pross-terms
Features: Automated way to stabilize
proteins by inserting mutations in the
original protein; the method is reliable
enough that only a more limited
number of designs as output is
sufficient
Limitations: Requires a structure
(which may not always be available)
for stability calculations
ProtLego A Python library for chimera design and
analysis
https://hoecker-lab.github.io/
protlego
Features: Automatic construction and
ranking of chimeras
Limitations: The correlation between
structural features and experimental
successisnotyetclear
Rosetta A comprehensive software suite with
several algorithms that can be used for
the modeling and analysis of proteins
https://www.rosettacommons.org Features: Encompasses many modules
under the same umbrella name
Limitations: Not unified and developed by
many people through the years; can be
hard to implement and use different
modules
SEWING A protocol to design new tertiary protein
structures by ‘sewing’together
secondary-structure building blocks
https://klab.web.unc.edu/
sewing-new-proteins
Features: Continuous and discontinuous
SEWING can be merged to create
additional diversity
Limitations: At present, it appears to have
been applied only to the construction of
all-α-helix chimeras
Tunnels
and
cavities
CAVER Software tool for the analysis and
visualization of tunnels and channels in
protein structure
https://www.caver.cz Features: Integration with other
CaverSuite tools allows deeper analysis
Limitations: Still lacks the possibility of
calculating pores
POVME Standalone tool for ligand-binding pocket
calculations
https://github.com/POVME/
POVME
Features: Calculates ligand-binding
pockets using MD snapshots
Limitations: The lack of a GUI makes it
less accessible to non-bioinformaticians
Trends in Biochemical Sciences
OPEN ACCESS
6Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
A showcase model system for understanding the role of conformational dynamics in protein evo-
lution and design, particularly from a computational perspective, has been β-lactamases, enzymes
that are capable of hydrolyzing the lactam core of nearly all β-lactam antibiotics (including penicillin
derivatives, cephalosporins, and carbapenems) and are thus major contributors to bacterial antibi-
otic resistance [41]. In particular, the evolution of serine β-lactamases has been characterized ex-
tensively through ASR, and thoroughly characterized in terms of their biochemical and biophysical
properties, including their structure, function, and stability [42,43]. Subsequent experimental and
computational analysis suggestedan important role for conformational dynamics in the evolvability
of these enzymes, with a narrowing of the conformational ensemble on transitioning from
the Precambrian nodes to the more specialized modern enzymes [18,20,24,43]. In addition, it
has been suggested that mutations utilize dynamic allostery to confer antibiotic resistance
in the modern TEM-1 β-lactamase [44], which is one of the most commonly encountered
β-lactamases in Gram-negative bacteria [45]. Interestingly, despite large differences in both
sequence and scaffold flexibility, the overall tertiary structure remains largely unchanged over
the course of these enzymes’evolution [46]. Excess positional mutual information analysis
and molecular dynamics (MD) simulations have been used to predict allosteric mutations that
affect β-lactamase drug resistance [47]andMarkov state models have been used to identify
hidden conformational states that are important for conferring antibiotic resistance [48], which
can allow for the prediction of potential resistance to new compounds and aid drug discovery
efforts. Finally, the conformational flexibility at the Precambrian nodes could be exploited to
insert a de novo active site capable of catalyzing Kemp eliminase activity [46], which could
be further optimized [49] using ultralow-throughput computational screening (using FuncLib
[50]) to reach catalytic activities comparable with those of modern enzymes towards their
natural substrates (Figure 2)[51].
Following from this, Kemp elimination is a model reaction for base-catalyzed proton abstraction
from carbon and is one of the commonly targeted reactions in artificial enzyme design [52]. In re-
lated recent work [53], Chica and coworkers used molecular simulations to recapitulate theeffect
of the directed evolution of the most proficient Kemp eliminase designed to date, HG3.17 [54].
They then exploited changes in conformational dynamics during the evolutionary trajectory lead-
ing to HG3.17 from a de novo design, HG3 [55], to engineer a biocatalyst, HG4, with catalytic ef-
ficiencies close to that of HG3.17 but using only key first- and second-shell substitutions picked
Table 1. (continued)
Category Name Description URL Features and limitations
Similarity
search
BLAST A tool to calculate statistical significance
between biological sequences
https://blast.ncbi.nlm.nih.gov/
Blast.cgi
Features: One of the most-used tools for
local alignment search; fast and easy to
use
Limitations: Developed in 1990 and has
not changed substantially since then
FASTA A web tool to provide a heuristic local
search with a protein query
https://www.ebi.ac.uk/Tools/
sss/fasta/
Features: First tool in the field; as with any
other tool from the EBI, it is integrated with
many other tools and analyses; newer
tools exist that have evolved from this
Limitations: As opposed to BLAST, it
does not remove low-complexity regions
Structure
and
trajectory
data
Bio3D R package for the analysis of protein
structure and trajectory data; provides a
variety of approaches for conformational
analysis of a protein
http://thegrantlab.org/bio3d Features: Comprehensive tool with many
tutorials and easy installation
Limitations: Lack of GUI and a webserver
makes for a steep learning curve for
non-bioinformaticians
a
Note that this list is based on a constantly expanding toolkit, and therefore it is impossible to be exhaustive.
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 7
up during the evolutionary trajectory based on a template obtained using ensemble-based
design.
Another group that has invested significant effort in understanding the role of conformational dy-
namics and distal mutations in enzyme catalysis and evolution is that of Osuna and coworkers.
This includes, in 2017, the development of a new approach, the ‘Shortest Path Map’(SPM) [56].
This approach can be applied to MD simulations to identify catalytically important communication
pathways in proteins as well as the pairs of residues with the greatest contributions to the com-
munication pathway. It has so far been successfully applied to either understand the evolution of
catalysis and/or to engineer the activity of retro-aldolases [56], tryptophan synthase [33,57],
monoamine oxidase [58], and, most recently, cytochrome P450 monooxygenase [59]. In this
latter case, the authors were able to identify two mutations that initially affected only activity, with-
out a corresponding population shift of conformations, and thus did not lead to a selectivity
change. However, a third mutation identified using the SPM approach was able to modify the
conformational ensemble and change both the activity and the selectivity. Further examples of
approaches that can be used to predict distal mutations are discussed in Box 1.
Following from this, there is increasing evidence that the conformational dynamics of flexible
loops [17,60]andtunnels[61] can be exploited for protein design. For example, in 2019
Trends
Trends
in
in
Biochemical
Biochemical
Sciences
Sciences
Figure 1. A comparison of representative structures extracted from 10 × 500-ns molecular dynamics (MD)
simulations of an ancestral glycosidase both (A) in the absence and (B) in the presence of heme, as well as
(C) its corresponding extant counterpart (in this case a glycosidase from Halothermothrix orenii). Also shown
here are (D) the absolute and (E) the relative (heme-bound vs non-heme-bound ancestral glycosidase) root mean square
fluctuations (RMSFs) (Å) of the backbone C
α
-atoms of each relevant system. It can be seen from these simulations that
there are clear differences in flexibility in the region spanning residues 227–334, when comparing both the ancestral and
extant proteins and the ancestral glycosidase with and without heme bound. This region corresponds to missing residues
in the electron density of the ancestral protein, an effect that is particularly pronounced in the absence of heme. The
corresponding rigidification of the ancestral protein by bound heme was shown, in turn, to lead to differences in catalytic
activity. This figure was originally published in [36]. Copyright 2018, Springer Nature. Published under a CC-BY license
(http://creativecommons.org/licenses/by/4.0).
Trends in Biochemical Sciences
OPEN ACCESS
8Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
Damborsky and collaborators published a study where ASR was used to revive a common an-
cestor of the Renilla luciferase and haloalkane dehalogenases [62]. The resurrected ancestor
showed higher thermostability than both of the extant enzymes (which is a common outcome
of ASR, as discussed in the prior section), with the conformations visited during similar-length
MD simulations showing slower dynamics on the ancestor than on the modern enzymes. The ac-
tivity for the monooxidase was only residual and the hydrolase activity was mostly lower in the an-
cestorthanintheirmodernenzymeofchoice. A recent continuation of this study [63]
hypothesized that a helix at the tunnel mouth was controlling the rate of the monooxygenase ac-
tivity. With an insertion variant of the ancestor and with a fragment grafted from the modern Renilla
luciferase into the ancestor, it was possible to engineer the conformational dynamics of this lucif-
erase to obtain both lower product inhibition and highly stable bioluminescence [63], resulting in a
more proficient reporter system for gene expression.
Enzyme engineering from protein LEGO®
While the repurposing of either extant or reconstructed ancestral enzymes for new catalytic func-
tions has been shown to be tremendously powerful as a starting point for protein design,
repurposed enzymes could in principle also carry unnecessary (or even potentially undesirable)
features for the desired function, which are just leftover traces of their evolutionary path. There-
fore, substantial effort has also been invested in the design of new protein scaffolds, completely
de novo,withafocusonobtainingtailoredphysicochemical properties. The past decades
witnessed tremendous progress in the design of novel enzymes completely de novo by grafting
minimal active sites onto pre-existing scaffolds [64]. However, the activities of the resulting con-
structs were typically low, with significant gains obtained only through subsequent directed
Trends
Trends
in
in
Biochemical
Biochemical
Sciences
Sciences
Figure 2. Enhancing the de novo Kemp eliminase activity of a Precambrian β-lactamase (GNCA4-WT) [46]
using FuncLib [50]. (A) Biochemical characterization of the activity of the top-20-ranked variants obtained from FuncLib
shows four variants with significant improvements in activity compared with the wild-type enzyme. The most efficient of
these, GNCA4-12, has a turnover number (k
cat
)of~10
2
s
-1
and a catalytic efficiency (k
cat
/K
M
)of~2×10
4
M
-1
s
-1
[49],
which is comparable with the catalytic efficiency of the average modern enzyme towards its native substrate [51]. (B) An
overlay of the crystal structures of the GNCA4-WT (tan) and GNCA-2 (light blue) β-lactamases, in complex with the
transition-state analog 6-nitrobenzotriazole, shows the overall scaffolds to be essentially superimposable, with only minor
differences in side-chain positioning at the de novo active site. Simulations of the reaction catalyzed by these enzymes
indicate that enhancements in catalysis can be directly linked to improved positioning of the reacting fragments for optimal
proton transfer at the Michaelis complex. As the catalytic D229 side chain is placed on a flexible loop, this positioning is
likely to be further improved by rigidifying the flexible loop. Adapted from [49]. Copyright 2020, Royal Society of Chemistry.
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 9
evolution [6]. While it is unclear whether de novo designer scaffolds will necessarily provide more
efficient enzymes, they have great potential to vastly expand the scope of new catalytic function-
alities that a scaffold can accommodate. This makes de novo protein design once again an excit-
ing area, particularly as the field has matured significantly [65–67], with significant contributions
also likely to be made from recent developments in protein structure prediction (such as Alpha
Fold [68] and recent developments in protein structure prediction using a three-track neural
network [69]).
Various approaches have been used to try to design new protein scaffolds [65–67], one of which
is modular design to assemble new protein folds [70]. This approach is attractive because it effec-
tively allows new proteins to be assembled as if they are made of molecular LEGO® building
blocks. Modular design is relevant from an evolutionary perspective, as there is increasing evi-
dence that natural proteins evolve from fragments or motifs, such as short propellor-like motifs
that have evolved into modern β-barrels [71], an omnipresent Rossman ribose-binding motif
that hints at common ancestry among Rossman-fold enzymes that use ribose-based cofactors
[72], and a β-α-βfragment that appears to be the minimal functional motif leading to modern
P-loop NTPase and Rossman enzymes [73], among other examples [74–76]. Therefore, modular
design mimics the repeating themes that lead to the natural evolution of functional protein
scaffolds.
Significant contributions to this area were made in early studies by Höcker, Sterner, and col-
leagues [76–80], who helped to elucidate the modular nature of protein evolution and develop
the concept that even TIM-barrel folds could be amenable to structure-based recombination.
As a more recent example, in 2016 Kuhlman and coworkers developed a design strategy they
called SEWING [81], as this approach effectively ‘sews’together connected or disconnected
pieces of existing proteins to create novel scaffolds. This allowed them to design highly stable
α-helical proteins, showcasing the potential of the modular design approach. It was suggested
that this approach can be used as a means with which to rapidly generate a wide variety of
designable protein scaffolds. Related to this, Fleishman and coworkers have presented a novel
approach, AbDesign [82], which assembles new backbone architectures by combining naturally
occurring modular fragments using the combinatorial backbone-conformational space of many
Box 1. Computational tools to predict distal mutations
Enzyme activity can be engineered through substitutions made either to active-siteresidues or to distal residues. In the first
case, such substitutions typically change the shape of the binding pocket and/or modulate key interactions with the sub-
strate, whereas in the latter case distal mutations affect allosteric regulation of the enzyme and/or stabilize alternative active
site configurations [16,56,93]. Although it is nontrivial, there is currently rapidly increasing interest in the prediction of distal
mutations that can modulate enzyme activity [17,40]. There are a number of computational approaches that have shown
promise as tools to reliably predict the allosteric effect of distal mutations. We focus here on three of them: (i) the Dynamic
Flexibility Index (DFI) [94]; (ii) the SPM approach [56]; and (iii) Ohm [95]. While all three approaches aim to predict long-range
allosteric interaction networks, they use different sources of information to achieve this.
The DFI approach uses elastic network models to roughly sample the different conformations of a protein [94]. Recently,
this approach has used structural information on not only one protein but also proteins in the same phylogenetic branch, to
compare their dynamical properties and assess which positions affect the function of the protein when substitutions occur
[18]. The SPM approach [56] uses long-timescale MD simulations to sample as much of the conformationalensemble as
possible and then uses that information to predict mutations that will have an allosteric effect on the protein’s function. It
was developed to study the role of dynamics in the evolution of enzymes. Finally, the Ohm web tool is the most recent ap-
proach of the three, and requires less knowledge of the system and of computational tools for protein engineering [95].
This approach identifies allosterically important residues that are distal to the active site based on the position of the active
site, the allosteric pathway connecting the position to the active site, and the critical residues between the active site and
the allosteric site. Taken together, these examples showcase rapid progress in user-friendly computational tools that can
be used to predict distal mutations that can alter the conformational landscape of an enzyme and thus regulate activity.
Trends in Biochemical Sciences
OPEN ACCESS
10 Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
protein families. This approach allows for the generation of a repertoire of scaffolds that exceeds
the diversity of the entire Protein Data Bank (PDB) [83] for even a modestly sized family of homo-
logs, giving the user significant control over the design process.
These provide just a few of many examples of the successful application of modular assembly
to artificial protein design (for a summary of selected methods, see Table 1). However, natural
evolution has also harnessed billions of years of evolution to generate a library of building
blocks that can be identified recurrently across the protein universe. These are most commonly
classified in terms of ‘domains’, typically comprising ~50–250 amino acids that represent an
independent folding unit [84]. However, a number of studies have suggested that duplication
and recombination of segments shorter than domains could have been the origin of the do-
mains themselves and that these are therefore the ultimate building blocks [71,75,81,85,86].
As has been argued, for instance by Kolodny and cowokers [75], the proteins in the ancient
universe were presumably shorter, but over extended periods of evolutionary time their
sequences were afforded many ‘copy–paste’opportunities, leading to the scaffolds we
observe today.
In parallel with these developments, there are now increasing efforts in the construction of a
catalog of locally similar segments across globally different domains [prior efforts, e.g., the
Structural Classification of Protein Domains (SCOP) [87], have focused on cataloguing full do-
mains]. As an example, Lupas and coworkers have constructed a vocabulary of ancient pep-
tides that have led to modern folded proteins, borrowing from strategies to how linguists
have compared modern languages to reconstruct ancient vocabularies [88]. Following from
this, Kolodny and coworkers have presented a global view of the protein universe based on
protein networks that connect domains that share fragments [89]. More recently, they have in-
troduced the concept of ‘themes’as recurrent fragments of short protein segments of at least
35 amino acids that are unexpectedly found in domains of independent evolutionary origin [75].
Along similar lines, Höcker and coworkers have developed the ‘Fuzzle’database [86], which
was obtained by clustering the SCOPe95 database by sequence identity and performing a
sequence- and structure-based comparison of all domains. This allowed the authors to identify
1337 fragments with lengths ranging from 11 to 229 amino acids that populate a wide diversity
of folds (519/1221 folds in the SCOP/Fuzzle database). A subsequent Python package,
ProtLego, uses the Fuzzle database to obtain evolutionarily conserved fragments for the auto-
mated and high-throughput in silico design of chimeric proteins [90]. Taken together, these
approaches, some of which are summarized in Figure 3,pavethewayfortheconstructionof
a catalog of fragments that can be characterized by specific biophysical features, or even func-
tional roles, such as metal binding, nucleotide binding, or nucleic acid binding, on the basis of
known structures [75,88]. This, in turn, provides an extremely valuable resource for the facile
design of custom-made proteins.
While modular design has frequently focused on the generation of novel stably folded proteins,
there is also increasing interest in using this approach to generate functional enzymes. For exam-
ple, in the case of AbDesign [82], Fleishman and coworkers were successfully able to use this
approach for the development of stable and highly active new enzyme scaffolds. Specifically,
they applied this approach to two TIM-barrel enzyme families with very different sequences,
active-site structures, and catalytic activities (GH10 and PLLs) and were able to obtain
enzymes with catalytic and stability profiles similar to those of the natural enzymes for the
GH10 designs, while for the PLLs some designs exhibited higher catalytic efficiencies than
the natural enzymes, as well as broad substrate promiscuity. This was achieved by first
segmenting all homologous structures within the protein family of interest (e.g., all GH10
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 11
xylanases) into modular parts, then performing an ‘idealization’procedure that forces ideal
bonds and relaxes the associated torsion angles, computing backbone conformational data-
bases based on this information as well as enforcing position-specific sequence constraints
based on the results of multiple sequence alignment, performing a precomputation step involv-
ing the design and ranking of thousands of unique backbones, assembling the resulting back-
bones, and, finally, performing stability optimization using the Protein Repair One Stop Shop
(PROSS) [91] to generate a seamless structure. It is perhaps unsurprising that such a strategy
wouldleadtohighlyefficient catalysts, as this appears to be the strategy also taken by nature
itself in designing new proteins [74,75,89,92]. However, it opens a new (and highly promising)
door for evolutionary-based computational protein design.
Trends
Trends
in
in
Biochemical
Biochemical
Sciences
Sciences
Figure 3. Schematic classification of current subdomain approaches. This figure shows the process of generation of
the different subdomain databases discussed in the main text. ‘Fragments’[88] used the SCOPe30 database as input and
clusters by sequence similarity, yielding 40 clusters of fragments of about 35 amino acids in length. ‘Themes’[75]used
Evolutionary Classification of Protein Domains (ECOD) [96] and the Protein Data Bank (PDB) [83] as input, clustered by
sequence similarity, and performed C
α
-RMSD calculations (3.5-Å threshold), obtaining 2195 curated hits of at least 35
amino acids. Finally, the ‘Fuzzle’[86] data set is based on clustering of the SCOPe95 database by sequence identity
combined with sequence and structure alignments to yield 1337 fragments with lengths ranging from 11 to 229 amino
acids. This figure was adapted from [92], which was published under a CC-BY license. Copyright 2021, Elsevier.
Trends in Biochemical Sciences
OPEN ACCESS
12 Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
Concluding remarks
The past decade has witnessed an explosion of interest in both computational enzyme de-
sign and, in parallel, understanding the molecular evolution of enzyme function. These two
developments are interlinked, as greater insight into how enzymes naturally emerge,
evolve, and gain new functions can be incorporated in turn to guide the design process:
to borrow from Tawfik, ‘protein engineers turned evolutionists’[13]. While evolutionary
insight is being used extensively to guide experimental protein design studies, it has only
more recently also been incorporated into computational enzyme design, driven in part
by increases in computational power that allow us to now model enzyme evolutionary
trajectories in atomic detail and pinpoint the physicochemical parameters that shape the
emergence of new functions.
This review has focused on three key directions being taken in the field: (i) the use of evolution-
ary information, gained through ancestral inference by bioinformatics analysis, to identify scaf-
folds that can be used as starting seeds for protein engineering; (ii) increased appreciation of
the role of conformational dynamics and its importance in enzyme design; and (iii) renewed ap-
preciation of and impetus towards modular design, following the modular nature of the natural
emergence of new protein scaffolds. These three areas, particularly the first two, are indepen-
dently evolving subfields but also highly interlinked. In addition, as these are all fast-growing
areas, the list of studies presented here is by no means exhaustive, but rather is meant to
showcase examples of success in the field. Finally, as a fast-growing field, there remain a num-
ber of outstanding questions that require addressing, some of which have been highlighted in
the Outstanding questions.
The use of computational models to efficiently design new proteins has long been a major goal of
computational biologists and chemists. Past decades have seen substantial progress in this area,
through both the development of new methodologies and their successful application in protein
design. We are at an exciting moment in the field as these investments are starting to bear fruit,
and theory is becoming an indispensable component of the design process. As we demonstrate
here, ‘turning evolutionist’is also important from a computational perspective, as learning from
nature will allow us to circumvent significant remaining challenges in the field. Therefore, in
time, there is a likelihood that we will all become protein engineers turned evolutionists, exploiting
evolutionary insight to guide the design process.
Acknowledgments
This work was fundedby the Knut and Alice Wallenberg Foundation (Wallenberg Academy Fellowship and Wallenberg Scholar
Fellowshipto S.C.L.K., grants KAW2018.0140 and 2019.0431), the Human FrontierScience Program (grantRGP0041/2017),
and the Swedish Research Council (grant 2019-03499). This project has received funding from the European Union’sHorizon
2020 Research and Innovation Programme under the Marie Skłodowska-Curie grant agreement no. 890562 to M.C.
Declaration of interests
No interests are declared.
References
1. Hellinga, H.W. and Richards, F.M. (1991) Construction of new li-
gand binding sites in proteins of known structure: I. Computer-
aided modeling of sites with pre-defined geometry. J. Mol.
Biol. 222, 763–785
2. Dahiyat, B.I. and Mayo, S.L. (1996) Protein design automation.
Protein Sci. 5, 895–903
3. Voigt, C.A. et al. (2001) Computational method to reduce the
search space for directed protein evolution. Proc. Natl . Acad.
Sci. U. S. A. 98, 3778–3783
4. Looger, L.L. et al. (2003) Computational design of receptor and
sensor proteins with novel functions. Nature 423, 185–190
5. Currin, A. et al. (2015) Synthetic biology for the directed evolu-
tion of protein biocatalysts: navigating sequence space intelli-
gently. Chem. Soc. Rev. 44, 1172–1239
6. Kries, H. et al. (2013) De novo enzymes by computational de-
sign. Curr. Opin. Chem. Biol. 17, 221–228
7. Lutz, S. and Iamurri, S.M. (2018) Protein engineering: past,
present and future. Methods Mol. Biol. 1685, 1–12
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 13
Outstanding questions
Thermostability is a desirable property
when designing new enzymes, and
reconstructed proteins obtained from
ancestral sequence reconstruction
(ASR)tendtobemorethermostable.
Is this real or artifactual from the
reconstruction? Resolving this debate
would be valuable in understanding
the biophysical properties of ancestral
scaffolds.
Despite significant improvements
in methodologies, predicting distal
mutations that are catalytically relevant
remains challenging as the impact of
these mutations on the catalytic rate is
often a sum of multiple small effects.
How can we improve our predictions?
There is now an increasing number of
user-friendly webservers for the predic-
tion of variants with improved catalytic
activities, as well as structural bioinfor-
matics tools that can characterize pro-
tein flexibility without the need for
extensive and computationally intensive
enhanced sampling simulations. How
far can the characterization of confor-
mational properties and allosteric inter-
actions be streamlined and simplified
to, for instance, a webserver that can
predict hotspots based on conforma-
tional properties in a physically mean-
ingful way? To what extent can
machine learning contribute?
Despite ever-increasing knowledge
about the evolution of protein folds,
as well as the availability of huge cata-
logs of segments, the de nov o design
of highly efficient enzymes remains
challenging. Part of this is due to inac-
curacies in the predictions of the posi-
tions of the side chains of key
residues involved in substrate binding,
transition-state stabilization, or sub-
strate release, each of which can
have a critical impact on the resulting
activity. To what extent can the
cataloging of segments and domains
reduce current inaccuracies in scaffold
design?
8. Drienovská, I. and Roelfes, G. (2020) Expanding the enzyme uni-
verse with genetically encoded unnatural amino acids. Na t.
Catal. 3, 193–202
9. Marques, S.M. et al. (2021) Web-based tools for computational
enzyme design. Curr. Opin. Struct. Biol. 69, 19–34
10. Xu, Y. et al. (2020) Deep dive into machine learning models for
protein engineering. J. Chem. Inf. Model. 60, 2773–2790
11. Chou, H.-H. et al. (2011) Diminishing returns epistasis among
beneficial mutations decelerates adaptation. Science 332,
1190–1192
12. Tokuriki, N. et al. (2012) Diminishing returns and tradeoffs con-
strain the laboratory optimization of an enzyme. Nat. Commun.
3, 1257
13. Trudeau, D.L. and Tawfik, D.S. (2019) Protein engineers turned
evolutionists –the quest for the optimal starting point. Curr.
Opin. Biotechnol. 60, 46–52
14. James, L.C. and Tawfik, D.S. (2003) Conformational diversity
and protein evolution –a 60-year-old hypothesis revisited.
Trends Biochem. Sci. 28, 361–368
15. Maria-Solano, M.A. et al. (2018) Role of conformational dynam-
ics in the evolution of novel enzyme function. Chem. Commun.
54, 6622–6634
16. Campbell, E.C. et al. (2018) Laboratory evolution of protein con-
formational dynamics. Curr. Opin. Struct. Biol. 50, 49–57
17. Crean, R.M. et al. (2020) Harnessing conformational plasticity to
generate designer enzymes. J. Am. Chem. Soc. 142, 11324–11342
18. Campitelli, P. et al. (2020) The role of conformational dynamics
and allostery in modulating protein evolution. Annu. Rev.
Biophys. 49, 267–288
19. Romero-Romero, M.L. et al. (2016) Engineering ancestral pro-
tein hyperstability. Biochem. J. 473, 3611–3620
20. Zou, T. et al. (2015) Evolution of conformational dynamics deter-
mines the conversion of a promiscuous generalist into a special-
ist enzyme. Mol. Biol. Evol. 32, 132–143
21. Trudeau, D.L. et al . (2016) On the potential origins of the high
stability of reconstructed ancestral proteins. Mol. Biol. Evol. 33,
2633–2641
22. Spence, M.A. et al. (2021) Ancestral sequence reconstructionfor
protein engineers. Curr. Opin. Struct. Biol. 69, 131–141
23. Selberg, A.G.A. et al. (2021) Ancestral sequence reconstruction:
from chemical paleogenetics to maximum likelihood algorithms
and beyond. J. Mol. Evol. 89, 157–164
24. Randall, R.N. et al. (2016) An experimental phylogeny to bench-
mark ancestral sequencereconstruction. Nat. Commun.7, 12847
25. Bar-Rogovsky, H. et al. (2015) Assessing the prediction fidelity of
ancestral reconstruction by a library approach. Protein Eng. Des.
Sel. 28, 507–518
26. Eick, G.N. et al. (2017) Robustness of reconstructed ancestral
protein functions to statistical uncertainty. Mol. Biol. Evol. 34,
247–261
27. Copley, S.D. (2021) Setting the stage for evolution of a new
enzyme. Curr. Opin. Struct. Biol. 69, 41–49
28. Thomas, A. et al. (2019) Highly thermostable carboxylic acid re-
ductases generated by ancestral sequence reconstruction.
Commun. Biol. 2, 429
29. Wheeler, L.C. et al. (2016) The thermostability and specificity of
ancient proteins. Curr. Opin. Struct. Biol. 38, 37–43
30. Hillis, D.M. et al. (1992) Experimental phylogenetics: generation
of a known phylogeny. Science 255, 589–592
31. Gardner, J.M. et al. (2020) Manipulating conformational dynam-
ics to repurpose ancient proteins for modern catalytic functions.
ACS Catal. 10, 4863–4870
32. Schupfner, M. et al. (2020) Analysis of allosteric communication
in a multienzyme complex by ancestral sequence reconstruction.
Proc. Natl. Acad. Sci. U. S. A. 117, 346–354
33. Maria-Solano, M.A. et al. (2021) Rational prediction of distal ac-
tivity enhancing mutations in tryptophan synthase. ChemRxiv
Published online March 4, 2021. https://doi.org/10.26434/
chemrxiv.14151989.v1
34. Gomez-Fernandez, B.J. et al. (2020) Consensus design of an
evolved high-redox potential laccase. Front. Bioeng. Biotechnol.
8, 354
35. Zamora, R.A. et al. (2020) Tuning of conformational dynamics
through evolution-based design modulates the catalytic adapt-
ability of an extremophile kinase. ACS Catal. 10, 10847–10857
36. Gamiz-Arco, G. et al. (2021) Heme-binding enables allosteric
modulation in an ancient TIM-barrel glycosidase. Nat. Commun.
12, 380
37. Tokuriki, N. and Tawfik, D.S. (2009) Protein dynamism and
evolvability. Science 324, 203–207
38. Babtie, A.C. et al. (2009) Efficient catalytic promiscuity for chem-
ically distinct reactions. Angew. Chem. Int. Ed. Engl. 48,
3692–3694
39. Bigley, A.N. and Rashel, M. (1834) Catalytic mechanisms for
phosphotriesterases. Biochim. Biophys. Acta 1834, 443–453
40. Osuna, S. (2020) The challenge of predicting distal active site
mutations in computational enzyme design. WIREs Comput.
Mol. Sci. 11, e1502
41. Worthington, R.J. and Melander, C. (2013) Overcoming resis-
tance to β-lactam antibiotics. J. Org. Chem. 78, 4207–4213
42. Hall, B.G. and Barlow, M. (2003) Structure-based phylogenies of
the serine beta-lactamases. J. Mol. Evol. 57, 255–260
43. Risso, V.A. et al. (2013) Hyperstability and substrate promiscuity
in laboratory resurrections of Precambrian β-lactamases. J. Am.
Chem. Soc. 135, 2899–2902
44. Modi, T. and Banu Ozkan, S. (2018) Mutations utilize dynamic al-
lostery to confer resistance in TEM-1 β-lactamase. In t. J. Mol.
Sci. 19, 3808
45. Shah, A.A. et al. (2004) Characteristics, epidemiologyand clinical
importance of emerging strains of Gram-negative bacilli produc-
ing extended-spectrum beta-lactamases. Res. Microbiol. 155,
409–421
46. Risso, V.A. et al. (2017) De novo active sites for resurrected Pre-
cambrian enzymes. Nat. Commun. 8, 16113
47. Cortina, G.A. and Kasson, P.M. (2016) Excess positional mutual
information predicts both local and allosteric mutations affecting
beta lactamase drug resistance. Bioinformatics 32, 3420–3427
48. Hart, K.M. et al. (2016) Modelling proteins’hidden conformations
to predict antibiotic resistance. Nat. Commun. 7, 12965
49. Risso, V.A. et al. (2020) Enhancing a de novo enzyme activity by
computationally-focused ultra-low-throughputscreening. Chem.
Sci. 11, 6134–6148
50. Khersonsky, O. et al. (2018) Automated design of efficient and
functionally diverse enzyme repertoires. Mol. Cell 72, 178–186.
e5
51. Bar-Even, A. et al. (2011) The moderately efficient enzyme: evo-
lutionary and physicochemical trends shaping enzyme parame-
ters. Biochemistry 50, 4402–4410
52. Korendovych, I.V. and DeGrado, W.F. (2014) Catalytic efficiency
of designed catalytic proteins. Curr. Opin. Struct. Biol. 27,
113–121
53. Broom, A. et al. (2020) Ensemble-based enzyme design can re-
capitulate the effects of laboratory directed evolution in silico.
Nat. Commun. 11, 4808
54. Blomberg, R. et al. (2013) Precision is essential for efficient catal-
ysis in an evolved Kemp eliminase. Nature 503, 418–421
55. Privett, H.K. et al. (2012) Iterative approach to computational en-
zyme design. Proc. Natl. Acad. Sci. U. S. A. 109, 3790–3795
56. Romero-Rivera, A. et al.(2017) Role of conformational dynamics in
the evolution of retro-aldolase activity. ACS Catal.7, 8524–8532
57. Maria-Solano, M.A. et al. (2019) Deciphering the allosterically
driven conformational ensemble in tryptophan synthase. J. Am.
Chem. Soc. 141, 1409–13056
58. Curado-Carballada, C. et al. (2019) Hidden conformations in As-
pergillus niger monoamine oxidase are key for catalytic effi-
ciency. Angew. Chem. Int. Ed. Engl. 58, 3097–3101
59. Acevedo-Rocha, C.G. et al. (2021) Pervasive cooperative muta-
tional effects on multiple catalytic enzyme traits emerge via long-
range conformational dynamics. Nat. Commun. 12, 1621
60. Nestl, B.M. and Hauer, B. (2014) Engineering of flexible loops in
enzymes. ACS Catal. 4, 3201–3211
61. Kokkonen, P. et al. (2019) Engineering enzyme access tunnels.
Biotechnol. Adv. 37, 107386
62. Chaloupkova, R. et al. (2019) Light-emitting dehalogenases: re-
construction of multifunctional biocatalysts. ACS Catal. 9,
4810–4823
63. Schenkmayerova, A. et al. (2021) Engineering the protein dy-
namics of an ancestral luciferase. Nat. Commun. 12, 3616
64. Zanghellini, A. (2014) De novo computational enzyme design.
Curr. Opin. Biotechnol. 29, 132–138
Trends in Biochemical Sciences
OPEN ACCESS
14 Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx
65. Dawson, W.M. et al. (2019) Towards functional de novo de-
signed proteins. Curr. Opin. Chem. Biol. 52, 102–111
66. Korendovych, I.V. and DeGrado, W.F. (2020) D e novo protein
design, a retrospective. Q. Rev. Biophys. 53, e3
67. Pan, X. and Kortemme, T. (2021) Recent advances in de novo
protein design: principles, methods, and applications. J. Biol.
Chem. 296, 100558
68. Jumper, J. et al. (2021) Highly accurate protein structure predic-
tion with AlphaFold. Nature 596, 583–589
69. Baek, M. et al. (2021) Accurate prediction of protein structures
and interactions using a three-track neural network. Science
373, 871–876
70. Lutz, S. and Benkovic, S.J. (2000) Homology-independent pro-
tein engineering. Curr. Opin. Biotechnol. 11, 319–324
71. Smock, R.G. et al. (2016) De novo evolutionary emergence of a
symmetrical protein is shaped by folding constraints. Cell
476–486
72. Laurino, P. et al. (2016) An ancient fingerprint indicates the com-
mon ancestry of Rossman-fold enzymes utilizing different ribose-
based cofactors. PLoS Biol. 14, e1002396
73. Longo, L.M. et al. (2020) On the emergence of P-Loop NTPase
and Rossmann enzymes from a beta-alpha-beta ancestral frag-
ment. eLife 9, e64415
74. Kolodny, R. (2021) Searching protein space for ancient sub-
domain segments. Curr. Opin. Struct. Biol. 68, 105–112
75. Kolodny, R. et al. (2021) Bridging themes: short protein seg-
ments found in different architectures. Mol. Biol. Evol. 38,
2191–2208
76. Höcker, B. et al. (2002) A common evolutionary origin of two el-
ementary enzyme folds. FEBS Lett. 510, 133–135
77. Höcker, B. et al. (2001) Dissection of a (βα)
8
-barrel enzyme into
two folded halves. Nat. Struct. Biol. 8, 32–36
78. Höcker, B. et al. (2004) Mimicking enzyme evolution by generat-
ing new (betaalpha)
8
-barrels from (betaalpha)
4
-half-barrels.
Proc. Natl. Acad. Sci. U. S. A. 101, 16448–16453
79. Bharat, T.A.M. et al. (2008) A beta alpha-barrel built by the com-
bination of fragments from different folds. Proc. Natl. Acad. Sci.
U. S. A. 105, 9942–9947
80. Claren, J. et al. (2009) Establishing wild-type levels of catalytic
activity on natural and artificial (βα)
8
-barrel protein scaffolds.
Proc. Natl. Acad. Sci. U. S. A. 106, 3704–3709
81. Jacobs, T.M. et al. (2016) Design of structurally distinct proteins
using strategies inspired by evolution. Science 352, 687–690
82. Lipsh-Sokolik, R. et al. (2020) The AbDesign computational
pipeline for modular backbone assembly and design of binders
and enzymes. Protein Sci. 30, 151–159
83. Berman, H.M. et al. (2000) The Protein Data Bank. Nucleic Acids
Res. 28, 235–242
84. Wetlaufer, D.B. (1973) Nucleation, rapid folding, and globular
intrachain regions in proteins. Proc. Natl. Acad. Sci. U. S. A.
70, 697–701
85. Lupas, A.N. et al. (2001) On the evolution of protein folds: are similar
motifs in different protein folds the result of convergence, insertion, or
relics of an ancient peptide world. J. Struct. Biol. 134, 191–203
86. Ferruz, N. et al. (2020) Identification and analysis of natural build-
ing blocks for evolution-guided fragment-based protein design.
J. Mol. Biol. 432, 3898–3914
87. Murzin, A.G. et al. (1995) SCOP: a structural classification of pro-
teins database for the investigation of sequences and structures.
J. Mol. Biol. 247, 536–540
88. Alva, V. et al. (2015) A vocabulary of ancient peptides at the or-
igin of folded proteins. eLife 4, e09410
89. Nepomnyachiy, S. et al . (2014) Global view of the protein uni-
verse. Proc. Natl. Acad. Sci. U. S. A. 111, 11691–11696
90. Ferruz, N. et al. (2021)ProtLego: a Pythonpackage forthe analysis
and design of chimeric proteins. Bioinformatics Published online
April 26, 2021. https://doi.org/10.1093/bioinformatics/btab253
91. Goldenzwig, A. et al. (2016) Automated structure- and
sequence-based design of proteins for high bacterial expression
and stability. Mol. Cell 63, 337–346
92. Romero-Romero, S. et al. (2021) Evolution, folding, and design of
TIM barrels and related proteins. Curr. Opin. Struct. Biol. 68, 94–104
93. Hong, N.-S. et al. (2018) The evolution of multiple active site con-
figurations in a designed enzyme. Nat. Commun. 9, 3900
94. Nevin Gerek, Z. et al. (2013) Structural dynamics flexibility in-
forms function and evolution at a proteome scale. Evol. Appl.
6, 423–433
95. Wang, J. et al. (2020) Mapping allosteric communications within
individual proteins. Nat. Commun. 11, 3862
96. Schaeffer, R.D. et al. (2016) ECOD: new developments in the
evolutionary classification of domains. Nucleic Acids Res. 45,
D296-D230
Trends in Biochemical Sciences OPEN ACCESS
Trends in Biochemical Sciences, Month 2021, Vol. xx, No. xx 15