Anna Lauko’s research while affiliated with University of Washington and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (9)


Computational design of serine hydrolases
  • Article

February 2025

·

110 Reads

·

31 Citations

Science

Anna Lauko

·

Samuel J. Pellock

·

Kiera H. Sumida

·

[...]

·

David Baker

The design of enzymes with complex active sites that mediate multistep reactions remains an outstanding challenge. With serine hydrolases as a model system, we combined the generative capabilities of RFdiffusion with an ensemble generation method for assessing active site preorganization to design enzymes starting from minimal active site descriptions. Experimental characterization revealed catalytic efficiencies ( k cat / K m ) up to 2.2x10 ⁵ M ⁻¹ s ⁻¹ and crystal structures that closely match the design models (Cα RMSDs < 1 Å). Selection for structural compatibility across the reaction coordinate enabled identification of new catalysts in low-throughput screens with five different folds distinct from those of natural serine hydrolases. Our de novo approach provides insight into the geometric basis of catalysis and a roadmap for designing enzymes that catalyze multistep transformations.


Figures
Computational Design of Metallohydrolases
  • Preprint
  • File available

November 2024

·

218 Reads

·

4 Citations

New enzymes can be designed by starting from a description of an ideal active site composed of catalytic residues surrounding the reaction transition state(s) and identifying or generating a protein scaffold that supports the site, but there are a few current limitations. First, the catalytic efficiencies achieved by such efforts have generally been quite low, and considerable optimization by directed evolution has been required to reach activities typical of native enzymes. Second, generative AI methods such as RFdiffusion now enable the direct generation of proteins around active sites, but to date, such scaffolding has required specification of both the position in the sequence and the backbone coordinates of the catalytic residue, which complicates sampling. Here we introduce a generative AI method called RFflow that overcomes these limitations and use it to design zinc metallohydrolases starting from a density functional theory description of active site geometry. Of 96 designs tested experimentally, the most active has a kcat/KM of 23,000 M ⁻¹ s ⁻¹ , orders of magnitude higher than previously designed metallohydrolases. This 148 amino acid protein has a novel fold with an enclosed chamber which positions the reaction substrate nearly perfectly for attack by a catalytic water molecule activated by the bound metal and is predicted by ChemNet to have a highly preorganized active site. The ability to generate high activity catalysts starting from quantum chemistry calculated active site geometries without experimental optimization should open the door to a new generation of potent designer enzymes.

Download

Modeling protein-small molecule conformational ensembles with ChemNet

September 2024

·

74 Reads

·

4 Citations

Modeling the conformational heterogeneity of protein-small molecule systems is an outstanding challenge. We reasoned that while residue level descriptions of biomolecules are efficient for de novo structure prediction, for probing heterogeneity of interactions with small molecules in the folded state an entirely atomic level description could have advantages in speed and generality. We developed a graph neural network called ChemNet trained to recapitulate correct atomic positions from partially corrupted input structures from the Cambridge Structural Database and the Protein Data Bank; the nodes of the graph are the atoms in the system. ChemNet accurately generates structures of diverse organic small molecules given knowledge of their atom composition and bonding, and given a description of the larger protein context, and builds up structures of small molecules and protein side chains for protein-small molecule docking. Because ChemNet is rapid and stochastic, ensembles of predictions can be readily generated to map conformational heterogeneity. In enzyme design efforts described here and elsewhere, we find that using ChemNet to assess the accuracy and pre-organization of the designed active sites results in higher success rates and higher activities; we obtain a preorganized retroaldolase with a kcat/KM of 11000 M-1min-1, considerably higher than any pre-deep learning design for this reaction. We anticipate that ChemNet will be widely useful for rapidly generating conformational ensembles of small molecule and small molecule-protein systems, and for designing higher activity preorganized enzymes.


Computational design of serine hydrolases

August 2024

·

298 Reads

·

9 Citations

Enzymes that proceed through multistep reaction mechanisms often utilize complex, polar active sites positioned with sub-angstrom precision to mediate distinct chemical steps, which makes their de novo construction extremely challenging. We sought to overcome this challenge using the classic catalytic triad and oxyanion hole of serine hydrolases as a model system. We used RFdiffusion ¹ to generate proteins housing catalytic sites of increasing complexity and varying geometry, and a newly developed ensemble generation method called ChemNet to assess active site geometry and preorganization at each step of the reaction. Experimental characterization revealed novel serine hydrolases that catalyze ester hydrolysis with catalytic efficiencies ( k cat / K m ) up to 3.8 × 10 ³ M ⁻¹ s ⁻¹ , closely match the design models (Cα RMSDs < 1 Å), and have folds distinct from natural serine hydrolases. In silico selection of designs based on active site preorganization across the reaction coordinate considerably increased success rates, enabling identification of new catalysts in screens of as few as 20 designs. Our de novo buildup approach provides insight into the geometric determinants of catalysis that complements what can be obtained from structural and mutational studies of native enzymes (in which catalytic group geometry and active site makeup cannot be so systematically varied), and provides a roadmap for the design of industrially relevant serine hydrolases and, more generally, for designing complex enzymes that catalyze multi-step transformations.


De novo design of protein structure and function with RFdiffusion

July 2023

·

1,749 Reads

·

1,048 Citations

Nature

There has been considerable recent progress in designing new proteins using deep learning methods1-9. Despite this progress, a general deep learning framework for protein design that enables solution of a wide range of design challenges, including de novo binder design and design of higher order symmetric architectures, has yet to be described. Diffusion models10,11 have had considerable success in image and language generative modeling but limited success when applied to protein modeling, likely due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding, and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold Diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of designed symmetric assemblies, metal binding proteins and protein binders. The accuracy of RFdiffusion is confirmed by the cryo-EM structure of a designed binder in complex with Influenza hemagglutinin which is nearly identical to the design model. In a manner analogous to networks which produce images from user-specified inputs, RFdiffusion enables the design of diverse functional proteins from simple molecular specifications.


Figures
Figure S4: RFdiffusion designs are diverse and dissimilar to proteins in the PDB. A) Comparing unconditional designs to one another (100 designs per length) demonstrates that, by TM score alignment, designs are diverse (medians 100-400aa: 0.39, 0.36, 0.37, 0.35). B-C) Designs also bear little resemblance to the training set (PDB). B) Example of the most diverse (lowest TM score hit) to the PDB for a set of 300 amino acid designs. The folds of the design (left) and native protein (middle) are highly dissimilar, aligning only across a portion of the -sheet. C) Example designs demonstrating extrapolation beyond the training set for generating novel folds. Gray: closest protein in the PDB by TM score, colors: RFdiffusion design model, overlaid by TM alignment. For each protein length, the median and most diverse samples are shown (the 300aa design is the same as in B). While for short proteins, designs typically show some similarity to known protein folds, with increasing length, designs become increasingly dissimilar to the PDB. TM score (closest PDB, TM score; median, most diverse): 100aa: 5WVE_A, 0.71; 4W5T_A, 0.59; 200aa: 4AV3_A, 0.58; 4CLY_A, 0.47; 300aa: 4PEW_B, 0.53; 4RDR_A, 0.46; 400aa: 4AIP_A, 0.49; 6R9T_A, 0.42.
Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models

December 2022

·

948 Reads

·

88 Citations

There has been considerable recent progress in designing new proteins using deep learning methods. Despite this progress, a general deep learning framework for protein design that enables solution of a wide range of design challenges, including de novo binder design and design of higher order symmetric architectures, has yet to be described. Diffusion models have had considerable success in image and language generative modeling but limited success when applied to protein modeling, likely due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding, and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold Diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of new designs. In a manner analogous to networks which produce images from user-specified inputs, RFdiffusion enables the design of diverse, complex, functional proteins from simple molecular specifications.


Figure 4. Design and structural characterization of conformation switching macrocycles
Figure 5. Designed macrocycles are orally bioavailable in vivo in rodent models
Accurate de novo design of membrane-traversing macrocycles

August 2022

·

277 Reads

·

114 Citations

Cell

We use computational design coupled with experimental characterization to systematically investigate the design principles for macrocycle membrane permeability and oral bioavailability. We designed 184 6–12 residue macrocycles with a wide range of predicted structures containing noncanonical backbone modifications and experimentally determined structures of 35; 29 are very close to the computational models. With such control, we show that membrane permeability can be systematically achieved by ensuring all amide (NH) groups are engaged in internal hydrogen bonding interactions. 84 designs over the 6–12 residue size range cross membranes with an apparent permeability greater than 1 × 10⁻⁶ cm/s. Designs with exposed NH groups can be made membrane permeable through the design of an alternative isoenergetic fully hydrogen-bonded state favored in the lipid membrane. The ability to robustly design membrane-permeable and orally bioavailable peptides with high structural accuracy should contribute to the next generation of designed macrocycle therapeutics.


An overview of the workflow for modeling FAcc. Initially, multiple sequence alignments (MSAs) of all protein sequences are generated, the sequences are segmented into domains using the MSAs and the individual domains are folded using trRosetta. These domains are individually docked into the cryoEM density. Monte Carlo sampling finds the domain assignment that is maximally consistent with the experimental data (electron density, XL-MS etc., Fig. S5). Finally, linkers between domains are sampled and the entire structure is refined.
An overview of FAcc. (a) Domain organization of the seven subunits of FAcc. Based on our modeling, we find that the complex consists of 18 domains, indicated with narrow bars. FANCB and FAAP100 have the same domain organization, with a β-propeller (βprop) followed by a long coiled coil (CC), a β-­sandwich (βsand) and then an α/β domain, finally followed by a C-­terminal helical region. FANCC, FANCF and FANCG are all comprised of a single helical-repeat domain, while we find FANCE to have two separated helical-repeat domains (one N-terminal and one C-­terminal). Finally, FANCL is organized as an ELF domain followed by a URD domain and then lastly a RING domain. Also indicated is the availability of known structures or homologous proteins throughout the modeling process with striations. Domains with known structures or available homologous proteins used include the C-terminal helices of FANCE, the helical repeats of FANCF, all of FANCG and FANCL, and the β-propeller of FAAP100. (b) Three views of the complete model of FAcc as determined by our modeling protocol. Colors are matched to the diagram in (a), with those that have multiple copies (FANCB, FANCG and FAAP100) having different shades of the coloring. The orientations of the top, middle and bottom lobes are indicated.
An overview of trRosetta-predicted domains. (a) The top three models from trRosetta for ten representative domains indicate a tight convergence of modeling. The identity of the domains follows the coloring in Fig. 2(a). Domains from FANCB, FANCE and FANCC are shown in the top row, while those from FAAP100 and FANCG are shown in the bottom row. (b) Several examples of trRosetta models docked into density before refinement, showing the role that the map plays in the validation and selection of models. From left to right [the colors match those in Fig. 2(a)]: the helical repeats of FANCC, the N-terminal repeats of FANCE, the α/β domain of FANCB and the β-sandwich of FAAP100. (c, d) Two examples illustrating the importance of domain segmentation when docking trRosetta-generated models. (c) The trRosetta model of FANCG (magenta) poorly matches the final structure (green); segmenting this model into two domains (red and blue) shows a much better match, as the individual domain structures are accurate, even though their relative orientation is not. (d) Similarly, a trRosetta prediction of the FANCB β-sandwich–α/β domain (pink) is dissimilar from the final structure (blue); splitting it into domains (brown and green) shows good overall agreement. (e) trRosetta models (blue) generally fit the map well, although some refinement was necessary to maximize agreement with the density (orange).
Model validation by mutational and cross-linking data. (a) 30 nonbenign human mutations mapped to our model of FAcc. All interface mutations are marked with magenta spheres; non-interface mutations are marked with tan spheres. (b) Close-up renderings of cross-links throughout the FAcc model. Black lines indicate cross-links that are satisfied (<30 Å) by the final refined structure. Representatives from each cross-link cluster are shown for the middle lobe (left) and the bottom lobe (right).
Deep learning enables the atomic structure determination of the Fanconi Anemia core complex from cryoEM

August 2020

·

240 Reads

·

13 Citations

Cryo-electron microscopy of protein complexes often leads to moderate resolution maps (4–8 Å), with visible secondary-structure elements but poorly resolved loops, making model building challenging. In the absence of high-resolution structures of homologues, only coarse-grained structural features are typically inferred from these maps, and it is often impossible to assign specific regions of density to individual protein subunits. This paper describes a new method for overcoming these difficulties that integrates predicted residue distance distributions from a deep-learned convolutional neural network, computational protein folding using Rosetta, and automated EM-map-guided complex assembly. We apply this method to a 4.6 Å resolution cryoEM map of Fanconi Anemia core complex (FAcc), an E3 ubiquitin ligase required for DNA interstrand crosslink repair, which was previously challenging to interpret as it comprises 6557 residues, only 1897 of which are covered by homology models. In the published model built from this map, only 387 residues could be assigned to the specific subunits with confidence. By building and placing into density 42 deep-learning-guided models containing 4795 residues not included in the previously published structure, we are able to determine an almost-complete atomic model of FAcc, in which 5182 of the 6557 residues were placed. The resulting model is consistent with previously published biochemical data, and facilitates interpretation of disease-related mutational data. We anticipate that our approach will be broadly useful for cryoEM structure determination of large complexes containing many subunits for which there are no homologues of known structure.


Figure 4
Deep learning enables the atomic structure determination of the Fanconi Anemia core complex from cryoEM

May 2020

·

144 Reads

·

1 Citation

Cryo-electron microscopy of protein complexes often leads to moderate resolution maps (4-8 Å), with visible secondary structure elements but poorly resolved loops, making model-building challenging. In the absence of high-resolution structures of homologues, only coarse-grained structural features are typically inferred from these maps, and it is often impossible to assign specific regions of density to individual protein subunits. This paper describes a new method for overcoming these difficulties that integrates predicted residue distance distributions from a deep-learned convolutional neural network, computational protein folding using Rosetta, and automated EM-map-guided complex assembly. We apply this method to a 4.6 Å resolution cryoEM map of Fanconi Anemia core complex (FAcc), an E3 ubiquitin ligase required for DNA interstrand crosslink repair, which was previously challenging to interpret as it is comprised of 6557 residues, only 1897 of which are covered by homology models. In the published structure built from this map, only 387 residues could be assigned to specific subunits. By building and placing into density 42 deep-learning guided models containing 4795 residues not included in the previously published structure, we are able to determine an almost-complete atomic model of FAcc, in which 5182 of the 6557 residues were placed. The resulting model is consistent with previously published biochemical data, and facilitates interpretation of disease related mutational data. We anticipate that our approach will be broadly useful for cryoEM structure determination of large complexes containing many subunits for which there are no homologues of known structure.

Citations (8)


... Recent advances in computational design have enabled rapid and effective optimization of natural enzyme stability, expressibility, catalytic rate and selectivity through fully computational workflows 12,13 . Further more, advances in fold design enabled the grafting of natural or engineered active sites into idealized de novo backbones 14,15 . By contrast, enzymes designed de novo, that is, without recourse to naturally occurring enzymes that catalyse the same reaction, were orders of magnitude less active relative to comparable natural ones [1][2][3][4][5]11 . ...

Reference:

Complete computational design of high-efficiency Kemp elimination enzymes
Computational design of serine hydrolases
  • Citing Article
  • February 2025

Science

... Therefore, de novo designed proteins, once folded, often do not permit changes in structural conformation. This feature can result in high thermostability and solubility, but it limits the generation of proteins with functions that rely on dynamic behaviour, such as proteins involved in catalysis 100,[150][151][152][153] . Computational methods thus need to be developed that facilitate the design of dynamic, that is, more 'natural', proteins. ...

Computational Design of Metallohydrolases

... For instance, algorithms such as LigandMPNN, RFdiffusionAA, and AlphaFold3, which can account for nonprotein ligands, could be integrated to model ligand-bound structures with increased accuracy. [14,43,44] In addition, methods such as MD simulations or deep-learning tools like ChemNet [45] could further enhance design by probing conformational dynamics. Our MD screening narrowed the range of design hits from 96 to 7 variants, suggesting that short simulations (≤10 ns) can eliminate designs with nonproductive conformations. ...

Modeling protein-small molecule conformational ensembles with ChemNet

... This study highlights the efficacy of dynamic docking simulations to infer substrate specificity especially for enzymes with flexible active sites, such as cold-active enzymes [57]. In general, the proposed workflow, which combines dynamic docking and MD simulations, could be applied in a wide range of scenarios to predict and describe the substrate specificity of enzymes, including those poorly represented in training datasets, such as sequences from metagenomic campaigns or from de novo design [58][59][60][61][62]. In addition, this computational workflow can be integrated with conventional high-throughput activity-based assays for the discovery of novel catalytic functions [63]. ...

Computational design of serine hydrolases

... We reasoned that recent breakthroughs in generative DL methods could be leveraged to develop a robust pipeline for the accurate and efficient design of macrocycle binders. Diffusion models for protein design, such as RFdiffusion 17 , are trained to generate diverse protein structures from randomly initialized residues as starting points and have demonstrated remarkable success in designing protein monomers, binders and symmetric oligomers of medium-sized to large-sized proteins. However, despite considerable recent progress in DL-based protein design methods, these methods are not readily applicable to designing macrocyclic peptides. ...

De novo design of protein structure and function with RFdiffusion

Nature

... Structural validation in the lab of these models is currently underway, with some researchers reporting partial success, even when solving for previously unknown structures [46]. Successful applications of sequence-to-shape models in protein research include vaccine design [47], binding affinity ranking [48], protein sequence design [49,50] and benchmarking [51]. ...

Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models

... Therefore, we next used Rosetta 24 and structural diversity accessible to macrocycles. Moreover, such approaches frequently fail to simultaneously optimize for multiple biophysical properties, such as target binding, selectivity and membrane permeability, because of the precise structural control required to achieve such functional properties 5 . ...

Accurate de novo design of membrane-traversing macrocycles

Cell

... Over the next few years ML and AI methods will permeate much of the cutting-edge research reported in IUCr journals, including IUCrJ. We note that the Physics and Chemistry Nobel Prizes in 2024 were awarded for achievements associated with the development and application of AI and ML methods, and it is an honor to acknowledge one of the Chemistry Nobel Laureates, David Baker, for his two papers published in IUCrJ (Simkovic et al., 2017;Farrell et al., 2020), as well as multiple papers in other IUCr journals. ...

Deep learning enables the atomic structure determination of the Fanconi Anemia core complex from cryoEM