[Show abstract][Hide abstract] ABSTRACT: The DNA sequence used to encode a polypeptide can have dramatic effects on its expression. Lack of readily available tools has until recently inhibited meaningful experimental investigation of this phenomenon. Advances in synthetic biology and the application of modern engineering approaches now provide the tools for systematic analysis of the sequence variables affecting heterologous expression of recombinant proteins. We here discuss how these new tools are being applied and how they circumvent the constraints of previous approaches, highlighting some of the surprising and promising results emerging from the developing field of gene engineering.
Protein Expression and Purification 03/2012; 83(1):37-46. DOI:10.1016/j.pep.2012.02.013 · 1.70 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Omega-hydroxyfatty acids are excellent monomers for synthesizing a unique family of polyethylene-like biobased plastics. However, ω-hydroxyfatty acids are difficult and expensive to prepare by traditional organic synthesis, precluding their use in commodity materials. Here we report the engineering of a strain of the diploid yeast Candida tropicalis to produce commercially viable yields of ω-hydroxyfatty acids. To develop the strain we identified and eliminated 16 genes encoding 6 cytochrome P450s, 4 fatty alcohol oxidases, and 6 alcohol dehydrogenases from the C. tropicalis genome. We also show that fatty acids with different chain lengths and degrees of unsaturation can be more efficiently oxidized by expressing different P450s within this strain background. Biocatalysis using engineered C. tropicalis is thus a potentially attractive biocatalytic platform for producing commodity chemicals from renewable resources.
Journal of the American Chemical Society 10/2010; 132(43):15451-5. DOI:10.1021/ja107707v · 12.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Production of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Protein-coding sequences are commonly re-designed to enhance expression, but there are no experimentally supported design principles.
To identify sequence features that affect protein expression we synthesized and expressed in E. coli two sets of 40 genes encoding two commercially valuable proteins, a DNA polymerase and a single chain antibody. Genes differing only in synonymous codon usage expressed protein at levels ranging from undetectable to 30% of cellular protein. Using partial least squares regression we tested the correlation of protein production levels with parameters that have been reported to affect expression. We found that the amount of protein produced in E. coli was strongly dependent on the codons used to encode a subset of amino acids. Favorable codons were predominantly those read by tRNAs that are most highly charged during amino acid starvation, not codons that are most abundant in highly expressed E. coli proteins. Finally we confirmed the validity of our models by designing, synthesizing and testing new genes using codon biases predicted to perform well.
The systematic analysis of gene design parameters shown in this study has allowed us to identify codon usage within a gene as a critical determinant of achievable protein expression levels in E. coli. We propose a biochemical basis for this, as well as design algorithms to ensure high protein production from synthetic genes. Replication of this methodology should allow similar design algorithms to be empirically derived for any expression system.
PLoS ONE 09/2009; 4(9):e7002. DOI:10.1371/journal.pone.0007002 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Using engineered Candida tropicalis strains, fatty acids were transformed to their corresponding ω-carboxyl and w-hydroxyl fatty acids. A wide range of α,ω-dicarboxylic acids with different carbon lengths (C12-C22) and internal functionality (e.g. alkene, alkyne, hydroxyl and epoxy) were prepared using the commercially available strain C. tropicalis ATCC20962. Furthermore, in collaboration with DNA 2.0, a whole-cell biotransformation process was successfully developed for conversion of long-chain fatty acids to ω-hydroxyfatty acids. This required removal of selected enzyme genes responsible for up-oxidation of ω-hydroxyfatty acids from the Candida tropicalis host strain and re-integration of a w-specific P450 enzyme. For the first time, high levels of ω-hydroxyfatty acids were prepared via microbial fermentation. For example, after process optimization, higher than 150 g/l of 14-hydroxy tetradecanoic acid was produced using methyl myristate as substrate by one of the engineered C. tropicalis strains. Compared to chemical synthetic methods, the proposed microbial transformation route has less by-products and extraordinary selectivity. Through both improvements in the strain and the fermentation process we expect to achieve high levels of product formation and generate a new platform of ω-hydroxyfatty acids for a variety of uses.
13th Annual Green Chemistry and Engineering Conference; 06/2009
[Show abstract][Hide abstract] ABSTRACT: Altering a protein's function by changing its sequence allows natural proteins to be converted into useful molecular tools. Current protein engineering methods are limited by a lack of high throughput physical or computational tests that can accurately predict protein activity under conditions relevant to its final application. Here we describe a new synthetic biology approach to protein engineering that avoids these limitations by combining high throughput gene synthesis with machine learning-based design algorithms.
We selected 24 amino acid substitutions to make in proteinase K from alignments of homologous sequences. We then designed and synthesized 59 specific proteinase K variants containing different combinations of the selected substitutions. The 59 variants were tested for their ability to hydrolyze a tetrapeptide substrate after the enzyme was first heated to 68 degrees C for 5 minutes. Sequence and activity data was analyzed using machine learning algorithms. This analysis was used to design a new set of variants predicted to have increased activity over the training set, that were then synthesized and tested. By performing two cycles of machine learning analysis and variant design we obtained 20-fold improved proteinase K variants while only testing a total of 95 variant enzymes.
The number of protein variants that must be tested to obtain significant functional improvements determines the type of tests that can be performed. Protein engineers wishing to modify the property of a protein to shrink tumours or catalyze chemical reactions under industrial conditions have until now been forced to accept high throughput surrogate screens to measure protein properties that they hope will correlate with the functionalities that they intend to modify. By reducing the number of variants that must be tested to fewer than 100, machine learning algorithms make it possible to use more complex and expensive tests so that only protein properties that are directly relevant to the desired application need to be measured. Protein design algorithms that only require the testing of a small number of variants represent a significant step towards a generic, resource-optimized protein engineering process.
[Show abstract][Hide abstract] ABSTRACT: Direct synthesis of genes is rapidly becoming the most efficient way to make functional genetic constructs and enables applications such as codon optimization, RNAi resistant genes and protein engineering. Here we introduce a software tool that drastically facilitates the design of synthetic genes.
Gene Designer is a stand-alone software for fast and easy design of synthetic DNA segments. Users can easily add, edit and combine genetic elements such as promoters, open reading frames and tags through an intuitive drag-and-drop graphic interface and a hierarchical DNA/Protein object map. Using advanced optimization algorithms, open reading frames within the DNA construct can readily be codon optimized for protein expression in any host organism. Gene Designer also includes features such as a real-time sliding calculator of oligonucleotide annealing temperatures, sequencing primer generator, tools for avoidance or inclusion of restriction sites, and options to maximize or minimize sequence identity to a reference.
Gene Designer is an expandable Synthetic Biology workbench suitable for molecular biologists interested in the de novo creation of genetic constructs.
[Show abstract][Hide abstract] ABSTRACT: There are two main reasons to try to predict an enzyme's function from its sequence. The first is to identify the components and thus the functional capabilities of an organism, the second is to create enzymes with specific properties. Genomics, expression analysis, proteomics and metabonomics are largely directed towards understanding how information flows from DNA sequence to protein functions within an organism. This review focuses on information flow in the opposite direction: the applicability of what is being learned from natural enzymes to improve methods for catalyst design.
Current Opinion in Chemical Biology 05/2005; 9(2):202-9. DOI:10.1016/j.cbpa.2005.02.003 · 6.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Almost all protein engineering methods rely upon making changes to naturally occurring proteins that already possess some of the desired properties. This will probably remain the case as long as we lack a complete understanding of the way that an amino acid sequence gives rise to a protein with a precisely defined biological function. Common to all methods for altering an existing protein is the selection of a subset of amino acids in the protein for variation and a choice of which substitutions to make at each position. Variants are then tested empirically and further variants are created based upon their performance. Differences between protein engineering methods are the ways in which amino acids are chosen for variation, the protocols followed for creating the variants, and how information regarding variant properties is used in creating subsequent variants. In this article, we describe these differences and provide examples of how the experimental parameters of specific projects determine which method is most suitable.
[Show abstract][Hide abstract] ABSTRACT: The members of the mechanistically diverse, (beta/alpha)(8)-barrel fold-containing enolase superfamily evolved from a common progenitor but catalyze different reactions using a conserved partial reaction. The molecular pathway for natural divergent evolution of function in the superfamily is unknown. We have identified single-site mutants of the (beta/alpha)(8)-barrel domains in both the l-Ala-d/l-Glu epimerase from Escherichia coli (AEE) and the muconate lactonizing enzyme II from Pseudomonas sp. P51 (MLE II) that catalyze the o-succinylbenzoate synthase (OSBS) reaction as well as the wild-type reaction. These enzymes are members of the MLE subgroup of the superfamily, share conserved lysines on opposite sides of their active sites, but catalyze acid- and base-mediated reactions with different mechanisms. A comparison of the structures of AEE and the OSBS from E. coli was used to design the D297G mutant of AEE; the E323G mutant of MLE II was isolated from directed evolution experiments. Although neither wild-type enzyme catalyzes the OSBS reaction, both mutants complement an E. coli OSBS auxotroph and have measurable levels of OSBS activity. The analogous mutations in the D297G mutant of AEE and the E323G mutant of MLE II are each located at the end of the eighth beta-strand of the (beta/alpha)(8)-barrel and alter the ability of AEE and MLE II to bind the substrate of the OSBS reaction. The substitutions relax the substrate specificity, thereby allowing catalysis of the mechanistically diverse OSBS reaction with the assistance of the active site lysines. The generation of functionally promiscuous and mechanistically diverse enzymes via single-amino acid substitutions likely mimics the natural divergent evolution of enzymatic activities and also highlights the utility of the (beta/alpha)(8)-barrel as a scaffold for new function.
[Show abstract][Hide abstract] ABSTRACT: During protein evolution, amino acids change due to a combination of functional constraints and genetic drift. Proteins frequently contain pairs of amino acids that appear to change together (covariation). Analysis of covariation from naturally occurring sets of orthologs cannot distinguish between residue pairs retained by functional requirements of the protein and those pairs existing due to changes along a common evolutionary path. Here, we have separated the two types of covariation by independently recombining every naturally occurring amino acid variant within a set of 15 subtilisin orthologs. Our analysis shows that in this family of subtilisin orthologs, almost all possible pairwise combinations of amino acids can coexist. This suggests that amino acid covariation found in the subtilisin orthologs is almost entirely due to common ancestral origin of the changes rather than functional constraints. We conclude that naturally occurring sequence diversity can be used to identify positions that can vary independently without destroying protein function.
[Show abstract][Hide abstract] ABSTRACT: We describe synthetic shuffling, an evolutionary protein engineering technology in which every amino acid from a set of parents is allowed to recombine independently of every other amino acid. With the use of degenerate oligonucleotides, synthetic shuffling provides a direct route from database sequence information to functional libraries. Physical starting genes are unnecessary, and additional design criteria such as optimal codon usage or known beneficial mutations can also be incorporated. We performed synthetic shuffling of 15 subtilisin genes and obtained active and highly chimeric enzymes with desirable combinations of properties that we did not obtain by other directed-evolution methods.
[Show abstract][Hide abstract] ABSTRACT: Directed evolution by DNA shuffling has been used to modify physical and catalytic properties of biological systems. We have shuffled two highly homologous triazine hydrolases and conducted an exploration of the substrate specificities of the resulting enzymes to acquire a better understanding of the possible distributions of novel functions in sequence space.
Both parental enzymes and a library of 1600 variant triazine hydrolases were screened against a synthetic library of 15 triazines. The shuffled library contained enzymes with up to 150-fold greater transformation rates than either parent. It also contained enzymes that hydrolyzed five of eight triazines that were not substrates for either starting enzyme.
Permutation of nine amino acid differences resulted in a set of enzymes with surprisingly diverse patterns of reactions catalyzed. The functional richness of this small area of sequence space may aid our understanding of both natural and artificial evolution.
[Show abstract][Hide abstract] ABSTRACT: Directed evolution can be a powerful tool to predict antibiotic resistance. Resistance involves the accumulation of mutations beneficial to the pathogen while maintaining residue interactions and core packing that are critical for preserving function. The constraint of maintaining stability, while increasing activity, drastically reduces the number of possible mutational combination pathways. To test this theory, TEM-1 beta-lactamase was evolved using a hypermutator E. coli-based directed evolution technique with cefotaxime selection. The selected mutants were compared to two previous directed evolution studies and a database of clinical isolates. In all cases, evolution resulted in the generation of the E104K/M182T/G238S combination of mutations ( approximately 500-fold increased resistance), which is equivalent to clinical isolate TEM-52. The structure of TEM-52 was determined to 2.4 A. G238S widens access to the active site by 2.8 A whereas E104K stabilizes the reorganized topology. The M182T mutation is located 17 A from the active site and appears to be a global suppressor mutation that acts to stabilize the new enzyme structure. Our results demonstrate that directed evolution coupled with structural analysis can be used to predict future mutations that lead to increased antibiotic resistance.
[Show abstract][Hide abstract] ABSTRACT: DNA family shuffling of 26 protease genes was used to create a library
of chimeric proteases that was screened for four distinct enzymatic properties.
Multiple clones were identified that were significantly improved over any
of the parental enzymes for each individual property. Family shuffling, also
known as molecular breeding, efficiently created all of the combinations of
parental properties, producing a great diversity of property combinations
in the progeny enzymes. Thus, molecular breeding, like classical breeding,
is a powerful tool for recombining existing diversity to tailor biological
systems for multiple functional parameters.