[show abstract][hide abstract] ABSTRACT: Omega-hydroxyfatty acids are excellent monomers for synthesizing a unique family of polyethylene-like biobased plastics. However, ω-hydroxyfatty acids are difficult and expensive to prepare by traditional organic synthesis, precluding their use in commodity materials. Here we report the engineering of a strain of the diploid yeast Candida tropicalis to produce commercially viable yields of ω-hydroxyfatty acids. To develop the strain we identified and eliminated 16 genes encoding 6 cytochrome P450s, 4 fatty alcohol oxidases, and 6 alcohol dehydrogenases from the C. tropicalis genome. We also show that fatty acids with different chain lengths and degrees of unsaturation can be more efficiently oxidized by expressing different P450s within this strain background. Biocatalysis using engineered C. tropicalis is thus a potentially attractive biocatalytic platform for producing commodity chemicals from renewable resources.
Journal of the American Chemical Society 10/2010; 132(43):15451-5. · 10.68 Impact Factor
[show abstract][hide abstract] ABSTRACT: Production of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Protein-coding sequences are commonly re-designed to enhance expression, but there are no experimentally supported design principles.
To identify sequence features that affect protein expression we synthesized and expressed in E. coli two sets of 40 genes encoding two commercially valuable proteins, a DNA polymerase and a single chain antibody. Genes differing only in synonymous codon usage expressed protein at levels ranging from undetectable to 30% of cellular protein. Using partial least squares regression we tested the correlation of protein production levels with parameters that have been reported to affect expression. We found that the amount of protein produced in E. coli was strongly dependent on the codons used to encode a subset of amino acids. Favorable codons were predominantly those read by tRNAs that are most highly charged during amino acid starvation, not codons that are most abundant in highly expressed E. coli proteins. Finally we confirmed the validity of our models by designing, synthesizing and testing new genes using codon biases predicted to perform well.
The systematic analysis of gene design parameters shown in this study has allowed us to identify codon usage within a gene as a critical determinant of achievable protein expression levels in E. coli. We propose a biochemical basis for this, as well as design algorithms to ensure high protein production from synthetic genes. Replication of this methodology should allow similar design algorithms to be empirically derived for any expression system.
PLoS ONE 01/2009; 4(9):e7002. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: Altering a protein's function by changing its sequence allows natural proteins to be converted into useful molecular tools. Current protein engineering methods are limited by a lack of high throughput physical or computational tests that can accurately predict protein activity under conditions relevant to its final application. Here we describe a new synthetic biology approach to protein engineering that avoids these limitations by combining high throughput gene synthesis with machine learning-based design algorithms.
We selected 24 amino acid substitutions to make in proteinase K from alignments of homologous sequences. We then designed and synthesized 59 specific proteinase K variants containing different combinations of the selected substitutions. The 59 variants were tested for their ability to hydrolyze a tetrapeptide substrate after the enzyme was first heated to 68 degrees C for 5 minutes. Sequence and activity data was analyzed using machine learning algorithms. This analysis was used to design a new set of variants predicted to have increased activity over the training set, that were then synthesized and tested. By performing two cycles of machine learning analysis and variant design we obtained 20-fold improved proteinase K variants while only testing a total of 95 variant enzymes.
The number of protein variants that must be tested to obtain significant functional improvements determines the type of tests that can be performed. Protein engineers wishing to modify the property of a protein to shrink tumours or catalyze chemical reactions under industrial conditions have until now been forced to accept high throughput surrogate screens to measure protein properties that they hope will correlate with the functionalities that they intend to modify. By reducing the number of variants that must be tested to fewer than 100, machine learning algorithms make it possible to use more complex and expensive tests so that only protein properties that are directly relevant to the desired application need to be measured. Protein design algorithms that only require the testing of a small number of variants represent a significant step towards a generic, resource-optimized protein engineering process.
[show abstract][hide abstract] ABSTRACT: Direct synthesis of genes is rapidly becoming the most efficient way to make functional genetic constructs and enables applications such as codon optimization, RNAi resistant genes and protein engineering. Here we introduce a software tool that drastically facilitates the design of synthetic genes.
Gene Designer is a stand-alone software for fast and easy design of synthetic DNA segments. Users can easily add, edit and combine genetic elements such as promoters, open reading frames and tags through an intuitive drag-and-drop graphic interface and a hierarchical DNA/Protein object map. Using advanced optimization algorithms, open reading frames within the DNA construct can readily be codon optimized for protein expression in any host organism. Gene Designer also includes features such as a real-time sliding calculator of oligonucleotide annealing temperatures, sequencing primer generator, tools for avoidance or inclusion of restriction sites, and options to maximize or minimize sequence identity to a reference.
Gene Designer is an expandable Synthetic Biology workbench suitable for molecular biologists interested in the de novo creation of genetic constructs.
[show abstract][hide abstract] ABSTRACT: There are two main reasons to try to predict an enzyme's function from its sequence. The first is to identify the components and thus the functional capabilities of an organism, the second is to create enzymes with specific properties. Genomics, expression analysis, proteomics and metabonomics are largely directed towards understanding how information flows from DNA sequence to protein functions within an organism. This review focuses on information flow in the opposite direction: the applicability of what is being learned from natural enzymes to improve methods for catalyst design.
Current Opinion in Chemical Biology 05/2005; 9(2):202-9. · 9.47 Impact Factor
[show abstract][hide abstract] ABSTRACT: Almost all protein engineering methods rely upon making changes to naturally occurring proteins that already possess some of the desired properties. This will probably remain the case as long as we lack a complete understanding of the way that an amino acid sequence gives rise to a protein with a precisely defined biological function. Common to all methods for altering an existing protein is the selection of a subset of amino acids in the protein for variation and a choice of which substitutions to make at each position. Variants are then tested empirically and further variants are created based upon their performance. Differences between protein engineering methods are the ways in which amino acids are chosen for variation, the protocols followed for creating the variants, and how information regarding variant properties is used in creating subsequent variants. In this article, we describe these differences and provide examples of how the experimental parameters of specific projects determine which method is most suitable.
[show abstract][hide abstract] ABSTRACT: The members of the mechanistically diverse, (beta/alpha)(8)-barrel fold-containing enolase superfamily evolved from a common progenitor but catalyze different reactions using a conserved partial reaction. The molecular pathway for natural divergent evolution of function in the superfamily is unknown. We have identified single-site mutants of the (beta/alpha)(8)-barrel domains in both the l-Ala-d/l-Glu epimerase from Escherichia coli (AEE) and the muconate lactonizing enzyme II from Pseudomonas sp. P51 (MLE II) that catalyze the o-succinylbenzoate synthase (OSBS) reaction as well as the wild-type reaction. These enzymes are members of the MLE subgroup of the superfamily, share conserved lysines on opposite sides of their active sites, but catalyze acid- and base-mediated reactions with different mechanisms. A comparison of the structures of AEE and the OSBS from E. coli was used to design the D297G mutant of AEE; the E323G mutant of MLE II was isolated from directed evolution experiments. Although neither wild-type enzyme catalyzes the OSBS reaction, both mutants complement an E. coli OSBS auxotroph and have measurable levels of OSBS activity. The analogous mutations in the D297G mutant of AEE and the E323G mutant of MLE II are each located at the end of the eighth beta-strand of the (beta/alpha)(8)-barrel and alter the ability of AEE and MLE II to bind the substrate of the OSBS reaction. The substitutions relax the substrate specificity, thereby allowing catalysis of the mechanistically diverse OSBS reaction with the assistance of the active site lysines. The generation of functionally promiscuous and mechanistically diverse enzymes via single-amino acid substitutions likely mimics the natural divergent evolution of enzymatic activities and also highlights the utility of the (beta/alpha)(8)-barrel as a scaffold for new function.
[show abstract][hide abstract] ABSTRACT: During protein evolution, amino acids change due to a combination of functional constraints and genetic drift. Proteins frequently contain pairs of amino acids that appear to change together (covariation). Analysis of covariation from naturally occurring sets of orthologs cannot distinguish between residue pairs retained by functional requirements of the protein and those pairs existing due to changes along a common evolutionary path. Here, we have separated the two types of covariation by independently recombining every naturally occurring amino acid variant within a set of 15 subtilisin orthologs. Our analysis shows that in this family of subtilisin orthologs, almost all possible pairwise combinations of amino acids can coexist. This suggests that amino acid covariation found in the subtilisin orthologs is almost entirely due to common ancestral origin of the changes rather than functional constraints. We conclude that naturally occurring sequence diversity can be used to identify positions that can vary independently without destroying protein function.
Journal of Molecular Biology 06/2003; 328(5):1061-9. · 3.91 Impact Factor
[show abstract][hide abstract] ABSTRACT: We describe synthetic shuffling, an evolutionary protein engineering technology in which every amino acid from a set of parents is allowed to recombine independently of every other amino acid. With the use of degenerate oligonucleotides, synthetic shuffling provides a direct route from database sequence information to functional libraries. Physical starting genes are unnecessary, and additional design criteria such as optimal codon usage or known beneficial mutations can also be incorporated. We performed synthetic shuffling of 15 subtilisin genes and obtained active and highly chimeric enzymes with desirable combinations of properties that we did not obtain by other directed-evolution methods.