[show abstract][hide abstract] ABSTRACT: The DNA sequence used to encode a polypeptide can have dramatic effects on its expression. Lack of readily available tools has until recently inhibited meaningful experimental investigation of this phenomenon. Advances in synthetic biology and the application of modern engineering approaches now provide the tools for systematic analysis of the sequence variables affecting heterologous expression of recombinant proteins. We here discuss how these new tools are being applied and how they circumvent the constraints of previous approaches, highlighting some of the surprising and promising results emerging from the developing field of gene engineering.
Protein Expression and Purification 03/2012; 83(1):37-46. · 1.43 Impact Factor
[show abstract][hide abstract] ABSTRACT: The promise of synthetic biology lies in the creation of novel function from the proper combination of genetic elements. De novo gene synthesis has become a cost-effective method for building virtually any conceptualized genetic construct, removing the constraints of extant sequences, and greatly facilitating study of the relationships between gene sequence and function. With the rapid increase in the number and variety of characterized and cataloged genetic elements, tools that facilitate assembly of such parts into functional constructs (genes, vectors, circuits, etc.) are essential. The Gene Designer software allows scientists and engineers to readily manage and recombine genetic elements into novel assemblies. It also provides tools for the simulation of molecular cloning schemes as well as the engineering and optimization of protein-coding sequences. Together, the functions in Gene Designer provide a complete capability to design functional genetic constructs.
Methods in molecular biology (Clifton, N.J.) 01/2012; 852:197-213.
[show abstract][hide abstract] ABSTRACT: DNA sequences are now far more readily available in silico than as physical DNA. De novo gene synthesis is an increasingly cost-effective method for building genetic constructs, and effectively removes the constraint of basing constructs on extant sequences. This allows scientists and engineers to experimentally test their hypotheses relating sequence to function. Molecular biologists, and now synthetic biologists, are characterizing and cataloging genetic elements with specific functions, aiming to combine them to perform complex functions. However, the most common purpose of synthetic genes is for the expression of an encoded protein. The huge number of different proteins makes it impossible to characterize and catalog each functional gene. Instead, it is necessary to abstract design principles from experimental data: data that can be generated by making predictions followed by synthesizing sequences to test those predictions. Because of the degeneracy of the genetic code, design of gene sequences to encode proteins is a high-dimensional problem, so there is no single simple formula to guarantee success. Nevertheless, there are several straightforward steps that can be taken to greatly increase the probability that a designed sequence will result in expression of the encoded protein. In this chapter, we discuss gene sequence parameters that are important for protein expression. We also describe algorithms for optimizing these parameters, and troubleshooting procedures that can be helpful when initial attempts fail. Finally, we show how many of these methods can be accomplished using the synthetic biology software tool Gene Designer.
Methods in enzymology 01/2011; 498:43-66. · 1.90 Impact Factor
[show abstract][hide abstract] ABSTRACT: A vast number of different nucleic acid sequences can all be translated by the genetic code into the same amino acid sequence. These sequences are not all equally useful however; the exact sequence chosen can have profound effects on the expression of the encoded protein. Despite the importance of protein-coding sequences, there has been little systematic study to identify parameters that affect expression. This is probably because protein expression has largely been tackled on an ad hoc basis in many independent projects: once a sequence has been obtained that yields adequate expression for that project, there is little incentive to continue work on the problem. Synthetic biology may now provide the impetus to transform protein expression folklore into design principles, so that DNA sequences may easily be designed to express any protein in any system. In this review, we offer a brief survey of the literature, outline the major challenges in interpreting existing data and constructing robust design algorithms, and propose a way to proceed towards the goal of rational sequence engineering.
Journal of The Royal Society Interface 04/2009; 6 Suppl 4:S467-76. · 4.91 Impact Factor
[show abstract][hide abstract] ABSTRACT: Production of proteins as therapeutic agents, research reagents and molecular tools frequently depends on expression in heterologous hosts. Synthetic genes are increasingly used for protein production because sequence information is easier to obtain than the corresponding physical DNA. Protein-coding sequences are commonly re-designed to enhance expression, but there are no experimentally supported design principles.
To identify sequence features that affect protein expression we synthesized and expressed in E. coli two sets of 40 genes encoding two commercially valuable proteins, a DNA polymerase and a single chain antibody. Genes differing only in synonymous codon usage expressed protein at levels ranging from undetectable to 30% of cellular protein. Using partial least squares regression we tested the correlation of protein production levels with parameters that have been reported to affect expression. We found that the amount of protein produced in E. coli was strongly dependent on the codons used to encode a subset of amino acids. Favorable codons were predominantly those read by tRNAs that are most highly charged during amino acid starvation, not codons that are most abundant in highly expressed E. coli proteins. Finally we confirmed the validity of our models by designing, synthesizing and testing new genes using codon biases predicted to perform well.
The systematic analysis of gene design parameters shown in this study has allowed us to identify codon usage within a gene as a critical determinant of achievable protein expression levels in E. coli. We propose a biochemical basis for this, as well as design algorithms to ensure high protein production from synthetic genes. Replication of this methodology should allow similar design algorithms to be empirically derived for any expression system.
PLoS ONE 01/2009; 4(9):e7002. · 3.73 Impact Factor