Conference PaperPDF Available

GenomeCarver: harvesting genetic parts from genomes to support biological design automation

Authors:
  • Visiting Scientist

Abstract and Figures

Concept The advance of genome sequencing and annotation has provided a " gold mine " of genetic parts, which all synthetic biologists wish to include in their toolbox of parts with which to build synthetic biological systems. The currently available computer assisted design systems (CADs) focus heavily, if not exclusively, on composing biological systems using genetic parts [3][7][9], however, how a user obtains parts in the first place remains an open question. To make matters worse, there are a few dozen part standards being proposed and used in the synthetic biology community (104 RFCs on part standard as of today). Even though one can extract a few parts from the genome manually, there is no software to ensure the standard compatibility of parts, and it is also very dicult to scale up the design of parts. With these problems in mind, we present GenomeCarver, a computational tool for the harvesting and packaging of biological parts from model genomes. GenomeCarver interfaces with various genomes, identifies regions of interest according to user specification (e.g., promoters, open reading frames and terminators) and extraction rules (e.g., a promoter is defined as 500bp upstream of the ATG start codon or last gene boundary, which comes shorter), extracts corresponding DNA sequences from the genome feature files (GFFs), checks the sequence's compatibility with the selected standard (e.g., whether the given sequence includes the forbidden restriction sites of certain parts standards), and finally outputs optimized primer sequences to amplify the parts from genomic DNA, adding necessary flanking sequences to standardize the parts. Through its compatibility with multiple genomes and multiple parts standards, GenomeCarver bridges the fields of systems biology and synthetic biology, and greatly enriches synthetic biologists' design toolbox. It complements many parts-based design tools which currently exist by supporting the Synthetic Biology Open Language standard [6]. Implementation GenomeCarver can be accessed as an application built on ⇤ These authors contributed equally to this work.
Content may be subject to copyright.
GenomeCarver: harvesting genetic parts from genomes to
support biological design automation
Emily Scher
School of Informatics
University of Edinburgh
Edinburgh EH9 3JR, UK
Yisha Luo
School of Biological Sciences
University of Edinburgh
Edinburgh EH9 3JR, UK
Aaron Berliner
Autodesk Research
San Francisco, California
94111, USA
Jacqueline Quinn
Autodesk Research
San Francisco, California
94111, USA
Carlos Olguin
Autodesk Research
San Francisco, California
94111, USA
Dr. Yizhi Cai
School of Biological Sciences
University of Edinburgh
Edinburgh EH9 3JR, UK
ABSTRACT
Concept
The advance of genome sequencing and annotation has pro-
vided a “gold mine” of genetic parts, which all synthetic biol-
ogists wish to include in their toolbox of parts with which to
build synthetic biological systems. The currently available
computer assisted design systems (CADs) focus heavily, if
not exclusively, on composing biological systems using ge-
netic parts [3][7][9], however, how a user obtains parts in
the first place remains an open question. To make matters
worse, there are a few dozen part standards being proposed
and used in the synthetic biology community (104 RFCs on
part standard as of today). Even though one can extract a
few parts from the genome manually, there is no software
to ensure the standard compatibility of parts, and it is also
very dicult to scale up the design of parts.
With these problems in mind, we present GenomeCarver, a
computational tool for the harvesting and packaging of bio-
logical parts from model genomes. GenomeCarver interfaces
with various genomes, identifies regions of interest according
to user specification (e.g., promoters, open reading frames
and terminators) and extraction rules (e.g., a promoter is
defined as 500bp upstream of the ATG start codon or last
gene boundary, which comes shorter), extracts correspond-
ing DNA sequences from the genome feature files (GFFs),
checks the sequence’s compatibility with the selected stan-
dard (e.g., whether the given sequence includes the forbid-
den restriction sites of certain parts standards), and finally
outputs optimized primer sequences to amplify the parts
from genomic DNA, adding necessary flanking sequences to
standardize the parts.
Through its compatibility with multiple genomes and mul-
tiple parts standards, GenomeCarver bridges the fields of
systems biology and synthetic biology, and greatly enriches
synthetic biologists’ design toolbox. It complements many
parts-based design tools which currently exist by supporting
the Synthetic Biology Open Language standard [6].
Implementation
GenomeCarver can be accessed as an application built on
These authors contributed equally to this work.
Corresponding author. E-mail:yizhi.cai@ed.ac.uk
Autodesk’s Project Cyborg (http://autodeskresearch.com/
projects/cyborg). Project Cyborg is a cloud-based plat-
form for computational tools in the life sciences and pro-
grammable matter space, supporting design and engineering
across domains and scales. Cyborg enables elastic comput-
ing through a node framework that natively provides sup-
port for simulation, optimization, and visualization. Being
built on Cyborg, GenomeCarver is comprised of nodes for
each step of the workflow connected to form a cohesive user
experience that guides the user through the tool.
GenomeCarver currently supports three model organisms:
yeast Saccharomyces cerevisiae, bacterial Escherichia coli,
and plant Arabidopsis thaliana. However, GenomeCarver
is flexible enough to be extended to interface a variety of
organisms, which we plan to do in the near future. Simi-
larly, GenomeCarver currently supports a finite number of
mainstream parts standards such as the BioBrick 1.0[1] and
yeast Golden Gate standards[2], but new standards could
easily be incorporated. In a future implementation, we even
plan to allow users to import their own, custom standards.
While it’s interfacing with multiple genomes and standards
has made GenomeCarver flexible, it’s being built on top of
Cyborg further’s the tool’s flexibility, as GenomeCarver will
be able to be used in conjunction with the other tools cur-
rently being developed on the same platform.
Figure 1 shows the application’s workflow. First, a user
chooses a genome, a category and the loci of the part. For
instance, a user may choose the promoter of Gal loci from
Saccharomyces cerevisiae. Optionally, the user can then de-
fine the preferred promoter and terminator lengths, or spec-
ify that they would like gene boundaries to be ignored. The
default maximum promoter length is 500 base pairs, and
the default maximum terminator length is 200 base pairs.
If the user does not specify that gene boundaries should
be ignored, then GenomeCarver will identify a gene’s pro-
moter as the upstream (5’ to 3’) sequence of a maximum
length of 500 (or the specified maximum length) which does
not overlap another gene. It will identify the terminator as
the following sequence of a maximum length of 200 bases
which does not overlap another gene. GenomeCarver then
returns the specified sequence(s). The user can then as-
sign the sequence(s) to a standard using another drop down
menu. Once selected, the sequence will then be checked
16
IWBDA 2014, June 11–12, 2014, Boston, Massachusetts, USA.
Copyright is held by the owner/author(s). Publication rights licensed to BDAC.
IWBDA 2014, June 11-12, 2014, Boston, Massachusetts, USA. Copyright is held by the owner/author(s).
BDAC acknowledges that this contribution was authored or co-authored by a contractor or affiliate of the
U.S. Government. As such, the Government retains a nonexclusive, royalty-free right to publish or
reproduce this article, or to allow others to do so, for Government purposes only.
Figure 1: The workflow of GenomeCarver
for restriction sites, returning a warning if an incompatible
restriction site is found. The part will then be packaged
by adding the appropriate prefix and sux. GenomeCarver
then allows the packaged parts to be exported in CSV and
SBOL formats [6].
Experimental verification
GenomeCarver has been used extensively in several labs
in the USA, UK and China to systematically design thou-
sands of yeast parts of each category conforming to the yeast
Golden Gate standard. We used the designed primers to am-
plify parts from genomic DNA in a high throughput fashion,
cloning the parts onto Topo vector backbone, and sequence
verifying them all (data not shown in the abstract). We
also demonstrated high ecient assembly of various genetic
switches using these parts and standard Golden Gate reac-
tion, and transformed these assembled switches into yeast
for functional assays. Most recently, GenomeCarver has
been used to design all the 6000 yeast promoters and 6000
yeast terminators, which demonstrates that we can scale up
the design automation easily.
Fut u r e p l a n
In the next version of GenomeCarver, we are planing to
include additional genomes, such as mammalian ones, as
well as to support user-customized standards. Batch de-
sign functionality will also be developed to support large
projects, such as BioFab (http://biofab.synberc.org/) type
projects for various genomes. We will also develop better
primer design strategies [8] to maximize the parts amplifi-
cation success rate. We are also planning to support codon
optimization for gene parts, so that a user can carve out
a gene from one species and codon optimize it for another
species, and GenomeCarve will output oligonucleotides for
de novo DNA synthesis. Finally, a better integration with
existing parts-based design tools will be needed for a better
user design experience.
Conclusion
GenomeCarver has been built to fill a gap left by existing
Synthetic Biology computational tools. It allows users to
extract parts directly from genomes, and to package them
into standardized formats for parts synthesis. We have used
this tool to design over 12,000 parts, and constructed and
verified several hundred of them. This tool, along with the
parts repository we created using it, will be a useful and
important addition to the synthetic biology community.
Acknowledgement
ES is supported by an Autodesk research fellowship. The
project is funded by an Edinburgh Chancellor’s Fellowship
to YC. We thank Drs. Jef D. Boeke (New York Univer-
sity, USA) and Junbiao Dai (Tsinghua University, China)
for helpful discussions to initiate the project.
1. REFERENCES
[1] Technical report, August 2005.
[2] Technical report, Johns Hopkins University, October
2012.
[3] Yizhi Cai, Brian Hartnett, Claes Gustafsson, and Jean
Peccoud. A syntactic model to design and verify
synthetic genetic constructs derived from standard
biological parts. Bioinformatics,2007.
[4] Carola Engler, Ramona Gruetzner, Romy Kandzia, and
Sylvestre Marillonnet. Golden gate shuing: A one-pot
dna shuing method based on type iis restriction
enzymes. PLoS ONE,4(5):e5553,052009.
[5] Carola Engler, Romy Kandzia, and Sylvestre
Marillonnet. A one pot, one step, precision cloning
method with high throughput capability. PLoS ONE,
3(11):e3647, 11 2008.
[6] Michal Galdzicki, Cesar Rodriguez, Deepak Chandran,
Herbert M. Sauro, and John H. Gennari. Standard
biological parts knowledgebase. PLoS ONE,
6(2):e17005, 02 2011.
[7] Nathan J. Hillson, Rafael D. Rosengarten, and Jay D.
Keasling. j5 dna assembly design automation software.
ACS Synthetic Biology,1(1):1421,2012.
[8] Andreas Untergasser, Ioana Cutcutache, Triinu
Koressaar, Jian Ye, Brant C. Faircloth, Maido Remm,
and Steven G. Rozen. Primer3ˆa ˘
Aˇ
Tnew capabilities and
interfaces. Nucleic Acids Research,40(15):e115,2012.
[9] Bing Xia, Swapnil Bhatia, Ben Bubenheim, Maisam
Dadgar, Douglas Densmore, and J. Christopher
Anderson. Chapter five - developer’s and user’s guide
to clotho v2.0: A software platform for the creation of
synthetic biological systems. In Christopher Voigt,
editor, Synthetic Biology, Part B Computer Aided
Design and DNA Assembly, volume 498 of Methods in
Enzymology, pages 97 – 135. Academic Press, 2011.
17
... To streamline the process of designing standardized biological parts for Golden Gate assembly, we developed BIOPARTSBUILDER, which retrieves sequence data from different sources and ensures compliance with design standards that are compatible with combinatorial assembly. Though there are tools for automated parts retrieval (Scher et al., 2014) and subsequent primer design for DNA assembly (Bode et al., 2009;Rouillard et al., 2004), the choices for Golden Gate assembly are limited. Compared with existing Golden Gate designers (Hillson et al., 2012), BIOPARTSBUILDER is distributed open source software and freely modified by both academic and commercial users. ...
... Users can submit a list of RefSeq protein/nucleotide accession numbers to retrieve sequences and annotations from NCBI, or for parts without RefSeq accessions or with customized sequences and annotations, users can upload a file in FASTA or CSV format. As retrieving a large number of arbitrary parts from a genome and upload to the system is tedious, BIOPARTSBUILDER implements an advanced search engine for retrieving parts from annotated genomes, similar to G ENOMECARVER software (Scher et al., 2014). It parses annotations, generates and stores a search index, and provides access to structured search terms (Supplementary Tables S1 and S2) through the Apache SOLR query language. ...
Article
Full-text available
Combinatorial assembly of DNA elements is an efficient method for building large-scale synthetic pathways from standardized, reusable components. These methods are particularly useful because they enable assembly of multiple DNA fragments in one reaction, at the cost of requiring that each fragment satisfies design constraints. We developed BioPartsBuilder as a biologist-friendly web tool to design biological parts that are compatible with DNA combinatorial assembly methods, such as Golden Gate and related methods. It retrieves biological sequences, enforces compliance with assembly design standards and provides a fabrication plan for each fragment. Availability and implementation: BioPartsBuilder is accessible at http://public.biopartsbuilder.org and an Amazon Web Services image is available from the AWS Market Place (AMI ID: ami-508acf38). Source code is released under the MIT license, and available for download at https://github.com/baderzone/biopartsbuilder. Contact: joel.bader{at}jhu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
... Instead, they can use the primer generation form to generate primer sequences which will allow them to PCR the sequences out of the host genome. This functionality is all based on that of Genome Carver (13). The primers are generated using Primer3 (14). ...
Article
Full-text available
The field of Synthetic Biology is both practically and philosophically reliant on the idea of BioParts—concrete DNA sequences meant to represent discrete functionalities. While there are a number of software tools which allow users to design complex DNA sequences by stitching together BioParts or genetic features into genetic devices, there is a lack of tools assisting Synthetic Biologists in finding BioParts and in generating new ones. In practice, researchers often find BioParts in an ad hoc way. We present PartCrafter, a tool which extracts and aggregates genomic feature data in order to facilitate the search for new BioParts with specific functionalities. PartCrafter can also turn a genomic feature into a BioPart by packaging it according to any manufacturing standard, codon optimizing it for a new host, and removing forbidden sites. PartCrafter is available at partcrafter.com.
Article
Full-text available
Polymerase chain reaction (PCR) is a basic molecular biology technique with a multiplicity of uses, including deoxyribonucleic acid cloning and sequencing, functional analysis of genes, diagnosis of diseases, genotyping and discovery of genetic variants. Reliable primer design is crucial for successful PCR, and for over a decade, the open-source Primer3 software has been widely used for primer design, often in high-throughput genomics applications. It has also been incorporated into numerous publicly available software packages and web services. During this period, we have greatly expanded Primer3's functionality. In this article, we describe Primer3's current capabilities, emphasizing recent improvements. The most notable enhancements incorporate more accurate thermodynamic models in the primer design process, both to improve melting temperature prediction and to reduce the likelihood that primers will form hairpins or dimers. Additional enhancements include more precise control of primer placement-a change motivated partly by opportunities to use whole-genome sequences to improve primer specificity. We also added features to increase ease of use, including the ability to save and re-use parameter settings and the ability to require that individual primers not be used in more than one primer pair. We have made the core code more modular and provided cleaner programming interfaces to further ease integration with other software. These improvements position Primer3 for continued use with genome-scale data in the decade ahead.
Article
Full-text available
We have created the Knowledgebase of Standard Biological Parts (SBPkb) as a publically accessible Semantic Web resource for synthetic biology (sbolstandard.org). The SBPkb allows researchers to query and retrieve standard biological parts for research and use in synthetic biology. Its initial version includes all of the information about parts stored in the Registry of Standard Biological Parts (partsregistry.org). SBPkb transforms this information so that it is computable, using our semantic framework for synthetic biology parts. This framework, known as SBOL-semantic, was built as part of the Synthetic Biology Open Language (SBOL), a project of the Synthetic Biology Data Exchange Group. SBOL-semantic represents commonly used synthetic biology entities, and its purpose is to improve the distribution and exchange of descriptions of biological parts. In this paper, we describe the data, our methods for transformation to SBPkb, and finally, we demonstrate the value of our knowledgebase with a set of sample queries. We use RDF technology and SPARQL queries to retrieve candidate "promoter" parts that are known to be both negatively and positively regulated. This method provides new web based data access to perform searches for parts that are not currently possible.
Article
Full-text available
We have developed a protocol to assemble in one step and one tube at least nine separate DNA fragments together into an acceptor vector, with 90% of recombinant clones obtained containing the desired construct. This protocol is based on the use of type IIs restriction enzymes and is performed by simply subjecting a mix of 10 undigested input plasmids (nine insert plasmids and the acceptor vector) to a restriction-ligation and transforming the resulting mix in competent cells. The efficiency of this protocol allows generating libraries of recombinant genes by combining in one reaction several fragment sets prepared from different parental templates. As an example, we have applied this strategy for shuffling of trypsinogen from three parental templates (bovine cationic trypsinogen, bovine anionic trypsinogen and human cationic trypsinogen) each divided in 9 separate modules. We show that one round of shuffling using the 27 trypsinogen entry plasmids can easily produce the 19,683 different possible combinations in one single restriction-ligation and that expression screening of a subset of the library allows identification of variants that can lead to higher expression levels of trypsin activity. This protocol, that we call 'Golden Gate shuffling', is robust, simple and efficient, can be performed with templates that have no homology, and can be combined with other shuffling protocols in order to introduce any variation in any part of a given gene.
Article
Full-text available
Current cloning technologies based on site-specific recombination are efficient, simple to use, and flexible, but have the drawback of leaving recombination site sequences in the final construct, adding an extra 8 to 13 amino acids to the expressed protein. We have devised a simple and rapid subcloning strategy to transfer any DNA fragment of interest from an entry clone into an expression vector, without this shortcoming. The strategy is based on the use of type IIs restriction enzymes, which cut outside of their recognition sequence. With proper design of the cleavage sites, two fragments cut by type IIs restriction enzymes can be ligated into a product lacking the original restriction site. Based on this property, a cloning strategy called 'Golden Gate' cloning was devised that allows to obtain in one tube and one step close to one hundred percent correct recombinant plasmids after just a 5 minute restriction-ligation. This method is therefore as efficient as currently used recombination-based cloning technologies but yields recombinant plasmids that do not contain unwanted sequences in the final construct, thus providing precision for this fundamental process of genetic manipulation.
Article
Full-text available
The sequence of artificial genetic constructs is composed of multiple functional fragments, or genetic parts, involved in different molecular steps of gene expression mechanisms. Biologists have deciphered structural rules that the design of genetic constructs needs to follow in order to ensure a successful completion of the gene expression process, but these rules have not been formalized, making it challenging for non-specialists to benefit from the recent progress in gene synthesis. We show that context-free grammars (CFG) can formalize these design principles. This approach provides a path to organizing libraries of genetic parts according to their biological functions, which correspond to the syntactic categories of the CFG. It also provides a framework for the systematic design of new genetic constructs consistent with the design principles expressed in the CFG. Using parsing algorithms, this syntactic model enables the verification of existing constructs. We illustrate these possibilities by describing a CFG that generates the most common architectures of genetic constructs in Escherichia coli. A web site allows readers to experiment with the algorithms presented in this article: www.genocad.org. Sequences and models are available at Bioinformatics online.
Article
Recent advances in Synthetic Biology have yielded standardized and automatable DNA assembly protocols that enable a broad range of biotechnological research and development. Unfortunately, the experimental design required for modern scar-less multipart DNA assembly methods is frequently laborious, time-consuming, and error-prone. Here, we report the development and deployment of a web-based software tool, j5, which automates the design of scar-less multipart DNA assembly protocols including SLIC, Gibson, CPEC, and Golden Gate. The key innovations of the j5 design process include cost optimization, leveraging DNA synthesis when cost-effective to do so, the enforcement of design specification rules, hierarchical assembly strategies to mitigate likely assembly errors, and the instruction of manual or automated construction of scar-less combinatorial DNA libraries. Using a GFP expression testbed, we demonstrate that j5 designs can be executed with the SLIC, Gibson, or CPEC assembly methods, used to build combinatorial libraries with the Golden Gate assembly method, and applied to the preparation of linear gene deletion cassettes for E. coli. The DNA assembly design algorithms reported here are generally applicable to broad classes of DNA construction methodologies and could be implemented to supplement other DNA assembly design tools. Taken together, these innovations save researchers time and effort, reduce the frequency of user design errors and off-target assembly products, decrease research costs, and enable scar-less multipart and combinatorial DNA construction at scales unfeasible without computer-aided design.
Article
To design the complex systems that synthetic biologists propose to create, software tools must be developed. Critical to success is the enablement of collaboration across our community such that individual tools that perform specific tasks combine with other tools to provide multiplicative benefits. This will require standardization of the form of the data that exists within the field (Parts, Strains, measurements, etc.), a software environment that enables communication between tools, and a sharing mechanism for distributing the tools. Additionally, this data model must describe the data in a sufficiently rigorous and validated form such that meaningful layers of abstraction can be built upon the base. Herein, we describe a software platform called "Clotho" which provides such a data model, and the plugin and sharing mechanisms needed for a rich tool environment. This document provides a tutorial for users of Clotho and information for software developers who wish to contribute new tools (known as "Apps") to it.
  • Andreas Untergasser
  • Ioana Cutcutache
  • Triinu Koressaar
  • Jian Ye
  • Brant C Faircloth
  • Maido Remm
  • Steven G Rozen
Andreas Untergasser, Ioana Cutcutache, Triinu Koressaar, Jian Ye, Brant C. Faircloth, Maido Remm, and Steven G. Rozen. Primer3â ˘ A ˇ Tnew capabilities and interfaces. Nucleic Acids Research, 40(15):e115, 2012.