ORFeome Cloning and Systems Biology:
Standardized Mass Production of the Parts
From the Parts-List
Michael A. Brasch,1James L. Hartley,2and Marc Vidal3,4
1Atto Bioscience, Rockville, Maryland 20850, USA;2SAIC/National Cancer Institute, Frederick, Maryland 21702, USA;3Center for
Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute and Department of Genetics, Harvard
Medical School, Boston, Massachusetts 02115, USA
Together with metabolites, proteins and RNAs form complex biological systems through highly intricate networks of
physical and functional interactions. Large-scale studies aimed at a molecular understanding of the structure,
function, and dynamics of proteins and RNAs in the context of cellular networks require novel approaches and
technologies. This Special Issue of Genome Research features strategies for the high-throughput construction and
manipulation of complete sets of protein-encoding open reading frames (ORFeome), gene promoters (promoterome),
and noncoding RNAs, as predicted from genome and transcriptome sequences. Here we discuss the use of a
recombinational cloning system that allows efficiency, adaptability, and compatibility in the generation of ORFeome,
promoterome, and other resources.
An important transition is taking place in biological research.
The field of genome sequencing and annotation (Goffeau et al.
1996; Blattner et al. 1997; The C. elegans Sequencing Consortium
1998; Adams et al. 2000; Arabidopsis Genome Initiative 2000;
Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002;
Gibbs et al. 2004) is now complemented by systems biology ap-
proaches that aim to decipher the biological networks in which
cellular macromolecules function. Nearly complete lists of genes,
the “parts-lists,” are available for several model organisms and for
human. With such parts-lists in hand, it is possible to produce
nearly complete collections of proteins, RNAs, or promoters, that
is, to mass produce the parts from the parts-lists, and then to
functionally characterize macromolecules in highly parallel as-
says, enabling global studies of the networks and systems in
which macromolecules function (Vidal 2001).
Novel emerging methodologies and strategies enable this
transition (Rual et al. 2004b). Particularly, new operating systems
for mass cloning allow the generation of flexible, standardized
clone collections that provide compatibility between resource
collections, not only for a single organism, but across collections
from different organisms. DNA cloning tools based on recombi-
national cloning (RC), such as the Gateway cloning technology
(Hartley et al. 2000), allow the development of such resources for
systems biology (Walhout et al. 2000a,b). Here we focus prima-
rily on Gateway and Gateway-generated resources. Alternative
RC methods (Liu et al. 1998; Paddison et al. 2004) are described
and compared elsewhere in this Special Issue (Marsischky and
LaBaer 2004; see also Table 1).
Macromolecules as Components of Biological Networks
Proteins, RNAs, and DNA (promoters, e.g.) require intricate
physical and functional interactions with other macromolecules,
each with particular temporal and spatial aspects, to mediate
their function. Understanding the complex networks formed by
all these interactions is critical for a global perspective of biologi-
cal systems (Barabasi and Oltvai 2004). For example, the incred-
ible robustness exhibited by many cell types might relate to the
overall topology of cellular networks (see below; Jeong et al.
2001). Thus, in addition to the reductionist characterization of
macromolecules necessary for the detailed understanding of in-
dividual interactions (molecular biology), the complex networks
they form by interacting with each other also need to be studied
(systems biology; Fig. 1; Ideker et al. 2001; Vidal 2001).
Mapping Biological Networks
Modeling biological networks requires understanding of the
structure, function, and dynamics of both the individual players
and their effect on each other. Although some information is
already contained in the scientific literature for a few thousand
proteins, promoters, and RNAs studied individually in various
organisms, these data cover only ∼10%–20% of the macromol-
ecules encoded by those organisms (Costanzo et al. 2000). Thus,
global studies of biological networks require the establishment of
“maps” providing information simultaneously on hundreds, if
not thousands, of proteins, RNAs, promoter sites, and other mac-
romolecules (and often domains thereof).
Recent work in this emerging field of biological networks
has focused on analyzing the structure of metabolic (Jeong et al.
2000), regulatory (Lee et al. 2002), and “interactome” networks
(Uetz et al. 2000; Walhout et al. 2000a). Network maps are gen-
erally visualized as graphs composed of nodes and edges. Nodes
represent the components of biological networks (metabolites,
transcription factors or promoters, and proteins or RNAs in meta-
bolic, regulatory, and interactome networks, respectively). Edges
represent interactions between those components (enzymatic re-
actions in metabolic networks and physical interactions in regu-
latory and interactome networks).
Intriguing biological hypotheses have already emerged from
early attempts at mapping the global structure of cellular net-
works. First, cellular networks appear scale-free, that is, they con-
tain a small but significant proportion of nodes that are highly
connected, whereas most nodes are sparsely connected (Jeong et
al. 2000). The scale-free topology of cellular networks (Jeong et al.
2000; Lee et al. 2002; Li et al. 2004) might relate to cellular ro-
bustness (Jeong et al. 2001). Second, regulatory networks may
contain subnetwork motifs, defined as logical subsystems linking
E-MAIL firstname.lastname@example.org; FAX (617) 632-5739.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/
14:2001–2009 ©2004 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/04; www.genome.org
Cold Spring Harbor Laboratory Press on December 26, 2015 - Published by genome.cshlp.orgDownloaded from
small numbers of network components and potentially exhibit-
ing semi-independent functions (Lee et al. 2002; Milo et al. 2002;
Shen-Orr et al. 2002). Third, analyses of interactome networks
suggest that biological processes are more interconnected mo-
lecularly than previously imagined (Walhout et al. 2000a; Wal-
hout and Vidal 2001b). Together, these observations illustrate
the need for further systematic mapping and modeling of cellular
Efficiency and the Mass Production of Nodes
The challenges ahead lie in both the manipulation of hundreds
of thousands of “biological nodes” (the components) and the
determination of large numbers of “biological edges” (interac-
tions) between them (Fig. 1). Network maps for half a dozen
model organisms, humans, and key pathogens and parasites will
be of value to both academic and pharmaceutical research. Ide-
ally, nearly all proteins, noncoding RNAs, and promoters of these
organisms should be available for network analyses.
Considering these challenges globally, the generation of
nearly complete ORFeome, promoterome, and other “parts” re-
sources for these different species will require the handling of
hundreds of thousands, perhaps millions, of DNA segments. In
the context of this enormous task, conventional cloning tech-
niques have been increasingly replaced by recombinational clon-
Adaptability and the Mass Production of Edges
In concert with the production and manipulation of large num-
bers of biological nodes, hundreds of thousands of biological
edges need to be mapped between them (Fig. 1).
Early attempts at mapping regulatory and interactome net-
works exist for Saccharomyces cerevisiae (Uetz et al. 2000; Ito et al.
2001; Gavin et al. 2002; Ho et al. 2002; Lee et al. 2002), Cae-
norhabditis elegans (Walhout et al. 2000a; Davy et al. 2001; Boul-
ton et al. 2002; Walhout et al. 2002; Reboul et al. 2003; Li et al.
2004), and Drosophila melanogaster (Giot et al. 2003). Although
highly informative, such projects need to be extended to greater
proportions of the respective proteomes and promoteromes. Cur-
rently available interactome maps strongly suggest that a variety
of methods will need to be applied to cross-validate the data
quality of each edge (von Mering et al. 2002; Han et al. 2004; Li
et al. 2004). False positives and false negatives inherent to high-
throughput interaction assays are reduced to more acceptable
levels when edges are tested in a variety of different assays (Han
et al. 2004).
Although metabolic, regulatory, and interactome networks
represent an important scaffold to comprehend the structure of
cellular networks, other types of edges need consideration as well
(Vidal 2001), such as protein modification networks in which
nodes represent proteins and edges represent phosphorylation,
acetylation, methylation, ubiquitination, or other posttransla-
tional regulation relationships between them; and protein–RNA
and RNA–RNA interaction networks.
Considering the many types of edges required, and that
these edges need verification by different assays, ORFeome, pro-
moterome, and other functional resources need to be produced
in ways that are compatible with multiple different biological
assay formats. For example, the production of proteins from
ORFeome resources typically involves the handling of tens of
thousands of ORFs and their incorporation into a myriad of ex-
pression vectors (Walhout et al. 2000b). Conventional cloning
techniques that use restriction enzymes and ligase pose substan-
tial challenges given such adaptability requirements of ORFeome
projects. This is primarily because such conventional cloning
strategies need to be redesigned for every new construct, both
DNA segments and vectors.
Function of Biological Networks
Whereas the structure of biological networks is informative,
studying the functional and dynamic features of biological net-
works is also crucial to understanding cellular biology. Ulti-
mately, the functional consequence of interactions must be con-
sidered to appreciate the global role of nodes and edges within a
A global approach to assess the function of biological net-
works in vivo uses systematic node perturbations. In S. cerevisiae,
gene knockouts are available for the whole ORFeome (Winzeler
et al. 1999; Giaever et al. 2002). In multicellular organisms, “phe-
nome” mapping analyses are now feasible with the development
of RNA interference or RNAi (Fire et al. 1998). Genome-scale
RNAi resources are available for C. elegans and D. melanogaster
Terms of Demonstrated Efficiency, Adaptability, and
Compatibility (See Text), Relative to Currently Available
Full-Length cDNA Resources
Summary of Recombinational Cloning Strategies in
N- & C-
References to alternative strategies are as follows: gap repair (Orr-
Weaver et al. 1983; Ma et al. 1987; Oldenburg et al. 1997), UPS (Liu
et al. 1998), MAGIC (Paddison et al. 2004).
cloning “operating systems.” (A) Physical interactions. This example il-
lustrates how the mapping of physical interactions between proteins can
benefit from the use of two different assays (“Double-edged” networks).
Here the yeast two-hybrid (Y2H) system is first used with individual DB-X
baits and a pooled AD-ORFeome library (Reboul et al. 2003). Subse-
quently, positive Y2H interactions are retested using a different assay (Li
et al. 2004): GST pull-down followed by anti-Myc Western-blot analysis.
To express high numbers of proteins with the appropriate tags (DB, DNA
binding domain; AD, activation domain; GST, glutathione-S-transferase),
large collections of archived protein encoding open reading frames
(ORFeomes) need to be available (efficiency), in ways that allow their
subcloning in many different vectors (adaptability). The resulting
“double-edged” networks are of greater overall quality.
Modeling cellular networks requires efficient recombinational
Brasch et al.
Cold Spring Harbor Laboratory Press on December 26, 2015 - Published by genome.cshlp.org Downloaded from
10.1101/gr.2769804 Access the most recent version at doi:
2004 14: 2001-2009 Genome Res.
Michael A. Brasch, James L. Hartley and Marc Vidal
Production of the Parts From the Parts-List
ORFeome Cloning and Systems Biology: Standardized Mass
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the
This article cites 85 articles, 29 of which can be accessed free at:
Receive free email alerts when new articles cite this article - sign up in the box at the
a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as
). After six months, it is available underhttp://genome.cshlp.org/site/misc/terms.xhtml
first six months after the full-issue publication date (see
top right corner of the article or
go to: Genome Research To subscribe to
Cold Spring Harbor Laboratory Press
Cold Spring Harbor Laboratory Press on December 26, 2015 - Published by genome.cshlp.org Downloaded from