Development of a citrus genome-wide EST collection and cDNA microarray as resources for genomic studies

Article (PDF Available)inPlant Molecular Biology 57(3):375-91 · March 2005with60 Reads
DOI: 10.1007/s11103-004-7926-1 · Source: PubMed
Abstract
A functional genomics project has been initiated to approach the molecular characterization of the main biological and agronomical traits of citrus. As a key part of this project, a citrus EST collection has been generated from 25 cDNA libraries covering different tissues, developmental stages and stress conditions. The collection includes a total of 22,635 high-quality ESTs, grouped in 11,836 putative unigenes, which represent at least one third of the estimated number of genes in the citrus genome. Functional annotation of unigenes which have Arabidopsis orthologues (68% of all unigenes) revealed gene representation in every major functional category, suggesting that a genome-wide EST collection was obtained. A Citrus clementina Hort. ex Tan. cv. Clemenules genomic library, that will contribute to further characterization of relevant genes, has also been constructed. To initiate the analysis of citrus transcriptome, we have developed a cDNA microarray containing 12,672 probes corresponding to 6875 putative unigenes of the collection. Technical characterization of the microarray showed high intra- and inter-array reproducibility, as well as a good range of sensitivity. We have also validated gene expression data achieved with this microarray through an independent technique such as RNA gel blot analysis.
Development of a citrus genome-wide EST collection and cDNA
microarray as resources for genomic studies
J. Forment
1,
, J. Gadea
1,
, L. Huerta
1,
, L. Abizanda
1
, J. Agusti
2
, S. Alamar
3
, E. Alos
2
,
F. Andres
2
, R. Arribas
1
, J.P. Beltran
1
, A. Berbel
1
, M.A. Blazquez
1
, J. Brumos
2
,
L.A. Canas
1
, M. Cercos
2
, J.M. Colmenero-Flores
2
, A. Conesa
2
, B. Estables
3
, M. Gandia
2
,
J.L. Garcia-Martinez
1
, J. Gimeno
1
, A. Gisbert
1
, G. Gomez
1
, L. Gonzalez-Candelas
3
,
A. Granell
1
, J. Guerri
2
, M.T. Lafuente
3
F. Madueno
1
, J.F. Marcos
3
, M.C. Marques
1
,
F. Martinez
2
, M.A. Martinez-Godoy
1
, S. Miralles
1
, P. Moreno
2
, L. Navarro
2
, V. Pallas
1
,
M.A. Perez-Amador
1
, J. Perez-Valle
1
, C. Pons
1
, I. Rodrigo
1
, P.L. Rodriguez
1
, C. Royo
1
,
R. Serrano
1
, G. Soler
2
, F. Tadeo
2
, M. Talon
2
, J. Terol
2
, M. Trenor
1
, L. Vaello
1
,
O. Vicente
1
, Ch. Vidal
1
, L. Zacarias
3
and V. Conejero
1,
*
1
Instituto de Biologı
´
a Molecular y Celular de Plantas (IBMCP), Universidad Polite
´
cnica de Valencia, La-
boratorio de Geno
´
mica, Avenida de los Naranjos, s/n, 46022 Valencia, Spain (*author for correspondence ;
e-mail vconejer@ibmcp.upv.es);
2
Instituto Valenciano de Investigaciones Agrarias (IVIA), Carrete ra Mon-
cada-Na
´
quera, Km.4.5, 46113 Moncada (Valencia), Spain;
3
Instituto de Agroquı
´
mica y Tecnologı
´
ade
Alimentos (IATA), Apdo. 73, 46100 Burjasot (Val encia), Spain;
These authors contributed equally to this
work.
Received 6 September 2004; accepted in revised form 20 December 2004
Key words: bioinformatics, citrus, EST, genomics, microarray, transcriptome
Abstract
A functional genomics project has been initiated to approach the molecular characterization of the main
biological and agronomical traits of citrus. As a key part of this project, a citrus EST collection has been
generated from 25 cDNA libraries covering different tissues, developmental stages and stress conditions.
The collection includes a total of 22,635 high-quality ESTs, grouped in 11,836 putative unigenes, which
represent at least one third of the estimated number of genes in the citrus genome. Functional annotation of
unigenes which have Arabidopsis orthologues (68% of all unigenes) revealed gene representation in every
major functional category, suggesting that a genome-wide EST collection was obtained. A Citrus clem-
entina Hort. ex Tan. cv. Clemenules genomic library, that will contribute to further characterization of
relevant genes, has also been constructed. To initiate the analysis of citrus transcriptome, we ha ve devel-
oped a cDNA microarray containing 12,672 probes corresponding to 6875 putative unigenes of the col-
lection. Technical characterization of the microarray showed high intra- and inter-array reproducibility, as
well as a good range of sensitivity. We have also validated gene expression data achieved with this
microarray through an independent technique such as RNA gel blot analysis.
Abbreviations: BLAST, Basic Local Alignment Search Tool; CFGP , Citrus Functional Genomics Project;
CEVd, Citrus exocortis viroid; CTV, Citrus tristeza virus; CV, coefficient of variation; EC, enzyme classi-
fication; EST, expressed sequence tag; FAO, Food and Agricult ure Organization; GO, Gene Ontology;
KEGG, Kyoto Encyclopedia of Genes and Genomes; MIPS, Munich Information Center for Protein
Sequences; NCBI, National Center for Biotechnology Information; TAIR, The Arabidopsis Information
Resource; UTR, untranslated region
Plant Molecular Biology (2005) 57:375–391 Springer 2005
DOI 10.1007/s11103-004-7926-1
Introduction
Citrus are the most important fruit tree crop in the
world, with a total yield of more than 100 million
metric tons produced both in developed and
developing countries (FAO; http://apps.fao.org).
Citrus are grown as a combination tree composed
of the fruit-producing scion variety bud-grafted
onto a rootstock variety adapted to soil and
environment of the local production area. Root-
stock and scion genotype improvement in citrus is
inherently costly becau se citrus crops require both
large field acreage and many years to adequately
evaluate field performance (20–40 years). In addi-
tion to the long-term nature of tree breeding,
variety development in citrus is difficult because of
general characteristics of citrus biology, such as
apomixis, sexual incompatibility, prolonged juve-
nility, and complex hybrid nature. This combina-
tion of adverse factors has hampered variety
improvement efforts by traditional breeding, and
most current varieties have arisen by spontaneous
mutation in field trees. In this scenario, functional
genomics approaches, with their potential to iso-
late genes of agronomical relevance, gene rate
molecular markers, characterize germplasm col-
lections, etc, will be of great help in the develop-
ment of cultivars that are well adapted to
environmental conditions and produce fruits
exhibiting those characteristics desired by most
consumers.
First functional genomics studies carried out in
plants have been focused on the model plant
Arabidopsis thaliana (Schena et al., 1995; Girke
et al., 2000; White et al., 2000; Schaffer et al.,
2001). Recently, there has also been increasing
interest on initiatives on functional geno mics ap-
plied to plants of higher economical interest, such
as rice (Yamamoto and Sasaki, 1997; Xue et al.,
2003; Yazaki et al., 2004), tomato (Emmanuel and
Levy, 2002), sugarcane (Vettore et al., 2003), grape
(http://www.vitaceae.org), Prunus spp. (Abbott
et al., 2002), maize (Lawrence et al ., 2004), or pine
(Whetten et al., 2001; http://pine.ccgb.umn.edu).
In spite of the economic importance of citrus, they
have not been the subject of large scale functional
genomics studies, and only isolated efforts have
been reported so far (Bausher et al., 2003;
Shimada et al., 2003).
In an effort to create a resource for the
citrus research community, a Citrus Functional
Genomics Project (CFGP; http://citrusgenom-
ics.ibmcp-ivia.upv.es) has been initiated in Spain.
The project is aimed to develop genomic tools for
large-scale studies of biological processes relevant
to citriculture, ranging from developmental biol-
ogy (vegetative growth, flowering, fruit set and
development, fruit quality, ripening, senescence,
and abscission), to biotic and abiotic stresses (virus
and fungal infection, salinity, drought, iron-defi-
ciency), and post-harvest processes (responses to
cold and fungal attack). Due to its economic
importance, Clementine mandarine Clemenules
(Citrus clementina Hort. ex Tan. cv. Clemenules)
has been selected for this study. Spain is the first
producer of clementines, with an estimated surfa ce
of 90,000 hectares and a production close to
1,300,000 metric tons. From these, about 900,000
tons are exported abroad as fresh fruit. During the
last ten years 27.7 million clementine nursery
plants have been planted, the variety Clemenules
representing 65% of this figure.
High-throughput sequencing technology has
provided a mechanism to gain insight into ge-
nomes at the RNA level by large-scale single pass
sequencing of cDNA clones to generate expressed
sequence tags (ESTs; Adams et al., 1993). By
broadening the diversity of tissues from which the
cDNA libraries are constructed, the rate of gene
discovery can be increased. In addition, robotics
and bioinformatics have provided a way to mon-
itorize the expression of thousands of genes
through the construction and use of DNA micro-
arrays (Schena et al., 1995). Accordingly, genomic
tools to be developed by CFGP included the gen-
eration of a genome-wide EST collection from a
complex set of cDNA libraries, and the construc-
tion of cDNA microarrays from the clones
generated.
In this paper, we report the construction of
both 25 different citrus cDNA libraries covering a
wide range of tissues, developmental stages, and
biotic and abiotic stre ss conditions, and a
C. clementina genomic library. We have also iso-
lated and functionally annotated 22,635 high-
quality ESTs, representing up to 11,836 unique
transcripts or unigenes, generated so far from
these libraries. As a first step to characterize the
citrus transcriptome, 12,672 clones, representing
6875 putative unigenes, have been spotted on glass
slides and this cDNA microarray is currently being
used for genome- wide gene expression analysis.
376
We present techni cal charact erization of this
microarray, as well as an example of its use to
identify differentially expressed genes.
Materials and methods
Plant material
All citrus material was collected from the Citrus
Germplasm Bank of IVIA, composed of patho-
gen-free plants (Navarro et al., 2002). Plant
material was collected from either field trees or
trees growing in insect-proof greenhouses. Table 1
details plant material used as RNA source for the
construction of cDNA libraries. For further
information visit web pages of the Germplasm
Bank (http://www.ivia.es/deps/biot/germop.htm)
and CFGP. The work is focused mainly on
Clementine (C. clementina cv. Clemenules). In
Spain, this scion is generally budded onto Carrizo
Citrange hybrid rootstock (C. sinensis · Poncirus
trifoliata). Therefore, most cDNA libraries have
been constructed using this rootstock-scion com-
bination. However, to study the response to biotic
and abiotic stresses, other genotypes have been
chosen according to their tolerance or suscepti-
bility to each specific stress (see Table 1).
For gene expression analysis during fruit set
and early fruit development, adult C. clementina
trees, grafted onto Carrizo Citrange, were used.
Flowers between anthesis and 1 week before
anthesis were emasculated (i.e. petals and stamens
were removed). Ovaries were collected immedi-
ately after emasculation. Fruit set was induced by
dipping emasculated flowers in gibberellic acid
(GA
3
) 500 lM, in Tween-20 0.2% as wetting
agent, for 10 s. Fruits were harvested at 1, 3, 7,
and 14 days after treatment.
Table 1. RNA sources used to construct the cDNA libraries.
Library Species Tissue / Developmental stage
AbsAOv1 C. clementina Abscission zone A of developing ovaries treated with ethylene
AbsLeaSub1 C. clementina Laminar abscission zone of leaves from drought stressed and rehydrated plants
a
CTVLeafMc1 C. macrophylla Leaves infected with Citrus tristeza virus
DroRLeaf1 C. clementina Leaves from plants rehydrated after drought stress for 24 h
Drought1 C. clementina Leaves from plants after drought stress for 5, 10 and 24 h
b
Drought2 C. reshni Roots from plants after drought stress for 5, 10 and 24 h
c
ExocortL1 C. medica Leaves infected with Citrus exocortis viroid
FerrChloL1 C. clementina Leaves from plants exposed to iron deficiency for 1 year
FerrChloR1 C. sinensis · Poncirus trifoliata Roots from plants exposed to iron deficiency for 1 year
FlavCurFr1 C clementina · C. tangerina Flavedo from mature fruits heat-treated at 37 C and stored at 2 C
FlavFr1 C. clementina Flavedo from mature fruits stored at 2 C
FlavFrSub1 C clementina · C. tangerina Flavedo from mature fruits stored at 2 C
d
FlavRip1 C. clementina Flavedo from ripening fruits
FlavSen1 C. clementina Flavedo from senescing fruits
IF1 C. clementina Flowers at different developmental stages, from floral buds to anthesis
NTLeaf1 C. clementina Leaves
OF1 C. clementina Senescent ovaries and 5-week old fruits treated with GA
3
OF2 C. clementina Ovaries from floral buds and 1-week and 3-week old fruits treated with GA
3
PhIFruit1 C. clementina Young and developing fruits
PhyRootSr1 C. aurantium Roots infected with Phytophthora citrophtora
PhyRootSw1 C. sinensis Roots infected with Phytophthora citrophtora
RindPdig24 C. clementina Flavedo and albedo from fruits infected with Penicillium digitatum
Roots1 C. sinensis · Poncirus trifoliate Root tips from seedlings grown in Paclobutrazol or auxin for 1–4 days
SaltLeaf1 C. clementina Leaves from plants treated with NaCl 150 mM for 6 and 24 h and 7 days
Veg1 C. clementina Vegetative shoots and internodes after foliar application of GA
3
and Paclobutrazol
a
Subtracted with RNA from laminar abscission zone of unstressed plants.
b
Subtracted with RNA from leaves of unstressed plants.
c
Subtracted with RNA from roots of unstressed plants.
d
Subtracted with RNA from fruits stored at 12 C.
377
Library construct ion and EST sequencing
Total RNA was isolated with either a standard
guanidinium-based method, acid phenol method,
Trizol (Invitrogen), or a method for recalcitrant
tissues described in Bugos et al. (1995), depend-
ing of the tissue. Oligotex (Qiagen) was used for
poly(A)
+
RNA purification. For libraries con-
structed from complex RNA sources (e.g., dif-
ferent treatments or time points), equal amounts
of each poly(A)
+
RNA preparation were mixed.
Standard, unidirectional cDNA libraries were
constructed using either the Lambda ZAP II
system with cDNA Synthesis Kit (Stratagene) or
the Lambda TriplEx2 system with SMART
cDNA Library Construction kit (Clontech),
following manufacturer’s instructions. For sub-
traction libraries Smart PCR cDNA Synthesis
Kit and PCR-Select cDNA Subtraction Kit
(Clontech) were used, following manufac-
turer’s instructio ns. For Lambda-based libraries,
pBluescript and pTriplEx2 phagemids were
in vivo excised from amplified phage libraries and
selected onto LB agar plates supplemented with
ampicillin. For subtraction libraries, transfor-
mants in pCR2.1 were plated on LB agar plates
supplemented with ampicillin for direct picking
without library amplification. Randomly-selected
clones were grown overnight in standard selec-
tive bacterial growth media, and plasmids were
isolated by alkaline lysis method using Perfect-
Prep (Eppendorf ) or Montage (Millipore) kits.
Sequencing reactions were carried out on plas-
mids from the 5
0
end of the cDNA insert, using
an ABI 3100 capillary automatic sequencer
(Applied Biosystems) with fluorescent dye
terminator technology.
A C. clementina (cv. Clemenules) genomic
library was constructed in the cosmid binary vec-
tor pBIC20 (Meyer et al., 1994). Genomic DNA
was extra cted by a modified cetyltrimethylammo-
nium bromide (CTAB)-based procedure (McKin-
ney et al., 1995) and partially digested with
HindIII. DNA fragments wer e size selected by
agarose gel electrophoresis and ligated with Hin-
dIII-digested pBIC20. Ligation products were
packaged in vitro using Gigapack III Gold pack-
aging extracts (Stratagene), and used to transfect
Escherichia coli NM544 cells. In order to deter-
mine the average insert size, 20 randomly-picked
clones were analyzed by HindIII restriction.
EST processing and functional annotation
A semi-automated pipeline consisting of shell and
Perl scripts, running different open source pro-
grams on Linux, has been developed for EST
processing, assembly, and annotation. Raw se-
quences and base confidence scores were obtained
from chromatogram files using the program Phred
(Ewing and Green, 1998; Ewing et al., 1998) with
default parameters. Vector masking and trimming
were performed with cross_match (http://
www.phrap.org). Sequences that had <50 non-
vector good-quality bases after trimming were
discarded. Assembly of reads in contigs to estimate
the redundancy of the ESTs, get the consensus
sequences of the redundant ones, and obtain the
unigene set was made with Phrap (http ://
www.phrap.org) using stringent clustering
parameters (minmatch 50, minscore 100, trim_qual
28). To prevent over- or under-clustering, the
clustering stringency was trained on the assembly
of the cDNA clones representing 5 different ran-
domly-selected genes, based on their similarity to
public databases and visual inspection of the
sequences. Manual adjustment of contigs was
made with Consed (Gordon et al., 1998).
For functional annotation of ESTs and func-
tional categorization of unigenes, BLASTX
against the NCBI (National Center for Biotech-
nology Information) non-redundant protein ( nr)
database and the Munich Information Center for
Protein Sequences (MIPS, ftp://ftpmips.gsf.de/
cress/arabiprot) full set of Arabidopsis proteins
were run locally, using default parame ters and an
arbitrary non-stringent threshold of 10
)5
for
E-value. The BLASTX outputs and the Arabi d-
opsis functional annotations were parsed with
scripts to extract information and to assign the
descriptions, MIPS functional categories, Gene
Ontology terms (GO terms), and KEGG (Kyoto
Encyclopedia of Genes and Genomes) enzyme
classification number (EC number) of the Arabid-
opsis top scoring hits to the corresponding citrus
sequences. All information generated in the pipe-
line was automatically uploaded in the database of
the project by other in-house scripts. This database
is a searchable MySQL relational database (http://
www.mysql.com) containing data about the pro-
ject (members, groups, plant material, libraries,
etc.) and all relevant information about every EST
(nucleotide raw and trimmed sequence, contig
378
consensus sequences, functional annotations,
orthologues in Arabidopsis and other organisms,
BLAST results, etc.). The database can be access ed
by research community to retrieve data in a cus-
tomized way (http://citrusgenomics.ibmcp-ivia.
upv.es/database/login_page.shtml). The site also
provides the opportunity to search the citrus EST
sequences using BLAST.
Microarray construction
DNA probes for microarrays were obtained by
PCR from the cDNA clone collection. Reactions
were carri ed out in a final volume of 100 ll using
4 ng plasmid DNA as a template, 400 nM of each
primer, and 200 lM dNTPs. For cDNAs cloned in
pCR2.1, M13-forward and M13-r everse primers
were used. For cDNAs cloned in pBluescript,
modified T7 (5
0
-CGACTCACTATAGGGCGAA
TTGG-3
0
) and Reverse (5
0
-GGAAACAGCTAT
GACCATGATTAC-3
0
) primers were used. For
cDNAs cloned in pTriplEx2, Trip5 (5
0
-
CTCGGGAAGCGCGCCATTGTGTTGGT-3
0
)
and Trip3 (5
0
-ATACGACTCACTATAGGGC
GAATTGGCC-3
0
) primers were used. Amplifica-
tions were performed in a 96-well format at an
annealing temperature of 55 C for 5 cycles and
51 C for 30 cycles for pCR2.1, 57 C for 5 cycles
and 54 C for 25 cycles for pTriplEx2, and 66 C
for 5 cycles and 60 C for 30 cycles for pBluescript.
Denaturing and elongation steps were 94 C for
30 s, and 72 C for 2 min, respectively, for all
plasmids. The reaction products were analyzed by
agarose gels. PCR products were purified using the
Multiscreen-PCR 96-well Filtration System (Mil-
lipore), and re-suspended in water to a final con-
centration of 200–400 ng/ll. Before printing,
purified PCR-products were transferred to 384-
well plate s at a final concentration of 100–200 ng/
ll in 50% DMSO. Lucidea Universal Sco reCard
(Amersham) spike controls were diluted at 100 ng/
ll in 50% DMSO and used to evaluate the quality
of the arrays. Each calibration and negative con-
trols from the Lucidea kit were spotted eight times
across the whole area of the array. Every citrus
clone was spotted once. All samples were spotted
on Corning UltraGAPS glass slides, using a
MicroGrid II spotting device from Biorobotics, in
a 48-block format and 16 by 17 spots per block.
Relative humidity was kept to 45%. After printing,
slides were crosslinked at 150 mJ and stored.
Microarray hybridization, scanning and data
analyses
RNA samples for microarray hybridizations were
labelled with the indirect method, by incorpora-
tion of 5-(3-aminoallyl)-2-deoxy-UTP (aa-dUT P)
into single-stranded cDNA during reverse tran-
scription, followed by conjugation of fluorescent
Cy3 and Cy5 as reactive N-hydroxyl succinimidal
dyes (NHS-dyes), as described in the TIGR web
page (The Institute for Genomics Research; http://
atarrays.tigr.org/PDF/Aminoallyl.pdf ). Poly(A)
+
RNA (500 ng) was used for each reaction, using
both oligo-dT and random nonamers to prime the
reverse transcript ase reaction. Single-stranded
cDNA synthesis was carried out for 3 h at 42 C
with Superscript II reverse transcriptase (Invitro-
gen), and the cDNA was purified in a Qiaquick
column (Qiagen) with the modifications indicated
in the TIGR web page. Coupl ing was allowed for
1h at room temperature in the dark using either
Cy3 or Cy5 CyDye NHS-ester (Amersham Bio-
sciences), and fluorescent cDNA was purified with
a Qiaquick column (Qiagen), as described by
manufacturer. Size of the labelled fragments was
monitored by running 1/10 aliquots of labelled
cDNA in 1% agarose gels, using a miniprotean gel
electrophoresis system (Bio-Rad Laboratories),
and scanning using a GenePix 4000B scanner
(Axon). The remaining probe was used to prepare
the hybridization mixture.
Microarray hybridization was performed
manually using Telechem Hybridization Cham-
bers, following Corning instructions. Briefly, slides
were prehybridized for 30 min in 3 · SSC,
0.1%SDS, 0.1 mg/ml BSA, and hybridized over-
night at 50 Cin3· SSC, 0.1% SDS, 0.1 mg/ml
salmon sperm DNA. After hybridization, slides
were washed in 2 · SSC, 0.1% SDS for 5 min at
42 C, 0.1 · SSC, 0,1% SDS for 10 min at room
temperature, and 0.1 · SSC for 5 min at room
temperature. A final rinse in 0.001 · SSC was
done for 30 s before drying the slides in a table
centrifuge (5 min at 600 g).
Slides were scanned in a GenePix 4000B scan-
ner (Axon Instruments) at 10 lm resolution, 100%
laser power, and different PMT values to adjust
the ratio equal to 1.0. Microarray images were
analyzed and globally normalized using GenePix
4.1 (Axon Instruments) software. Only spots
with background-subtracted intensity greater than
379
two-fold the mean ba ckground intensity in at least
one channel were selected and used for normali-
zation and further analysis. Lowess normalization
and data analysis were carried out using Acuity
(Axon Instruments) and/or SAM (Tusher et al.,
2001) software.
RNA gel blot analysis
For probe synthesis, 6 EST clones were selected
from microarray data and used to amplify the
corresponding cDNA by PCR, basically as for
microarray printing. PCR products were
sequenced to confirm the nature of the gene being
tested. Total RNA (10 lg) was analyzed by elec-
trophoresis on 6% formaldehyde/1.2% agarose
gels and blotted ont o nylon membranes Nytran
Plus (Schleicher and Schuell). DNA probes were
labelled with [a-
32
P]dCTP by the Ready to Go
DNA labelling kit (Amersham Biosciences), and
purified with probe purification columns Quick
Spin (Roche). The RNA blots were hybridized
according to the protocol described by Church and
Gilbert (1984) using
32
P-labeled probes. The ethi-
dium bromide staining of the RNA gel was used as
a control for equal loading of all lanes. Blots were
stripped between hybridizations in 0.1% SDS at
90–95 C for 1 h. Quantification of hybridization
signals was achieved using a FujiBass (Fuji).
Results and discussion
Libraries constructed
Plant breeders face challenges con cerning almost
every aspect of plant biology, from vegetative and
reproductive development to tolerance to different
biotic and abiotic stresses. It is, therefore, neces-
sary to get information about genes involv ed in all
these processes. EST projects offer the possibility
to collect genomics information on organisms
whose complete genomes have not been sequenced
(Whitfield et al., 2002; Brenner et al., 2003; Pous-
tka et al., 2003; Vettore et al., 2003; Rise et al.,
2004), and EST analysis has become a rapid and
relatively affordable way to identify genes, pro-
teins, and metabolic pathways through homology
with other sequences in public databases. These
projects obtain sequence data from a sub set of the
total number of genes of an organism, namely
those being expressed in the tissues used to con-
struct the cDNA libraries. The selection of the
RNA sources employed for library construction is
therefore a key issue at designing EST collections.
Rate of gene discovery can be increased by
increasing the depth of sequencing within a library
or by broadening the diversity of tissues from
which the libraries are constructed. In this regard,
the cost of synthesizing numerous cDNA libraries
must be considered, and this approach may not
reveal large numbers of relatively low -copy tran-
scripts. On the other ha nd, given continually lower
sequencing costs, it may be more cost-effective to
sequence fewer random clones from numerous
non-normalized libraries to maximize the return
on each sequencing reaction. Accordingly, in order
to get a genome-wide overview of the repertoire
and the spatial and temporal expression of citrus
genes, we have constructed 19 standard non-nor-
malized, and 4 subtraction citrus cDNA libra ries,
covering a wide range of tissues, developmental
stages, and biotic and abiotic stress conditions
(Table 1).
The specific tissues and processes used to con-
struct the libraries were selected to identify tran-
scripts involved in the main issues relevant to
citriculture. Vegetative development is important
in determining the shape and the general perfor-
mance of trees, and has a decisive influence in the
amount and quality of fruit production, as well as
in culture practi ces. Therefore, some libraries were
constructed from vegetative tissues such as roots,
leaves, shoots, and internodes. Flowering and fruit
set and maturation, as well as seed production and
fruit quality, are of obvious interest in citrus pro-
duction; accordingly, other libraries were con-
structed from flowers, ovaries, and fruits covering
the whole process of reproductive development,
from flowering to fruit senescence and abscission.
Concerning stressing factors, drought, salinity,
and iron deficiency are among the most important
abiotic agents affecting citrus crops. With regard
to biotic stresses, pathogens such as viruses, vi-
roids, and fungi cause great losses. ‘‘Tristeza’’ is
the most important viral disease of citrus trees in
the world, being Citrus tristeza virus (CTV) the
causal agent. Citrus exocortis viroid (CEVd) is the
causal agent of the exocortis disease, a bark
shelling and scaling disorder on trifoliate orange
rootstock. Citrus are also sensitive to infections
caused by several fungal Phytophthora species
380
(mainly P. citrophthora ) that produce the diseases
known as gummosis, foot rot (also named root rot
or collar rot), and brown rot of fruits. To identify
genes involved in the response of citrus trees to all
these stressing agents, different libraries were
constructed from plants exposed to these abiotic
and biotic challenges. Finally, post-harvest pro-
cesses are determinant to ensure the quality of the
fruit and its acceptance by the consumer. Storage
at low temperature is necessary to maintain fruit
quality, once collected, as well as to reduce the
amount of fungicides that are currently used to
prevent fungal diseases (the attack by Penicillium
digitatum is one of the most important causes of
post-harvest losses of citrus fruits). However,
many cultivars do not tolerate these temperatures
and develop cold damage. In order to identify
genes involved in cold tolerance, chilling sensitiv-
ity, and response to fungal attack in citrus fruits,
libraries from cold-stored and P. digitatum-in-
fected fruits were also constructed.
Most libraries were constructed from Ci trus
clementina. However, to isolate ge nes relevan t to
tolerance or susceptibility to different stresses,
several libraries were made from other citrus spe-
cies that are known to respond differentially to
each stress (Table 1). Thus, Cleopatra mandarin
(C. reshni Hort. ex Tan.), a rootstock tolerant to
different abiotic stresses, was selected for con-
structing the Drought2 library. Similarly, C. mac-
rophylla Wester, a rootstock susceptible to Citrus
tristeza virus (CTV) infection, was selected for the
CTVLeafMc1 library, whereas C. medica,an
indicator for the biological indexing of Citrus
exocortis viroid (CEVd) displaying a variety of
symptoms ranging from severe to very mild, was
selected for the ExocortL1 library. Sour orange
(C. aurantium L.) and sweet orange (C. sinensis
(L.) Osb.) were chosen for their respective toler-
ance and susceptibility to Phytophthora citrophtora
infection, being PhyRootSr1 and PhyRootSw1 the
corresponding libraries. The hybrid Fortune
mandarin (C. clementina · C. tangerina), whose
fruits are sensitive to chilling injury during cold-
storage, has been used for the FlavCurFr1 and
FlavFrSub1 libraries. In addition, some libraries
from root tissues were constructed with RNA
from the Carrizo citrange hybrid rootstock
(C. sinensis · Poncirus trifoliata), since Clementine
is usually grafted onto this rootstock in Spain. The
species studied in the present work must not be,
genetically, very distant from each other, since
most cultivated Citrus species known today origi-
nated by sexual hybridization from common
ancestors (Barrett and Rhodes, 1976). Therefore,
homologous genes are expected to have a very high
level of identity and the origin of the different
ESTs was not taken into account during contig
assembly and analysis.
To help characterize relevant genes identified in
the EST collection, a C. clementina Hort. ex Tan.
cv. Clemenules genomic library was constructed in
the cosmid binary vector pBIC20 (Meyer et al.,
1994). Two main features of the library make it
suitable for this. First, DNA fragments in the
range of 8–25 kb can be efficiently cloned, allow-
ing the characterization of relevant genes in terms
of sequence completion or isolation of gene pro-
moters. Secondly, transfer of large DNA frag-
ments into plant genomes has been successfully
reported for this binary vector (Meyer et al.,
1994), facilitating studies of the function of the
cloned genes. Approximately 115,000 primary
recombinant clones with an average insert size of
approximately 17 kb were obtained, which implies
a statistical completion above 99% (data not
shown).
EST collection
A total of 25,536 independent cDNA clones were
randomly isolated from the 25 cDNA libraries and
single-pass sequenced, from the 5
0
end of the clone,
to generate an EST collection. These cDNA clones
are publicly available from the corresponding au-
thor upon request. In total, 22,635 ESTs having
more than 50 non-vector high-quality bases were
obtained, which indicates a global sequencing
success of 89%. All libraries sequenced well, with a
range of 67–98% high-quality sequences. Only 4
libraries had a sequencing success lower than 80%,
and 12 libraries yielded more than 90% good
sequences. High-quality sequences were deposited
in the dbEST division of GenBank (accession
numbers CX286781-CX309414). Size distribution
(Figure 1A) showed that, after trimming of vector
and poor quality sequences, most ESTs (75%)
were longer than 400 bp, with an average sequence
length of 500 nucleotides. EST assembly revealed
that ESTs could be clustered in 8387 singletons
and 3449 contigs, yielding a total of 11,836 puta-
tive unique transcripts. This number of unigenes is
381
probably an overestimation of the number of un-
ique transcripts isolated, as failure of ESTs from a
single transcript to assemble can result from non-
overlapping ESTs, alternate splicing, sequence
polymorphism, and sequencing errors. On the
other hand, in some cases, members of closely
related gene families cannot be resolved into
individual contigs, resulting in an underestimation.
Levels of redundancy after EST assembly have
been estimated to be 20–22% in other EST col-
lections (Kawai et al., 2001; Whitfield et al., 2002;
Vettore et al., 2003). In our case, a total of 8060
citrus putative unigenes with an Arabidopsis hit in
the BLASTX searches had ‘‘best hits’’ to 6023
different Arabidopsis sequences, suggesting a
maximum of 25% redundancy. However, this
higher redundancy may reflect the stringent clus-
tering parameters used to assemble our ESTs into
contigs (see Material and methods), combined
with the genetic diversity present in the plants
sampled. In addition, genomic contamination is
often a minor problem with EST sequencing pro-
grams (Clark et al., 2003). Although genomic
contamination is likely to be highly library-
dependent, one could expect that some of the ESTs
are the result of genomic contamination. Thus the
11,836 putative unigenes may represent some-
where in the order of 8877–9469 citrus genes (75–
80% of the total number of putative unigenes).
Assuming that citrus genomes have about the
same number of protein encoding genes as Ara-
bidopsis does (26,484) (MIPS, November 2004),
our collection represents about one third of the
citrus genes (in the worst scenario). In the best
case, this figure would increase to half of the total
number of citrus genes. Furthermore, the se-
quences reported in this study provide a set of
valuable data on citrus since the ESTs generated
were derived from biologically and agronomically
relevant tissues.
The number of ESTs per contig ranged between
2 (1650 contigs) and 76 (one contig, corresponding
0
1000
2000
3000
4000
5000
0 - 99
100 - 199
200 - 299
300 - 399
400 - 499
500 - 599
600 - 699
700 - 799
> 800
EST length (nucleotides)
Number of ESTs
0
200
400
600
800
1000
1200
1400
1600
1800
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 >20
Number of ESTs / contig
Number of contigs
(B)
(A)
Figure 1. (A) Size distribution of citrus ESTs, (B) Distribution of citrus ESTs among contigs.
382
to a polyubiqui tin gene), while most contigs (80%)
contained 4 or less ESTs (Figure 1B). ESTs that
constitute large clusters partially overlap, thus
allowing the reconstitution of the full-length se-
quence of gen es without having to use expensive
and labor-intensive primer walking sequencing.
These redundant sequences are also a source of
single nucleotide polymorphisms (SNPs) for
molecular marker developmen t. Only 59 contigs
(1.7% of total) included >20 EST sequences,
mainly coming from either subtracted libraries or
standard libraries from plants subjected to differ-
ent stresses, in which these genes are highly rep-
resented. In order to keep the number of co ntigs
with high number of ESTs low, we are currently
using hybridization methods to avoid isolation of
highly represented clones. Table 2 shows the dis-
tribution per library of the number of ESTs, sin-
gletons, contigs, and unigenes, as well as their
redundancy and novelty. Since we are interested in
the optimization of the sequencing effort of the
entire project, redundancy for each library was
calculated as the percentage of ESTs in this library
that correspond to unigenes already obtained in
the whole project. Although this number is nec-
essarily higher than redundancy within the library,
Table 2 shows that most libraries (17) had a level
of redundancy below 30%, while in others (4) it
was higher but still acceptable (30–40%). Only 4
libraries, mainly those that were subtracted and
enriched in differentially expressed transcripts, had
more than 40% redundant ESTs. Novelty is cal-
culated as the percentage of unigenes in each li-
brary that have been isolated only from that
particular library (unique unigenes). This number
represents the level of uniqueness of the library,
which can be considered as an estimation of the
capacity of the library to provide new genes to the
Table 2. cDNA libraries statistics.
Library Clones ESTs Singletons Contigs Unigenes Redundancy (%)
a
Unique unigenes Novelty (%)
b
AbsAOv1 288 224 47 96 143 37 57 39
AbsLeaSub1 768 713 134 224 358 50 210 58
CTVLeafMc1 1248 1076 63 87 150 87 122 81
DroRLeaf1 576 565 264 236 500 12 277 55
Drought1 768 639 116 226 342 47 136 39
Drought2 1632 1443 686 402 1088 25 769 70
ExocortL1 672 650 343 248 591 10 368 62
FerrChloL1 864 640 237 263 500 22 254 50
FerrChloR1 1056 877 317 382 699 21 347 49
FlavCurFr1 1152 1123 427 554 981 13 442 45
FlavFr1 2592 2184 627 793 1420 35 695 48
FlavFrSub1 960 643 161 140 301 54 186 61
FlavRip1 384 312 106 171 277 12 110 39
FlavSen1 288 213 68 126 194 9 69 35
IF1 2496 2345 726 905 1631 31 912 55
NTLeaf1 384 344 105 165 270 22 112 41
OF1 192 164 81 78 159 4 82 51
OF2 1824 1492 691 586 1277 15 735 57
PhIFruit1 576 557 204 272 476 15 216 45
PhyRootSr1 960 885 566 276 842 5 578 68
PhyRootSw1 1920 1820 1126 492 1618 12 1176 72
RindPdig24 1152 1129 384 569 953 16 409 42
Roots1 192 175 48 93 141 20 55 39
SaltLeaf1 768 688 141 323 464 33 151 32
Veg1 1824 1734 719 703 1422 18 726 51
Whole collection 25,536 22,635 8387 3449 11,836 48
a
Redundancy = (1 ) (Unigenes/ESTs)) · 100.
b
Novelty = (Unique unigenes/Unigenes) · 100.
383
collection, or gene discovery. So far, libraries have
novelty in the range of 32–81%, with only 2
libraries having a novelty lower than 39%, and 6
of them exceeding 60% (Table 2). It has to be
noted, however, that libraries with higher novelty
were made from citrus species other than
C. clementina, thus indicating that inter-species
sequence divergence is probably higher than
expected. This possibly produces overestimation of
the number of putative unigenes. The potential of
the library collection is determined by the low
levels of redundancy and high percentages of
novelty of most libraries. A major advantage of
EST sequencing from multiple libraries is that it
increases the possibility to identify genes that are
putatively transcribed specifically within a certain
tissue, during a particular developm ental phase, or
under some environmental conditions. Our anal-
ysis revealed that 9194 unigenes (78% of 11,836)
are so far library-specific. Although the number of
ESTs in our collection is still low, these results
support the suitability of using a complex set of
cDNA libraries to generate a genome-wide EST
collection. Thus, as long as the redundancy level of
the libraries is reasonably low, generation of ESTs
to identify new citrus unigenes will be continued.
A total of 4177 ESTs had BLASTX matches to
the NCBI (National Center for Biotechnology
Information) non-redundant protein database en-
tries with alignments that include the first amino
acid of the protein present in the NCBI da tabase.
In most of these cases, the EST also contains
additional sequence 5
0
of the alig nment, possibly
corresponding to 5
0
-UTR. These cDNA clones
were assumed to be full-length, since cDNA li-
braries were constructed using oligo-dT as a pri-
mer in the cDNA synthesis reaction. These ESTs
represent 26% of the total number of CFGP ESTs
having matches in the nr database, indicating that
about 6600 of the c DNA clones isolated are full-
length (26% of the total number of CFGP clones).
However, identification of full-length clones using
such BLASTX sequence similarity is a relatively
crude method, and this figure of 26% has to be
taken only as a rough estimate.
For libraries constructed from plants infected
with fungi, we did not separate pathogen tissue
from host tissue. It is therefore possible that a
portion of the sequences derived from these li-
braries is of fungal, not citrus, origin. In order to
estimate the proportion of fungal ESTs in these
libraries, we analyze d the presence of Phytoph-
thora sequences in the 2705 ESTs from the Phy-
RootSr1 and PhyRootSw1 libraries, constructed
from Phytophthora-infected roots (see Table 1).
Using BLASTN against the collection of 77,544
Phytophthora spp. sequences from Phytophthora
Functional Genomics Database (http://
www.pfgd.org) we identified 266 ESTs with signi-
ficative hits to Phytophthora sequences (E-va-
lue < 10
)5
). Most of them (243) were also very
similar to Arabidopsis sequences when compared
with the Arabidopsis full set of proteins, and 5
more were similar to non-Arabidopsis plant pro-
teins in the NCBI nr database. Consequently, these
248 ESTs probably represent co nserved sequences
and we cannot discriminate whether they are plant
or fungal ESTs. The remaining 18 ESTs have a
high probability to correspond to Phytophthora
genes. The number of Phytophthora genes in these
libraries will therefore range between 18 and 243
(0.7–9.0%). A similar proportion of ESTs of fun-
gal origin (4%) was obtained in a similar analysis
of the 1129 ESTs of the RindPdig24 library, con-
structed from P. digitatum-infected fruits. We have
flagged these ESTs in order to take this into
account in ulterior analyses. These data suggest
that only a small fraction of the sequences
obtained from the pathogen-challenged libraries
were derived from the pathogen.
Functional annotation of citrus unigenes
Functional categorization of citru s putative unig-
enes was made on the basis of the functional cat-
egory attributed by MIPS (Munich Information
Center for Protein Sequences; http://mips.gsf.de)
to the most similar Arabidopsis protein. This
analysis indicated that 3776 citrus unigenes (nearly
32%) do not produce any significant matches
(Figure 2). This group probably includes citrus- or
tree-specific genes, but also conserved genes whose
ESTs in this project contain mainly noncoding
sequence (e.g. 5
0
-or3
0
-UTR) or are too short to
yield a significative score in the BLASTX searches.
To estimate the proportion of truly citrus- or tree-
specific genes, the citrus unigenes were screened to
identify those with a high-quality sequence long
enough to include a significant portion of the open
reading frame. We performed this an alysis for
unigenes with at least 450, 500, 550, 600, and
650 bp high-quality sequence, obtaining 7417,
384
6735, 5893, 4965, and 4031 different sequences,
respectively. The analysis was also performed for
unigenes with more than 700 bp high-quality
sequence, but the number of unigenes was highly
reduced (2989) and skewed towards contigs com-
posed by a high number of ESTs. Accordingly, we
did not consider them in the analysis. We calcu-
lated then the proportion of these unigenes which
fail to get an Arabidopsis BLASTX hit (Figure 3).
The results obtained revealed that an increase in
query sequence length decreases the probability of
not finding a significant match in protein data-
bases. However, longer sequences did not reduce
the proportion of citrus unigenes without an
Arabidopsis orthologue below 15%. This result
indicates that at least 15% of the unigenes in our
EST collection (about 1800 out of 11,836) are
estimated to be citrus- or tree-specific, and dem-
onstrate the importance of EST sequencing in crop
plants, as it is highly likely that there are genes
specific to crop species which are absent in model
plants. Further studies on these genes could reveal
new important and interesting proteins, as well as
unique biosynthetic pathways not yet discovered
in other systems. In any case, additional citrus
ESTs will improve gene annotation in the collec-
tion by assembling more ESTs into larger contig
sequences.
Of the remaining citrus unigenes, 1893 (16%)
were similar to Arabidopsis proteins with no
function assigned yet by MIPS (Fig. 2). Putative
MIPS-style functional categories could be
assigned to the rest of the transcripts (represent-
ing 6167 unigenes). Most of the putative proteins
appeared to be involved in metabolism (22%),
followed by those participating in protein syn-
thesis and processing (9%), transcript ion (6%),
and cell cycle and DNA process ing (5%). It is
worth noting that we have obtained citrus ESTs
for every major MIPS functional category, indi-
cating that a genome-wide EST collection has
been generated. Moreover, the distribution of the
functionally annotated citrus unigenes in these
functional categories roughly resembles that of
Arabidopsis full set of proteins (Figure 4). The
Development (systemic)
0,35%
Cell fate
0,53%
Transport
facilitation
0,34%
Transposable elements
and
viral proteins
0,21%
Cell type differentiation
0,01%
Tissue differentiation
0,06%
Control of
cellular organization
0,44%
Storage
proteins
0,05%
No functional
classification
15,99%
Regulation of /
Interaction with
environment
0,71%
No similar Arabidopsis
protein
31,90%
Metabolism
22,28%
Energy
1,65%
Cell cycle and
DNA processing
4,89%
Transcription
6,16%
Protein
synthesis
3,99%
Protein fate
4,67%
Cellular
transport
1,23%
Signal transduction
2,73%
Cell rescue and defense
1,80%
Figure 2. Distribution of citrus EST unigenes amongst functional categories.
0
5
10
15
20
25
30
35
0 450 500 550 600 650
min. length (bp)
% of unigenes with no
Arabidopsis match
Figure 3. Percentage of citrus unigenes with no Arabidopsis
similar protein when only sequences above different lengths
were considered.
385
biggest difference was found in the lower per-
centage of transposable elements and viral pro-
teins in the citrus unigene set. This is probably
due to the fact that the full set of proteins of
Arabidopsis was obtained from the genome
sequence, where gene discovery is independent of
the level of gene expression, while the citrus EST
collection was obtained from the citrus tran-
scriptome, and genes with low expression are
therefore under-represented. In addition, genes
encoding for proteins involved in transcription or
in signal transduction were also under-represented
in the citrus collection, probably due to their low
level of expression. Finally, genes encoding pro-
teins for metabolism and protein synthesis were
over-represented in the citrus colle ction when
compared with the representation of these genes
in the Arabidopsis genome. This probably reflects
the housekeeping role of these proteins and their
high level of expression in many different tissues
and developmental stages. The similarity between
the functional distribution of our EST collection
and the functional distribution of an organism’s
complete set of genes indicates again that a gen-
ome-wide EST collection has been generated.
ESTs were also annotated with the Gene
Ontology (GO) terms and the KEGG enzyme
classification (EC) number assigned to the most
similar Arabidopsis protein. A total of 16,092
ESTs (71% of the 22,635 ESTs) displayed mat-
ches to Arabidopsis proteins (data not shown).
Of these, 9294 (41% of the total number of ESTs)
could be annotated in at least one of the 3 dif-
ferent GO ontologies (Molecular Function, Bio-
logical Process, or Cellular Component), with
3925 ESTs (17%) anno tated in the Molecular
Function ontology, 2479 (11%) in the Biological
Process ontology, 8177 (36%) in the Cellular
Component ontology, and 1867 (8%) in the 3
ontologies. Only 2743 ESTs (12%) could be
annotated with an KEGG EC number. We expect
that ongoing improvements in functional anno-
tation for Arabidopsis proteins will lead to future
improvements in citrus EST annotation and will
allow assignment of putative functions to many
of the citrus genes wi th no function ascribed.
All these annotations, along with information
on source cDNA library, trimmed nucleot ide
sequence, contig assembly, and full BLAST
reports, were recorded in the database of the
project and are available on the web site, with
hyperlinks to NCBI Entrez nucleotide (http://
www.ncbi.nih.gov/Entrez/), Gene Ontology
(http://www.geneontology.org), KEGG (http://
www.genome.ad.jp/kegg), TAIR (Th e Arabidopsis
Information Resource; http://www.arabidop-
sis.org) and MIPS (http://mips.gsf.de) databases.
Moreover, the database may be queried for
annotated data or for sequence/library informa-
tion, returning all the CFGP clones matching the
search criteria. For each clone returned, a set of
links lead the user to detailed information on that
0 5 10 15 20 25 30 35
Unclassified protein
Cell type differentiation
Tissue differentiation
Storage proteins
Transposable elements and viral proteins
Development (systemic)
Transport facilitation
Control of cellular organization
Cell fate
Regulation of / Interaction with environment
Cellular transport and transport mechanisms
Energy
Protein synthesis
Signal transduction
Protein fate
Cell cycle and DNA processing
Transcription
Metabolism
% of unigenes
Arabidopsis thaliana
Citrus spp.
Figure 4. Comparison of the distribution of unigenes in MIPS functional categories between Arabidopsis thaliana genome and CFGP
citrus EST collection.
386
clone, including library information, nucleotide
raw and trimmed sequence, contig consensus
sequences and EST members, functional annota-
tions, orthologues in Arabidopsis and other
organisms, BLAST results, etc.
Microarray construction and technical
characterization
In order to generate a cDNA microarray for gene
expression analysis in citrus, 12,672 clones of the
CFGP collection were PCR-amplified and pre-
pared for glass slide printing. Analysis of PCR
products revealed that 92% of the clones yielded
unique bands in the PCR reaction. The clones
grouped in 6875 unigenes, following the same
contig distribution showed in Figure1B for the
whole EST collection. Accordingly, a high per-
centage of unigenes represented in the microarray
were printed 2 to 5-fold, allowing intra-slide
confirmation of expression data. On the other
hand, aroun d 5400 genes were repres ented only
once in this microarray. The clones printed onto
the microarray come from 18 different cDNA li-
braries, with a tissue distribution of 35% leaves,
15% flowers, 19% roots, 20% young fruits, and
11% flavedo. Most clones (85%) were isolated
from Clementine libraries, whereas 15% were
obtained from other citrus varieties or rootstocks.
Cross-hybridization is expected to occur in cDNA
microarrays under normal hybridization condi-
tions when sequence identity between probes and
samples is higher than 70% (Evertsz et al., 2001),
and the use of cDNA microarrays for inter-spe-
cies hybridizations has been reported in plants
(Horvath et al., 2003) and animals (Rise et al.,
2004). Consequently, this microarray should be
suitable for experiments carried out with different
citrus species. Although we have not yet per-
formed a detailed study of heterologous hybrid-
ization using different citrus varieties, preliminary
results showed that no significant differences in
signal intensity were found between non-clemen-
tine and clementine probes when hybridized with
citrus RNA samples from different citrus varieties
or species (unpublished results).
We next assessed the reliability of our micro-
array platform. Technical replicates (8) from
the same plant material (young leaves from
C. clementina greenhouse plants) were performed.
For every replicate, mRNA was extracted
separately, and the different preparations were
Cy5-labelled and hybridized to slides from the
same batch. A common Cy3-labelled reference
made from leaves of 40 independent greenhouse
C. clementina plants was used in all replicates.
Spots that showed background-subtracted fluo-
rescence signals below threshold levels (two-times
the mean background in at least one channel) were
filtered out for further analysis. Negative controls
fell always within this group. Microarray data
were then Lowess-norma lized to account for
intensity-dependent dye differences between chan-
nels (Yang et al., 2002). Figure 5 displays box
plots of the Lowess log
2
-ratios for each of the 8
replicates. The box plots are centered at zero and
have fairly similar spreads. Accordingly, we chose
not to adjust for scale, as the noise introduced by a
scale normalization of different replicates may be
more detrimental than a small difference in scale.
Dynamic range and sensitivity levels were
determined using Lucidea Universal ScoreCard
spiked controls. Although in some cases we were
able to de tect as low as 15 pg mRNA, reproduc-
ible and consistent results were obtained when
at least 50 pg mRNA were included in the sample
(500 ng of poly(A)
+
RNA) (data not shown),
which represents around 10 copies per cell (Ruan
et al., 1998). Dynamic range was found to cover
2.5 orders of magnitude.
Intra-array reproducibility was first estimated
comparing the intensity level and ratio of dupli-
cate calibration controls distributed across the
array. Although signal intensity varied slightly
along the different sections of the array (data not
replicates
Lowess Log Ratio
12345678
-2
-1
0
1
2
Figure 5. Box-plot displaying the log ratio for different
microarray replicates after Lowess normalization.
387
shown), Cy5/Cy3 ratio values were quite consis-
tent, with coefficient of variation (CV) values not
higher than 14%. We next calculated the Cy5/
Cy3 ratio values for different clones belonging to
the same contig. This analysis was done for 10
randomly selected genes with 3–5 representatives
in the microarray (data not shown). Average CV
was 12%, with a range of 8% to 16%, and only 3
contigs showed a CV intra-array higher than
13%. This result indicates that relative expression
levels between the Cy5 and Cy3 labelled samples
were consistent between members of the same
contig, even though the length and origin of the
different representatives were different.
Inter-array reproducibility was assessed by
comparing the results obtaine d in the 8 technical
replicates. An average of 74% of the spots showed
background-subtracted fluorescence levels higher
than 2-fold background levels. This value ranged
from 66% to 79% in the different slides, and in
two slides were less than 70% of spots above
threshold levels (data not shown). Moreover, when
comparing the spots with signal intensity above
threshold level in the different slides, we found that
virtually the same spots were consistently selected
in the 8 repli cates (see Table 3). We also calculated
the CV in replicates for each of the spots. The
frequency histogram in Figure 6A shows that this
CV was 8–15% for most of the spots, with an
average CV of 14% and 86% of the spots showing
a CV of <20%. We then plotted CV values
against overall dynamic signal range, as a function
of log
10
(Cy5 + Cy3) fluorescence signal (Fig-
ure 6B). The average CV observed was also
around 14% for spots with log
10
signal below 4, a
group including 96% of spots. CVs between
technical microarray replicates usually range
between 10–12% (Yue et al., 2001), indicating that
our microarray platform is yielding reproducible
results.
Validation of microarray to identify changes
in gene expression
Gene expression analysis in ovaries and young
fruits of C. clementina was carried out as a pre-
liminary experiment to validate the use of the
cDNA microarray by comparing data obtained
with RNA gel blot analysis. Ovaries were colle cted
from untreated emasculated flowers, while fruits
were obtained by treating emasculated flowers
with GA
3
and harvesting at 1, 3, 7, and 14 days
after treatment. Total RNA was extracted, and
RNA from fruits from all 4 stages were mixed at
1:1:1:1 ratio. Ovary and fruit samples were labelled
with both Cy3 and Cy5 dyes and used to hybridize
six slides, three of them dye-swapped. Statistical
significance was assessed using one-class SAM
analysis (Tusher et al., 2001). A total of 1016 sig-
nificant genes (328 up-regulated, 688 down-regu-
lated during ovary-fruit transition) were identified
in this way with a q-value < 0.05 (D¼1.61). (see
supplementary da ta).
Among genes with elevated mRNA levels, the
rubisco small subunit showed the largest set of
cDNA clones (9 cDNAs), maybe reflecting the
high expression level of this gene in the tissues
employed to construct the cDNA libraries. A cit-
rus gene similar to late embryogenesis abundant
protein Lea5 and a probable AAA-type ATPase
Table 3. Comparison of spots with fluorescence levels above threshold in different microarray replicates.
Number of replicate Number of replicate
12345678
1 92 82 84 94 91 93 85
2 91 85 86 95 92 95 89
3 92 96 92 95 93 97 93
4 91 94 90 94 93 96 90
5 88907981– 909284
6 8587788091– 9183
7 879182839392– 87
8 90958888969498–
Values correspond to the percentage of spots selected in the replicate indicated on the left that were also selected in the replicate
indicated on top.
388
were also represented by more than 1 spot. Other
up-regulated genes included members of the
guanine nucleotide exchange protein family, heat
shock proteins (Hsp20.1 and 18), a plastidic
aldolase, a senescence-associated cysteine prote-
ase, several genes encoding polypeptides similar to
unknown proteins, and citrus cDNAs with no
homologous sequences in public databases. Genes
with decreased mRNA levels upon fruit set
included, among others, an invertase/pectin
methylesterase inhibitor protein, a S-adenosyl-
L-methionine:sal icylic acid methyltransferase, and
genes encoding proteins similar to major allergen
proteins of different plants (see supplementary
data). Work in progress using the cDNA clones
corresponding to these up- and down-regulated
genes will allow us to understand their role during
early stages of fruit development.
To confirm gene expression data obtained with
cDNA microarray, we carried out RNA gel
blot analysis. Six cDNAs identified either as
up-regulated or down-regulated in the microarray
analysis were selected and used as probes on RNA
gel blots containing total RNA extracted from the
same tissues and developmental stages as for the
microarray experiment. The selected genes were
annotated as late embryogenesis abundant protein
Lea5 (clone C20007E03), AAA-type ATPase
(clone C20001C03), rubisco small subun it (clone
C01006G06), invertase/pectin methylesterase
inhibitor family protein (clone C20006G11), major
allergen Pru ar 1 (clone C02015B09), and S-ade-
nosyl-L-methionine:salicylic acid methyltransfer-
ase (clone C20010B07). All 6 genes tested
hybridized to single RNA species (Figure 7) and
showed very similar changes in mRNA levels when
RNA gel blots and cDNA microarray analysis
were compared (Table 4). Table 4 shows fold
changes in RNA gel blot analysis as the signal in
Ovary compared with average signal in Fruit1 and
Fruit2 from Figure 7. These results indicate that
the cDNA microarray generated within our pro-
ject is able to identify, at least, 2-fold changes in
mRNA levels, and that the data obtained with the
microarray are consistent with the results obtained
by RNA gel blot analysis.
Future prospects
The CFGP citrus genome-wide cDNA clone and
EST collections, genomic library and microarrays
presented here constitute some of the first high-
coverage genomics tools available to carry out
genome analysis in citrus. Their use will undoubt-
edly facilitate research on function, structure, and
CV (%)
frequency
0 10203040506070
0
100
200
300
400
500
600
Log10 (Cy5 + Cy3) Signal
CV (%)
2.3 2.8 3.3 3.8 4.3 4.8 5.3
0
10
20
30
40
50
60
70
(A)
(B)
Figure 6. Coefficient of variation (CV) in replicates for each of
the spots in the microarray. (A) Frequency histogram of CV.
(B) CV plotted as a function of the overall dynamic signal range
(as log
10
Cy5 + Cy3 signal).
Figure 7. RNA gel blot analysis. (A) Genes upregulated upon
fruit-set. (B) Genes downregulated upon fruit-set. Ovary, ova-
ries from emasculated flowers between 1 week before anthesis
and anthesis; Fruit1, fruits collected 1 and 3 days after GA
3
treatment of emasculated flowers; Fruit2, fruits collected 7 and
14 days after GA
3
treatment of emasculated flowers. Ethidium
bromide (EtBr) staining was used as loading control.
389
regulation of the citrus genome, as well as the iso-
lation of genes of interest for citrus improvement.
The identification of citrus genes up- and down-
regulated using these genomic resources illustrates
their suitability to assist in the characterization of
citrus genes relevant to different biological and
agronomical issues. We are currently using these
resources for transcription profiling in citrus under
different conditions and to evaluate the popula-
tions of citrus deposited in the Germplasm Bank of
the IVIA. Future reports will detail these studies.
The clones which were used to generate the EST
collection, the genomic library, and the cDNA
microarrays are publicly available and represent a
valuable resource for the citrus research commu-
nity, ensuring that large amounts of data will be
collected on the same physical set of resource
clones. We believe that these genomic resources
will contribute to achieve an integrated and com-
prehensive knowledge of the molecular and cellular
mechanisms underlying citrus biologi cal proper-
ties, and thereby facilitate development of im-
proved citrus cultivars and production practices.
Acknowledgements
The authors thank the technicians and researchers
who contribute to the CFGP and whose names are
listed at the web site of the project (http://citrus-
genomics.ibmcp-ivia.upv.es). We are also indebted
to J.J. Tuset, J.L. Mira, C. Hinarejos, and
A. Medina for kindly preparing specific plant
material and treatments. This project was jointly
sponsored by ‘‘Conselleria de Agricultura, Pesca y
Alimentacion de la Comunidad Valenciana’’ and
Spanish ‘‘Ministerio de Ciencia y Tecnologia’’
(research grant GEN2001-4885-C0 5). Construc-
tion of cDNA library FlavCurFr1 and isolation of
their corresponding ESTs were supported by
research grant AGL 2002-01727 from Spanish
‘‘Ministerio de Ciencia y Tecnologia’’.
References
Abbott, A., Georgi, L., Yvergniaux, D., Wang, Y., Blenda, A.,
Reighard, G., Inigo, M. and Sosinski, B. 2002. Peach the
model genome for Rosaceae. Acta Hort. 575: 145–155.
Adams, M.D., Soares, M.B., Kerlavage, A.R., Fields, C. and
Venter, J.C. 1993. Rapid cDNA sequencing (expressed
sequence tags) from a directionally cloned human infant
brain cDNA library. Nat. Genet. 4: 373–380.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman,
D.J. 1990. Basic Local Alignment Search Tool. J. Mol. Biol.
215: 403–410.
Barrett, H.C. and Rhodes, A.M. 1976. A numerical taxonomic
study of the affinity relationships in cultivated Citrus and its
close relatives. Systematic Botany 1: 105–136.
Bausher, M., Shatters, R., Chaparro, J., Dang, P., Hunter, V.
and Niedz, R.. 2003. An expressed sequence tag (EST) set
from Citrus sinensis L. Osbeck whole seedlings and the
implications of further perennial source investigations. Plant
Sci. 165: 415–422.
Brenner, E.D., Stevenson, D.W., McCombie, R.W., Katari,
M.S., Rudd, S.A., Mayer, K.F., Palenchar, P.M., Runko,
S.J., Twigg, R.W., Dai, G., Martienssen, R.A., Benfey, P.N.
and Coruzzi, G.M. 2003. Expressed sequence tag analysis in
Cycas, the most primitive living seed plant. Genome Biol. 4:
R78.
Bugos, R.C., Chiang, V.L., Zhang, X.-H., Campbell, E.R.,
Podila, G.K. and Campbell, W.H. 1995. RNA isolation from
plant tissues recalcitrant to extraction in guanidine. Bio-
Techniques 19: 734–737.
Church, G.M. and Gilbert, W. 1984. Genomic sequencing.
Proc. Natl. Acad. Sci. USA 81: 1991–1995.
Clark, M.S., Edwards, Y.J.K., Peterson, S., Clifton, S.W.,
Thompson, A.J., Sasaki, M., Suzuki, Y., Kikuchi, K., Watabe,
S., Kawakami, K., Sugano, S., Elgar, G. and Johnson, S.L.
Table 4. Comparison of gene expression data between cDNA microarray and RNA gel blot analysis.
EST Annotation Average fold change
Microarray RNA gel blot
a
Induced
C20007E03 Late embriogenesis abundant protein Lea5 1.7 1.5
C20001C03 Putative AAA-type ATPase 2.5 2.0
C01006G06 Rubisco small subunit 1.9 2.4
Repressed
C20006G11 Invertase/pectin methylesterase inhibitor )4.1 )4.8
C02015B09 Major allergen Pru ar1 )3.1 )3.3
C20010B07 SAM:salicylic acid methyltransferase )2.2 )3.4
a
Fold change in RNA gel blot analysis is signal in Ovary compared to average signal in Fruit1 and Fruit2 from Figure 7.
390
2003. Fugu ESTs: news resources for transcriptionanalysisand
genome annotation. Genome Res. 13: 2747–2753.
Emmanuel, E. and Levy, A.A. 2002. Tomato mutants as tools
for functional genomics. Curr. Opin. Plant Biol. 5: 112–117.
Evertsz, E.M., Au-Young, J., Ruvolo, M.V., Lim, A.C. and
Reynolds, M.A. 2001. Hybridization cross-reactivity within
homologous gene families on glass cDNA microarrays.
Biotechniques 31: 1182–1192.
Ewing, B. and Green, P. 1998. Base-calling of automated
sequencer traces using Phred. II. Error probabilities.
Genome Res. 8: 186–194.
Ewing, B., Hillier, L., Wendl, M.C. and Green, P. 1998. Base-
calling of automated sequencer traces using Phred. I.
Accuracy assessment. Genome Res. 8: 175–185.
Girke, T., Todd, J., Ruuska, S., White, J., Benning, Ch. and
Ohlrogge, J. 2000. Microarray analysis of developing Ara-
bidopsis seed. Plant Physiol. 124: 1570–1581.
Gordon, D., Abajian, C. and Green, P. 1998. Consed: a
graphical tool for sequence finishing. Genome Res. 8: 195–
202.
Horvath, D.P., Schaffer, R., West, M. and Wisman, E. 2003.
Arabidopsis microarrays identify conserved and differentially
expressed genes involved in shoot growth and development
from distantly related plant species. Plant J. 34: 125–134.
Kawai, J., Shinagawa, A., Shibata, K., Yoshino, M., Itoh, M.,
Ishii, Y., Arakawa, T., Hara, A., Fukunishi, Y., Konno, H.,
et al. (2001). Functional annotation of a full-length mouse
cDNA collection. Nature 409: 685–690.
Lawrence, C.J., Dong, Q., Polacco, M.L., Seigfred, T.E. and
Brendel, V. 2004. MaizeGDB, the community database for
maize genetics and genomics. Nucl. Acids Res. 32 (Database
issue): D393–D397.
McKinney, E.C., Ali, N., Traut, A., Feldmann, K.A., Belos-
totsky, D.A., McDowell, J.M. and Meagher, R.B. 1995.
Sequence-based identification of T-DNA insertion mutations
in Arabidopsis: actin mutants act2-1 and act4-1. Plant J. 8:
613–622.
Meyer, K., Leube, M.P. and Grill, E. 1994. A protein
phosphatase 2C involved in ABA signal transduction in
Arabidopsis thaliana. Science 264: 1452–1455.
Navarro, L., Pina, J.A., Juarez, J., Ballester-Olmos, J.F.,
Arregui, J.M., Ortega, C., Navarro, A, Duran-Vila, N.,
Guerri, J., Moreno, P., Cambra, M. and Zaragoza, S. 2002.
The Citrus variety improvement program in Spain in the
period 1975–2001. Proceedings of the 15th Conference of the
International Organization for Citrus Virol, In: N. Duran-
Vila, R.G. Milne and J.V. da Grac¸ a (Eds.), International
Organization for Citrus Virology, Riverside, CA, pp. 306–
316.
Poustka, A.J., Groth, D., Hennig, S., Thamm, S., Cameron, A.,
Beck, A., Reinhardt, R., Herwig, R., Panopoulou, G. and
Lehrach, H. 2003. Generation, annotation, evolutionary
analysis, and database integration of 20.000 unique sea
urchin EST clusters. Genome Res. 13: 2736–2746.
Rise, M.L., von Schalburg, K.R., Brown, G.D., Mawer, M.A.,
Devlin, R.H., Kuipers, N., Busby, M., Beetz-Sargent, M.,
Alberto, R., Gibbs, A.R., Hunt, P., Shukin, R., Zeznik, J.A.,
Nelson, C., Jones, S.R.M., Smailus, D.E., Jones, S.J.M.,
Schein, J.E., Marra, M.A., Butterfield, Y.S.N., Stott, J.M.,
Ng, S.H.S., Davidson, W.S. and Koop, B.F. 2004. Devel-
opment and application of a salmonid EST database and
cDNA microarray: data mining and interspecific hybridiza-
tion characteristics. Genome Res. 14: 478–490.
Ruan, Y., Gilmore, J. and Conner, T. 1998. Towards Arabid-
opsis genome analysis: monitoring expression profiles of
1400 genes using cDNA microarrays. Plant J. 15: 821–833.
Schaffer, R., Landgraf, J., Accerbi, M., Simon, V.V., Larson,
M. and Wisman, E. 2001. Microarray analysis of diurnal and
circadian-regulated genes in Arabidopsis. Plant Cell 13:
113–123.
Schena, M., Shalon, D., Davis, R.W. and Brown, P.O. 1995.
Quantitative monitoring of gene expression patterns with a
complementary DNA microarray. Science 270: 467–470.
Shimada, T., Kita, M., Endo, T., Fujii, H., Ueda, T.,
Moriguchi, T. and Omura, M. 2003. Expressed sequence
tags of ovary tissue cDNA library in Citrus unshiu Marc.
Plant Sci. 165: 167–168.
The Arabidopsis Genome Initiative (AGI). 2000. Analysis of
the genome sequence of the flowering plant Arabidopsis
thaliana. Nature 408: 796–815.
Tusher, G.T., Tibshirani, R. and Chu, G. 2001. Significance
analysis of microarrays applied to the ionizing radiation
response. Proc. Natl. Acad. Sci. USA 98: 5116–5121.
Vettore, A.L., da Silva, F.R., Kemper, E.L., Souza, G.M., da
Silva, A.M., Ferro, M.I.T., Henrique-Silva, F., Giglioti,
E.A., Lemos, M.V.F., Coutinho, L.L. et al. 2003. Analysis
and functional annotation of an expressed sequence tag
collection for tropical crop sugarcane. Genome Res. 13:
2725–2735.
Whetten, R., Sun, Y.H., Zhang, Y. and Sederoff, R. 2001.
Functional genomics and cell wall biosynthesis in loblolly
pine. Plant Mol. Biol. 47: 275–291.
White, J.A., Todd, J., Newman, T., Focks, N., Girke, T.,
Martinez de Ilarduya, O., Jaworski, J.G., Ohlrogge, J.B. and
Benning, Ch. 2000. A new set of Arabidopsis expressed
sequence tags from developing seeds. The metabolic pathway
from carbohydrates to seed oil. Plant Physiol. 124: 1582–
1594.
Whitfield, Ch.W., Band, M.R., Bonaldo, M.F., Kumar, Ch.G.,
Liu, L., Pardinas, J.R., Robertson, H.M., Soares, M.B. and
Robinson, G.E. 2002. Annotated expressed sequence tags
and cDNA microarrays for studies of brain and behavior in
the honey bee. Genome Res. 12: 555–566.
Xue, Y., Li, J. and Xu, Z. 2003. Recent highlights of the China
Rice Functional Genomics Program. Trends Genet. 19: 390–
394.
Yamamoto, K. and Sasaki, T. 1997. Large-scale EST sequenc-
ing in rice. Plant Mol. Biol. 35: 135–144.
Yang, Y.H., Dudoit, S., Luu, P., Lin, D.M., Peng, V., Ngai, J.
and Speed, T.P. 2002. Normalization for cDNA microarray
data: a robust composite method addressing single and
multiple slide systematic variation. Nucleic Acids Res. 30:
e15.
Yazaki, J., Kojima, K., Suzuki, K., Kishimoto, N. and Kikuchi,
S. 2004. The Rice PIPELINE: a unification tool for plant
functional genomics. Nucleic Acids Res. 32 Database issue:
D383–D387.
Yue, H., Eastman, P.S., Wang, B.B., Minor, J., Doctolero,
M.H., Nuttall, R.L., Stack, R., Becker, J.W., Montgomery,
J.R., Vainer, M. and Johnston, R. 2001. An evaluation of the
performance of cDNA microarrays for detecting changes in
global mRNA expression. Nucleic Acids Res. 29: e41.
391
    • "This type of strategy may not be applicable in the long-lived species such as citrus (Talon and Gmitter, 2008). Reverse genetics provide an opportunity to identify a specific transcript that is involved in the resistance against disease through their expression (Dixon, 2001; Forment et al., 2005). Resistance alleles may be forced to express by exposing the species to heavy bacterial inoculums (Kiedrowski et al., 1992). "
    [Show abstract] [Hide abstract] ABSTRACT: Huanglongbing (HLB) is a major threat to citrus sustainable yield and production. Therefore, various strategies are discussed in this review to provide solutions for the control of the disease. These include phyto-sanitory techniques to reduce pathogen inoculum in the field and which are based on several approaches such as the presence of a reliable pathogen detection system, control over vector populations, cultural practices, and chemotherapy and finally the production of disease-free propagating material. In addition to phytosanitory techniques, efforts to introduce resistant genes into cultivatable germplasm are also needed and are thus also discussed in this review.
    Full-text · Article · May 2014
    • "Availability of the citrus genome sequence (http:// www.phytozome.net/citrus), microarray data sets (Agustí et al., 2008; Alós et al., 2008; Ancillo et al., 2007; Aprile et al., 2011; Cercós et al., 2006; Gandía et al., 2007; Huerta et al., 2008) and extensive EST collections (Forment et al., 2005) provide a long list of candidate genes that might be associated with interesting agronomic traits for breeding programs. The use of CLBV-based vectors to evaluate plant gene function is particularly attractive for citrus, in which analysis of genes involved in certain biological processes like flowering and fruiting by conventional breeding programs is hampered by their long juvenile period, and the difficulty for genetic transformation of adult plants (Cervera et al., 2008). "
    [Show abstract] [Hide abstract] ABSTRACT: Virus induced gene silencing (VIGS) is an effective technology for gene function analysis in plants. We assessed the VIGS effectiveness in Nicotiana benthamiana and citrus plants of different Citrus leaf blotch virus (CLBV)-based vectors, using inserts of the phytoene desaturase (pds) gene. While in N. benthamiana the silencing phenotype was induced only by the construct carrying a 58-nt pds hairpin, in citrus plants all the constructs induced the silencing phenotype. Differences in the generation of secondary small interfering RNAs in both species are believed to be responsible for differential host-species effects. The ability of CLBV-based vectors to silence different endogenous citrus genes was further confirmed. Since CLBV-based vectors are known to be stable and induce VIGS in successive flushes for several months, these vectors provide an important genomic tool and it is expected that they will be useful to analyze gene function by reverse genetics in the long-lived citrus plants.
    Full-text · Article · Apr 2014
    • "For the microarray assay we used Citrus cDNA microarray slides consisting of 21.081 cDNA probes, generated by the Spanish Citrus Functional Genomics Project (CFGP) [55]. These probes correspond to Citrus-expressed sequence tags (EST) from different gene libraries [56]. For microarray hybridization, Cy5-labelled aRNA synthesized from each individual mRNA sample and Cy3-labelled aRNA synthesized from a reference sample consisting of a mixture of equal amounts of RNA from all experimental samples were combined in equal amounts (200 pmoles of each dye) and fragmented using the RNA Fragmentation Reagents (Ambion). "
    [Show abstract] [Hide abstract] ABSTRACT: Pathogens interaction with a host plant starts a set of immune responses that result in complex changes in gene expression and plant physiology. Light is an important modulator of plant defense response and recent studies have evidenced the novel influence of this environmental stimulus in the virulence of several bacterial pathogens. Xanthomonas citri subsp. citri is the bacterium responsible for citrus canker disease, which affects most citrus cultivars. The ability of this bacterium to colonize host plants is influenced by bacterial blue-light sensing through a LOV-domain protein and disease symptoms are considerably altered upon deletion of this protein. In this work we aimed to unravel the role of this photoreceptor during the bacterial counteraction of plant immune responses leading to citrus canker development. We performed a transcriptomic analysis in Citrus sinensis leaves inoculated with the wild type X. citri subsp. citri and with a mutant strain lacking the LOV protein by a cDNA microarray and evaluated the differentially regulated genes corresponding to specific biological processes. A down-regulation of photosynthesis-related genes (together with a corresponding decrease in photosynthesis rates) was observed upon bacterial infection, this effect being more pronounced in plants infected with the lov-mutant bacterial strain. Infection with this strain was also accompanied with the up-regulation of several secondary metabolism- and defense response-related genes. Moreover, we found that relevant plant physiological alterations triggered by pathogen attack such as cell wall fortification and tissue disruption were amplified during the lov-mutant strain infection. These results suggest the participation of the LOV-domain protein from X. citri subsp. citri in the bacterial counteraction of host plant defense response, contributing in this way to disease development.
    Full-text · Article · Nov 2013
Show more