Page 1
Transcriptome Profiling of a Toxic Dinoflagellate Reveals
a Gene-Rich Protist and a Potential Impact on Gene
Expression Due to Bacterial Presence
Ahmed Moustafa1, Andrew N. Evans2, David M. Kulis3, Jeremiah D. Hackett4, Deana L. Erdner2,
Donald M. Anderson3, Debashish Bhattacharya1*
1 Ecology, Evolution and Natural Resources, Institute of Marine and Coastal Sciences, Rutgers, The State University of New Jersey, New Brunswick, New Jersey, United
States of America, 2Marine Science Institute, University of Texas at Austin, Port Aransas, Texas, United States of America, 3Woods Hole Oceanographic Institution, Woods
Hole, Massachusetts, United States of America, 4 Ecology and Evolutionary Biology Department, University of Arizona, Tucson, Arizona, United States of America
Abstract
Background: Dinoflagellates are unicellular, often photosynthetic protists that play a major role in the dynamics of the
Earth’s oceans and climate. Sequencing of dinoflagellate nuclear DNA is thwarted by their massive genome sizes that are
often several times that in humans. However, modern transcriptomic methods offer promising approaches to tackle this
challenging system. Here, we used massively parallel signature sequencing (MPSS) to understand global transcriptional
regulation patterns in Alexandrium tamarense cultures that were grown under four different conditions.
Methodology/Principal Findings: We generated more than 40,000 unique short expression signatures gathered from the
four conditions. Of these, about 11,000 signatures did not display detectable differential expression patterns. At a p-value ,
1E-10, 1,124 signatures were differentially expressed in the three treatments, xenic, nitrogen-limited, and phosphorus-
limited, compared to the nutrient-replete control, with the presence of bacteria explaining the largest set of these
differentially expressed signatures.
Conclusions/Significance: Among microbial eukaryotes, dinoflagellates contain the largest number of genes in their
nuclear genomes. These genes occur in complex families, many of which have evolved via recent gene duplication events.
Our expression data suggest that about 73% of the Alexandrium transcriptome shows no significant change in gene
expression under the experimental conditions used here and may comprise a ‘‘core’’ component for this species. We report
a fundamental shift in expression patterns in response to the presence of bacteria, highlighting the impact of biotic
interaction on gene expression in dinoflagellates.
Citation: Moustafa A, Evans AN, Kulis DM, Hackett JD, Erdner DL, et al. (2010) Transcriptome Profiling of a Toxic Dinoflagellate Reveals a Gene-Rich Protist and a
Potential Impact on Gene Expression Due to Bacterial Presence. PLoS ONE 5(3): e9688. doi:10.1371/journal.pone.0009688
Editor: Ramy K. Aziz, Cairo University, Egypt
Received November 13, 2009; Accepted February 22, 2010; Published March 12, 2010
Copyright: � 2010 Moustafa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was primarily funded by a collaborative grant from the National Institutes of Health (R01 ES 013679-01A2) awarded to DB, DMA, and M.
Bento Soares. Funding support for DMA and DLE was also provided from the Woods Hole Center for Oceans and Human Health from the NSF/NIEHS Centers for
Oceans and Human Health program, NIEHS (P50 ES 012742) and (NSF OCE-043072). Additional support came from the National Science Foundation (EF-0732440)
in a grant awarded to F. Gerald Plumley, DB, JDH, and DMA. AM was supported by an Institutional NRSA (T 32 GM98629). The funders had no role in study design,
data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: bhattacharya@aesop.rutgers.edu
Introduction
Dinoflagellates (Phylum Alveolata, Supergroup Chromalveolata)
are unicellular protists that are among the most abundant
phytoplankton in marine and freshwater ecosystems. Dinoflagellates
display a range of lifestyles that together make these organisms of
central ecological and economic importance. On the one hand, as
oxygenic photosynthesizers, about 50% of the known species play a
vital role in oxygen evolution and ocean primary production. On
the other hand, some dinoflagellate species form massive toxic or
non-toxic harmful algal blooms (commonly known as ‘‘red tides’’) in
the oceans, leading to negative impacts on human health, fisheries,
and many other coastal resources.
Dinoflagellates can exhibit different trophic states, of which
some are obligatory and others reflect rapid and transient
responses to cellular or environmental conditions. Many dinofla-
gellates are able to exist autotrophically via photosynthesis in some
stages of their lifecycle. However, there are also strict cases of
heterotrophy due to the absence of plastids, as in Protoperidinium
that feeds on other dinoflagellates [1] and Paulsenella that
parasitizes diatoms [2]. In addition, alternation between autotro-
phy and heterotrophy; i.e., mixotrophy, exists in many dinofla-
gellates and is supported by the presence of food vacuoles and
plastids in these taxa (e.g., Alexandrium ostenfeldii [3,4]).
In dinoflagellates, sexuality and subsequent encystment play a
key role in bloom dynamics [5]. Encystment allows dinoflagellates
to survive unfavorable environmental conditions in the form of
resistant cysts, which remain dormant for a mandatory period of
several months and then germinate when conditions become
favorable. The exponential proliferation of germinated cells results
PLoS ONE | www.plosone.org 1 March 2010 | Volume 5 | Issue 3 | e9688
Page 2
in blooms, which terminate through induction of encystment.
Cysts can also be geographically dispersed, giving rise to blooms in
regions with no previous history of that species [6,7,8,9].
Although dinoflagellates follow a typical eukaryotic G1-S-G2-M
cell cycle [10], they have genetic and cytological properties that
distinguish them starkly from other eukaryotes. One of the most
remarkable characteristics of dinoflagellates is the large amount of
nuclear DNA. On average, algal and plant nuclei contain 0.5 pg/
cell, however, in dinoflagellates, DNA content varies from 2.0 pg/
cell as in Amphidinium carterae [11] to up to 200.0 pg/cell in
Lingulodinium polyedrum (formerly Gonyaulax polyedra) [12], corre-
sponding to ca. 200,000 Mb. Such a massive amount of DNA has
made dinoflagellates a challenging system for complete genome
sequencing approaches. However, modern transcriptomic meth-
ods provide promising strategies to gene discovery in dinoflagel-
lates and an opportunity to address key questions about their
ecology and life cycles.
Bacterial assemblages were shown to be associated with and
attached to dinoflagellates [13] where their availability markedly
affects different aspects of dinoflagellate life cycles such as the
quantity of toxin that is produced [14,15], level of motility [16],
growth rate [15,17], and bloom formation and termination [18].
To investigate the influence of the biotic interaction between
dinoflagellates and associated bacterial communities, we prepared
RNA from a xenic (X) strain of Alexandrium tamarense (hereafter,
Alexandrium) and compared its expression profile to that of the
nutrient-replete control condition (F) and nutrient-stressed cells
under nitrogen (N) and phosphorus (P) limitation. A previous study
[19] validated the utilization of ‘‘massively parallel signature
sequencing’’ (MPSS) [20] to analyze transcriptional regulation in a
closely related dinoflagellate (Alexandrium fundyense) and provided
evidence for the complexity of the transcriptome, the presence of
gene families, and the extent of transcriptional regulation. Here,
we report the results of a comprehensive profiling of Alexandrium
transcriptome using MPSS. Our results provide novel insights into
the extent of gene richness, the dynamics of gene family evolution,
the magnitude of transcriptional regulation, and the impact of the
presence of bacteria on global gene expression patterns in
dinoflagellates.
Results and Discussion
Using MPSS, each sample resulted in a library of ,3,000,000
short signature sequences, containing an average of 290,941
unique sequences (hereafter, simply signatures) with 1,073,382
signatures from all treatments. After screening for deterministic
(i.e., absence of nucleotide ambiguities) and significantly expressed
signatures (i.e., $4 signatures per million [TPM] in at least one
library), we found between 38,000 – 39,000 usable signatures per
culture treatment (Table 1). We identified 40,029 unique
signatures when the data from all treatments were combined. In
agreement with earlier findings [21], our data show that the most
abundant transcripts among the examined conditions belong to
families that encode chlorophyll a-b binding protein, histone family
protein, S-adenosylmethionine synthetase, and S-adenosylhomo-
cysteine hydrolase. Of a total of 40,029, only 18, 2, and 12
signatures were found exclusively in the nutrient-replete (control),
N-depleted, and P-depleted cultures, respectively. In contrast, 487
signatures were found exclusively in the xenic culture, suggesting
the presence of bacteria had the most significant impact on the
transcriptome of Alexandrium under the conditions used here; i.e.,
exclusive transcription of 1.3% of the total number of transcribed
genes. Our data also showed the expected transcriptional
responses to nutrient limitation, in particular the up-regulation
of genes involved in the pathways of cell-death and gamete
formation, which will be discussed in detail elsewhere. Here, we
focus on genome-wide aspects of dinoflagellate gene expression
with a specific focus on the impact of associated bacteria on gene
expression.
Gene Content and Gene Families
Previous MPSS analyses using well-annotated genomes have
shown a strong correlation between the number of transcribed
signatures and the total number of nuclear genes. In addition,
these studies have demonstrated that as more libraries and
conditions are examined, the number of unique signatures more
closely represents the total number of predicted gene models in a
genome. For example, in Arabidopsis, the number of annotated
genes is 27,165 [22] and the number of unique MPSS signatures
associated with protein coding regions gathered from 17 libraries is
at least 29,569 [23]. Based on this correlation, we postulate that
there are about 40,000 transcribed genes in Alexandrium, making it
the most complex protist transcriptome yet described. It should be
remembered, however, that although this number is relatively
large compared to other free-living protists (e.g., 27,000 genes in
the ciliate Tetrahymena thermophila [24] and 12,000 genes in the
diatom Phaeodactylum tricornutum [25]), it does not account for the
massive amount of nuclear DNA (ca. 150 Gb, estimated using
pulse-field gel electrophoresis) in haploid Alexandrium cells. Clearly,
gene number and genome size are uncoupled in these taxa. It is
worth pointing out that in a recent genome size versus gene
content regression study, dinoflagellates were predicted to contain
40,086 genes in the smallest genome and 92,013 genes in the
largest [26].
Table 1. Summary of Alexandrium tamarense MPSS signatures that were significant and reliable.
Condition Common Specific Unique .10 .100 .1000
Nutrient-replete (Control) 38633 18 38651 17041 895 35
Nitrogen-limited 38948 2 38950 14128 1180 45
Phosphorus-limited 38780 12 38792 14580 1068 38
Xenic (bacterized) 38078 487 38565 15213 878 39
Average 38610 130 38740 15241 1005 39
Total 39426 603 40029 23412 1843 61
Definitions of the column headers are as following: Common; signatures that are expressed under at least one other treatment, Specific; signatures that are exclusively
expressed under the corresponding treatment, Unique; the total number of unique signatures expressed under the corresponding treatment, and .10, .100, and
.1000; signatures with expression values at least 10, 100, and 1000 TPM, respectively.
doi:10.1371/journal.pone.0009688.t001
Dinoflagellate transcriptome
PLoS ONE | www.plosone.org 2 March 2010 | Volume 5 | Issue 3 | e9688
Page 3
This unusually high number of transcribed genes in Alexandrium
is unlikely to represent unique functional categories; rather many
may comprise large gene families that arose by extensive gene
duplication events. To address this hypothesis in a conservative
fashion, we first identified 4,341 expressed sequence tags (ESTs)
from this strain that match perfectly and uniquely the identified set
of reliable and significant MPSS signatures. Then we used KEGG
Orthology (KO) [27] to functionally cluster these ESTs into
families, resulting in the assignment of 1,020 KO entries to 2,169
ESTs (Table 2). The largest gene family comprises 31 members
that encode peptidylprolyl isomerase (EC 5.2.1.8; cyclophilin).
Subsequently, we counted the number of pairwise mismatches
between signatures that correspond to ESTs clustered into the
same families and ESTs belonging to different families. By
comparing the numbers of pairwise mismatches between signa-
tures from the two groups, we found that five mismatches can
distinguish significantly between the two categories with p-value ,
1E-10. Thus, using five mismatches as the maximum number of
pairwise mismatches between signatures to obtain a rough
estimate of the genome-wide distribution of gene families, we
found 56 families with more than 100 members and the largest
family contains 139 members (Figure 1). The largest family with
members of known function contains 81 members and encodes
pyruvate kinase (EC 2.7.1.40). The second largest family of known
function encodes ribosomal protein L27a and contains 74
members. However, using KO-predicted families, we found cases
where signatures within the same families shared low to zero
identity. These cases are interpreted as duplicated genes with a
relatively ancient common ancestor and the accumulation of
mutations in the 39 UTR has erased the phylogenetic signal in the
signature sequences.
Examining the Alexandrium expression data drew our attention to
several examples of different genes that belong to the same family
and exhibit similar transcriptional profiles. For example, three S-
adenosylmethionine synthetase (SAMS) genes were down-regulated
in the bacterized culture. Similarly, three serine hydroxymethyl-
transferase (SHMT) genes were also down-regulated under this
treatment. Genes encoding light-harvesting chlorophyll binding
proteins followed the same pattern. In addition, four members of the
ubiquitin family were up-regulated under nutrient limitation. To
examine this association between gene family members and gene
expression, we identified six signatures with a single mismatch
between each pair with each of the six signatures having perfect
matches to ESTs that encode the alpha subunit of the eukaryote
translation elongation factor (EF-1a). The multiple sequence
alignment (Figure 2A) of the signatures and their matching ESTs
shows co-segregation of the mismatches among the signatures along
with mismatches among the ESTs, suggesting these mismatches are
not due to sequencing errors. Next, we found that the expression
values of these six signatures are strongly correlated (Figure 2B) with
a general pattern of up-regulation under nitrogen limitation and
down-regulation in the presence of bacteria; i.e., when both are
compared to the nutrient-replete culture. Therefore, the expression
profiles among members of this (and perhaps many other) gene
family are strongly correlated. This suggests that gene family
expansion in Alexandrium may be a general mechanism used to
enhance transcript abundance. Searching for similar patterns of co-
regulation among family members, we found several families of
different sizes (2, 4, and 8 family members; Table 3) that follow the
same trend. In summary, our data indicate that dinoflagellate
genomes contain large gene families with evidence for expression
correlation among studied family members.
Table 2. Gene families identified using KEGG orthology that have sizes .10 members.
Gene Definition Class Size
E5.2.1.8 peptidylprolyl isomerase [EC:5.2.1.8] Genetic Information Processing; Folding, Sorting and Degradation 31
ANK ankyrin Cellular Processes and Signaling; Cytoskeleton 29
E2.5.1.18, gst glutathione S-transferase [EC:2.5.1.18] Metabolism; Metabolism of Other Amino Acids; Glutathione metabolism 23
fabD [acyl-carrier-protein] S-malonyltransferase [EC:2.3.1.39] Metabolism; Lipid Metabolism; Fatty acid biosynthesis 19
CALM calmodulin Environmental Information Processing; Signal Transduction; Calcium 17
fabG 3-oxoacyl-[acyl-carrier protein] reductase [EC:1.1.1.100] Metabolism; Lipid Metabolism; Fatty acid biosynthesis 16
ATPF0C, atpE F-type H+-transporting ATPase subunit c [EC:3.6.3.14] Metabolism; Energy Metabolism; Oxidative phosphorylation 15
eEF-1A, ef1A elongation factor EF-1 alpha subunit [EC:3.6.5.3] Genetic Information Processing; Translation 15
dnaJ molecular chaperone DnaJ Genetic Information Processing; Chaperones and 15
E4.2.1.17, paaG enoyl-CoA hydratase [EC:4.2.1.17] Metabolism; Carbohydrate Metabolism; Propanoate metabolism 14
E1.14.11.16 aspartate beta-hydroxylase [EC:1.14.11.16] Unclassified; Metabolism; Other enzymes 13
rluD ribosomal large subunit pseudouridine synthase D
[EC:5.4.99.12]
Genetic Information Processing; Translation; Other translation 13
HSPA1_8 heat shock 70 kDa protein 1/8 Environmental Information Processing; Membrane Transport; Pores ion 12
YWHA tyrosine 3-monooxygenase Cellular Processes; Cell Growth and Death; Cell cycle 12
RAB Rab family, other Cellular Processes and Signaling; GTP-binding 12
E1.1.1.37B, mdh malate dehydrogenase [EC:1.1.1.37] Metabolism; Carbohydrate Metabolism; Citrate cycle (TCA cycle) 11
E1.1.1.95, serA D-3-phosphoglycerate dehydrogenase [EC:1.1.1.95] Metabolism; Amino Acid Metabolism; Glycine, serine and threonine 11
GAPDH, gapA glyceraldehyde 3-phosphate dehydrogenase [EC:1.2.1.12] Metabolism; Carbohydrate Metabolism; Glycolysis/Gluconeogenesis 11
E3.1.3.16 protein phosphatase [EC:3.1.3.16] Unclassified; Metabolism; Other enzymes 11
RP-L40e, RPL40 large subunit ribosomal protein L40e Genetic Information Processing; Translation; Ribosome 11
doi:10.1371/journal.pone.0009688.t002
Dinoflagellate transcriptome
PLoS ONE | www.plosone.org 3 March 2010 | Volume 5 | Issue 3 | e9688
Page 4
Bacterial Presence and Gene Expression
Although complex and multi-species bacterial assemblages have
been shown to be associated with dinoflagellates both in extra- and
intra-cellular environments [28,29], taxa appear to be limited to
the Cytophaga-Flavobacterium-Bacteroides (CFB) group and the
a- and c- classes of Proteobacteria. In this study, we did not
attempt to identify the prokaryotes present in the bacterized
Alexandrium culture. Previous studies have shown however that
members of the genera Roseobacter (a-Proteobacteria) and Alter-
omonas (c-Proteobacteria) are the dominant bacterial groups
associated with Alexandrium sp. [30]. Here, we focused on the
effect of the presence of bacteria in the culture on gene expression
in the dinoflagellate. To identify transcriptionally regulated genes,
we used the nutrient-replete culture as the control condition and
identified signatures that were significantly up or down regulated
in the other conditions using Fisher’s test. At p-value , 1E-10, we
found 1,124 signatures that were differentially expressed among
the three (xenic, N-limited, P-limited) treatments compared to the
control (Figure 3). By relaxing the p-value to a relatively more
permissive threshold of 0.05 to detect even slight changes in
expression among treatments, we identified ca. 11,000 differen-
tially expressed signatures, indicating that about 29,000 signatures
are consistently expressed with non-significant differences under
the culture conditions used here. In dramatic contrast to a recent
study which showed that about 6% of the expressed genes in rice
are uniformly expressed, housekeeping genes [31], our results
suggest that about 73% of the Alexandrium transcriptome comprises
a ‘‘core’’ component and 27% comprises the regulated compo-
nent, under differing cellular or environmental conditions. Of the
1,124 signatures, 307 (27%) were differentially expressed in the
xenic culture, of which 119 and 188 were up- and down-regulated,
respectively. Of these differentially regulated transcripts, two sets
of genes stand out because they are collectively involved in the
regulation of two important cellular processes, the methionine-
homocysteine cycle and photosynthesis.
Methionine-Homocysteine Cycle. The majority of the
signatures that showed a significant expression change in the
xenic culture were down-regulated. Of these, three signatures
match perfectly (i.e., 20/20 matching nucleotides) three different
ESTs encoding S-adenosylmethionine synthetase (SAMS; EC
2.5.1.6). Although the log2 fold-change ratios were not dramatic
for these signatures, 0.7 (4038/2534), 2.0 (579/142), and 1.5 (826/
286), expression differences were statistically significant with
p-values , 1E-10, respectively. SAMS catalyzes the synthesis
of S-adenosylmethionine (SAM) from methionine and ATP
[32,33] and is vital for prokaryotic and eukaryotic cellular
growth and proliferation. SAM is the primary methyl group
(CH3) donor and a precursor for the biosynthesis of polyamines
[34]. In saxitoxin-producing microorganisms, e.g., Alexandrium
and the cyanobacterium Anabaena circinalis, SAM is thought to
act as an alkylating agent in the biosynthesis of saxitoxin [35,36].
Given such a critical role for SAM, the observed significant
decrease in the transcriptional level of three different SAMS-
encoding genes in Alexandrium in the presence of the bacterial
community may potentially be of biological significance. A similar
interaction between Amoeba proteus and its proteobacterial
Legionella-like symbionts was shown to repress the transcription
of amoebal host SAMS genes [37,38]. It was proposed that
plasmids from the bacterial symbionts [39] transfer defective
copies of SAMS to the nuclear genome of the amoeba host,
thereby repressing transcription of native SAMS. This establishes
complete dependence of the amoeba on symbiont supply of
bacterial SAMS, with removal of the latter resulting in host death
[37]. Although such an irreversible repression of host SAMS
Figure 1. Distribution of gene family size with a maximum of five pairwise mismatches. Histogram of the extrapolated sizes of gene
families and the frequency of each class of family size.
doi:10.1371/journal.pone.0009688.g001
Dinoflagellate transcriptome
PLoS ONE | www.plosone.org 4 March 2010 | Volume 5 | Issue 3 | e9688
Page 5
activity in the presence of bacteria has not been previously
reported in dinoflagellates, a possible, albeit speculative,
explanation for our result is that bacterial effectors employ a
mechanism that ‘‘transiently’’ down-regulates the transcription of
Alexandrium SAMS. With regard to SAMS, among the
significantly down-regulated genes is S-adenosylhomocysteine
hydrolase (SAHH) with a fold-change of 1.13 (636/291) and p-
value of 1.92E-28. SAHH is a key player in the methionine cycle
by catalyzing the reversible hydrolysis of S-adenosylhomocysteine
(SAH) to homocysteine (HCY) and adenosine [40]. This takes
place after the transfer of the methyl group from SAM to an
acceptor and the conversion of SAM to SAH in SAM-dependent
methylation reactions [41]. Although preliminary, these results
begin to demonstrate the significant impact of bacterial presence
on Alexandrium via the regulation of key enzymes that share
metabolic connections.
Photosynthesis. The second set of genes that were signi-
ficantly affected by the presence of bacteria in the Alexandrium
culture is those involved in photosynthesis. These genes are
categorized into two groups (Table 4). The first is primarily
associated with light absorption and carbon fixation and were
down-regulated, whereas the second group was up-regulated.
Members of the latter group play a role in photoprotection and
response to light stress. Among the down-regulated genes are three
Figure 2. Co-regulation of elongation factor 1-a gene family members. (A) Multiple sequence alignment of six signatures and their matching
ESTs. The six signatures contain one or two pairwise mismatches. The mismatches among the signatures co-segregate along with mismatches in the
ESTs. (B) Heatmap of the expression of the six signatures.
doi:10.1371/journal.pone.0009688.g002
Table 3. Gene families with significant within-family
co-regulated expression patterns.
Gene Definition r2 p-value Size
atpE F-type H+-transporting ATPase subunit c
[EC:3.6.3.14]
0.952 4.82E-02 8
CETN2 centrin-2 0.995 4.62E-03 2
metK S-adenosylmethionine synthetase
[EC:2.5.1.6]
0.994 5.73E-03 2
petF ferredoxin 0.953 4.70E-02 2
psaC photosystem I subunit VII 0.910 8.95E-02 2
psaE photosystem I subunit IV 0.974 2.57E-02 2
rbcL ribulose-bisphosphate carboxylase large
chain [EC:4.1.1.39]
0.998 2.16E-03 4
SMD2 small nuclear ribonucleoprotein D2 0.927 7.33E-02 3
tktB transketolase [EC:2.2.1.1] 0.992 8.12E-03 2
YWHA tyrosine 3-monooxygenase 0.949 5.13E-02 2
doi:10.1371/journal.pone.0009688.t003
Dinoflagellate transcriptome
PLoS ONE | www.plosone.org 5 March 2010 | Volume 5 | Issue 3 | e9688
End of preview.