Content uploaded by Natalia De Souza Araujo
Author content
All content in this area was uploaded by Natalia De Souza Araujo on Apr 18, 2019
Content may be subject to copyright.
Letter to the Editor
Getting Useful Information from RNA-Seq Contaminants:
A Case of Study in the Oil-Collecting Bee
Tetrapedia diversipes Transcriptome
Natalia de Souza Araujo,
1
Alexandre Rizzo Zuntini,
2
and Maria Cristina Arias
1
To the Editor:
The RNA-Seq is a straightforward technique widely used
in studies of gene expression, especially for nonmodel spe-
cies. This approach results in a comprehensive data set of
genes expressed and their frequency without the need of
species-specific probes or a reference genome. Because plant
and animal species are constantly interacting with each other
and the RNA-Seq is not a species-specific approach, it is
highly probably that one can find genes from alien species in
a transcriptome data set. Indeed, as reported herein, the data
analyses of Tetrapedia diversipes transcriptome revealed
contaminant transcripts from plants and parasites. The deep
exploitation of this plant contaminant data set proved to be a
useful source of information concerning the biology and
behavior of this bee.
The oil-collecting bee T. diversipes is a solitary species
native of the Neotropical region. This species is bivoltine,
that is, presents two main reproductive generations during the
year: one in the hot and wet season (generation one—G1) and
the second during the cold and dry season (generation two—
G2). The developmental cycle since egg till adult varies
significantly between the two generations because the pre-
pupal larvae from G2 enter in diapause (Alves-dos-Santos
et al., 2006). To understand the differences between each
reproductive generation, we have used the RNA-Seq tech-
nique to sequence the transcriptome from female foundresses
from G1 and G2.
Nine T. diversipes from each reproductive generation
were collected in front of trap nests at the city of Sa
˜oPaulo
(Brazil) in an area close to a small secondary semideciduous
forest containing many native and ornamental plants
(Alves-dos-Santos et al., 2006). To identify the contaminant
transcripts, complete assembled transcriptomes from G1
and G2 of T. diversipes foundresses were blasted against the
Uniref database (August, 2015) using the Annocript pro-
gram (v1.2) (Musacchia et al., 2015). Scripts in R (v3.1.3),
bash, Python (v2.7.9), and manual checking were then used
to identify and select contaminant transcripts from plants
(pipeline and scripts available at https://github.com/nat2bee/
trans-contamination/tree/master).
From the transcriptomes of G1 and G2 female foundresses,
respectively, 857 and 538 transcripts were identified as plant
contaminants. Contaminant transcripts from G1 blasted
against 28 plant families and almost 50% of them (13) were
found exclusively in G1. Whereas in G2, 19 different families
were identified and four were only found in G2 (Table 1).
These results indicate that the richness of plants visited by
females from G1 is greater than that of plants visited by
females from G2, which may be related to the floral bloom
during spring. Our data corroborate an earlier study on pollen
diversity storage in T. diversipes nests (Menezes et al., 2012).
Table 1 presents all plant families identified among the
contaminant transcripts in each generation and their pro-
portion in the data set. These findings are in agreement with
previous ecological studies (Alves-dos-Santos et al., 2006;
Menezes et al., 2012), especially regarding the use of the
Euphorbiaceae as the main pollen source in larval feeding.
Furthermore, because Amaranthacea and Euphorbiaceae are
the two main families visited during G1 and Euphorbiaceae is
the main source during G2, the hypothesis that T. diversipes
is not a truly polilectic species but has preferences for specific
families is supported. As oil source, it is known that this bee
uses plants from the Malpighiacea family, but it is not clear
whether other families are also visited (Alves-dos-Santos
et al., 2006). In the present data set, transcripts from the
Cucurbitaceae and Solanaceae families were found in both
generations, which suggests that these families may be also
visited for oil collection.
Therefore, as described here, the use of contaminant
transcripts might be a useful source of information not only
for the study of insect–plant interactions but also for analyses
of other associations such as parasitism and symbioses. These
data are usually neglected in transcriptomic studies, but the
present results indicate that contaminant transcripts from any
transcriptomic data set can be extremely valuable to answer
different biological questions.
Nevertheless during this type of analyses, it is important
to keep in mind that the public databases used for identifi-
cation through blast are incomplete and transcripts identifica-
tion may be deficient in some cases. Also when the transcripts
are from highly conserved genes, the identification of a
1
Departamento de Gene
´tica e Biologia Evolutiva, Instituto de Biocie
ˆncias, Universidade de Sa
˜o Paulo, Sa
˜o Paulo, Brazil.
2
Departamento de Biologia Vegetal, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, Brazil.
OMICS A Journal of Integrative Biology
Volume 20, Number 0, 2016
ªMary Ann Liebert, Inc.
DOI: 10.1089/omi.2016.0054
1
taxonomic group may be compromised. Thus, the use of the
reported approach associated with ecological observations or
as a general and comparative tool is recommended.
Acknowledgments
The authors would like to thank Isabel Alves-dos-Santos
for the support during bee collection, to Susy Coelho for
technical assistance, to FAPESP (Sa
˜o Paulo Research Foun-
dation, process numbers 2013/12530-4 and 2012/18531-0)
for financial support, and to the reviewers for suggestions.
This work was developed in the Research Center on Biodi-
versity and Computing (BioComp) of the Universidade de
Sa
˜o Paulo (USP), supported by the USP Provost’s Office
for Research.
Author Disclosure Statement
The authors declare that no conflicting financial interests
exist.
References
Alves-dos-Santos I, Naxara SRC, and Patrı
´cio EFLRA. (2006).
Notes on the Morphology of Tetrapedia diversipes Klug 1810
(Tetrapediini, Apidae), an Oil-collecting bee. Braz J Morphol
Sci 23, 425–430.
Menezes GB, Gonc¸alves-Esteves V, Bastos EMAF, Augusto
SC, and Gaglianone MC. (2012). Nesting and use of pollen
resources by Tetrapedia diversipes Klug (Apidae) in Atlantic
Forest areas (Rio de Janeiro, Brazil) in different stages of
regeneration. Rev Bras Entomol 56, 86–94.
Musacchia F, Basu S, Petrosino G, Salvemini M, and Sanges R.
(2015). Annocript: a flexible pipeline for the annotation of
transcriptomes also able to identify putative long noncoding
RNAs. Bioinformatics 31, 2199–2201.
Address correspondence to:
Natalia de Souza Araujo, MSc
Departamento de Gene
´tica e Biologia Evolutiva
Instituto de Biocie
ˆncias
Universidade de Sa
˜o Paulo
Rua do Mata
˜o, 277
Cidade Universita
´ria
Sa
˜o Paulo 05508-090
Brazil
E-mail: na.araujo@usp.br
Table 1. Classification, Numbers, and Proportion of the Contaminant Transcripts from Plants Found
in the Transcriptome of Tetrapedia diversipes Foundresses from Generations One (G1) and Two (G2)
Plant family
No. of G1 contaminant
transcripts
% of G1 contaminant
transcripts
No. of G2 contaminant
transcripts
% of G2 contaminant
transcripts
Aizoaceae 1 0.13 — —
Amaranthaceae 255 31.95 1 0.20
Arecaceae 6 0.75 — —
Asteraceae 2 0.25 — —
Brassicaceae 11 1.38 6 1.19
Cactaceae 1 0.13 — —
Caryophyllaceae 2 0.25 — —
Chenopodiaceae 3 0.38 — —
Cleomaceae 2 0.25 5 0.99
Curcubitaceae 8 1.00 4 0.79
Euphorbiaceae 329 41.23 331 65.67
Fabaceae 16 2.00 10 1.98
Lamiaceae 1 0.13 — —
Lentibulariaceae 2 0.25 — —
Lythraceae — — 1 0.20
Malvaceae 19 2.38 25 4.96
Moraceae — — 6 1.19
Musaceae 3 0.38 — —
Myrtaceae 4 0.50 44 8.73
Nelumbonaceae 2 0.25 6 1.19
Oleaceae 2 0.25 — —
Onagraceae — — 3 0.60
Pedaliaceae 30 3.76 — —
Phrymaceae 33 4.14 — —
Poaceae 1 0.13 1 0.20
Rhizophoraceae — — 1 0.20
Rosaceae 14 1.75 17 3.37
Rubiaceae 6 0.75 — —
Rutaceae 5 0.63 10 1.98
Salicaceae 18 2.26 22 4.37
Solanaceae 11 1.38 8 1.59
Vitaceae 11 1.38 3 0.60
2 ARAUJO ET AL.