ArticlePDF Available
Letter to the Editor
Getting Useful Information from RNA-Seq Contaminants:
A Case of Study in the Oil-Collecting Bee
Tetrapedia diversipes Transcriptome
Natalia de Souza Araujo,
Alexandre Rizzo Zuntini,
and Maria Cristina Arias
To the Editor:
The RNA-Seq is a straightforward technique widely used
in studies of gene expression, especially for nonmodel spe-
cies. This approach results in a comprehensive data set of
genes expressed and their frequency without the need of
species-specific probes or a reference genome. Because plant
and animal species are constantly interacting with each other
and the RNA-Seq is not a species-specific approach, it is
highly probably that one can find genes from alien species in
a transcriptome data set. Indeed, as reported herein, the data
analyses of Tetrapedia diversipes transcriptome revealed
contaminant transcripts from plants and parasites. The deep
exploitation of this plant contaminant data set proved to be a
useful source of information concerning the biology and
behavior of this bee.
The oil-collecting bee T. diversipes is a solitary species
native of the Neotropical region. This species is bivoltine,
that is, presents two main reproductive generations during the
year: one in the hot and wet season (generation one—G1) and
the second during the cold and dry season (generation two—
G2). The developmental cycle since egg till adult varies
significantly between the two generations because the pre-
pupal larvae from G2 enter in diapause (Alves-dos-Santos
et al., 2006). To understand the differences between each
reproductive generation, we have used the RNA-Seq tech-
nique to sequence the transcriptome from female foundresses
from G1 and G2.
Nine T. diversipes from each reproductive generation
were collected in front of trap nests at the city of Sa
(Brazil) in an area close to a small secondary semideciduous
forest containing many native and ornamental plants
(Alves-dos-Santos et al., 2006). To identify the contaminant
transcripts, complete assembled transcriptomes from G1
and G2 of T. diversipes foundresses were blasted against the
Uniref database (August, 2015) using the Annocript pro-
gram (v1.2) (Musacchia et al., 2015). Scripts in R (v3.1.3),
bash, Python (v2.7.9), and manual checking were then used
to identify and select contaminant transcripts from plants
(pipeline and scripts available at
From the transcriptomes of G1 and G2 female foundresses,
respectively, 857 and 538 transcripts were identified as plant
contaminants. Contaminant transcripts from G1 blasted
against 28 plant families and almost 50% of them (13) were
found exclusively in G1. Whereas in G2, 19 different families
were identified and four were only found in G2 (Table 1).
These results indicate that the richness of plants visited by
females from G1 is greater than that of plants visited by
females from G2, which may be related to the floral bloom
during spring. Our data corroborate an earlier study on pollen
diversity storage in T. diversipes nests (Menezes et al., 2012).
Table 1 presents all plant families identified among the
contaminant transcripts in each generation and their pro-
portion in the data set. These findings are in agreement with
previous ecological studies (Alves-dos-Santos et al., 2006;
Menezes et al., 2012), especially regarding the use of the
Euphorbiaceae as the main pollen source in larval feeding.
Furthermore, because Amaranthacea and Euphorbiaceae are
the two main families visited during G1 and Euphorbiaceae is
the main source during G2, the hypothesis that T. diversipes
is not a truly polilectic species but has preferences for specific
families is supported. As oil source, it is known that this bee
uses plants from the Malpighiacea family, but it is not clear
whether other families are also visited (Alves-dos-Santos
et al., 2006). In the present data set, transcripts from the
Cucurbitaceae and Solanaceae families were found in both
generations, which suggests that these families may be also
visited for oil collection.
Therefore, as described here, the use of contaminant
transcripts might be a useful source of information not only
for the study of insect–plant interactions but also for analyses
of other associations such as parasitism and symbioses. These
data are usually neglected in transcriptomic studies, but the
present results indicate that contaminant transcripts from any
transcriptomic data set can be extremely valuable to answer
different biological questions.
Nevertheless during this type of analyses, it is important
to keep in mind that the public databases used for identifi-
cation through blast are incomplete and transcripts identifica-
tion may be deficient in some cases. Also when the transcripts
are from highly conserved genes, the identification of a
Departamento de Gene
´tica e Biologia Evolutiva, Instituto de Biocie
ˆncias, Universidade de Sa
˜o Paulo, Sa
˜o Paulo, Brazil.
Departamento de Biologia Vegetal, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, Brazil.
OMICS A Journal of Integrative Biology
Volume 20, Number 0, 2016
ªMary Ann Liebert, Inc.
DOI: 10.1089/omi.2016.0054
taxonomic group may be compromised. Thus, the use of the
reported approach associated with ecological observations or
as a general and comparative tool is recommended.
The authors would like to thank Isabel Alves-dos-Santos
for the support during bee collection, to Susy Coelho for
technical assistance, to FAPESP (Sa
˜o Paulo Research Foun-
dation, process numbers 2013/12530-4 and 2012/18531-0)
for financial support, and to the reviewers for suggestions.
This work was developed in the Research Center on Biodi-
versity and Computing (BioComp) of the Universidade de
˜o Paulo (USP), supported by the USP Provost’s Office
for Research.
Author Disclosure Statement
The authors declare that no conflicting financial interests
Alves-dos-Santos I, Naxara SRC, and Patrı
´cio EFLRA. (2006).
Notes on the Morphology of Tetrapedia diversipes Klug 1810
(Tetrapediini, Apidae), an Oil-collecting bee. Braz J Morphol
Sci 23, 425–430.
Menezes GB, Gonc¸alves-Esteves V, Bastos EMAF, Augusto
SC, and Gaglianone MC. (2012). Nesting and use of pollen
resources by Tetrapedia diversipes Klug (Apidae) in Atlantic
Forest areas (Rio de Janeiro, Brazil) in different stages of
regeneration. Rev Bras Entomol 56, 86–94.
Musacchia F, Basu S, Petrosino G, Salvemini M, and Sanges R.
(2015). Annocript: a flexible pipeline for the annotation of
transcriptomes also able to identify putative long noncoding
RNAs. Bioinformatics 31, 2199–2201.
Address correspondence to:
Natalia de Souza Araujo, MSc
Departamento de Gene
´tica e Biologia Evolutiva
Instituto de Biocie
Universidade de Sa
˜o Paulo
Rua do Mata
˜o, 277
Cidade Universita
˜o Paulo 05508-090
Table 1. Classification, Numbers, and Proportion of the Contaminant Transcripts from Plants Found
in the Transcriptome of Tetrapedia diversipes Foundresses from Generations One (G1) and Two (G2)
Plant family
No. of G1 contaminant
% of G1 contaminant
No. of G2 contaminant
% of G2 contaminant
Aizoaceae 1 0.13 —
Amaranthaceae 255 31.95 1 0.20
Arecaceae 6 0.75 —
Asteraceae 2 0.25
Brassicaceae 11 1.38 6 1.19
Cactaceae 1 0.13 —
Caryophyllaceae 2 0.25
Chenopodiaceae 3 0.38
Cleomaceae 2 0.25 5 0.99
Curcubitaceae 8 1.00 4 0.79
Euphorbiaceae 329 41.23 331 65.67
Fabaceae 16 2.00 10 1.98
Lamiaceae 1 0.13
Lentibulariaceae 2 0.25
Lythraceae — 1 0.20
Malvaceae 19 2.38 25 4.96
Moraceae — 6 1.19
Musaceae 3 0.38 —
Myrtaceae 4 0.50 44 8.73
Nelumbonaceae 2 0.25 6 1.19
Oleaceae 2 0.25 —
Onagraceae — 3 0.60
Pedaliaceae 30 3.76
Phrymaceae 33 4.14
Poaceae 1 0.13 1 0.20
Rhizophoraceae — 1 0.20
Rosaceae 14 1.75 17 3.37
Rubiaceae 6 0.75 —
Rutaceae 5 0.63 10 1.98
Salicaceae 18 2.26 22 4.37
Solanaceae 11 1.38 8 1.59
Vitaceae 11 1.38 3 0.60
... Previous studies have shown that in G2, mature 5th instar larvae remain inactive and in diapause, through the months of July and August (dry season), and that nesting activities are also drastically reduced during this period [14]. Although, the exact environmental cue that initiates diapause in T. diversipes is still unknown, previous studies have reported that there are substantial changes in the floral resources used by this bee, depending on the season [15,16]. These differences in food provisioning may be the environmental stimulus responsible for the diapause response, but this is still speculative. ...
... In the case of divergence between these two databases, the Swiss-Prot output prevailed since this database is manually curated and is therefore considered more reliable [28]. Transcripts annotated as potential contaminants (acari, bacteria, fungi, virus and plants) were removed according to [15]. ...
... In the larval phase, diapause is characterized by a reduction in ecdysteroid production by the prothoracic gland [5] and the low expression of these genes may be directly related to these processes. On the other hand, it has been demonstrated that the NPC2 gene is negatively regulated by juvenile hormone (JH) [54], which at high levels leads to low [34], 12 [49], 13 [73], 14 [101], 15 [68], 16 [102], 17 [103], 18 [104], 19 [33], 20 [44], 21 [105] levels of NPC2 expression. This indicates that the JH may be involved in preventing the molting process from larva to pupa in T. diversipes, as was reported to occur in diapause larvae of Diatreae grandiosella (Lepidoptera) [55]. ...
Full-text available
Background: Diapause is a natural phenomenon characterized by an arrest in development that ensures the survival of organisms under extreme environmental conditions. The process has been well documented in arthropods. However, its molecular basis has been mainly studied in species from temperate zones, leaving a knowledge gap of this phenomenon in tropical species. In the present study, the Neotropical and solitary bee Tetrapedia diversipes was employed as a model for investigating diapause in species from tropical zones. Being a bivoltine insect, Tetrapedia diversipes produce two generations of offspring per year. The first generation, normally born during the wet season, develops faster than individuals from the second generation, born after the dry season. Furthermore, it has been shown that the development of the progeny, of the second generation, is halted at the 5th larval instar, and remains in larval diapause during the dry season. Towards the goal of gaining a better understanding of the diapause phenomenon we compared the global gene expression pattern, in larvae, from both reproductive generations and during diapause. The results demonstrate that there are similarities in the observed gene expression patterns to those already described for temperate climate models, and also identify diapause-related genes that have not been previously reported in the literature. Results: The RNA-Seq analysis identified 2275 differentially expressed transcripts, of which 1167 were annotated. Of these genes, during diapause, 352 were upregulated and 815 were downregulated. According to their biological functions, these genes were categorized into the following groups: cellular detoxification, cytoskeleton, cuticle, sterol and lipid metabolism, cell cycle, heat shock proteins, immune response, circadian clock, and epigenetic control. Conclusion: Many of the identified genes have already been described as being related to diapause; however, new genes were discovered, for the first time, in this study. Among those, we highlight: Niemann-Pick type C1, NPC2 and Acyl-CoA binding protein homolog (all involved in ecdysteroid synthesis); RhoBTB2 and SASH1 (associated with cell cycle regulation) and Histone acetyltransferase KAT7 (related to epigenetic transcriptional regulation). The results presented here add important findings to the understanding of diapause in tropical species, thus increasing the comprehension of diapause-related molecular mechanisms.
... Transcripts with significant blast hits (e-value < 0.00001) against possible contaminants (i.e., transcripts with blast hits against plants, algae, unicellular organisms, bacteria, and fungi) in the UniProt Reference Clusters (UniRef90) were removed from the dataset using the same protocol as described in Araujo et al. (2016). Lastly, transcripts identified as possible contaminants in the initial filtering of the NCBI transcriptomic database (TSA) were also removed or trimmed according to the contamination report. ...
In termites with a true worker caste, the development pattern splits up from early developmental stages: primary reproductives develop through the nymphal line, whereas workers and soldiers follow the apterous developmental line. In some species, such as Cavitermes tuberosus Emerson (Isoptera: Termitinae), secondary reproductives (or neotenics) may also develop through the nymphal line from a transitional stage called aspirant. Aspirants originate mostly from automictic parthenogenetic reproduction. Therefore, C. tuberosus queens originate from sexual (primary queens) or parthenogenetic (neotenic queens) reproduction. A comparison of these two queen castes offers the possibility to better understand core molecular underpinnings of caste development and plasticity in termites. We investigated these molecular mechanisms by using high‐throughput Illumina RNA sequencing of pooled individuals. We first assembled the de novo reference transcriptome of C. tuberosus, and then identified the transcripts consistently co‐expressed across castes, sexes, and two alternative routes to female reproduction. Cavitermes tuberosus final transcriptome had 130 874 transcripts, N50 of 3398, and total length of 213 549 184 bp. We found that female reproductive maturation was characterized by gene expression down‐regulation: primary queens expressed fewer transcripts overall and had the greatest number of down‐regulated transcripts when compared to all other castes. In both secondary and primary queens, biological processes involved in muscle development and contraction, flight, and olfactory learning were enriched in the down‐regulated gene cluster. In contrast, processes related to reproductive development, insulin receptor signaling pathway, isoprenoid biosynthesis, and multiple metabolic processes were enriched among up‐regulated genes in primary queens. Finally, we found that 17% of all transcripts (21852) were differently co‐expressed when females from sexual and parthenogenetic origins were compared, even though the expression profile of core reproductive‐related gene clusters showed a similar trend in all reproductive females despite their origin. Our findings fit the genomic imprinting model predictions of a maternal effect that commonly regulates the expression of core reproductive genes in females from parthenogenetic and non‐parthenogenetic origins, whereas the expression of non‐reproductive genes varies.
... contains supplementary material, which is available to authorized users. ecological, and evolutionary studies (Alves-dos- Santos et al. 2002;Cordeiro et al. 2011Cordeiro et al. , 2019Menezes et al. 2012;Pinto et al. 2015;Rocha-Filho and Garófalo 2015;Araujo et al. 2016Araujo et al. , 2017Santos et al. 2018). However, despite the studies already performed for the species, the population dynamics, brood relatedness, mating system, and nest occupation are not yet fully understood. ...
Tetrapedia diversipes is an oil-collecting solitary species widely distributed in the Neotropical region. The high incidence of this species in trap nests makes it potentially a model species for ecological, genetic, evolutionary, and physiological studies. Here, we performed a population genetic study by using mitochondrial and microsatellite markers. This study aimed to investigate the genetic diversity, gene flow among populations, and offspring relatedness within the nest. The results indicated female philopatry and gene flow to be mainly mediated by males. Data on offspring relatedness suggested that nest owner replacement is common in this species.
... Final transcriptomes were then annotated with Annocript (version 1.2, Musacchia et al. 2015) using the UniProt Reference Clusters (UniRef) database (version February 2016; Suzek et al. 2015). Transcripts with significant blast hits (E value < 1e−5) against possible contaminants (plants, fungus, mites, and bacteria) in UniRef were removed from the final dataset and were also used to identify other contaminants based on cluster analysis from Corset, as described in Araujo et al. (2016). Quality assessments of the final assembly were performed with QUAST ( ...
Full-text available
In animals, voltinism is a result of evolutionary adaptations to environmental conditions. These evolutionary adaptations may profoundly affect the population structure and social organization level. To study the bivoltinism of the solitary bee Tetrapedia diversipes, we performed comparative transcriptomics analyses of foundresses and larvae from the two reproductive generations (G1 and G2) produced per year by this bee. Most of the differentially expressed genes (DEGs) were found between foundresses: 52 DEGs between adults, but only one between the larvae. Among the DEGs in foundresses, 46 were higher expressed in G1 and most of them (38) have no functional annotation defined in the database. Interestingly, mitochondrial genes and long non-coding RNAs were the only type of identified transcripts in the set of upregulated genes. These results highlight the importance of developing studies on non-model species and suggest that maternal genes may be of importance for determining larval diapause in T. diversipes.
Full-text available
The eukaryotic transcriptome is composed of thousands of coding and long noncoding RNAs (lncRNAs). However, we lack a software platform to identify both RNA classes in a given transcriptome. Here we introduce Annocript, a pipeline which combines the annotation of protein coding transcripts with the prediction of putative lncRNAs in whole transcriptomes. It downloads and indexes the needed databases, runs the analysis and produces human readable and standard outputs together with summary statistics of the whole analysis. Availability: Annocript is distributed under the GNU General Public License (version 3 or later) and is freely available at © The Author (2015). Published by Oxford University Press. All rights reserved. For Permissions, please email:
Full-text available
Nesting and use of pollen resources by Tetrapedia diversipes Klug (Apidae) in Atlantic Forest areas (Rio de Janeiro, Brazil) in different stages of regeneration. The nesting in trap-nests and use of pollen sources in larval food by Tetrapedia diversipes Klug, 1810 (Apidae) was compared between regenerating areas of Atlantic Forest. The study was conducted between April 2008 and October 2009 at União Biological Reserve, Rio de Janeiro, Brazil. T. diversipes nested in 66 trap-nests and showed a peak of nesting during the months of highest rainfall. The most frequent pollen type in brood cells during the wet season was Dalechampia sp. 1. During the dry season, the type Ludwigia sp. was the most frequent, followed by Dalechampia sp. 2. The high frequency of Dalechampia and Ludwigia species in the larval food, observed in both habitats and in the two seasons could be considered relevant for T. diversipes, suggesting highly selective diet based primarily on two plant species unrelated, but similar in size of pollen grains.
Full-text available
Some groups of bees collect oil from flowers and use this product to feed the larvae and to line the nests and brood cells, as is the case for bees of the Neotropical genus Tetrapedia (Tetrapediini, Apidae). They are solitary and construct their nests on pre-existing cavities in wood. Aiming to bring a better understanding of the oil collecting structures of Neotropical oil bees, in this study we examined the foreleg morphology of female of Tetrapedia diversipes Klug, showing on SEM the adaptations of forebasitarsus for collecting oil from flowers. The metasoma of female bees was measured and dissected using stereomicroscope and the size and shape of the Dufour's gland were estimated. T. diversipes hold a curved comb on the basitarsus of the front leg to collect oil and a mixture of slender and branched hairs on the scopa of the hind leg to transport it. These structures are very similar on other examined Tetrapedia species. The Dufour's gland of T. diversipes is reduced, occupying about 2.2% of the metasoma. Further investigation of the chemical composition of the Dufour's gland secretion, of the cell lining and of the collected floral oil might clarify the role of these components on T. diversipes' life.
Notes on the Morphology of Tetrapedia diversipes Klug 1810 (Tetrapediini, Apidae), an Oil-collecting bee
  • I Alves-Dos-Santos
  • Src Naxara
Alves-dos-Santos I, Naxara SRC, and Patrício EFLRA. (2006). Notes on the Morphology of Tetrapedia diversipes Klug 1810 (Tetrapediini, Apidae), an Oil-collecting bee. Braz J Morphol Sci 23, 425-430.
Nesting and use of pollen resources by Tetrapedia diversipes Klug (Apidae) in Atlantic Forest areas (Rio de Janeiro, Brazil) in different stages of regeneration
  • G B Menezes
  • V Gonçalves-Esteves
  • Emaf Bastos
  • S C Augusto
  • M C Gaglianone
Menezes GB, Gonçalves-Esteves V, Bastos EMAF, Augusto SC, and Gaglianone MC. (2012). Nesting and use of pollen resources by Tetrapedia diversipes Klug (Apidae) in Atlantic Forest areas (Rio de Janeiro, Brazil) in different stages of regeneration. Rev Bras Entomol 56, 86-94.