Metagenomic study of the oral microbiota by Illumina high-throughput sequencing.
ABSTRACT To date, metagenomic studies have relied on the utilization and analysis of reads obtained using 454 pyrosequencing to replace conventional Sanger sequencing. After extensively scanning the 16S ribosomal RNA (rRNA) gene, we identified the V5 hypervariable region as a short region providing reliable identification of bacterial sequences available in public databases such as the Human Oral Microbiome Database. We amplified samples from the oral cavity of three healthy individuals using primers covering an approximately 82-base segment of the V5 loop, and sequenced using the Illumina technology in a single orientation. We identified 135 genera or higher taxonomic ranks from the resulting 1,373,824 sequences. While the abundances of the most common phyla (Firmicutes, Proteobacteria, Actinobacteria, Fusobacteria and TM7) are largely comparable to previous studies, Bacteroidetes were less present. Potential sources for this difference include classification bias in this region of the 16S rRNA gene, human sample variation, sample preparation and primer bias. Using an Illumina sequencing approach, we achieved a much greater depth of coverage than previous oral microbiota studies, allowing us to identify several taxa not yet discovered in these types of samples, and to assess that at least 30,000 additional reads would be required to identify only one additional phylotype. The evolution of high-throughput sequencing technologies, and their subsequent improvements in read length enable the utilization of different platforms for studying communities of complex flora. Access to large amounts of data is already leading to a better representation of sample diversity at a reasonable cost.
- SourceAvailable from: Dawn Bowdish[Show abstract] [Hide abstract]
ABSTRACT: The upper respiratory tract (URT) is a crucial site for host defense, as it is home to bacterial communities that both modulate host immune defense and serve as a reservoir of potential pathogens. Young children are at high risk of respiratory illness, yet the composition of their URT microbiota is not well understood. Microbial profiling of the respiratory tract has traditionally focused on culturing common respiratory pathogens, whereas recent culture-independent microbiome profiling can only report the relative abundance of bacterial populations. In the current study, we used both molecular profiling of the bacterial 16S rRNA gene and laboratory culture to examine the bacterial diversity from the oropharynx and nasopharynx of 51 healthy children with a median age of 1.1 years (range 1-4.5 years) along with 19 accompanying parents. The resulting profiles suggest that in young children the nasopharyngeal microbiota, much like the gastrointestinal tract microbiome, changes from an immature state, where it is colonized by a few dominant taxa, to a more diverse state as it matures to resemble the adult microbiota. Importantly, this difference in bacterial diversity between adults and children accompanies a change in bacterial load of three orders of magnitude. This indicates that the bacterial communities in the nasopharynx of young children have a fundamentally different structure from those in adults and suggests that maturation of this community occurs sometime during the first few years of life, a period that includes ages at which children are at the highest risk for respiratory disease.The ISME Journal advance online publication, 9 January 2015; doi:10.1038/ismej.2014.250.The ISME Journal 01/2015; · 9.27 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: The oral microbiome is one of most diversity habitat in the human body and they are closely related with oral health and disease. As the technique developing, high-throughput sequencing has become a popular approach applied for oral microbial analysis. Oral bacterial profiles have been studied to explore the relationship between microbial diversity and oral diseases such as caries and periodontal disease. This review describes the application of high-throughput sequencing for characterization of oral microbiota and analyzing the changes of the microbiome in the states of health or disease. Deep understanding the knowledge of microbiota will pave the way for more effective prevent dentistry and contribute to the development of personalized dental medicine.Frontiers in Microbiology 10/2014; 5:508. · 3.94 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Effective and sensitive monitoring of human pathogenic bacteria in municipal wastewater treatment is important not only for managing public health risk related to treated wastewater reuse, but also for ensuring proper functioning of the treatment plant. In this study, 3 different 16S rRNA gene molecular analysis methodologies were employed to screen bacterial pathogens in samples collected at 3 different stages of an activated sludge plant. Overall bacterial diversity was analyzed using next generation sequencing (NGS) on the Illumina MiSeq platform, as well as PCR-DGGE followed by band sequencing. In addition, a microdiversity analysis was conducted using PCR-DGGE, targeting Escherichia coli. Bioinformatics analysis was performed using QIIME protocol by clustering sequences against the Human Pathogenic Bacteria Database. NGS data were also clustered against the Greengenes database for a genera-level diversity analysis. NGS proved to be the most effective approach screening the sequences of 21 potential human bacterial pathogens, while the E. coli microdiversity analysis yielded one (O157:H7 str. EDL933) out of the 2 E. coli strains picked up by NGS. Overall diversity using PCR-DGGE did not yield any pathogenic sequence matches even though a number of sequences matched the NGS results. Overall, sequences of Gram-negative pathogens decreased in relative abundance along the treatment train while those of Gram-positive pathogens increased.Environmental Science & Technology 09/2014; 48:11610-11619. · 5.48 Impact Factor
Metagenomic study of the oral microbiota by Illumina high-throughput sequencing
Vladimir Lazarevica,⁎,1, Katrine Whitesona,⁎,1, Susan Huseb, David Hernandeza, Laurent Farinellic,
Magne Østeråsc, Jacques Schrenzela, Patrice Françoisa
aGenomic Research Laboratory, Geneva University Hospitals, Rue Gabrielle-Perret-Gentil 4, CH-1211 Geneva 14, Switzerland
bJosephine Bay Paul Center, Marine Biological Laboratory, Woods Hole, MA 02543, USA
cFasteris, Ch. du Pont-du-Centenaire 109, Case postale 28, CH-1228 Plan-les-Ouates, Switzerland
a b s t r a c t a r t i c l ei n f o
Received 11 September 2009
Accepted 14 September 2009
Available online 29 September 2009
To date, metagenomic studies have relied on the utilization and analysis of reads obtained using 454
pyrosequencing to replace conventional Sanger sequencing. After extensively scanning the 16S ribosomal
RNA (rRNA) gene, we identified the V5 hypervariable region as a short region providing reliable
identification of bacterial sequences available in public databases such as the Human Oral Microbiome
Database. We amplified samples from the oral cavity of three healthy individuals using primers covering an
~82-base segment of the V5 loop, and sequenced using the Illumina technology in a single orientation. We
identified 135 genera or higher taxonomic ranks from the resulting 1,373,824 sequences. While the
abundances of the most common phyla (Firmicutes, Proteobacteria, Actinobacteria, Fusobacteria and TM7) are
largely comparable to previous studies, Bacteroidetes were less present. Potential sources for this difference
include classification bias in this region of the 16S rRNA gene, human sample variation, sample preparation
and primer bias. Using an Illumina sequencing approach, we achieved a much greater depth of coverage than
previous oral microbiota studies, allowing us to identify several taxa not yet discovered in these types of
samples, and to assess that at least 30,000 additional reads would be required to identify only one additional
phylotype. The evolution of high-throughput sequencing technologies, and their subsequent improvements
in read length enable the utilization of different platforms for studying communities of complex flora. Access
to large amounts of data is already leading to a better representation of sample diversity at a reasonable cost.
© 2009 Elsevier B.V. All rights reserved.
Oral health, which is strongly influenced by oral microbiota, has a
significant impact on general health. The bacterial community in the
mouth contains species that promote health, and others that contri-
bute to illness. Recent studies have shown that poor oral hygiene and/
or the presence of particular species in the mouth may be associated
with periodontitis, respiratory infection and intestinal disease (Avila
et al., 2009; Kuehbacher et al., 2008; Raghavendran et al., 2007). In
addition, the salivary microbiota may be used as a diagnostic marker
for cancer (Mager et al., 2005) and periodontal disease (Faveri et al.,
2008) as well as to provide insights into human population studies
(Nasidze et al., 2009a). Understanding which species are present and
how the community is composed in healthy adults is the first step
towards understanding how changes can lead to disease.
Experts have recently raised the hypothesis that in some chronic
diseases, the “pathogen” might be a disturbed microbial community
rather than a single organism (Friedrich, 2008). Understanding the
contribution of “behind-the-scenes” species which influence the
pathogenicity of other species has already led to important changes
in treatment strategies (Sibley et al., 2006). These unexpected
interactions are changing how microbiologists think about causation
of infection and disease (Lipkin, 2009).
cavity was limited to those species that could be cultured in the labora-
tory. New sequencing technologies have brought tremendous improve-
ments in automated sequencing and analysis of genome features. Today
around900 completeprokaryoticgenomes arepubliclyavailable(www.
ncbi.nlm.nih.gov/genomes/lproks.cgi) aswellasmorethana million16S
rRNA gene sequences, and several hundred metagenomic datasets. The
1000 species which have been found in the mouth, while a metage-
nomic-based estimate of the diversity is one order of magnitude higher
(Keijser et al., 2008). The availability of these extensive and varied
sequences has opened the way for comparative genomics techniques
(Fraser et al., 2000) for evaluating relatedness and diversity as well as
studying whole viral or bacterial content of various media (Venter et al.,
2004; Williamson et al., 2008) or bacterial infections (Cox-Foster et al.,
2007; Nakamura et al., 2008; Turnbaugh et al., 2009).
Journal of Microbiological Methods 79 (2009) 266–271
⁎ Corresponding authors. Tel.: +41 22 372 93 38; fax: +41 22 372 98 30.
E-mail addresses: email@example.com (V. Lazarevic),
firstname.lastname@example.org (K. Whiteson).
1Both authors contributed equally to this work.
0167-7012/$ – see front matter © 2009 Elsevier B.V. All rights reserved.
Contents lists available at ScienceDirect
Journal of Microbiological Methods
journal homepage: www.elsevier.com/locate/jmicmeth
Here, we evaluate the potential of Illumina high-throughput
sequencing with an unprecedented depth of sequence coverage for
the study of human oral microbiota diversity. We use partial sequences
from the well-characterized and conserved 16S rRNA gene, to enable
classification of bacteria from human oral samples.
2. Materials and methods
2.1. Sampling and DNA extraction
We collected saliva and oropharyngeal samples over a one-week
period from three adult individuals with informed consent. Saliva
samples were collected by expectoration into a sterile plastic 50-ml
tube and kept frozen at −20 °C until processing. We mixed 100µL of
each saliva sample with the same volume of 2× lysis buffer [Tris
20 mM, EDTA 2 mM (pH 8), Tween 1%, proteinase K (Fermentas)
400 μg/ml] and incubated them for 2h at 55 °C (Faveri et al., 2008).
Proteinase K was inactivated by a 10 min incubation at 95 °C and the
samples were frozen at −20 °C.
Dry cotton swabs (Copan) were used to gently swab the poste-
rior wall of the oropharynx. They were directly suspended in a
microtube containing 200 μL of lysis buffer and processed in the
same way as the saliva samples. The saliva and oropharyngeal lys-
ates from all three subjects were mixed in a 1:10 ratio with roughly
equal contributions from the two sampling sites according to PCR
2.2. PCR primers and conditions
We aligned 753 16S rDNA sequences from the Human Oral
Microbiome Database (HOMD, October 2008) using MAFFT (-FFT-NS-
2, v6.531b) (Katoh et al., 2002). We chose primers from the conserved
areas of the alignment flanking the V5 region so as to match most
and 880RDEG (5′-CRTACTHCHCAGGYG) sequences produced 740 and
745 hits, respectively, or 732 (97.2%) of the HOMD sequences. Species
coverage was within the 91–100% range for all HOMD bacterial phyla
except Chloroflexi which is very rare in oral microbiota (Keijser et al.,
2008) and has a single representative in the HOMD.
PCR amplification was carried out in a 50 μL PrimeStar HS Premix
(Takara) containing 5 μL of lysate and 0.5 μM of each forward (784DEG)
and reverse (880RDEG) primer. The samples were run in two separate
PCRsfor15cyclesusingthefollowingparameters:98 °Cfor10 s,46 °Cfor
15 s, and 72 °C for 1 min. The two PCRs were then pooled and phos-
phorylated with polynucleotide kinase and the Illumina paired-ends
adapters were ligated with T4 DNA ligase. After PCR amplification with
was quality controlled by cloning an aliquot into a TOPO plasmid and
capillary sequencing 16 clones. The library was sequenced from the
forward end for 76cycles on the Illumina Genome Analyzer system GAII
coli positions 785to894 includingprimer sequence and topositions 798
to 879 excluding primers.
2.3. Sequence analysis
Base-calling was performed with the GAPipeline 1.3.2 using
standard parameters, which include purity filtering with “chastity
0.6”. We removed sequences containing uncalled bases, incorrect
primer sequence or runs of ≥12 identical nucleotides. Seventy-two-
base sequence reads were trimmed to remove the 13-base forward
primer sequence, yielding 59-base sequences.
We assigned taxonomy to sequences with GAST (Huse et al.,
2008), using a database of reference V5 rDNA sequences (RefHVR_V5)
from SILVA (version 98) (Pruesse et al., 2007), and taxonomy from
known cultured isolates, Entrez Genome projects, the Ribosomal
Database Project [RDP; (Cole et al., 2005)], Greengenes (DeSantis
et al., 2006) and hand curation. GAST compares each tag to the
RefHVR_V5 and aligns it to its nearest neighbors in the database and
then selects the closest reference(s). The taxonomy for the tag is the
lowest common ancestor for a two-thirds majority of all 16S rDNA
sequences associated with the nearest V5 reference sequences.
Before generating clusters of phylotypes, we filtered out all
sequences that occurred fewer than 3 times. This reduced the number
of unique sequences to a computationally manageable level, and
potentially reduced the number oferrorsfrom sequencingand contam-
ination. We created a multiple sequences alignment of the remaining
data using MUSCLE (Edgar, 2004) with parameters -maxiters 2 and
-diags, and generated phylotype clusters and diversity estimates using
MOTHUR (Schloss and Handelsman, 2005).
3. Results and discussion
3.1. Evaluation of the oral microbiota diversity using the V5 region of the
16S rRNA gene
To examine which region of the 16S rRNA gene would be possible
to target with the short Illumina sequencing reads, we extracted
various sections of aligned 16S rDNA sequences available for 753
species in the Human Oral Microbiome Database and submitted them
to the RDP classifier with a 80% confidence cutoff. The entire V5 120-
base region as well as the 59-base segments from its forward end lead
to many fewer unclassified sequences than their V6-region counter-
parts. (Table 1). Therefore, the paired-end data from the ~82-base V5
region we amplified in the current study would provide a means to
capture taxonomic information suitable for studying the microbial
diversity with the Illumina technology, similar to that of the favored
V1–3 and V6 regions which are used when longer sequence reads are
ryngealswabsamples from threeindividuals bytargetingthe16SrDNA
hypervariable V5 regions. Of 1,373,824 obtained reads, 1,237,319
[publicly available at the MG-RAST repository (Meyer et al., 2008)
under ID:4444448.3] passed the quality control. They were clustered in
377,275 distinct sequences most of which (330,815) were unique.
3.2. Taxonomic analysis of the oral microbiota
We analyzed the taxonomic composition and abundance of the
oralbacterial communityusing GAST(Huseet al.,2008), theMG-RAST
server (Meyer et al., 2008) and the RDP classifier (Wang et al., 2007).
RDP's Seqmatch program may also eventually be useful for deter-
mining which sequences in the RDP database are most closely related
to our sequences, it works for sequences as short as 7 bases but only
for 2000 sequences at a time.
The mean RDP confidence level for the six taxonomic levels from
domain to genus was calculated as a function of the sequence abun-
(Fig. 1). This general trend is most likely due to the fact that the
most frequent sequences correspond to known species whose 16S rDNA
sequences are available. Conversely, the rare species include a higher
proportion of 16S rDNA sequences absent or distant from those in the
RDP reference database. In addition, the probability that a sequence
contains an error is expected to be higher in low frequency sequences
(Andersson et al., 2008).
To limit the impact of sequencing errors, we removed all sequences
that occurred less than three times. This new dataset contains 865,540
reads representing 25,978 distinct sequences. We discarded 381 reads
(<0.05%) with a GAST distance that diverged more than 30% from their
nearest reference sequence, leaving 865,159 sequences. For the MG-
RAST analysis with a minimum alignment length of 50 and a maximum
V. Lazarevic et al. / Journal of Microbiological Methods 79 (2009) 266–271
843,827 sequences. The phylogenetic assignments using the RDP
classifier were performed after two additional filtering steps. They
included the removal of sequences that were better classified when
considered as reverse complements and those that had <80%
confidence at the domain level. In this way the number of reads was
reduced to 854,968 (24,757 distinct sequences).
The combined saliva and oropharyngeal swab samples were domi-
nated by the phyla Firmicutes, Proteobacteria, Actinobacteria, Fusobac-
teria, TM7 and Spirochaetes (Table 2), that are also abundant in other
oral samples assessed by means of phyloarrays (Huyghe et al., 2008) or
massivelyparallel pyrosequencing of the 16S rDNA clones oramplicons
(Keijser et al., 2008; Nasidze et al., 2009a). Their proportions, however,
less than 80% cannot be trusted. The MG-RAST server and RDP classifer
returned a higher fraction of unclassified bacteria, likely because they
are not well designed for such short sequences. This may explain the
lower content of major phyla relative to that generated by GAST which
was designed specifically for short tag sequences. Proteobacteria is an
exception since their abundances were similar with the three
classification tools. Therefore, the RDP- and MG-RAST-based classifica-
tion of V5 rDNA sequences appeared to be more sensitive for Proteo-
bacteriathanfor other major phyla.Indeed, theRDP classification of the
HOMD 16S rDNA sequences showed that the relative abundance of
Proteobacteria, in contrast to those of other major phyla, was not
reduced when using 59-base V5 sequences instead of their full length
counterparts (Table 1).
The ability toidentify taxa from class down tothe genus levelvaried
between phyla and was dependent on the classification approach
(Fig. 2). For the six major phyla, GAST generated the highest proportion
of reads placed at these levels of taxonomy. Fusobacteria and
Spirochaetes had thelargestproportion of reads that can beconfidently
placed at the genus level using all three classification approaches. This
proportion was the lowest for Proteobacteria despite their robust
classification at the phylum level (see above).
Some consider organisms with more than 1.3% sequence differ-
ence in 16S rDNA sequence (based on the full-length) to belong to
different species (Stackebrandt and Ebers, 2006). Since a single
nucleotide difference in a 59-base-long sequence corresponds to a
1.7% resolution, there may be more than 25,000 species-level
phylotypes in our dataset (Fig. 3). For a more conservative estimate
of species-level phylotypes, we used a cutoff of 3% corresponding to a
2-base resolution (Stackebrandt and Goebel, 1994) to create clusters
Fig. 1. Average confidence level for the six taxonomic levels as a function of sequence
Comparison of oral phyla abundance obtained using different classification tools.
Phylum Percentage of total sequences classified
(Keijser et al., 2008)
NR, not reported.
a≤30% divergent from their nearest reference sequence.
bMaximum e-value 10−10and minimum alignment length of 50 nucleotides.
c80% confidence cutoff applied at the phylum level.
RDP Classification of aligned segments of the 16S rRNA genes from the 753 sequences in the Human Oral Microbiome Database, using 80% confidence level cutoffs with the RDP
Percentage of sequences classified for different 16S rDNA regions
Position in 16S rDNAa
Sequence length (nt)
aNumbering corresponds to the E. coli 16S rRNA gene sequence.
bF, from the forward end; R, from the reverse end.
V. Lazarevic et al. / Journal of Microbiological Methods 79 (2009) 266–271
of sequences. There are at least 8,000 different phylotypes at the 3%
level. This will be an underestimate since we removed all sequences
occurring less than three times prior to analysis. These filtered
sequences would include valid but rare organisms as well as many
We used rarefaction analysis to determine the microbial diversity
recovery in the filtered dataset. The rarefaction curve is very stable at
~8000 (Fig. 3), suggesting that the sampling completeness is high —
at least 30,000 additional reads would be required to discover a
new unique phylotype, and more than 120,000 additional reads
would be required to discover a new 3% phylotype. The removal of
unique sequences impacts the rarefaction curve, and may underes-
timate the potential for new species detection in human saliva
samples. However, sampling is sufficient among the sequences likely
to be prevalent in human saliva because they were found at least 3
A total of 135 genera or next higher taxonomic ranks were
identified by GAST (Appendix A). The most frequent genera were
Neisseria and Streptococcus, constituting about 70% of the sequences.
Thirty-four taxa have not been identified in previous studies of oral
microbiotas (Keijser et al., 2008; Nasidze et al., 2009a,b) and are not
listed in the Human Oral Microbiome Database. They include some
low-abundance genera as well as putative members of the candidate
divisions BRC1, OP10, OP3. The MG-RAST server also identified BRC1
and OP10 sequences.
The observed relative low abundance in Bacteroidetes in our data
compared to previous studies may be accounted for by many factors
sample size, as well as potential bias in lysis, amplification and
classification. Indeed, it has been shown that some of the “frequent”
species are absent in some individuals (Aas et al., 2005). Good oral
hygieneis known to decreasethe proportion of Gram-negative bacteria
including some Bacteroidetes species. The amplification bias has been
series of fecalsamples (Anderssonet al.,2008).Thisisunlikely tobethe
case in our study since the PCR primers used cover 104 of 107 (97%)
Bacteroidetes species listed in the HOMD.
To the best of our knowledge this is the first metagenomic study
based on the utilization of the Illumina high-throughput sequencing
allowing for more in-depth coverage than the competing technologies.
time-points and samples, and better assessment of total diversity in
whether a core community of bacteria is common to most humans,
making the less common species important to understanding human
health and disease (Hamady and Knight, 2009).
The advantage of generating and sequencing short 16S rDNA
amplicons for bacterial community analysis is that it reduces the
likelihood of generating chimera and increases the likelihood of
detecting low-abundance taxa (Huber et al., 2009). Moreover, reads of
100–200 bases obtained with carefully chosen amplification primers
can yield the same clustering as long 16S rDNA sequences (Liu et al.,
There is a concern that short read length may compromise the
the V5 region of the 16S rDNA, taxonomic assessment at the phylum
based on the Illumina technology will be improved by paired-end reads
(Table 1) which are expected not only to generate longer sequences
but also to increase the sequence quality. Since the probability of a
sequencing error increases with the read length (Qu et al., 2009),
partially overlapping complementary reads of the same amplicon may
help to predict sequencing errors and aid the removal of ambiguous
reads or parts of reads.
Fig. 2. Proportions of taxonomic assignments under the phylum level. Reads assigned to
each of the four taxonomic levels for each major phylum are represented by bars. Their
height represent the percentage of reads that can be placed at a given level of taxonomy
using GAST, the MG-RAST server and the RDP Classifier.
V. Lazarevic et al. / Journal of Microbiological Methods 79 (2009) 266–271
This work was supported by grants from the Swiss National
Science Foundation 3100A0-112370/1 (JS) and 3100A0-116075 (PF),
and United States National Institutes of Health grant UH2DK083993
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in
the online version, at 10.1016/j.mimet.2009.09.012.
Aas, J.A., Paster, B.J., Stokes, L.N., Olsen, I., Dewhirst, F.E., 2005. Defining the normal
bacterial flora of the oral cavity. J. Clin. Microbiol. 43, 5721–5732.
Andersson, A.F., Lindberg, M., Jakobsson, H., Backhed, F., Nyren, P., Engstrand, L., 2008.
Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS
ONE 3, e2836.
Avila, M., Ojcius, D.M., Yilmaz, O., 2009. The oral microbiota: living with a permanent
guest. DNA Cell Biol. 28, 405–411.
Cole, J.R., Chai, B., Farris, R.J., Wang, Q., Kulam, S.A., McGarrell, D.M., Garrity, G.M., Tiedje,
J.M., 2005. The Ribosomal Database Project (RDP-II): sequences and tools for high-
throughput rRNA analysis. Nucleic Acids Res. 33, D294–D296.
Cox-Foster, D.L., Conlan, S., Holmes, E.C., Palacios, G., Evans, J.D., Moran, N.A., Quan, P.L.,
Briese, T., Hornig, M., Geiser, D.M., Martinson, V., vanEngelsdorp, D., Kalkstein, A.L.,
Lipkin, W.I., 2007. A metagenomic survey of microbes in honey bee colony collapse
disorder. Science 318, 283–287.
DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T.,
Dalevi, D., Hu, P., Andersen, G.L., 2006. Greengenes, a chimera-checked 16S rRNA
gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72,
Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high
throughput. Nucleic Acids Res. 32, 1792–1797.
Faveri, M., Mayer, M.P., Feres, M., de Figueiredo, L.C., Dewhirst, F.E., Paster, B.J., 2008.
Microbiological diversity of generalized aggressive periodontitis by 16S rRNA
clonal analysis. Oral Microbiol. Immunol. 23, 112–118.
Fraser, C.M., Eisen, J., Fleischmann, R.D., Ketchum, K.A., Peterson, S., 2000. Comparative
genomics and understanding of microbial biology. Emerg. Infect. Dis. 6, 505–512.
Friedrich, M.J., 2008. Microbiome project seeks to understand human body's
microscopic residents. JAMA 300, 777–778.
Hamady, M., Knight, R., 2009. Microbial community profiling for human microbiome
projects: tools, techniques, and challenges. Genome Res. 19, 1141–1152.
Huber, J.A., Morrison, H.G., Huse, S.M., Neal, P.R., Sogin, M.L., Mark Welch, D.B., 2009.
Effect of PCR amplicon size on assessments of clone library microbial diversity and
community structure. Environ. Microbiol. 11, 1292–1302.
Huse, S.M., Dethlefsen, L., Huber, J.A., Welch, D.M., Relman, D.A., Sogin, M.L., 2008.
Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag
sequencing. PLoS Genet. 4, e1000255.
Huyghe, A., Francois, P., Charbonnier, Y., Tangomo-Bento, M., Bonetti, E.J., Paster, B.J.,
Bolivar, I., Baratti-Mayer, D., Pittet, D., Schrenzel, J., 2008. Novel microarray design
strategy to study complex bacterial communities. Appl. Environ. Microbiol. 74,
Katoh, K., Misawa, K., Kuma, K., Miyata, T., 2002. MAFFT: a novel method for rapid
multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30,
Keijser, B.J., Zaura, E., Huse, S.M., van der Vossen, J.M., Schuren, F.H., Montijn, R.C., ten
Cate, J.M., Crielaard, W., 2008. Pyrosequencing analysis of the oral microflora of
healthy adults. J. Dent. Res. 87, 1016–1020.
Kuehbacher, T., Rehman, A., Lepage, P., Hellmig, S., Folsch, U.R., Schreiber, S., Ott, S.J.,
2008. Intestinal TM7 bacterial phylogenies in active inflammatory bowel disease.
J. Med. Microbiol. 57, 1569–1576.
Lipkin, W.I., 2009. Microbe hunting inthe21stcentury.Proc. Natl. Acad. Sci. USA 106,6–7.
Liu, Z., Lozupone, C., Hamady, M., Bushman, F.D., Knight, R., 2007. Short pyrosequencing
reads suffice for accurate microbial community analysis. Nucleic Acids Res. 35,
Mager,D.L.,Haffajee, A.D., Devlin, P.M., Norris,C.M., Posner,M.R., Goodson,J.M., 2005.The
salivary microbiota as a diagnostic indicator of oral cancer: a descriptive, non-
randomized studyof cancer-free and oral squamous cell carcinoma subjects. J.Transl.
Med. 3, 27.
Meyer, F., Paarmann, D., D'Souza, M., Olson, R., Glass, E.M., Kubal, M., Paczian, T.,
Rodriguez, A., Stevens, R., Wilke, A., Wilkening, J., Edwards, R.A., 2008. The
metagenomics RAST server — a public resource for the automatic phylogenetic and
functional analysis of metagenomes. BMC Bioinformatics 9, 386.
Nakamura, S., Maeda, N., Miron, I.M., Yoh, M., Izutsu, K., Kataoka, C., Honda, T.,
Yasunaga, T., Nakaya, T., Kawai, J., Hayashizaki, Y., Horii, T., Iida, T., 2008.
Metagenomic diagnosis of bacterial infections. Emerg. Infect. Dis. 14, 1784–1786.
Nasidze, I., Li, J., Quinque, D., Tang, K., Stoneking, M., 2009a. Global diversity in the
human salivary microbiome. Genome Res. 19, 636–643.
Nasidze, I., Quinque, D., Li, J., Li, M., Tang, K., Stoneking, M., 2009b. Comparative analysis
of human saliva microbiome diversity by barcoded pyrosequencing and cloning
approaches. Anal. Biochem. 391, 64–68.
Pruesse, E., Quast, C., Knittel, K., Fuchs, B.M., Ludwig, W., Peplies, J., Glockner, F.O., 2007.
SILVA: a comprehensive online resource for quality checked and aligned ribosomal
RNA sequence data compatible with ARB. Nucleic Acids Res. 35, 7188–7196.
Qu, W., Hashimoto, S., Morishita, S., 2009. Efficient frequency-based de novo short-read
clustering for error trimming in next-generation sequencing. Genome Res. 19,
Raghavendran, K., Mylotte, J.M., Scannapieco, F.A., 2007. Nursing home-associated
pneumonia, hospital-acquired pneumonia and ventilator-associated pneumonia:
the contribution of dental biofilms and periodontal inflammation. Periodontol.
2000 (44), 164–177.
Schloss, P.D., Handelsman, J., 2005. Introducing DOTUR, a computer program for
defining operational taxonomic units and estimating species richness. Appl.
Environ. Microbiol. 71, 1501–1506.
Sibley, C.D., Rabin, H., Surette, M.G., 2006. Cystic fibrosis: a polymicrobial infectious
disease. Future Microbiol. 1, 53–61.
Fig. 3. Rarefaction analysis of the oral metagenome. The curves include only sequences which occur 3 or more times. The number of OTUs with different cutoff values was plotted as a
function of the number of sequences sampled. OTUs with ≥97%, ≥95% and ≥90% pairwise sequence identity are arbitrarily assumed to form the same species, genus and family,
V. Lazarevic et al. / Journal of Microbiological Methods 79 (2009) 266–271