Content uploaded by Craig S Criddle
Author content
All content in this area was uploaded by Craig S Criddle
Content may be subject to copyright.
Available via license: CC BY 4.0
Content may be subject to copyright.
Laurel D. Crosby and Craig
S. Criddle
Stanford University, Stanford,
CA, USA
ABSTRACT
Molecular tools based on rRNA (rrn)
genes are valuable techniques for the study
of microbial communities. However, the
presence of operon copy number hetero-
geneity represents a source of systematic er-
ror in community analysis. To understand
the types and magnitude of such bias, four
commonly used rrn-based techniques were
used to perform an in silico analysis of a
hypothetical community comprised organ-
isms from the Comprehensive Microbial Re-
source database. Community profiles were
generated, and diversity indices were calcu-
lated for length heterogeneity PCR, auto-
mated ribosomal intergenic spacer analy-
sis, denaturing gradient gel electrophoresis,
and terminal RFLP (using RsaI, MspI, and
HhaI). The results demonstrate that all
techniques present a quantitative bias to-
ward organisms with higher copy numbers.
In addition, techniques may underestimate
diversity by grouping similar ribotypes or
overestimate diversity by allowing multiple
signals for one organism. The results of this
study suggest a degree of caution should be
used when interpreting rrn-based communi-
ty analysis techniques.
INTRODUCTION
Microbial ecology addresses a vari-
ety of issues that range from the
function of a single population to the
myriad interactions of complex com-
munities. Researchers in this field have
the added the challenge of scope be-
cause microbial diversity and commu-
nity dynamics must be inferred using
indirect methods. Unfortunately, the
tools for discriminating and measuring
microbial populations are far from
ideal. One of the challenges is that mi-
crobial communities are exceedingly
diverse, with estimates suggesting be-
tween 4000 and 10 000 different micro-
bial genomes per gram of soil or sedi-
ment (1,2). These estimates are based
on DNA-DNA re-annealing curves, yet
traditional isolation techniques have re-
covered only a small fraction of this es-
timated diversity (3–5). Limitations to
culturing include the inability to predict
the proper culture medium to select un-
known organisms, and the propensity
of fast-growing organisms to outgrow
and overshadow the more relevant or-
ganisms that grow more slowly. The
development of molecular approaches
to community analysis have circum-
vented the need for cultivation because
phylogenetically informative DNA se-
quences can be directly screened from
the environment.
The most widely used techniques
for organism identification and com-
munity analysis include those based on
16S rRNA (rrn) genes because of the
quality of phylogenetic information,
rapid and straightforward procedures,
and large databases of sequence infor-
mation. Despite the advantages of ribo-
somal DNA sequence analysis for stud-
ies of bacterial isolates, limitations ex-
ist for using rRNA genes to analyze
mixed communities (6–9). Problems
arise as a result of organisms that have
variable numbers of copies of the rrn
operon (10,11) and sequence hetero-
geneity between operons (12). While
intracistronic heterogeneity has been
cited as a source of “noise” for deter-
mining the phylogenetic rank of an iso-
late (12), the influence of rrn operon
copy number and sequence heterogene-
ity on community analysis techniques
is much more serious. The problem is
that microbial communities, in almost
all instances, are mixtures of unknown
organisms with unknown numbers of
copies of the rrn operon.
Techniques for the rapid evaluation
of community diversity have several
features in common. First, total genom-
ic DNA is extracted from an environ-
mental sample, and sequences of 16S
genes or intergenic spacer regions are
copied and amplified above the back-
ground of the genome using PCR.
These copies of DNA are then subject-
ed to various discrimination methods,
including electrophoretic separation of
fragments based on length, melting
temperature, or restriction fragment
lengths. The number of different elec-
trophoretic bands or peaks in the analy-
sis serves as a proxy for diversity, as
the different ribosome types (ribotypes)
are considered unique to a group of or-
ganisms. [Note that the term “ribotype”
is used here in terms of a class of RNA,
as opposed to the ribotype theory of the
origin of life (13).] For all techniques,
Research Report
2 BioTechniques Vol. 34, No. 4 (2003)
Understanding Bias in Microbial Community
Analysis Techniques due to rrn Operon Copy
Number Heterogeneity
BioTechniques 34:__-__ (April 2003)
the signal intensity of a particular peak
reflects the number of copies of the
DNA fragment that contribute to that
peak. Unfortunately, the presence of
variable numbers of operons for organ-
isms in diverse communities leads to a
mixture of overlapping signals, multi-
ple signals for single populations, and
distorted estimates of abundance be-
tween organisms. The result is a com-
plicated portrait of community diversi-
ty that is difficult to interpret. To
acknowledge the potential biases in
these techniques, authors prudently ad-
vise readers to “interpret these data
with caution.”
For researchers to develop sound
experimental designs, accurate hy-
potheses, and meaningful conclusions
regarding community structure and
function, sources of systematic error in
community analysis techniques must
be identified and quantified. Biases re-
lated to genomic DNA extraction and
PCR amplification are well document-
ed (14–21). The goal of this paper is to
illustrate how rrn operon copy number
heterogeneity influences the interpreta-
tion of four commonly used 16S
rDNA-based community analysis tech-
niques. A hypothetical community was
constructed using gene sequences re-
trieved from the Comprehensive Mi-
crobial Resource (CMR) database,
which is a collection of completely se-
quenced and annotated genomes com-
piled by The Institute of Genomic Re-
search (TIGR) (Rockville, MD, USA).
DNA sequences encoding the 16S
rRNA gene and adjacent spacer regions
were used for an in silico comparison
of four major community analysis tech-
niques: length heterogeneity PCR (LH-
PCR) (22), automated ribosomal inter-
genic spacer analysis (ARISA) (8), de-
naturing gradient gel electrophoresis
(DGGE) (23), and terminal RFLP
analysis (T-RFLP; with restriction en-
zymes RsaI, MspI, and HhaI) (9). Se-
quences were analyzed according to the
priming sequences and discrimination
methods for each technique. Diversity
indices were calculated based on the
observed distribution of fragment sizes
(or melting temperatures for DGGE)
and then compared with the ideal or
“true” diversity indices for the hypo-
thetical community. The results demon-
strate that rrn operon copy number het-
erogeneity strongly influences the
interpretation of 16S rDNA-based
community analysis techniques.
MATERIALS AND METHODS
The CMR database, maintained by
TIGR (www.tigr.org), was the source of
the genome sequences analyzed in this
report. The CMR database was selected
over GenBank
®
because it contains all
of the rrn operons for each organism.
The organisms evaluated had rrn oper-
on copy numbers ranging from 1 to 10,
with 47% of organisms having either
one or two operons and 71% having up
to four (Figure 1). This distribution of
operon frequency in the CMR database
approximates the operon frequency for
organisms in the Ribosomal RNA
Operon Database (rrndb) Release 2.3
reported by Klappenbach (24). The
CMR database provided coordinates for
the sequence location of the rrn genes,
which allowed for
intergenic spacer
regions to be re-
trieved intact with
the 16S rRNA
genes. Sequences
were retrieved us-
ing the segment
retrieval function
on the CMR Web
site, where coordi-
nates for the 5′ end
of the 16S gene
and 5000 bases
downstream were
used to retrieve
the segment. For
operons with the typical configuration
of rRNA genes (16S, 23S, and 5S), this
reflected the entire 16S rRNA gene, the
intergenic spacer region between the
16S and 23S rRNA genes, and a portion
of the 23S rRNA gene. The analysis in-
cluded all microbial species for which
operons were reported but was limited
to those organisms in which PCR
primers matched potential targets by at
least 65% for at least two techniques.
With this constraint, members of the
domain Archaea were excluded from
the analysis. In addition, the hypotheti-
cal community was limited to a single
strain of a given species in the event
that multiple strains were reported. This
reduced the magnitude of the rrn oper-
on copy number bias that could be at-
tributed to the characteristics of any
particular species.
The sequences for forward and re-
verse primers, as previously published
for each method, were used to search
for corresponding sites within the DNA
sequence. When these sites were found,
subsequences were extracted that rep-
resented the fragment between the 5′
end of the forward primer and the 3′
end of the reverse primer. The lengths
of these fragments were recorded for
the LH-PCR and ARISA techniques,
while fragments for DGGE and T-
RFLP underwent further manipulation.
Sequence fragments for DGGE were
exported to the Winmelt software pack-
age (MedProbe AS, Oslo, Norway) to
estimate the T
m
of the lowest melting
domain. For T-RFLP, sequences for a
given restriction enzyme were used to
identify the fragment length between
the 5′ end of the forward primer and the
first instance of the restriction enzyme
cutting site. For all techniques, the frag-
ment lengths (or melting temperatures,
as in the case of DGGE) were plotted as
histograms to simulate the electro-
pherogram type of output that is com-
mon to the automated techniques.
DGGE differs in this regard but was
similarly plotted to facilitate compar-
isons between the techniques.
To reduce bias unrelated to copy
number, all members of the hypotheti-
cal community were assumed to be
equally abundant, with an equal ratio of
genomes. This eliminates the complica-
tion of variable cell densities or growth
rates and allows for meaningful com-
Vol. 34, No. 4 (2003) BioTechniques 3
Figure 1. Distribution of operon copy number frequency between the
CMR database and rrndb (Release 2.3). At this time, the CMR comprises
45 different species entries, while the rrndb contains 287 entries.
parison of diversity indices. A final as-
sumption was that there was no PCR
amplification bias as a result of primer
annealing efficiency. The 65% similari-
ty cutoff between the primer and poten-
tial targets represents a PCR amplifica-
tion reaction with low stringency. In
reality, the type of systematic error at-
tributed to primer bias is a serious com-
plication of PCR-based community
analysis techniques (17,21) and only
exacerbates the errors contributed by
rrn operon copy number heterogeneity.
Diversity indices were calculated
for each technique, based on the ob-
served fragment distribution, where
each unique fragment type represented
a particular ribotype. The Shannon-
Weaver index was used as a diversity
index and was calculated as follows:
H = -Σ(p
i
)(log p
i
) [Eq. 1]
where the summation is over all unique
fragments i, and p
i
is the proportion of
an individual “peak height” (i.e., num-
ber of same-sized fragments) relative to
the sum of all peak heights (i.e., total
number of fragments). Richness is the
number of unique fragment sizes (or
melting temperatures) identified by each
technique. No minimum signal intensity
threshold was used to determine peak
richness, although cutoffs are common-
ly applied to the interpretation of real
community analysis data. Evenness, or
the equitability of the observed ribo-
types, was calculated from the Shannon-
Weaver diversity function, where:
E = H/ln (richness). [Eq. 2]
At the same time, true values for
richness, evenness, and the Shannon-
Weaver index were calculated based on
the known species composition of the
hypothetical community, assuming that
each species would contribute only one
ribotype for each technique. For this
hypothetical community, 41 ribotypes
were present at equal abundance. The
observed values for the diversity in-
dices were then compared with the true
values to gain insight into the type and
magnitude of bias as a result of operon
copy number.
RESULTS
Table 1 presents the amplification
product lengths, melting temperatures,
and restriction fragment lengths for
each organism. The table is organized
such that the discriminating power of
the different techniques can be evaluat-
ed for any given organism, while gen-
eralizations about each technique can
be made by perusing the columns. En-
tries highlighted in bold represent frag-
ments that have two or more members
in common, while the number of frag-
ments of each length are presented in
parentheses. The histograms in Figure
2 represent the distribution of ribotypes
for each particular technique. The
scales for frequency distribution were
kept relevant to each technique, while
the gridlines for the vertical axis were
set to 10 U. This allows for a visual
comparison of the scales between the
different techniques.
LH-PCR gave hypothetical amplifi-
cation products with lengths ranging
314–371 bp, with an average product
length of 346 bp and standard deviation
of 13 bp. The technique produced a to-
tal of 26 unique product lengths for the
hypothetical community, where 14
(54%) represented unique peaks, and
12 (46%) peaks contained fragments
from two or more organisms. Of those
peaks with multiple contributors, the
number of organisms within the peak
ranged from two to eight. Fragments of
348 bp were highest in frequency, with
eight organisms contributing 40 copies
of the LH-PCR fragment to this peak.
In addition, there were 11 organisms
that contributed 52 copies of the frag-
ment within a size range of 5 bp
(352–356 bp). The incidence of in-
teroperon heterogeneity (heterogeneity
within the same organism) was relative-
ly low, with only five organisms with
more than one fragment length. Of the
organisms with heterogeneous copies
of length heterogeneity fragments, four
had fragments that differed by only one
base pair. Bacillus subtilis was the ex-
ception, with four unique fragment
lengths of 352, 353, 354, and 355 bp.
For ARISA, the hypothetical ampli-
fication products ranged 308–1576 bp,
with an average of 751 bp (
SD 239). The
product lengths were typically unique,
with six instances in which a maximum
of two to three organisms contributed to
a peak. Three organisms did not con-
tribute an amplification product because
of the orientation of the 16S and 23S
rRNA genes within the operon or be-
cause they lacked a sufficiently similar
priming site (less than the 65% se-
quence similarity criterion). Those or-
ganisms that did not contribute a prod-
uct tended to have a low operon copy
number, falling in the range from one to
two copies. Of the organisms with mul-
tiple operons, only Deinococcus radio-
durans gave an instance of a missed
product for one of its operons. Despite
the presence of organisms with no
hypothetical product, ARISA yielded
several peaks that exceeded the true
richness of the community. The hypo-
thetical community of 41 organisms
gave a total of 68 peaks. For organisms
with multiple rrn operons, the number
of unique amplification products was
almost equal to the number of operons
(Table 1). For example, Staphylococcus
aureus gave five unique product lengths
for six operons, and B. halodurans gave
six distinctly different product lengths
for each of its six operons. These length
differences were more than 1–2 bp, as
would be indicative of a minor insertion
or deletion event. In many instances, the
length differences were tens or hun-
dreds of bases apart, likely correspond-
ing to the presence or absence of vari-
ous tRNA sequences in the intergenic
spacer region (25). This result demon-
strates a combination of two systematic
errors for the ARISA technique: (i) the
underestimation of community diversi-
ty through missing or overlapping se-
quences and (ii) the overestimation of
diversity due to heterogeneous amplifi-
cation product lengths for a single or-
ganism.
The profile for DGGE showed a
range of melting temperatures from
70.4°C to 79.4°C, with an average tem-
perature of 74.0°C and standard devia-
tion of 1.7°C. The analysis gave a total
of 32 different melting temperatures,
with nine temperatures representing
amplification products from multiple
organisms. Of the peaks with multiple
contributors, seven contained only two
members and the other two contained
four members. The incidence of melt-
ing temperature heterogeneity for a sin-
gle organism was low, with only five
organisms that gave multiple signals.
For those with multiple temperatures,
the differences were frequently limited
to a tenth of a degree.
For the T-RFLP analysis, three en-
Research Report
4 BioTechniques Vol. 34, No. 4 (2003)
Vol. 34, No. 4 (2003) BioTechniques 5
Organism LH-PCR ARISA DGGE T-RFLP
Rsa
I T-RFLP
Hha
I
Agrobacterium tumefaciens
314 (4) 1494 (1), 1575 (1), 75.2 (4) 824 (4) 339 (4)
1576 (2)
Aquifex aeolicus
371 (2) 607 (2) 79.4 (2) 503 (2) 22 (2)
B. halodurans
354 (6) 965 (1), 1010 (1), 75.5 (1), 75.6 (5) 485 (5), 656 (1) 240 (6)
1091(1), 1135 (1),
1254 (1), 1281(1)
B. subtilis
352 (1), 353 (1), 448 (1), 449 (4), 74.8 (7), 454 (1), 455 (1), 238 (1), 240 (8),
354 (7), 355 (1) 452 (3), 629 (2) 75.2 (3) 456 (5), 457 (1), 241 (1)
475 (2)
Borrelia burgdorferi
N N 67.6 (1) 28 (1) 437 (1)
Brucella mellitensis
314 (3) 1048 (3) 75.8 (3) 106 (3) 61 (3)
Campylobacter jejuni
346 (3) 1074 (3) 74.2 (3) 453 (3) 98 (3)
Caulobacter crescentus
316 (2) 969 (2) 74.6 (2) 422 (2) 332 (2)
Chlamydia pneumoniae
356 (1) 513 (1) 71.6 (1) 106 (1) 734 (1)
C. trachomatis
357 (2) 531 (2) 74.5 (2) 488 (2) 735 (2)
Clorobium tepidum
342 (2) 737 (2) 76.2 (2) 465 (2) 91 (2)
C. perfringens
347 (9), 348 (1) 466 (1), 468 (4), 75.2 (9), 75.3 (1) 453 (9), 454 (1) 233 (10)
469 (1), 700 (2),
702 (2)
D. radiodurans
329 (3) N (1), 308 (1), 75.7 (3) 448 (3) 82 (3)
1022 (1)
Enterococcus
366 (4) 508 (2), 609 (1), 73.8 (4) 903 (4) 218 (4)
faecalis
610 (1)
E. coli
348 (7) 636 (1), 637 (1), 74.6 (7) 427 (7) 373 (7)
713 (1), 719 (2),
722 (1), 728 (1)
Hemophilus influenzae
348 (6) 758 (3), 1003 (3) 71.0 (6) 463 (6) 364 (6)
Helicobacter pylori
333 (1), 334 (1) N 71.3 (2) 846 (2) 99 (2)
Listeria innocua
356 (6) 529 (4), 779 (2) 72.6 (6) 435 (6) 186 (6)
Listeria monocytogenes
356 (6) 528 (4), 779 (2) 72.6 (6) 435 (6) 186 (6)
Mesorhizobium loti
314 (2) 1197 (2) 75.2 (2) 682 (2) 61 (2)
Mycobacterium leprae
356 (1) 569 (1) 76.7 (1) 307 (1) 193 (1)
Mycobacterium tuberculosis
344 (1) 559 (1) 77.6 (1) 638 (1) 201 (1)
Mycoplasma genitalium
343 (1) 482 (1) 70.4 (1) 475 (1) 226 (1)
Mycoplasma pulmonis
346 (1) 569 (1) 72.5 (1) 477 (1) 841 (1)
Neisseria meningitidis
348 (4) 946 (4) 74.4 (4) 126 (4) 213 (4)
Nostoc sp.
315 (4) 569 (1), 796 (3) 72.2 (4) 424 (4) 228 (4)
Porphyromonas gingivalis
353 (4) 1037 (4) 72.0 (4) 318 (4) 102 (4)
Pseudomonas aeruginosa
342 (4) 753 (4) 72.9 (4) 644 (4) 155 (4)
Ralstonia solanacearum
346 (4) 784 (4) 74.7 (4) 477 (4) 572 (4)
Rickettsia conorii
330 (1) N 73.0 (1) 132 (1) 1060 (1)
Salmonella enterica
348 (6), 349 (1) 636 (4), 797 (3) 75.7 (7) 427 (6), 428 (1) 373 (6), 374 (1)
S. aureus
355 (6) 586 (1), 620 (1), 72.2 (6) 486 (6) 238 (6)
648 (1), 757 (1),
830 (2)
Table 1. Hypothetical Fragments Retrieved for LH-PCR, ARISA, DGGE, and T-RFLP
zymes were used to generate terminal
restriction fragments: RsaI, MspI, and
HhaI. Each restriction enzyme was
used to generate an independent T-
RFLP profile of the hypothetical com-
munity. The range of fragment lengths
for RsaI was 28–903 bp; for MspI, the
range was 24–566 bp; and HhaI frag-
ments ranged 22–1113 bp. The average
fragment lengths for RsaI, MspI, and
HhaI were 495 (
SD 188), 300 (SD 184),
and 313 (
SD 210) bp, respectively. For
brevity, only RsaI and HhaI fragments
were included in Table 1. T-RFLP
analysis with all three enzymes showed
examples of interoperon heterogeneity
within a single organism and overlap-
ping fragment lengths from multiple
organisms. Most examples of in-
teroperon heterogeneity occurred as a
result of fragment lengths that differed
by a single base pair. B. halodurans
was the only example of an organism
that possessed operons that had dis-
tinctly different restriction sites, which
resulted in two widely divergent frag-
ment lengths. For the RsaI enzyme, one
of the six operons of B. halodurans dif-
fered by 171 bases; whereas for MspI,
two operons out of six differed by 389
bases. This is an interesting result be-
cause this pattern emerged from two
different C→T transitions, disrupting
cut sites for two different restriction en-
zymes. Instances of overlapping signals
were also observed for the T-RFLP
analysis, with RsaI giving seven peaks
with multiple contributors, three peaks
for MspI, and five for HhaI.
Richness was estimated as the num-
ber of different ribotypes presented by
each technique. The true value for
species richness for the hypothetical
community was 41, while the observed
richness values for the four techniques
ranged from 26 (LH-PCR) to 68
(ARISA) (Table 2). DGGE gave a rich-
ness estimate of 32 members, while T-
RFLP gave values of 42, 40, and 38 for
RsaI, MspI, and HhaI, respectively. In
this hypothetical community, the true
measure for evenness was equal to 1.0,
as all species were equally abundant.
ARISA ranked highest with a value of
0.951, followed by DGGE (0.900), T-
RFLP (0.904 for RsaI, 0.885 for MspI,
and 0.881 for HhaI), and finally, LH-
PCR (0.814). The Shannon-Weaver di-
versity index was used to account for
the abundance and evenness of ribo-
types generated by each technique. A
comparison of values for this index
showed that most techniques underesti-
mated diversity, with the exception of
ARISA (due to biases noted earlier).
All other values fell below the true val-
ue of 3.714. The diversity indices for
each of the techniques, in rank order,
are as follows: ARISA (4.012), T-
RFLP RsaI (3.378), T-RFLP HhaI
(3.206), T-RFLP MspI (3.263), DGGE
(3.120), and LH-PCR (2.653).
DISCUSSION
Clearly, the rrn operon copy number
has an effect on community analysis
techniques based on 16S rRNA genes.
Some of these techniques tend to com-
bine signals into a single peak, while
others tend to generate multiple signals
for a single organism. However, for all
of these techniques, the fact that an or-
ganism has multiple copies of an oper-
on leads to a quantitative bias for that
organism. The magnitude of this bias
depends on several factors, including
the range of fragment sizes generated
by the primer sets, the region of the rrn
operon amplified, and the discriminat-
ing power of capillary and gel elec-
trophoresis.
LH-PCR suffers most from overlap-
ping signals because it requires the dis-
crimination of small base pair differ-
ences. As an example, the original
reference for LH-PCR illustrates how
amplified products from soils form a
contiguous distribution (22). This tech-
nique has been previously cited as a
tool for quick assessments of changes,
but the meaning of any such change is
difficult to assess. For example, the loss
of a high copy number organism would
result in a more pronounced response
than the loss of a low copy number or-
ganism. Thus, more attention would be
drawn to drastic changes in large peaks,
rather than subtle changes that may be
Research Report
6 BioTechniques Vol. 34, No. 4 (2003)
Organism LH-PCR ARISA DGGE T-RFLP
Rsa
I T-RFLP
Hha
I
S. pneumoniae
352 (4) 529 (4) 74.2 (4) 889 (4) 579 (4)
S. pyogenes 354 (6) 704 (6) 73.6 (1), 73.7 (5) 629 (5), 630 (1) 581 (5), 582 (1)
Synechocystis sp.
317 (2) 746 (2) 73.5 (2) 425 (2) 1048 (2)
Thermotoga maritima
351 (1) 525 (1) 77.1 (1) 86 (1) 1113 (1)
Treponema pallidum
352 (1), 353 (1) 578 (1), 588 (1) 75.6 (2) 639 (1), 640 (1) 850 (1), 851 (1)
Ureaplasma urealyticum
345 (2) 573 (2) NA 283 (2) 370 (2)
Vibrio cholerae
348 (8) 707 (1), 713 (2), 71.6 (5), 71.7 (1), 427 (8) 213 (8)
792 (2), 793 (1), 72.2 (2)
968 (1), 994 (1)
Xylella fastidiosa
348 (2) 746 (2) 72.2 (2) 479 (2) 373 (2)
Yersinia pestis
348 (6) 746 (3), 806 (3) 75.5 (6) 884 (6) 373 (6)
Values represent amplified PCR fragment lengths for LH-PCR and ARISA, melting temperatures of the lowest melting do-
mains for DGGE fragments, and restriction fragment lengths for T-RFLP (
Rsa
I and
Hha
I only,
Msp
I not included). The number
of copies of each fragment are noted in parentheses, and the values in bold represent fragments that overlap in size with one
or more organisms within the technique.
Table 1. Hypothetical Fragments Retrieved for LH-PCR, ARISA, DGGE, and T-RFLP continued
equally or more relevant.
ARISA has also been cited as a
valuable tool due to its simplicity and
rapidity. Use of the intergenic spacer
region has the advantage that ISR frag-
ment banding patterns confer a finer
degree of phylogenetic resolution for
microbial isolates compared to frag-
ment analysis of 16S rRNA genes. Un-
fortunately, heterogeneity in the lengths
of intergenic spacer regions is a serious
complication for studies of mixed com-
munities. The magnitude of the prob-
lem is strongly influenced by the num-
ber of organisms with high copy
number, as opposed to the number of
organisms with fewer copies. Of the 22
organisms that gave a singular re-
sponse, the aver-
age copy number
was 2.45 (
SD 1.37),
compared to the
copy number aver-
age of the whole
population, which
was 3.67 (SD 2.49).
For comparison,
the 16 organisms
that gave multiple
signals had an av-
erage of 5.88
copies of the oper-
on (
SD 2.25). An-
other observation
is that the organ-
isms that did not give a signal (because
of sequence variation relative to the
“universal” PCR priming sites) also
tended to have a low copy number.
Again, this suggests that organisms of
potential relevance to the function of
the community may be overlooked.
Techniques such as DGGE and T-
RFLP also demonstrate copy number
bias, but to a lesser extent than LH-
PCR or ARISA. For DGGE, hetero-
geneity between operons was typically
limited to a single base change between
the DGGE fragments, which corre-
sponded to a temperature difference of
0.1°C. The technique also displayed
examples of overlapping sequences as
a result of two or more organisms shar-
ing the same ribotype. One point to
consider for the hypothetical DGGE
analysis is that melting temperatures
were estimates rather than discreet val-
ues. Thus, in this hypothetical scenario,
the bias due to overlapping fragments
may be greater or less than the bias ob-
served experimentally. Another consid-
eration for the hypothetical analysis of
the DGGE technique is that DNA frag-
ments are normally separated and visu-
alized by gel electrophoresis, rather
than automated capillary electrophore-
sis. Thus, the degree of resolution of a
gel may differ from the electrophero-
gram type of output that is common to
the other techniques.
T-RFLP analysis with each of the
three restriction enzymes showed het-
erogeneity between operons but was
typically limited to 1–2 bp. This corre-
sponds to minor insertion or deletion
events (indels) occurring between the
operons. The disruption of a restriction
Vol. 34, No. 4 (2003) BioTechniques 7
Method Shannon-Weaver Index Richness Evenness
LH-PCR 2.653 26 0.814
ARISA 4.012 68 0.951
DGGE 3.120 32 0.900
T-RFLP (
Rsa
I) 3.378 42 0.904
T-RFLP (
Msp
I) 3.263 40 0.885
T-RFLP (
Hha
I) 3.206 38 0.881
Ideal (species level) 3.714 41 1.000
Calculations for the ideal values were based on an a community with all popula-
tions at equal abundance.
Table 2. Diversity Indices for the Hypothetical Community
Figure 2. Frequency distribution of fragments generated by LH-PCR,
ARISA, DGGE, and T-RFLP (RsaI). Scales for the x-axis are relative to
each technique, while the gridlines for the y-axis were set to units of 10.
Figure 3. Plot of Shannon-Weaver index versus richness for communi-
ties of equally abundant populations. The relative positions of LH-PCR,
ARISA, DGGE, and T-RFLP indicate how far the techniques deviate from
the true value for these indices.
enzyme cutting site was also observed
in the hypothetical community, al-
though the incidence of this type of
mutation was much less frequent. Re-
garding the discriminating power of T-
RFLP, the wider range of possible frag-
ment lengths led to a finer level of
separation than for LH-PCR or DGGE.
However, the T-RFLP profile also in-
cluded several small peaks that abutted
larger peaks, which, in practice, may
not be finely differentiated. Another
consideration for T-RFLP profiles and
the other techniques is that interpreta-
tion often involves a cutoff for peaks
that fall below a given intensity thresh-
old. For this hypothetical analysis, no
threshold was set, although several
peaks were present at very low fre-
quency relative to the larger peaks. In-
cluding such a cutoff would have had
the effect of omitting signals from
some of the heterogeneous operons of a
single organism and from low copy
number organisms. This would have in-
fluenced the calculation of the diversi-
ty indices and resulted in a lower esti-
mates of diversity. According to the
Shannon-Weaver diversity index calcu-
lations, T-RFLP comes close to approx-
imating the actual diversity of the hy-
pothetical community. However, it
should be emphasized that, in this case
and for all techniques described here,
the bias of overlapping fragments di-
rectly offsets the bias of multiple sig-
nals by a single organism. Thus, two
compensating errors do not necessarily
yield a correct answer.
A potential limitation of this study is
the fact that the CMR database current-
ly emphasizes medically relevant or-
ganisms. These organisms may be sys-
tematically different in their copy
number compared to environmental
isolates, although comparison with the
less medically oriented rrndb suggests
that the copy number distributions are
quite similar. The current size of the
CMR database and the restrictions for
inclusion in the hypothetical communi-
ty also limit the scope of this analysis.
However, the behavior of the Shannon-
Weaver diversity index with respect to
richness (for communities of uniform
abundance and therefore an evenness
value of 1.0) lends some insight into
the value of this small subset of organ-
isms (Figure 3). At low diversity, the
addition of a single population has a
large impact on the diversity index val-
ue because the proportion of the new
population is relatively large compared
to the total number of populations. As
the community grows, the impact of
each additional population becomes
smaller and smaller. Given the observa-
tion that the diversity index changes
most abruptly for communities of less
than 20 organisms, it would appear that
a hypothetical community of 41 organ-
isms is sufficient to make meaningful
conclusions regarding the analysis
techniques described here. (Note that
the various diversity indices used in
this analysis may not be appropriate for
“real” 16S rDNA-based fragment
analyses due to biases inherent in PCR
amplification, fragment discrimination,
and operon copy number issues.) Addi-
tions to the database and updated hypo-
thetical analyses will determine
whether these trends remain consistent.
It is instructive to observe the his-
tograms generated by each technique
for the hypothetical community. All or-
ganisms were equally represented on
the basis of population densities, but
the histograms show a wide variation in
the ribotype abundance and diversity.
This illustrates how peak amplitude can
be deceptive in community analysis.
Changes in the height of a particular
peak can be caused by the growth or
loss of a single population, while subtle
changes in smaller peaks are over-
looked or discounted. Understanding
the rrn distribution for organisms in a
particular environment would improve
the application and interpretation of
molecular analyses. Recent studies
(11,26) suggest that variation in the
copy number of the rRNA genes is re-
lated to the ecological strategy of an or-
ganism. That is, organisms with multi-
ple copies of the rrn operon are able to
mobilize quickly in response to rich
growth conditions. Organisms with
fewer rrn operons are more limited in
their rate of ribosome synthesis and
mobilize less quickly to an influx of nu-
trient into an environment. How does
the dynamic nutrient profile of an envi-
ronment shape the composition and, in
turn, function of the microbial commu-
nity? It may be that organisms of low
rrn operon copy number comprise a
significant portion of the microbial di-
versity, while high copy number organ-
isms flourish during nutrient perturba-
tions. These “fast responders” with
high copy number are the same kinds
of organisms that promptly appear on
culture plates under traditional cultur-
ing methods, overshadowing the organ-
isms that grow more slowly. Note that
molecular-based tools were developed
in part to avoid the bias of culture-
based techniques, while the results of
this study suggest that 16S rDNA-
based molecular techniques may
overemphasize the same organisms.
Another point to consider is the term
ribotype, which is often meant to con-
vey the sequence similarity of the 16S
rRNA genes between two organisms.
Ribotyping is used to describe the
unique banding patterns of the rRNA
gene using various methods of discrim-
ination (restriction fragments, length
heterogeneity, etc.) Two organisms are
said to have common ribotypes when
they give identical signals for a given
technique. This hypothetical analysis
demonstrates that while restriction
fragment sites and gene fragment
lengths may be in common for one
technique, minute differences in the
DNA sequence may yield divergent re-
sponses for another technique. Thus, it
should be made clear that the concept
of ribotype as a measure for diversity is
entirely technique dependent.
This study provides an initial explo-
ration of rrn operon copy number bias,
based on the content of the databases
available to date. Further investigation
may lead to the refinement of existing
methods and/or the development of
correction factors for improved esti-
mates of community diversity. In the
meantime, method development should
be directed toward technologies that
are based on single copy genes and/or
new discrimination methods. Until
these new techniques are readily avail-
able and broadly applicable, re-
searchers should continue to interpret
rrn-based techniques with caution.
ACKNOWLEDGMENTS
Support for this work was provided
by a National Institutes of Health
Training Grant in Biotechnology (no.
T32GM008412) and a NASA Graduate
Research Report
8 BioTechniques Vol. 34, No. 4 (2003)
Student Researchers Program Fellow-
ship (no. NGT-10-52619) to L.D.C. and
by project no. DE-FG03-00ER63046-
A001 from the U.S. Department of En-
ergy NABIR program.
REFERENCES
1.Torsvik, V., F.L. Daae, R.A. Sandaa, and L.
Ovreas. 1998. Novel techniques for analyzing
microbial diversity in natural and perturbed
environments. J. Biotechnol. 64:53-62.
2.Torsvik, V., J. Goksoyr, and F.L. Daae.
1990. High diversity in DNA of soil bacteria.
Appl. Environ. Microbiol. 56:782-787.
3.Amann, R.I., W. Ludwig, and K.H.
Schleifer. 1995. Phylogenetic identification
and in situ detection of individual microbial
cells without cultivation. Microbiol. Rev.
59:143-169.
4.Giovannoni, S.J., T.B. Britschgi, C.L. Moy-
er, and K.G. Field. 1990. Genetic diversity in
Sargasso Sea bacterioplankton. Nature
345:60-63.
5.Ward, D.M., R. Weller, and M.M. Bateson.
1990. 16S rRNA sequences reveal numerous
uncultured microorganisms in a natural com-
munity. Nature 345:63-65.
6.Dahllof, I., H. Baillie, and S. Kjelleberg.
2000. rpoB-based microbial community
analysis avoids limitations inherent in 16S
rRNA gene intraspecies heterogeneity. Appl.
Environ. Microbiol. 66:3376-3380.
7.Dahllof, I. 2002. Molecular community analy-
sis of microbial diversity. Curr. Opin. Biotech-
nol. 13:213-217.
8.Fisher, M.M. and E.W. Triplett. 1999. Auto-
mated approach for ribosomal intergenic spac-
er analysis of microbial diversity and its appli-
cation to freshwater bacterial communities.
Appl. Environ. Microbiol. 65:4630-4636.
9.Liu, W.T., T.L. Marsh, H. Cheng, and L.J.
Forney. 1997. Characterization of microbial
diversity by determining terminal restriction
fragment length polymorphisms of genes en-
coding 16S rRNA. Appl. Environ. Microbiol.
63:4516-4522.
10.Farrelly, V., F.A. Rainey, and E. Stacke-
brandt. 1995. Effect of genome size and rrn
gene copy number on PCR amplification of
16S rRNA genes from a mixture of bacterial
species. Appl. Environ. Microbiol. 61:2798-
2801.
11.Klappenbach, J.A., J.M. Dunbar, and T.
Schmidt. 2000. rRNA operon copy number
reflects ecological strategies of bacteria. Appl.
Environ. Microbiol. 66:1328-1333.
12.Stackebrandt, E. 2002. Defining taxonomic
ranks. In The Prokaryotes: an Evolving Elec-
tronic Resource for the Microbiological Com-
munity. (Online reference.) Title No. 10125.
13.Barbieri, M. 1981. The ribotype theory on
the origin of life. J. Theor. Biol. 91:545-601.
14.Frostegard, A., S. Courtois, V. Ramisse, S.
Clerc, D. Bernillon, F. Le Gall, P. Jeannin,
X. Nesme, et al. 1999. Quantification of bias
related to the extraction of DNA directly from
soils. Appl. Environ. Microbiol. 65:5409-
5420.
15.Martin-Laurent, F., L. Philippot, S. Hallet,
R. Chaussod, J.C. Germon, G. Soulas, and
G. Catroux. 2001. DNA Extraction from
soils: old bias for new microbial diversity
analysis methods. Appl. Environ. Microbiol.
67:2354-2359.
16.Miller, D.N., J.E. Bryant, E.L. Madsen, and
W.C. Ghiorse. 1999. Evaluation and opti-
mization of DNA extraction and purification
procedures for soil and sediment samples.
Appl. Environ. Microbiol. 65:4715-4724.
17.Polz, M.F. and C.M. Cavanaugh. 1998. Bias
in template-to-product ratios in multi-template
PCR. Appl. Environ. Microbiol. 64:3724-
3730.
18.Steffan, R.J., J. Goksoyr, A.K. Bej, and
R.M. Atlas. 1988. Recovery of DNA from
soils and sediments. Appl. Environ. Microbi-
ol. 54:2908-2915.
19.Wilson, I.G. 1997. Inhibition and facilitation
of nucleic acid amplification. Appl. Environ.
Microbiol. 63:3741-3751.
20.Suzuki, M., M.S. Rappe, and S.J. Giovan-
noni. 1998. Kinetic bias in estimates of
coastal picoplankton community structure ob-
tained by measurements of small-subunit
rRNA gene PCR amplicon length heterogene-
ity. Appl. Environ. Microbiol. 64:4522-4529.
21.Suzuki, M.T. and S.J. Giovannoni. 1996.
Bias caused by template annealing in the am-
plification of mixtures of 16S rRNA genes by
PCR. Appl. Environ. Microbiol. 62:625-630.
22.Ritchie, N.J., M.E. Schutter, R.P. Dick, and
D.D. Myrold. 2000. Use of length hetero-
geneity PCR and fatty acid methyl ester pro-
files to characterize microbial communities in
soil. Appl. Environ. Microbiol. 66:1668-1675.
23.Muyzer, G., E.C. de Waal, and A.G. Uitter-
linden. 1993. Profiling of complex microbial
populations by denaturing gradient gel elec-
trophoresis analysis of polymerase chain reac-
tion-amplified genes coding for 16S rRNA.
Appl. Environ. Microbiol. 59:695-700.
24.Klappenbach, J.A., P.R. Saxman, J.R. Cole,
and T.M. Schmidt. 2001. rrndb: the riboso-
mal RNA operon copy number database. Nu-
cleic Acids Res. 29:181-184.
25.Gurtler, V. and V.A. Stanisich. 1996. New
approaches to typing and identification of bac-
teria using the 16S-23S rDNA spacer region.
Microbiology 142:3-16.
26.Fogel, G.B., C.R. Collins, J. Li, and C.F.
Brunk. 1999. Prokaryotic genome size and
SSU rDNA copy number: estimation of mi-
crobial relative abundance from a mixed pop-
ulation. Microb. Ecol. 38:93-113.
Received 7 November 2002; accepted
14 January 2003.
Address correspondence to:
Dr. Craig S. Criddle
Department of Civil and Environmental
Engineering
Terman Engineering Center, Rm B-9
Stanford University
Stanford, CA 94305, USA
e-mail: criddle@stanford.edu
Vol. 34, No. 4 (2003) BioTechniques 9