A core gut microbiome in obese and lean twins
Peter J. Turnbaugh1, Micah Hamady3, Tanya Yatsunenko1, Brandi L. Cantarel5, Alexis Duncan2, Ruth E. Ley1,
Mitchell L. Sogin6, William J. Jones7, Bruce A. Roe8, Jason P. Affourtit9, Michael Egholm9, Bernard Henrissat5,
Andrew C. Heath2, Rob Knight4& Jeffrey I. Gordon1
The human distal gut harbours a vast ensemble of microbes (the
microbiota) that provide important metaboliccapabilities, includ-
polysaccharides1–6. Studies of a few unrelated, healthy adults have
revealed substantial diversity in their gut communities, as mea-
sured by sequencing 16S rRNA genes6–8, yet how this diversity
relates to function and to the rest of the genes in the collective
genomesofthemicrobiota (thegutmicrobiome) remains obscure.
Studies of lean and obese mice suggest that the gut microbiota
affects energy balance by influencing the efficiency of calorie har-
vest from the diet, and how this harvested energy is used and
stored3–5. Here we characterize the faecal microbial communities
of adult female monozygotic and dizygotic twin pairs concordant
for leanness or obesity, and their mothers, to address how host
genotype, environmental exposure and host adiposity influence
the gutmicrobiome. Analysis of 154 individuals yielded 9,920 near
full-length and 1,937,461 partial bacterial 16S rRNA sequences,
the human gut microbiome is shared among family members, but
that each person’s gut microbial community varies in the specific
between adult monozygotic and dizygotic twin pairs. However,
there was a wide array of shared microbial genes among sampled
individuals, comprising an extensive, identifiable ‘core micro-
biome’ at the gene, rather than at the organismal lineage, level.
Obesity isassociatedwith phylum-level changesinthemicrobiota,
reduced bacterial diversity and altered representation of bacterial
genes and metabolic pathways. These results demonstrate that a
diversity of organismal assemblages can nonetheless yield a core
are associated with different physiological states (obese compared
We characterized gut microbial communities in 31 monozygotic
(n546) (Supplementary Tables 1–5). Monozygotic and dizygotic
co-twins and parent–offspring pairs provided an attractive model
for assessing the impact of genotype and shared early environmental
exposures on the gut microbiome. Moreover, genetically ‘identical’9
monozygotic twin pairs gain weight in response to overfeeding in a
more reproducible way than unrelated individuals10and are more
concordant for body mass index (BMI) than dizygotic twin pairs11.
Twin pairs who had been enrolled in the Missouri Adolescent
FemaleTwin Study(MOAFTS12)wererecruited for thisstudy(mean
period of enrolment in MOAFTS, 11.761.2 years; range, 4.4–13.0
years). Twins were 21–32 years old, of European or African ancestry,
and were generally concordant for obesity (BMI>30kgm22) or
leanness (BMI518.5–24.9kgm22) (one twin pair was lean/over-
overweight/obese). They had not taken antibiotics for at least
5.4960.09 months. Each participant completed a detailed medical,
lifestyle and dietary questionnaire: study enrolees were broadly
representative of the overall Missouri population for BMI, parity,
education and maritalstatus (see Supplementary
Although all were born in Missouri, they currently live throughout
of interpersonal differences in gut microbial ecology7, they were col-
lected from each individual and frozen immediately. The collection
procedure was repeated again with an average interval between
sampling of 5764 days.
To characterize the bacterial lineages present in the faecal micro-
biotas of these 154 individuals, we performed 16S rRNA sequencing,
Additionally, we performed multiplex pyrosequencing with a 454
FLX instrument to survey the gene’s V2 variable region13and its
V6 hypervariable region14(Supplementary Tables 1–3).
Complementary phylogenetic and taxon-based methods were
used to compare 16S rRNA sequences among faecal communities
(see Methods). No matter which region of the gene was examined,
individuals from the same family (a twin and her co-twin, or twins
and their mother) had a more similar bacterial community structure
than unrelated individuals (Fig. 1a and Supplementary Fig. 1a, b),
and shared significantly more species-level phylotypes (16S rRNA
(G555.2, P,10212(V2); G512.3, P,0.001 (V6); G511.3,
P,0.001 (full-length)). No significant correlation was seen between
the degree of physical separation of family members’ current homes
and the degree of similarity between their microbial communities
(defined by UniFrac15). The observed familial similarity was not due
to an indirect effect of the physiological states of obesity versus lean-
ness; similar results were observed after stratifying twin pairs and
individuals; Supplementary Fig. 2). Surprisingly, there was no sig-
adult monozygotic compared with dizygotic twin pairs (Fig. 1a).
However, we could not assess whether monozygotic and dizygotic
levels of coverage compared with what was feasible using Sanger
sequencing, reaching onaverage
of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado 80309, USA.5CNRS, UMR6098, Marseille, France.6Josephine Bay Paul Center, Marine Biological
Laboratory, Woods Hole, Massachusetts 02543, USA.7Environmental Genomics Core Facility, University of South Carolina, Columbia, South Carolina 29208, USA.8Department of
Chemistry and Biochemistry and the Advanced Center for Genome Technology, University of Oklahoma, Norman, Oklahoma 73019, USA.9454 Life Sciences, Branford, Connecticut
Vol 457|22 January 2009|doi:10.1038/nature07540
Macmillan Publishers Limited. All rights reserved
in coverage, all analyses were performed on an equal number of
randomly selected sequences (200 full-length, 1,000 V2 and 10,000
V6). At this level of coverage, there was little overlap between the
sampled faecal communities. Moreover, the number of 16S rRNA
gene sequences belonging to each phylotype varied greatly between
faecal microbiotas (Supplementary Tables 6–8).
Because this apparent lack of overlap could reflect the level of
coverage (Supplementary Tables 1–3), we subsequently searched all
hosts for bacterial phylotypes present at high abundance using a
sampling model based on a combination of standard Poisson and
no phylotype was present at more than about 0.5% abundance in all
of the samples in this study (see Supplementary Results). Finally, we
sub-sampled our data set by randomly selecting 50–3,000 sequences
per sample; again, no phylotypes were detectable in all individuals
sampled within this range of coverage (Supplementary Fig. 3).
Samples taken from the same individual at the initial collection
phylotypes found (Supplementary Figs 4 and 5), but showed varia-
tions in relative abundance of the major gut bacterial phyla
UniFrac distance and the time between sample collections. Overall,
faecal samples from the same individual were much more similar to
one another than samples from family members or unrelated indi-
community structure within an individual are minor compared with
Analysis of 16S rRNA data sets produced by the three PCR-based
methods, plus shotgun sequencing of community DNA (see below),
of Actinobacteria in obese compared with lean individuals of both
ancestries (Supplementary Table 9). Combining the individual P
values across these independent analyses using Fisher’s method dis-
Actinobacteria(P50.002) but no significant
Firmicutes (P50.09). These findings agree with previous work
obese humans lost weight after being placed on one of two reduced-
Across all methods, obesity was associated with a significant
decrease in the level of diversity (Fig. 1b and Supplementary Fig.
1c–f). This reduced diversity suggests an analogy: the obese gut
microbiota is not like a rainforest or reef, which are adapted to high
energy flux and are highly diverse; rather, it may be more like a
fertilizer runoff where a reduced-diversity microbial community
blooms with abnormal energy input16.
We subsequently characterized the microbial lineage and gene
content of the faecal microbiomes of 18 individuals representing
six of the families (three lean and three obese European ancestry
monozygotic twin pairs and their mothers) through shotgun pyro-
sequencing (Supplementary Tables 4 and 5) and BLASTX compar-
isons against several databases (KEGG17(version 44) and STRING18)
(Supplementary Figs7–10andSupplementary Results).Ouranalysis
parameters were validated using control data sets comprising ran-
domly fragmented microbial genes with annotations in the KEGG
database17(Supplementary Fig. 11 and Supplementary Methods).
We also tested how technical advances that produce longer reads
might improve these assignments by sequencing faecal community
(average read length of 3416134 nucleotides (s.d.) versus 208668
nucleotides for the standard FLX method). Supplementary Fig. 12
shows that the frequency and quality of sequence assignments is
improved as read length increases from 200 to 350 nucleotides.
enzymes (CAZymes). Sequences matching 156 total CAZy families
were found within at least one human gut microbiome, including 77
glycoside hydrolase, 21 carbohydrate-binding module, 35 glycosyl-
transferase, 12 polysaccharide lyase and 11 carbohydrate-esterase
families (Supplementary Table 10). On average, 2.6260.13% of
the sequences in the gut microbiome could be assigned to
CAZymes (a total of 217,615 sequences), a percentage that is greater
than themost abundantKEGG
1.2060.06% of the filtered sequences generated from each sample)
and indicative of the abundant and diverse set of microbial genes
directed towards accessing a wide range of polysaccharides.
Category-based clustering of the functions from each microbiome
archical clustering19. Two distinct clusters of gut microbiomes were
identified basedonmetabolic profile, corresponding to sampleswith
an increased abundance of Firmicutes and Actinobacteria, and sam-
sion of the first principal component (PC1, explaining 20% of the
functional variance) and the relative abundance of the Bacteroidetes
showed a highly significant correlation (R250.96, P,10212;
Fig. 2b). Functional profiles stabilized within each individual’s
microbiome after 20,000 sequences had been accumulated
(Supplementary Fig. 13). Family members had more similar profiles
than unrelated individuals (Fig. 2c), suggesting that shared bacterial
community structure (‘who’s there’ based on 16S rRNA analyses)
also translates into shared community-wide relative abundance of
metabolic pathways. Accordingly, a direct comparison of functional
Number of sequences
Figure 1 | 16S rRNA gene surveys reveal familial similarity and reduced
diversityof thegut microbiotainobeseindividuals. a,Averageunweighted
UniFrac distance (a measure of differences in bacterial community
structure) between individuals over time (self), twin pairs, twins and their
mother, and unrelated individuals (1,000 sequences per V2 data set;
Student’s t-test with Monte Carlo; *P,1025; **P,10214; ***P,10241;
mean6s.e.m.). b, Phylogenetic diversity curves for the microbiota of lean
and obese individuals (based on 1–10,000 sequences per V6 data set;
mean695% confidence intervals shown).
NATURE|Vol 457|22 January 2009
Macmillan Publishers Limited. All rights reserved
and taxonomic similarity (see Supplementary Methods) disclosed a
significant association: individuals with similar taxonomic profiles
also share similar metabolic profiles (P,0.001; Mantel test).
Functional clustering of phylum-wide sequence bins representing
microbiome reads assigned to 23 human gut Firmicutes and 14
Bacteroidetes reference genomes showed discrete clustering by
phylum (Supplementary Figs 14a and 15). Bootstrap analyses of the
Firmicutes and Bacteroidetes sequence bins disclosed 26 pathways
with a significantly different relative abundance (Supplementary
Fig. 14a). The Bacteroidetes bins were enriched for several carbohyd-
for transport systems. This finding is consistent with our CAZyme
analysis, which revealed a significantly higher relative abundance of
glycoside hydrolases, carbohydrate-binding modules, glycosyltrans-
ferases, polysaccharide lyases and carbohydrate esterases in the
Bacteroidetes sequence bins (Supplementary Fig. 14b).
One of the major goals of the International Human Microbiome
Project(s) is to determine whether there is an identifiable ‘core
microbiome’ of shared organisms, genes or functional capabilities
found in a given body habitat of all or the vast majority of humans1.
Although all of the 18 gut microbiomes surveyed showed a high level
(Fig. 3a), analysis of the relative abundance of broad functional cat-
consistent pattern regardless of the sample surveyed (Fig. 3b and
Supplementary Table 11): the pattern is also consistent with results
biome data sets from nine adults20,21(Supplementary Fig. 16). This
consistency is not simply due to the broad level of these annotations,
as a similar analysis of Bacteroidetes and Firmicutes reference gen-
omes revealed substantial variation in the relative abundance of each
category (see Supplementary Fig. 17). Furthermore, pairwise com-
parisons of metabolic profiles obtained from the 18 microbiomes in
this study revealed an average value of R2of 0.9760.002 (Fig. 2a),
indicating a high level of functional similarity.
Overall functional diversity was compared using the Shannon
index22, a measurement that combines diversity (the number of dif-
ferent metabolic pathways) and evenness (the relative abundance of
each pathway). The human gut microbiomes surveyed had a stable
and high Shannon index value (4.6360.01), close to the maximum
possible level of functional diversity (5.54; see Supplementary
bolic pathways (listed in Supplementary Table 11), the overall func-
pathways are found at a similar level of abundance. Interestingly, the
level of functional diversity in each microbiome was significantly
linked to the relative abundance of the Bacteroidetes (R250.81,
P,1026); microbiomes enriched for Firmicutes/Actinobacteria had
Bacteroidetes and Firmicutes genomes (Supplementary Fig. 18): on
both functional diversity and evenness (Mann–Whitney U-test,
At a finer level, 26–53% of ‘enzyme’-level functional groups
(KEGG/CAZy/STRING) were shared across all 18 microbiomes,
whereas 8–22% of the groups were unique to a single microbiome
(Supplementary Fig. 19a–c). The ‘core’ functional groups present in
all microbiomes were also highly abundant, representing 93–98% of
groups, more than 95% were found after 26.1162.02 megabases of
sequence were collected from a given microbiome, whereas the ‘vari-
able’ groups continued to increase substantially with each additional
megabase of sequence. Of course, any estimate of the total size of the
core microbiome will depend on sequencing effort, especially for
High Firmicutes/ActinobacteriaHigh Bacteroidetes
Functional similarity (R2)
R2 = 0.96
–0.4–0.20 0.2 0.40.6
0.98 0.98 0.99
F1T1Le F1T2Le F1MOvF2T1LeF2T2Le F2MOb F3T1LeF3T2LeF3MOv F4T1Ob F4T2ObF4MObF5T1Ob F5T2ObF5MOv F6T1ObF6T2ObF6MOb
Figure 2 | Metabolic-pathway-based clustering and analysis of the human
gut microbiome of monozygotic twins. a, Clustering of functional profiles
based on the relative abundance of KEGG metabolic pathways. All pairwise
comparisons were made of the profiles by calculating each R2value. Sample
identifier nomenclature: family number, twin number or mother, and BMI
category (Le, lean; Ov, overweight; Ob, obese; for example, F1T1Le stands
for family 1, twin 1, lean). b, The relative abundance of Bacteroidetes as a
function of the first principal componentderived from an analysisof KEGG
metabolic profiles. c, Comparisons of functional similarity between twin
pairs, between twins and their mother, and between unrelated individuals.
Asterisk indicates significant differences (Student’s t-test with Monte Carlo;
Relative abundance (%)
F1T2LeF1MOvF2T1Le F2T2LeF2MOb F3T1LeF3T2Le F3MOvF4T1Ob F4T2ObF4MOb F5T1ObF5T2Ob F5MOvF6T1Ob F6T2Ob F6MOb F1T1LeF1T2LeF1MOvF2T1Le F2T2LeF2MOb F3T1LeF3T2Le F3MOvF4T1Ob F4T2Ob F4MObF5T1Ob F5T2Ob F5MOvF6T1Ob F6T2ObF6MOb
COG categoriesBacterial phylum
Figure 3 | Comparisonoftaxonomicandfunctionalvariationsinthehuman
gut microbiome. a, Relative abundance of major phyla across 18 faecal
comparisons of microbiomes and the National Center for Biotechnology
genes across each sampledgut microbiome (letters correspond to categories
in the COG database).
NATURE|Vol 457|22 January 2009
Macmillan Publishers Limited. All rights reserved
functional groups found at a low abundance. On average, our survey
achieved more than 450,000 sequences per faecal sample, which,
assuming an even distribution, would allow us to sample groups
found at a relative abundance of 1024. To estimate the total size of
the core microbiome based on the 18 individuals, we randomly sub-
sampled each microbiome in 1,000 sequence intervals (Supple-
mentary Fig. 19d). Based on this analysis, the core microbiome is
approaching a total of 2,142 total orthologous groups (one site bind-
ing (hyperbola) curve fit, R250.9966), indicating that we identified
(CAZy), 64% (KEGG) and 56% (STRING) were also found in the
nine previously published, but much lower coverage, data sets gener-
ated by capillary sequencing of adult faecal DNA20,21(average of
78,41362,044 bidirectional reads per sample; see Supplementary
Metabolic reconstructions of the ‘core’ microbiome revealed sig-
nificant enrichment for several expected functional categories,
including those involved in transcription and translation (Fig. 4).
Metabolic profile-based clustering indicated that the representation
of ‘core’ functional groups was highly consistent across samples
(Supplementary Fig. 20), and included several pathways that are
likely important for life in the gut, such as those for carbohydrate
and amino-acid metabolism (for example, fructose/mannose meta-
bolism, amino-sugar metabolism and N-glycan degradation).
Variably represented pathways and categories include cell motility
(only a subset of Firmicutes produce flagella), secretion systems and
membrane transport (for example, phosphotransferase systems
involved in the import of nutrients, including sugars; Fig. 4 and
Supplementary Fig. 20).
The distribution of CAZy glycoside hydrolase and glycosyltrans-
ferase families was compared between each pair of microbiomes (see
Supplementary Table10forCAZyfamilies witharelativeabundance
greater than 1%). This analysis revealed that all individuals had a
similar profile of glycosyltransferases (R250.9660.003), whereas
the profiles of glycoside hydrolases were significantly more variable,
even between family members (R250.8060.01; P,10230, paired
Student’s t-test). This suggests that the number and spectrum of
than the glycosyltransferases.
To identify metabolic pathways associated with obesity, only non-
core associated (variable) functional groups were included in a com-
parison of the gut microbiomes of lean versus obese twin pairs. A
bootstrap analysis23was used to identify metabolic pathways that
were enriched or depleted in the variable obese gut microbiome.
For example, similar to a mouse model of diet-induced obesity4,
ase systems involved in microbial processing of carbohydrates
(Supplementary Table12). Allgut microbiome sequences were com-
pared with the custom database of 44 human gut genomes: an odds
ratio analysis revealed 383 genes that were significantly different
between the obese and lean gut microbiome (q value,0.05; 273
Supplementary Tables 13 and 14). By contrast, only 49 genes were
consistently enriched or depleted between all twin pairs (see
These obesity-associated genes were representative of the taxo-
nomic differences described above: 75% of the obesity-enriched
genes; the other 25% are from Firmicutes) whereas 42% of the lean-
enriched genes were from Bacteroidetes (compared with 0% of the
obesity-enriched genes). Their functional annotation indicated that
many are involved in carbohydrate, lipid and amino-acid metabol-
ism (Supplementary Tables 13 and 14). Together, they comprise an
initial set of microbial biomarkers of the obese gut microbiome.
Our finding that the gut microbial community structures of adult
monozygotic twin pairs had a degree of similarity that was compar-
able to that of dizygotic twin pairs, and only slightly more similar
study of adult twins24, and with a recent microarray-based analysis,
which revealed that gut community assembly during the first year of
life followed a more similar pattern in a pair of dizygotic twins than
12 unrelated infants25. Intriguingly, another fingerprinting study of
monozygotic and dizygotic twins in childhood showed a slightly
reduced similarity profile in dizygotic twins26. Thus, comprehensive
time-course studies, comparing monozygotic and dizygotic twin
pairs from birth through adulthood, as well as intergenerational
analyses of their families’ microbiotas, will be key to determining
the relative contributions of host genotype and environmental expo-
sures to (gut) microbial ecology.
The hypothesis that there is a core human gut microbiome, defin-
able by a set of abundant microbial organismal lineages that we all
share, may be incorrect: by adulthood, no single bacterial phylotype
was detectable at an abundant frequency in the guts of all 154
sampled humans. Instead, it appears that a core gut microbiome
high degree of redundancy in the gut microbiome and supports an
ecological view of each individual as an ‘island’ inhabited by unique
in the obese microbiome;
02468 10 12 14
Replication and repair
other amino acids
Biosynthesis of polyketides
Cell growth and death
Metabolism of cofactors
processing protein families
Cellular processes and
signalling protein families
Cellular processes and
Relative abundance (percentage of KEGG assignments)
Figure 4 | KEGGcategoriesenrichedordepletedinthecoreversusvariable
components of the gut microbiome. Sequences from each of the 18 faecal
microbiomes were binned into the ‘core’ or ‘variable’ microbiome based on
the co-occurrence of KEGG orthologous groups (core groups were found in
all 18 microbiomes whereas variable groups were present in fewer (,18)
microbiomes; see Supplementary Fig. 19a). Asterisks indicate significant
differences (Student’s t-test, *P,0.05, **P,0.001, ***P,1025;
NATURE|Vol 457|22 January 2009
Macmillan Publishers Limited. All rights reserved
collections of microbial phylotypes: as in actual islands, different
species assemblages converge on shared core functions provided by
distinctive components. Our findings raise the question of how core
functionality is assembled in this body habitat. Understanding the
underlying principles should provide insights about microbial
adaptation to, and mutualistic community assembly within, a wide
range of environments.
Faecal samples were collected from each individual. Community DNA was pre-
pared and used for pyrosequencing (454 Life Sciences), as well as for PCR and
sequencing of bacterial 16S rRNA genes. Shotgun reads were mapped to ref-
erencegenomesusing theNationalCenterforBiotechnology Information‘non-
redundant’ database, KEGG17, STRING18, CAZy (http://www.cazy.org/) and a
44-member human-gut microbial genome database. Metabolic reconstructions
were performed based on CAZy, KEGG and STRING annotations. The relative
abundance of KEGG metabolic pathways is referred to as a ‘metabolic profile’.
Full Methods and any associated references are available in the online version of
the paper at www.nature.com/nature.
Received 29 June; accepted 14 October 2008.
Published online 30 November 2008.
1.Turnbaugh, P. J. et al. The human microbiome project. Nature 449, 804–810
Ley, R. E. et al. Obesity alters gut microbial ecology. Proc. Natl Acad. Sci. USA 102,
Turnbaugh, P. J. et al. An obesity-associated gut microbiome with increased
capacity for energy harvest. Nature 444, 1027–1031 (2006).
Turnbaugh, P. J., Ba ¨ckhed, F., Fulton, L. & Gordon, J. I. Diet-induced obesity is
linked to marked but reversible alterations in the mouse distal gut microbiome.
Cell Host Microbe 3, 213–223 (2008).
Ba ¨ckhed, F. etal. The gut microbiota asan environmental factor that regulates fat
storage. Proc. Natl Acad. Sci. USA 101, 15718–15723 (2004).
with obesity. Nature 444, 1022–1023 (2006).
Eckburg, P. B. et al. Diversity of the human intestinal microbial flora. Science 308,
Frank, D. N. et al. Molecular-phylogenetic characterization of microbial
community imbalances in human inflammatory bowel diseases. Proc. Natl Acad.
Sci. USA 104, 13780–13785 (2007).
Bruder, C. E. et al. Phenotypically concordant and discordant monozygotic twins
display different DNA copy-number-variation profiles. Am. J. Hum. Genet. 82,
10. Bouchard, C. et al. The response to long-term overfeeding in identical twins. N.
Engl. J. Med. 322, 1477–1482 (1990).
11.Maes, H. H., Neale, M. C. & Eaves, L. J. Genetic and environmental factors in
relative body weight and human adiposity. Behav. Genet. 27, 325–351 (1997).
12. Heath, A. C. et al. Ascertainment of a mid-western US female adolescent twin
cohort for alcohol studies: assessment of sample representativeness using birth
record data. Twin Res. 5, 107–112 (2002).
13. Hamady, M., Walker, J. J., Harris, J. K., Gold, N. J. & Knight, R. Error-correcting
barcoded primers for pyrosequencing hundreds of samples in multiplex. Nature
Methods 5, 235–237 (2008).
14. Sogin, M. L. et al. Microbial diversity in the deep sea and the underexplored ‘‘rare
biosphere’’. Proc. Natl Acad. Sci. USA 103, 12115–12120 (2006).
15. Lozupone, C., Hamady, M. & Knight, R. UniFrac-an online tool for comparing
microbial community diversity in a phylogenetic context. BMC Bioinformatics 7,
42, 487–495 (1997).
17. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. & Hattori, M. The KEGG
resource for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004).
18. von Mering, C. et al. STRING 7-recent developments in the integration and
prediction of protein interactions. Nucleic Acids Res. 35, D358–D362 (2007).
19. de Hoon, M. J., Imoto, S., Nolan, J. & Miyano, S. Open source clustering software.
Bioinformatics 20, 1453–1454 (2004).
20. Gill, S.R. etal.Metagenomic analysis ofthehuman distalgutmicrobiome. Science
312, 1355–1359 (2006).
21. Kurokawa, K. et al. Comparative metagenomics revealed commonly enriched
gene sets in human gut microbiomes. DNA Res. 14, 169–181 (2007).
23. Rodriguez-Brito, B., Rohwer, F. & Edwards, R. A. An application of statistics to
comparative metagenomics. BMC Bioinformatics 7, 162 (2006).
gastrointestinal tract. Microb. Ecol. Health Dis. 13, 129–134 (2001).
25. Palmer, C., Bik, E. M., Digiulio, D. B., Relman, D. A. & Brown, P. O. Development of
the human infant intenstinal microbiota. PLoS Biol. 5, e177 (2007).
26. Stewart, J. A., Chadwick, V. S. & Murray, A. Investigations into the influence of
Med. Microbiol. 54, 1239–1242 (2005).
Supplementary Information is linked to the online version of the paper at
Acknowledgements We thank: S. Wagoner and J. Manchester for technical
support; S. Marion and D. Hopper for recruitment of participants and sample
collection; A. Goodman, B. Muegge, and M. Mahowald for suggestions; S. Huse
(Marine Biological Laboratory), F. Niazi and S. Attiya (454 Life Sciences),
C. Markovic, L. Fulton, B. Fulton, E. Mardis and R. Wilson (Washington University
Genome Sequencing Center) and S. Macmil, G. Wiley, C. Qu, and P. Wang
(University of Oklahoma) for their assistance with sequencing; andP.M. Coutinho
(Universite ´ de Provence, France) for help with the CAZy analysis. Deep draft
assemblies of reference gut genomes were generated as part of a National Human
Genome Research Institute (NHGRI)-sponsored human gut microbiome initiative
).Thiswork wassupportedinpart bytheNational InstitutesofHealth(DK78669/
ES012742/AA09022/HD049024), the National Science Foundation
(OCE0430724), the W.M. Keck Foundation, and the Crohn’s and Colitis
Foundation of America.
T.Y., A.D., R.E.L., M.L.S., W.J.J., B.A.R., J.P.A. and M.E. generated the data. P.J.T.,
M.H., M.L.S., B.L.C., A.D., B.H., A.C.H., R.K. and J.I.G. analysed the data. P.J.T.,
A.C.H., R.K. and J.I.G. wrote the manuscript with input from the other members of
Author Information This Whole Genome Shotgun project is deposited in DDBJ/
EMBL/GenBank under accession number 32089. 454 pyrosequencing reads are
deposited in the NCBI Short Read Archive. Nearly full-length 16S rRNA gene
sequences are deposited in GenBank under accession numbers
FJ362604–FJ372382. Annotated sequences are also available in MG-RAST
(http://metagenomics.nmpdr.org/). 454-generated 16S rRNA sequences with
sample identifiers are also available at http://gordonlab.wustl.edu/
SuppData.html. Correspondence and requests for materials should be addressed
to J.I.G. (firstname.lastname@example.org).
NATURE|Vol 457|22 January 2009
Macmillan Publishers Limited. All rights reserved
Community DNA preparation. Faecal samples were frozen immediately after
mortar and pestle. An aliquot (approximately 500mg) of each sample was then
suspended, while frozen, in a solution containing 500ml of extraction buffer
(200mM Tris (pH 8.0), 200mM NaCl, 20mM EDTA), 210ml of 20% SDS,
500ml of a mixture of phenol:chloroform:isoamyl alcohol (25:24:1, pH 7.9),
and 500ml of a slurry of 0.1-mm diameter zirconia/silica beads (BioSpec
Products). Microbial cells were subsequently lysed by mechanical disruption
with a bead beater (BioSpec Products) set on high for 2min at room temper-
ature, followed by extraction with phenol:chloroform:isoamyl alcohol, and pre-
cipitation with isopropanol. DNA obtained from three separate aliquots of each
faecal sample were pooled ($200mg DNA) and used for pyrosequencing (see
16S rRNA gene-sequence-based surveys. Complementary phylogenetic- and
taxon-based methods were used to compare 16S rRNA sequences among faecal
communities can be compared in terms of their shared evolutionary history, as
measured by the degreeto whichthey share branch length on a phylogenetic tree.
We complemented this approach with taxon-based methods27, which disregard
but have the advantage that specific taxa unique to, or shared among, groups of
samples can be identified (for example, those from lean or obese individuals).
Before both types of analysis, we grouped 16S rRNA gene sequences into opera-
tional taxonomic units (OTUs/phylotypes) using both cd-hit28and the furthest-
neighbour-like algorithm, with a sequence identity threshold of 97%, which is
thebest-BLAST-hit against Greengenes29(Evaluecutoffof10210, minimum 88%
coverage, 88% identity) and the Hugenholtz taxonomy (downloaded from
http://greengenes.lbl.gov/Download/Sequence_Data/Greengenes_format/ on 12
May 2008, excluding sequences annotated as chimaeric).
cing data were pre-processed to remove sequences with low-quality scores,
sequences with ambiguous characters or sequences outside the length bounds
(V6,50 nucleotides, V2,200 nucleotides), and binned according to sample-
specific barcode (see, for example, ref. 13). Similar sequences were identified
using Megablast30and cd-hit, with the following parameters: E value 10210
(Megablast only); minimum coverage 99%; minimum pairwise identity 97%.
Candidate OTUs were identified as sets of sequences connected to each other at
this level usinga maximum of 4,000hits per sequence. EachcandidateOTU was
wise, it was broken up into smaller connected components27.
Tree building and UniFrac clustering for PCA analysis. A relaxed neighbour-
joining tree was built from one representative sequence per OTU using
Clearcut31, employing the Kimura correction (the PH Lane mask was applied
UniFrac15was run using the resulting tree. PCA was performed on the resulting
matrix of distances between each pair of samples. To determine if the UniFrac
distances were on average significantly different for pairs of samples (that is,
between twin pairs, between twins and their mother, or between unrelated indi-
test, regenerating the t-statistic for 1,000 random samples, and using the distri-
bution to obtain an empirical P value.
Rarefaction and phylogenetic diversity measurements. To determine which
individuals had the most diverse communities of gut bacteria, rarefaction plots
each sample. Phylogenetic diversity is the total amount of branch length in a
the sequences in a given sample. To account for differences in sampling effort
between individuals, and to estimate how far we were from sampling the divers-
ity of each individual completely, we plotted the accumulation of phylogenetic
diversity (branch length) with sampling effort, in a manner analogous to
rarefaction curves. We generated the phylogenetic diversity rarefaction curve
for each individual by applying custom python code (http://bmf2.colorado.
edu/unifrac/about.psp) to the Arb parsimony insertion tree27.
Pyrosequencing of total community DNA. Shotgun sequencing runs were per-
formed on the 454 FLX pyrosequencer from total faecal community DNA. Two
samples were also analysed in a single run using Titanium extra-long-read pyr-
osequencing technology (see Supplementary Tables 4 and 5). Sequencing reads
sequences of identical length and content are a common artefact of the pyrose-
quencing methodology. Finally, human sequences were removed by identifying
sequences homologous to the Homo sapiens reference genome (BLASTN
E,1025, %identity.75, score.50).
CAZyme analysis. Metagenomic sequence reads were searched against a library
of modules derived from all entries in the carbohydrate-active enzymes (CAZy)
database (www.cazy.org using FASTY33, E,1026). This library consists of
approximately 180,000 previously annotated modules (catalytic modules,
carbohydrate-binding modules and other non-catalytic modules or domains
of unknown function) derived from about 80,000 protein sequences. The num-
ber of sequencing reads matching each CAZy family was divided by the number
of total sequences assigned to CAZymes and multiplied by 100 to calculate a
relative abundance. An R2value was calculated for each pair of CAZy profiles.
We then comparedthe distribution of glycoside hydrolase similarity scores with
the distribution of glycosyltransferase similarity scores.
Statistical analyses. Xipe23(version 2.4) was used for bootstrap analyses of
pathway enrichment and depletion, using the parameters sample size510,000
11.0, Microsoft). Mann–Whitney and Student’s t-tests were used to identify
statistically significant differences between two groups (Prism version 4.0,
GraphPad; Excel version 11.0, Microsoft). The Bonferroni correction was used
matrices: the matrix of each pairwise comparison of the abundance of each
reference genome, and the abundance of each metabolic pathway, were com-
represented as mean6s.e.m. unless otherwise indicated.
Microbiome sequences were compared against the custom database of 44 gut
genomes (BLASTX E,1025, bitscore.50, and %identity.50). A gene-by-
sample matrix was then screened to identify genes ‘commonly-enriched’ in
either the obese or lean gut microbiome (defined by an odds ratio greater than
2 or less than 0.5 when comparing the pooled obese twin microbiomes with the
microbiome with the aggregate lean twin microbiome, or vice versa). The stati-
stical significance of enriched or depleted genes was then calculated using a
modified t-test (q value,0.05; calculated with code supplied by M. Pop and
J.R. White, University of Maryland). We also searched for genes that were con-
sistently enriched or depleted in all six monozygotic twin pairs. A gene-by-
sample matrix was generated based on BLASTX comparisons of each micro-
the frequency of each gene in each twin versus the respective co-twin. The
of taxonomic groups, including Firmicutes, Bacteroidetes and Actinobacteria,
and did not show any clear functional trends.
27. Ley, R. E. et al. Evolution of mammals and their gut microbes. Science 320,
28. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets
of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072
30. Zhang, Z., Schwartz, S., Wagner, L. & Miller, W. A greedy algorithm for aligning
DNA sequences. J. Comput. Biol. 7, 203–214 (2000).
31. Sheneman, L., Evans, J. & Foster, J. A. Clearcut: a fast implementation of relaxed
neighbor joining. Bioinformatics 22, 2823–2824 (2006).
32. Faith, D. P. Conservation evaluation and phylogenetic diversity. Biol. Conserv. 61,
33. Pearson, W. R., Wood, T., Zhang, Z. & Miller, W. Comparison of DNA sequences
with protein sequences. Genomics 46, 24–36 (1997).
34. Knight, R. et al. PyCogent: a toolkit for making sense from sequence. Genome Biol.
8, R171 (2007).
Macmillan Publishers Limited. All rights reserved