ArticlePDF Available

Alterations of the human gut microbiome in liver cirrhosis

Authors:

Abstract

Liver cirrhosis occurs as a consequence of many chronic liver diseases that are prevalent worldwide. Here we characterize the gut microbiome in liver cirrhosis by comparing 98 patients and 83 healthy control individuals. We build a reference gene set for the cohort containing 2.69 million genes, 36.1% of which are novel. Quantitative metagenomics reveals 75,245 genes that differ in abundance between the patients and healthy individuals (false discovery rate < 0.0001) and can be grouped into 66 clusters representing cognate bacterial species; 28 are enriched in patients and 38 in control individuals. Most (54%) of the patient-enriched, taxonomically assigned species are of buccal origin, suggesting an invasion of the gut from the mouth in liver cirrhosis. Biomarkers specific to liver cirrhosis at gene and function levels are revealed by a comparison with those for type 2 diabetes and inflammatory bowel disease. On the basis of only 15 biomarkers, a highly accurate patient discrimination index is created and validated on an independent cohort. Thus microbiota-targeted biomarkers may be a powerful tool for diagnosis of different diseases.
ARTICLE doi:10.1038/nature13568
Alterations of the human gut microbiome
in liver cirrhosis
Nan Qin
1,2
*, Fengling Yang
1
*, Ang Li
1
*, Edi Prifti
3
*, Yanfei Chen
1
*,LiShao
1,2
*, Jing Guo
1
, Emmanuelle Le Chatelier
3
, Jian Yao
1,2
,
Lingjiao Wu
1
, Jiawei Zhou
1
, Shujun Ni
1
, Lin Liu
1
, Nicolas Pons
3
, Jean Michel Batto
3
, Sean P. Kennedy
3
, Pierre Leonard
3
,
Chunhui Yuan
1
, Wenchao Ding
1
, Yuanting Chen
1
, Xinjun Hu
1
, Beiwen Zheng
1,2
, Guirong Qian
1
,WeiXu
1
, S. Dusko Ehrlich
3,4
,
Shusen Zheng
2,5
& Lanjuan Li
1,2
Liver cirrhosis occurs as a consequence of many chronic liver diseases that are prevalent worldwide. Here we character-
ize the gut microbiome in liver cirrhosis by comparing 98 patients and 83 healthy controlindividuals. We build a reference
gene set for the cohort containing 2.69 million genes, 36.1%of which are novel. Quantitative metagenomics reveals 75,245
genes that differ in abundance between the patients and healthy individuals (false discovery rate ,0.0001) and can be
grouped into 66 clusters representing cognate bacterial species; 28 are enriched in patients and 38 in control individuals.
Most (54%) of the patient-enriched, taxonomically assigned species are of buccal origin, suggesting an invasion of the gut
from the mouth in liver cirrhosis. Biomarkers specific to liver cirrhosis at gene and function levels are revealed by a
comparison with those for type 2 diabetes and inflammatory bowel disease. On the basis of only 15 biomarkers, a highly
accurate patient discrimination index is created and validated on an independent cohort. Thus microbiota-targeted bio-
markers may be a powerful tool for diagnosis of different diseases.
Cirrhosis is an advanced liver disease resulting from acute or chronic
liver injury, including alcohol abuse, obesity and hepatitis virus infec-
tion. The prognosis for patients with decompensated liver cirrhosis is
poor, and they frequently requireliver transplantation
1
. The liver inter-
acts directly with the gut through the hepatic portal and bile secretion
2
systems. Enteric dysbiosis, especially the translocation of bacteria
3
and
their products
4,5
across the gut epithelial barrier, is involved in the pro-
gression of liver cirrhosis. However, the phylogenetic and functional com-
position changes in the human gut microbiota that are related to this
progression remain obscure
5
. Some studies have revealed that altera-
tions in the gut microbiota are important in complications of end-stage
liver cirrhosis
6
(such as spontaneous bacterial peritonitis
7
and hepatic
encephalopathy
8
) and the induction and promotion of liver damage
in early-stage liver disease
9
(such as alcoholic liver disease
10
and non-
alcoholic fatty liver disease
11
), but definitive associations of gut micro-
biota andliver pathology in humans are still lacking
12
. Studiesof patients
with liver cirrhosis
13
and of mouse models for alcoholic liver disease
10
have revealed a similar and substantial alteration in the gut microbiota,
as measured by sequencing of 16S ribosomal RNA genes. How these
phylogenetic alterations relate to changes in the functioningof this eco-
system is, however, unclear.
The role of gut microbiota in human health and disease
14
has recently
received considerable attention. Chronic diseases, such as obesity
15–18
,
inflammatory bowel disease (IBD)
19,20
, diabetes mellitus
21,22
, metabolic
syndrome
23
, symptomatic atherosclerosis
24
and non-alcoholic fatty liver
disease
10
, have been associated with gut microbiota. The US National
Institutes of Health Human Microbiome Project (HMP) generated a
large data set from different anatomical sites among 242 healthy indivi-
duals and created a large human microbiome gene resource
25,26
. Quanti-
tativemetagenomicsanalysis
27,28
developed by the MetaHIT consortium
revealed a significant loss of gut microbial richness associated with the
risk of metabolic syndrome related co-morbidities. Here we apply a
similar analysis to contrast microbiota from 123 patients with liver cir-
rhosis and 114 healthy counterparts of Han Chinese origin.
Gene catalogue of gut microbes
We constructed a gene catalogue from 98 Chinese patients with liver
cirrhosis and 83 healthy Chinese control individuals (Supplementary
Table 1) using the methodology developed by MetaHIT. The liver cirrho-
sis catalogue contained 2,688,468 non-redundant open reading frames
(ORFs). We compared it with three other gut microbial catalogues:
MetaHIT
29
,HMP
25
and T2D
22
. To facilitate this comparison, genes were
predicted from the original contigs using the same criteria. The MetaHIT
catalogue contained 3,452,726 genes, HMP 4,768,112 genes and T2D
2,148,029 genes. In total 674,131 genes were common to all catalogues
(Extended Data Fig. 1a). The liver cirrhosis catalogue, MetaHIT, HMP
and T2D gene sets contained 794,647, 1,419,517, 2,620,096 and 623,570
unique genes, respectively. Genes from the liver cirrhosis, T2D and
MetaHIT catalogues were merged; the HMP was not included, as it
contained Sanger, 454 or Illumina-based 16S sequences, in addition to
whole metagenomic data. The merged non-redundant catalogue con-
tained 5,382,817 genes (Extended Data Fig. 1b).
Phylogenetic profiles of gut microbes
The sequencing reads (36.67%) were aligned against 4,398 reference
genomes from the National Center for BiotechnologyInformation and
the HMP (Supplementary Table 2). After correction for population strat-
ification that could be related to non-liver cirrhosis-related factors (see
Methods), the relative abundances of phylum, class,order, family, genus
and species between liver cirrhosis and control groups were compared
(Extended Data Fig. 2). Phylotypes with a median relative abundance
larger than 0.01% of the total abundance in either the healthy control
*These authors contributed equally to this work.
1
State Key Laboratory for Diagnosis and Treatment of Infectious Disease, The First Affiliated Hospital, College of Medicine, Zhejiang University, 310003 Hangzhou, China.
2
Collaborative Innovation Center
for Diagnosis and Treatment of Infectious Diseases, Zhejiang University, 310003 Hangzhou, China.
3
Metagenopolis, Institut National de la Recherche Agronomique, 78350 Jouy en Josas, France.
4
King’s
College London, Centre for Host-Microbiome Interactions, Dental Institute Central Office, Guy’s Hospital, London Bridge, London SE1 9RT, UK.
5
Key Laboratory of Combined Multi-organ Transplantation,
Ministry of Public Health, the First Affiliated Hospital, Zhejiang University, 310003 Hangzhou, China.
00 MONTH 2014 | VOL 000 | NATURE | 1
Macmillan Publishers Limited. All rights reserved
©2014
group or the liver cirrhosis group were included for comparison. At the
phylum level, Bacteroidetes and Firmicutes dominated the faecal micro-
bial communities of both groups (Fig. 1a, b). Compared with healthy
controls, patients with liver cirrhosis had fewer Bacteroidetes (Fig. 1a),
but higher levels of Proteobacteria and Fusobacteria (Fig. 1b).
At the genus level, Bacteroides was the dominant phylotype in both
groups, but was significantly decreased in the liver cirrhosis group. Of
the remaining genera, Veillonella,Streptococcus,Clostridium and Prevotella
were enriched in the liver cirrhosis group, while Eubacterium and Alistipes
were dominant in the healthy controls (Fig. 1a, b). The most abundant
species in both liver cirrhosis and the healthy control groups were pri-
marily from the Bacteroides genus. Of the 20 species that increased the
most in abundance in the liver cirrhosis group, four were Streptococcus
spp. and six were Veillonella spp., suggesting that the two genera might
play an important role in liver cirrhosis. Of the species that decreased
the most inabundance in theliver cirrhosis group, 12 wereBacteroidetes
and seven were Firmicutes, specifically from the order Clostridiales.
Gut microbial species associated with cirrhosis
Our investigation included two phases. The first was discovery, where
we compared 98 patients with liver cirrhosis and 83 healthy controls.
The second was validation, with additional 25 patients and 31 controls.
In the discovery phase, a Wilcoxon rank-sum test corrected for mul-
tiple testing by the Benjamini and Hochberg method was used to iden-
tify differentially abundant genesin patients and controls. At a stringent
threshold (false discovery rate (FDR) ,0.0001), 75,245 genes were found:
49,830 were more abundant in the patients and 25,415 in the controls
(Methods). Patients and controls could be clearly separated by princi-
pal component analysis based on the 75,245 genes; this was confirmed
with the validation samples (Supplementary Table 3 and Extended Data
Fig. 1c).
To explore further the microbial genes associated with livercirrhosis
we grouped them into clusters, denoted metagenomic species (MGS)
here, on the basis of their abundance profiles
27,30
. Of the 66 MGS, 38 and
28 were enriched in healthy individuals and patients, respectively. The
significantly different abundance distribution between healthy and liver
cirrhosis subjects is shown in Fig. 2 and Supplementary Table 4. A majority
(82%) were also differentially abundant in the validation cohort (q,0.05),
in spite of the reduced statistical power due to the smaller cohort size.
Composition of bacterialcommunities varies considerably as a func-
tion of the overall gene richness
27,28
and the loss of richness is associated
with obesity and IBD
27,28,31
. A large majority of the 38 MGS enriched in
the healthy individuals (33, 86.8%) was correlated with the richness at
q,10
23
in the Chinese cohort; 26 of these (78.8%) were similarly cor-
related in a Danish cohort (Extended Data Fig. 3). These observations
indicate that gut communities of bacteria in healthy individuals across
continents may be largely similar. Furthermore, gene richness was much
lower in patients with liver cirrhosis than in healthy individuals (on
average389,000 and 497,000 genes,respectively;Supplementary Table5
and Extended Data Fig. 4, top left). Interestingly, among the species
enriched in healthy Chinese, were Faecalibacterium prausnitzii, which
has anti-inflammatory properties and was foundin a ‘healthy’ gene-rich
microbiome
27,28
,andCoprococcus comes, which might contribute to gut
health through butyrate production. A similar butyrate production role
may be played by three Lachnospiraceae and five Ruminococcaceae
enriched in healthy individuals. A lower abundance of these species in
patients with liver cirrhosis indicates that these individuals have a less
healthy gut microbiome.
Most interestingly, a high proportion of MGS enriched in patients
belong to taxa such as Veillonella (n58) or Streptococcus (n56), known
to include species of oral origin (Supplementary Table 4). However, the
small intestine also harbours such species
32
and small-intestinal bacterial
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Veillonella
Streptococcus
Prevotella
Haemophilus
Lactobacillus
NULL(Lachnospiraceae bacterium 2 1 58FAA)
Fusobacterium
Megasphaera
Genus
0.00
0.10
0.20
Proteobacteria
Fusobacteria
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Abundance
Streptococcus salivarius
Veillonella parvula
Veillonella atypica
Ruminococcus gnavus
Haemophilus parainfluenzae
Veillonella sp. 6 1 27
Veillonella sp. 3 1 44
Lachnospiraceae bacterium 2 1 58FAA
Veillonella dispar
Streptococcus parasanguinis
Veillonella sp. oral taxon 158
Lactobacillus salivarius
Streptococcus vestibularis
Streptococcus anginosus
Species
ab
0.0
0.2
0.4
0.6
0.8
1.0
Bacteroidetes
Phylum
Healthy
Liver cirrhosis
0.0
0.2
0.4
0.6
0.8
Abundance
Bacteroides
Eubacterium
Alistipes
Faecalibacterium
Roseburia
Parabacteroides
Odoribacter
Ruminococcus
Dorea
Bilophila
Coprococcus
Null(Lachnospiraceae bacterium 6 1 63FAA)
Tannerella
Subdoligranulum
Null(Lachnospiraceae bacterium 5 1 63FAA)
Null([Bacteroides] pectinophilus ATCC 43243)
Holdemania
Null(Lachnospiraceae bacterium 3 1 46FAA)
Phascolarctobacterium
Null(Ruminococcaceae bacterium D16)
Genus
0.00
0.02
0.04
0.00
0.05
0.10
0.15
Faecalibacterium prausnitzii
Bacteroides sp. D20
Alistipes putredinis
Bacteroides sp. 4 1 36
Bacteroides uniformis
Eubacterium eligens
Eubacterium rectale
Parabacteroides merdae
Bacteroides finegoldii
Odoribacter splanchnicus
Roseburia intestinalis
Bacteroides eggerthii
Ruminococcus sp. 5 1 39BFAA
Bacteroides sp. 1 1 30
Roseburia hominis
Bacteroides sp. 9 1 42FAA
Bilophila wadsworthia
Eubacterium hallii
Alistipes finegoldii
Parabacteroides distasonis
Species
Abundance
Abundance
Phylum
Abundance
Abundance
Healthy
Liver cirrhosis
Healthy
Liver cirrhosis
Healthy
Liver cirrhosis
Healthy
Liver cirrhosis
Figure 1
|
Differentially abundant phyla in patients (
n
598) and healthy
individuals (
n
583). The phylotypes decreased (a) and increased (b)in
patients with liver cirrhosis at the phylum, genus and species levels. Blue and
red represent healthy controls and patients with liver cirrhosis, respectively.
Only the 20 most abundant species in each group are shown for clarity. The
phylotypes with median relative abundances greater than 0.01% of total
abundance in either the healthy control group or the liver cirrhosis group are
included (FDR ,0.01, Wilcoxonrank-sum test corrected by the Benjamini and
Hochberg method).The boxes representthe interquartile range (IQR), from the
first and third quartiles, and the inside line represents the median. The whiskers
denote the lowest and highest values within 1.5 IQR from the first and third
quartiles. The circles represent outliers beyond the whiskers. The notches show
the 95% confidence interval for the medians. If the notches of two boxes do not
overlap, it gives evidence of a significant difference between the medians.
RESEARCH ARTICLE
2 | NATURE | VOL 000 | 00 MONTH 2014
Macmillan Publishers Limited. All rights reserved
©2014
overgrowth is frequently found in patients with liver cirrhosis
33
.To
explore the origin of the patient-enriched species, we used information
from the HOMD
34
and GOLD
35
databasesabout the origin of the closely
related sequenced isolates. We also constructed a catalogue of 114 pub-
licly available genomes for Streptococcus,Fusobacterium,Lactobacillus,
Veillonella and Megasphaera strains, originating mostly from mouth
or gut (57 or 28, respectively; Supplementary Table 6) and used it for
blastN and blastP analysis (Methods). Thirteen of the species were closest
to an oral isolate whereas only six were closest to the gut isolates, a single
species being from the ileum (Supplementary Table 4 and Extended Data
Fig. 4, top right). Comparison with the three ileum metagenomes failed
to reveal identity above that detectedby comparison with the sequenced
genomes (Methods). We conclude that oralcommensals invade the gut
in patients with liver cirrhosis. Possibly, an altered bile production in
cirrhosis renders the gut morepermissible and/or accessible to ‘foreign’
bacteria, as bile resistance may be required for survival in the human
gut
36,37
. As patient-enriched MGS include pathogens such as Campylo-
bacter and Haemophilus parainfluenzae, these also might use the oral
route to invade the gut, possibly via contaminated food. The invasion
species foreign to the niche may occur not only in the colon but also in
the ileum, and contribute to the small-intestinal bacterial overgrowth
associated with liver cirrhosis. Among the patient-enriched species were
Streptococcus anginosus,Veillonella atypica,Veillonella dispar,Veillonella
sp. oral taxon and Clostridium perfringens, which have been reported
to cause opportunistic infections
38–40
.
To analyse the relations between the liver-cirrhosis-associated MGS,
we generated networks based on co-abundance, for healthy individuals
and patients with liver cirrhosis (Fig. 2b). A striking featureis that tax-
onomically related species tend to cluster, as reported previously
29
.These
observations indicate thatthe gut environment becomes permissive for
Healthy
n = 33
Liver cirrhosis
n = 25
LPA HPA
0
5
10
15
20
*
6
8
10
12
*
12
16
18
20
22
*
*
0
100
200
300
500
600
*
1.0
1.2
1.8
2.0
2.2
MELD
CTP
TB
PT
INR
Crea
Alb
Healthy Liver cirrhosis
LPA HPA
Discovery cohort Validation cohort
0
0.5
q
q
LPA HPA
Healthy Liver
cirrhosis
–0.5
MELD : P < 10–5
LPA HPA LPA HPA LPA HPA LPA HPA
400
CTP : P < 2 × 10–4 TB : P < 2 × 10–4 PT : P < 0.02
14
INR : P < 0.06
1.6
1.4
L_44
L_44
Fusobacterium
Fusobacterium
L_32
L_32
S. oralis
S. oralis
L_12
L_12
S. anginosus
S. anginosus
L_15
L_15
S. parasanguinis
S. parasanguinis
L_24
L_24
Streptococcus
Streptococcus
sp. 2_1_36FAA
sp. 2_1_36FAA
L_14
L_14
S. vestibularis
S. vestibularis
L_42
L_42
Veillonella
Veillonella
L_18
L_18
C. concisus
C. concisus
L_20
L_20
A. segnis
A. segnis
L_10
L_10
M. micronuciformis
M. micronuciformis
L_55
L_55
Veillonella
Veillonella
L_4
L_4
V. atypica
V. atypica
L_19
L_19
V. dispar
V. dispar
L_6
L_6
S. salivarius
S. salivarius
L_2
L_2
B. dentium
B. dentium
L_7
L_7
V. parvula
V. parvula
L_59
L_59
Veillonella
Veillonella
L_17
L_17
Veillonella
Veillonella
sp._oral_taxon_158
sp._oral_taxon_158
L_39
L_39
Veillonella
Veillonella
L_8
L_8
P. buccae
P. buccae
L_9
L_9
H. parainfluenzae
H. parainfluenzae
L_1
L_1
L. mucosae
L. mucosae
L_11
L_11
L. fermentum
L. fermentum
L_3
L_3
L. salivarius
L. salivarius
L_40
L_40
Lactobacillus
Lactobacillus
L_44
Fusobacterium L_32
S. oralis
L_12
S. anginosus L_15
S. parasanguinis
L_24
Streptococcus
sp. 2_1_36FAA
L_14
S. vestibularis
L_42
Veillonella
L_18
C. concisus
L_20
A. segnis
L_10
M. micronuciformis
L_55
Veillonella
L_4
V. atypica L_19
V. dispar
L_6
S. salivarius
L_2
B. dentium
L_7
V. parvula L_59
Veillonella
L_17
Veillonella
sp._oral_taxon_158
L_39
Veillonella
L_8
P. buccae
L_9
H. parainfluenzae
L_1
L. mucosae
L_11
L. fermentum
L_3
L. salivarius
L_40
Lactobacillus
H_33
H_33
Ruminococcaceae
Ruminococcaceae
H_37
H_37
Ruminococcaceae
Ruminococcaceae
H_24
H_24
Bacteroidales
Bacteroidales
H_16
H_16
Oscillibacter
Oscillibacter
H_9
H_9
Clostridiales
Clostridiales
H_12
H_12
Clostridiales
Clostridiales
H_3
H_3
Clostridiales
Clostridiales
H_2
H_2
Clostridiales
Clostridiales
H_11
H_11
Clostridiales
Clostridiales
H_40
H_40
Clostridiales
Clostridiales
H_29
H_29
Ruminococcaceae
Ruminococcaceae
H_8
H_8
Clostridium
Clostridium
H_10
H_10
Eubacterium
Eubacterium
H_43
H_43
Eggerthella
Eggerthella
H_18
H_18
Eubacterium
Eubacterium
H_22
H_22
Eubacterium
Eubacterium
H_15
H_15
Clostridiales
Clostridiales
H_7
H_7
Clostridiales
Clostridiales
H_5
H_5
Alistipes
Alistipes
H_28
H_28
Bacteroidales
Bacteroidales
H_21
H_21
Bacteroidales
Bacteroidales
H_26
H_26
Porphyromonadaceae
Porphyromonadaceae
H_30
H_30
Ruminococcaceae
Ruminococcaceae
H_6
H_6
Subdoligranulum
Subdoligranulum
H_32
H_32
C. comes
C. comes
H_42
H_42
A. indistinctus
A. indistinctus
H_25
H_25
NA
NA
H_23
H_23
Lachnospiraceae
Lachnospiraceae
H_34
H_34
Lachnospiraceae
Lachnospiraceae
H_14
H_14
Lachnospiraceae
Lachnospiraceae
H_17
H_17
Ruminococcaceae
Ruminococcaceae
H_20
H_20
F. prausnitzii
F. prausnitzii
H_36
H_36
Parabacteroides
Parabacteroides
H_33
Ruminococcaceae H_37
Ruminococcaceae
H_24
Bacteroidales
H_16
Oscillibacter
H_9
Clostridiales H_12
Clostridiales
H_3
Clostridiales
H_2
Clostridiales
H_11
Clostridiales H_40
Clostridiales
H_29
Ruminococcaceae
H_8
Clostridium
H_10
Eubacterium
H_43
Eggerthella
H_18
Eubacterium
H_22
Eubacterium
H_15
Clostridiales
H_7
Clostridiales
H_5
Alistipes H_28
Bacteroidales
H_21
Bacteroidales
H_26
Porphyromonadaceae
H_30
Ruminococcaceae
H_6
Subdoligranulum
H_32
C. comes
H_42
A. indistinctus
H_25
NA
H_23
Lachnospiraceae
H_34
Lachnospiraceae
H_14
Lachnospiraceae
H_17
Ruminococcaceae
H_20
F. prausnitzii
H_36
Parabacteroides
ab
Figure 2
|
Differentially abundant MGS in patients (
n
5123) and healthy
individuals (
n
5114). a, Abundance of 50 ‘tracer’genes for each species in the
discovery (n
patients
598, n
healthy
583) and validation cohorts (n
patients
525,
n
healthy
531); oral species are highlighted in red. Genes are in rows, abundance
is indicated by colour gradient (white, not detected; red, most abundant); the
enrichment significance is shown (qindicates the Mann–Whitney Pvalues
corrected by the Benjamini and Hochberg method). Individuals are shown in
columns, ordered by increasing abundance of patient-enriched species.
Correlation of the species abundance and patients’ clinical parameters in the
discovery cohort are indicated in colour code (red and blue for positive and
negative correlations; intensity reflects the level of correlation). MELD, model
for end-stage liver disease; CTP, Child–Turcotte–Pugh score; TB, total
bilirubin; PT, prothrombintime test; INR, international normalized ratio
describing coagulation of the blood in patients with liver cirrhosis; Crea,
creatinine level; Alb, albumin level. b, Top, clinical parameters of patients for
the lowest and highest patient-enriched species abundance (LPA and HPA,
respectively; n524 foreach). Pvalues indicate the significance of the difference
by Mann–Whitney U-test except MELD (Student’s t-test). Middle and bottom,
abundance-based species correlation network enriched in patients with liver
cirrhosis (n525) and healthy individuals (n533), respectively. Two nodes are
linked if the pooled variance z-test shows an FDR ,10
29
when accounting
for the compositionality effect (see Methods). The edge width is proportional to
the correlation strength. The node size is proportional to the mean abundance
in the respective population. Nodes with the same colour are classified in the
same phylogenetic order level.
ARTICLE RESEARCH
00 MONTH 2014 | VOL 000 | NATURE | 3
Macmillan Publishers Limited. All rights reserved
©2014
the development and maintenance of the related taxa in many indivi-
duals.Obviously, taxonomically unrelated speciescan also thrive in such
environments, as observed with Campylobacter concisus,H. parainflu-
enzae or Fusobacterium, whichtend to be associated with Veillonella in
patients. The overall abundanceof species enriched in patients reached
high levels,exceeding 5% in over a quarter andapproaching the extreme
of 40%, whereas it was very low in healthy individuals (Extended Data
Fig. 4, bottom). Interestingly, the severity of the disease was positively
correlatedwith the abundance of a number of MGS enrichedin patients
and negatively correlated with those of the MGS enriched in controls
(and therefore under-represented in patients; Fig. 2a). Thedisease status
of the patients with the highest load of these bacteria was significantly
worse than that of the patients with the lowest load (Fig. 2b, top). Sucha
‘dose response’ is consistent with an active role of the enriched species
in liver cirrhosis.
Microbial functions enriched in liver cirrhosis
To investigate the functional roleof the gut microbiota in livercirrhosis,
we identified4,801 KEGG (Kyoto Encyclopedia of Genes and Genomes
database) orthologues and 13,970 eggNOG (evolutionary genealogy of
genes: Non-supervised Orthologous Groups database) orthologues asso-
ciated with the disease (Supplementary Tables 7 and 8). The most abun-
dant KEGG orthologues in patients and controls were enzymefamilies.
The most enriched orthologues in patients were membrane transport,
similar to findings for IBDs
19,20
, obesity
41
and T2D
22
. In contrast, the
most prevalent markers among the controls included those involved in
carbohydrate metabolism,amino-acid metabolism, energy metabolism,
signal transduction andthe metabolismof cofactors and vitamins (Ex-
tended Data Fig. 5). At the module or pathway level, the liver-cirrhosis-
associated markers included assimilation or dissimilation of nitrate to
or from ammonia, denitrification, GABA(c-aminobutyric acid) biosyn-
thesis, GABA shunt, haem biosynthesis, phosphotransferase systems and
some types of membrane transport, such as amino-acid transport. The
control-enrichedmodules included histidine metabolism, ornithinebio-
synthesis, creatine pathway, carbohydrate metabolism, repair systems and
glycosaminoglycan metabolism (Supplementary Table 9).
The enrichment of the modules for ammonia production in patients
suggests a potential role of gut microbiota in hepatic encephalopathy, a
complication related to liver cirrhosis that is characterized by hyper-
ammonemia. Overproduction of ammonia by gut bacteria might con-
tribute to increased levels of ammonia in blood. Manganese-related
transport system modules enriched in patients possibly contribute to
the changes in concentrations of manganese.The accumulationof man-
ganese within the basal ganglia in patients with end-stage liver disease
may have a rolein the pathogenesis of chronic hepatic encephalopathy
42
,
a main complication of liver cirrhosis. The hydrodynamic venous shunt
and liver failure could promote this accumulation, which,in turn, causes
metabolic disorders ofthe nerve cell enzymes, affects transmission func-
tion of neural synapsesand eventuallyleads to hepatic encephalopathy
40
.
Finally, the modules for GABA biosynthesis were enriched in the patients.
The GABA neurotransmitter system is involved in the pathogenesis of
hepatic encephalopathy in humans
43
. Because of the hydrodynamic
venous shunt and liver failure, GABA levels in the blood are increased
44
,
and could go through the blood–brain barrier to activate GABA recep-
tor and cause hepatic encephalopathy. Microbiome modulation, aim-
ing at manganese elimination and lowering of GABA levels in the gut,
might provide a new therapeutic option for the treatment of hepatic
encephalopathy.
Microbial dysbiosis in chronic diseases
It is unclear whether a gutmicrobial dysbiosis in type 2 diabetes (T2D)
22
,
IBD
41
and liver cirrhosis
13
is similar or unique for each disease. We com-
pared the differences between the gut microbiota from patients with
liver cirrhosis, T2D and IBD,and organized the disease-associated gene,
KEGG orthologue group and eggNOG orthologue group markers into
patient- and control-enriched groups. We then identified markers common
to different disease pairs (T2D and liver cirrhosis, liver cirrhosis and
IBD, and IBD and T2D) and to the three diseases (Supplementary
Table 10). Different diseases displayed a relatively unique profile, even
if some markers were shared (Extended Data Fig. 6a, b). Most liver-
cirrhosis-enriched markers had low Pvalues (Extended Data Fig. 6c),
implying that patients with liver cirrhosis had more severe dysbiosis
than patients with T2D. Functional differences between liver cirrhosis
and T2D were also detected at the pathway level, even if there was a sig-
nificant increase in membrane transport markers in both (Extended
Data Figs 7 and 8). Most functional markers in both diseases were from
categories of carbohydrate metabolism, metabolism of cofactors and
vitamins, amino-acid metabolism and signal transduction. In contrast,
most cellmotility markersin the KEGG orthologue group were enriched
in liver cirrhosis or T2D but not both, possibly indicating a unique role
in each disease (Extended Data Fig. 8a, b). However, similar cell motility
markers and pathways in the KEGG orthologue group were enriched
both in liver cirrhosis and in T2D controls, suggesting a possible role
in health (Extended Data Figs 8c, d and 9a, b).
Gene markers that identify patients with liver cirrhosis
We used a pattern recognition techniqueto identify patients by gut mic-
robiota information in the discovery cohort (n5181). For this we selected
46,000 genes, half enriched in patients and half in controls (Supplemen-
taryTable 11). From this set we selected 15 optimal gene markers by a
minimum redundancy–maximum relevance (mRMR) method combined
with an incremental feature search, which showed the highest value of
Matthews correlation coefficient (Extended Data Fig. 9c). A support
vector machine discriminator was constructed using the same samples
and 15 gene markers (Supplementary Table 12), with the training and
leave-one-out cross-validation AUC (area under the receiver operating
characteristic curve) achieving 0.918(confidence interval: 0.881–0.955)
(Fig. 3b) and 0.838, respectively. The validation cohort of 31 healthy
controls and 25 patients with liver cirrhosis showed an AUC value of
0.836 (95% confidence interval 0.730–0.943) (Fig. 3c) for these samples,
confirming that the gut microbiota information could be applied to iden-
tify patients accurately.
To facilitate the clinical application of the 15 optimal gene markers,
we propose a patient discrimination index (PDI). The high correlation
coefficient value between the ratio of patients in our cohort and thePDI
(Fig. 3a and Supplementary Table 13) indicates that the PDI could be
used to identify patients with liver cirrhosis. The discriminatory power
of the PDI was then validated using an independent group (Fig. 3d).
The average PDI index between the control and the patient groups was
significantly different (P,8.18 310
25
, Wilcoxon r ank-sum test), con-
firming the potential use of gut microbiota information for identifying
patients with liver cirrhosis.
Discussion
To study gut microbiota in liver cirrhosis we firstestablished a novel gut
gene catalogue (liver cirrhosiscatalogue),including 98 patients withliver
cirrhosis and 83 healthy control individuals. Comparison with the previ-
ously established MetaHIT and T2D
22
gene catalogues indicated a com-
mon core of approximately 800,000 genes and a considerable propor-
tion of catalogue-specific genes (37.01% of MetaHIT, 36.59% of T2D
and 18.02% of liver cirrhosis), indicating that the current gene sets are
still limited and should be completed by inclusion of more individuals.
Interestingly, although the T2D and liver cirrhosis gene sets are both
derived from Chinese populations, the number of unique genes in each
gene set was large. Thismight be due to the difference in diseaseprofiles
and to the different genotypes, body mass indices, age
45
and dietary
habits
46
(Supplementary Table 14 and Extended Data Fig. 10). Never-
theless, there was no significant difference in the abundance of main
phyla (P.0.01);of the top 30 most abundant generaand species, 28 and
26, respectively, were the same in both studies, and there were no signi-
ficant differences in abundance for most of them. Furthermore, the top
four species were exactly the same. These results, and the similarity of
RESEARCH ARTICLE
4 | NATURE | VOL 000 | 00 MONTH 2014
Macmillan Publishers Limited. All rights reserved
©2014
controls withthe healthy Danish population, point towards overall sim-
ilarity of the microbiota in healthy individuals.
Use of the liver cirrhosis gene catalogue, in conjunction with the quan-
titative metagenomics approach, revealed a major change of the gut mic-
robiota in the patients with liver cirrhosis, mainly because of a massive
invasion of the gut by oral bacterial species. Correlation of the severity
of the disease with the abundance of the invading species suggests that
they may play an active role in the pathology. This was not noted in a
previous study, where the 16S-based approach probably lacked the required
species-level resolution, even if similar trends in taxonomy change between
the liver cirrhosis group and the healthy controls at the phylum, class
and order levels were observed
13
. Some of the MGS depleted in patients
were negatively associated with the severity of the disease (Fig. 2). This
opens avenues to the development of novel probiotics, which might help
combat the aggravation of liver cirrhosis. More generally, modulation
of microbiota to correct the major dysbioses we report might open new
avenues to treatment of liver cirrhosis.
A combination of 15 microbial genes discriminates patientswith liver
cirrhosis from healthy individuals, with a high specificity. This could
lead to a new way of monitoring and preventing liver cirrhosis. None of
the 15 markers found in the liver cirrhosis studyoverlapped with the 50
markers found in the T2D study
22
, indicating that diagnosis of different
diseases with microbiota-targeted biomarkers may be a powerful tool
for disease detection.
Online Content Methods, along with any additional Extended Data display items
and SourceData, are available in theonline version of the paper;references unique
to these sections appear only in the online paper.
Received 7 April 2013; accepted 9 June 2014.
Published online 23 July 2014.
1. Fouts, D. E., Torralba, M., Nelson, K. E., Brenner, D. A. & Schnabl, B. Bacterial
translocation and changes in the intestinal microbiome in mouse models of liver
disease. J. Hepatol. 56, 1283–1292 (2012).
2. Cesaro, C. et al. Gut microbiota and probioticsin chronic liver diseases. Digest. Liver
Dis. 43, 431–438 (2011).
3. Wiest,R. & Garcia-Tsao, G. Bacterial translocation (BT) in cirrhosis. Hepatology41,
422–433 (2005).
4. Nolan, J. P. The role of intestinal endotoxin in liver injury: a long and evolving
history. Hepatology 52, 1829–1835 (2010).
5. Gill,S. R. et al. Metagenomic analysis of the humandistal gut microbiome. Science
312, 1355–1359 (2006).
6. Garcia-Tsao, G. & Wiest, R. Gutmicroflora in the pathogenesisof the complications
of cirrhosis. Best Pract. Res. Clin. Gastroenterol. 18, 353–372 (2004).
7. Wiest,R., Krag, A. & Gerbes, A. Spontaneous bacterial peritonitis: recent guidelines
and beyond. Gut 61, 297–310 (2012).
8. Bass, N. M. et al. Rifaximin treatment in hepatic encephalopathy. N. Engl. J. Med.
362, 1071–1081 (2010).
9. Benten, D. & Wiest, R. Gut microbiome and intestinal barrier failure–the ‘‘Achilles
heel’’ in hepatology? J. Hepatol. 56, 1221–1223 (2012).
10. Yan, A. W. et al. Enteric dysbiosis associated with a mouse model of alcoholic liver
disease. Hepatology 53, 96–105 (2011).
11. De Filippo, C. et al. Impact of diet in shaping gut microbiota revealed by a
comparative study in children from Europe and rural Africa. Proc. Natl Acad. Sci.
USA 107, 14691–14696 (2010).
12. Cho, I. & Blaser, M. J. The human microbiome: at the interface of health and
disease. Nature Rev. Genet. 13, 260–270 (2012).
13. Chen, Y. et al. Characterization of fecal microbialcommunities in patientswith liver
cirrhosis. Hepatology 54, 562–572 (2011).
14. Nelson, K. E. et al. A catalog of reference genomes from the human microbiome.
Science 328, 994–999 (2010).
15. Ley, R. E., Turnbaugh, P. J., Klein, S. & Gordon, J. I. Microbial ecology: human gut
microbes associated with obesity. Nature 444, 1022–1023 (2006).
16. Turnbaugh, P. J. et al. An obesity-associated gut microbiome with increased
capacity for energy harvest. Nature 444, 1027–1031 (2006).
17. Turnbaugh, P. J. et al.A core gut microbiome in obese and lean twins. Nature 457,
480–484 (2009).
18. Ley, R. E. et al. Obesity alters gut microbial ecology. Proc. Natl Acad. Sci. USA 102,
11070–11075 (2005).
19. Lepage, P. et al. Twin study indicates loss of interaction between microbiota and
mucosa of patients with ulcerative colitis. Gastroenterology 141, 227–236 (2011).
20. Garrett, W. S. et al. Enterobacteriaceae act in concert with the gut microbiota to
induce spontaneous and maternally transmitted colitis. Cell Host Microbe 8,
292–300 (2010).
21. Wen, L. et al. Innateimmunity and intestinal microbiota in the development of type
1diabetes.Nature 455, 1109–1113 (2008).
22. Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2
diabetes. Nature 490, 55–60 (2012 ).
23. Vijay-Kumar, M. et al. Metabolic syndrome and altered gut microbiota in mice
lacking Toll-like receptor 5. Science 328, 228–231 (2010).
24. Karlsson, F. H. et al. Symptomatic atherosclerosis is associated with an altered gut
metagenome. Nature Commun. 3, 1245 (2012).
25. The HumanMicrobiome Project Consortium. A frameworkfor human microbiome
research. Nature 486, 215–221 (2012).
26. The Human Microbiome Project Consortium. Structure, function and diversity of
the healthy human microbiome. Nature 486, 207–214 (2012).
27. Le Chatelier,E. et al. Richness of human gut microbiomecorrelates with metabolic
markers. Nature 500, 541–546 (2013).
Sensitivity
0.0
0.2
0.4
0.6
0.8
1.0
1-Specicity
Sensitivity
0.0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
–15 –10 –5 0 5
Controls
Case
–1.0 –0.5 0.0 0.5 1.0
Controls
Case
Median
Median
a
bc
d
AUC = 91.8%
Condence interval: 88.1–95.5%
AUC = 83.6%
Condence interval: 73.0–94.3%
–1.5 –1 0 1 2 3 3.5
0
10
20
30
40
50
60
PDI
Number of individuals
−1.5 −1 0 1 2 3 3.5
0
0.2
0.4
0.6
0.8
1.0
PDI
Percentage of patients
1-Specicity
Figure 3
|
PDI on the basis of gut microbial biomarkers. a, A PDI was
calculated for each individual from 15 gene markers selected using the mRMR
approach to evaluate the risk of liver cirrhosis. The filled blue circles show the
distribution of liver cirrhosis indices for all individuals (bins of 0.5 PDI units
were used; values less than 21.5 and greater than 3.5 were grouped). Inset, the
proportion of patients with liver cirrhosis in the corresponding bins. b,c,The
AUC is shown for the training (b) and validation (c) samples. d, The liver
cirrhosis PDI was computed for an additional 25 liver cirrhosis samples and 31
healthy control samples. The box depicts the interquartile range between the
first and third quartiles (25th and 75th percentiles, respectively); the line inside
denotes the median. Inset, the PDI without the outliers.
ARTICLE RESEARCH
00 MONTH 2014 | VOL 000 | NATURE | 5
Macmillan Publishers Limited. All rights reserved
©2014
28. Cotillard, A. et al. Dietary intervention impact on gut microbial gene richness.
Nature 500, 585–588 (2013).
29. Qin, J. et al. A human gut microbial gene catalogue established by metagenomic
sequencing. Nature 464, 59–65 (2010).
30. Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements
in complex metagenomic samples without using reference genomes. Nature
Biotechnol. http://dx.doi.org/10.1038/nbt.2939 (2014).
31. Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by
differential coverage binning of multiple metagenomes. Nature Biotechnol. 31,
533–538 (2013).
32. Zoetendal, E. G. et al. The human small intestinal microbiota is driven by rapid
uptake and conversion of simple carbohydrates. ISME J. 6, 1415–1426 (2012).
33. Bauer, T. M. et al. Small intestinal bacterial overgrowth in human cirrhosis is
associated with systemic endotoxemia. Am. J. Gastroenterol. 97, 2364–2370
(2002).
34. Chen, T. et al. The Human Oral Microbiome Database: a web accessible resource
for investigating oral microbe taxonomic and genomic information. Database
2010, baq013 (2010).
35. Pagani, I. et al. The Genomes OnLine Database (GOLD) v.4: status of genomic and
metagenomic projects and their associated metadata. Nucleic Acids Res. 40,
D571–D579 (2012).
36. Saarela, M., Mogensen, G., Fonden, R., Matto, J. & Mattila-Sandholm, T. Probiotic
bacteria: safety, functional and technological properties. J. Biotechnol. 84,
197–215 (2000).
37. Merritt, M. E. & Donaldson, J. R. Effect of bile salts on the DNA and membrane
integrity of enteric bacteria. J. Med. Microbiol. 58, 1533–1541 (2009).
38. Marchandin, H. et al. Prosthetic joint infection due to Veillonella dispar.Eur. J Clin.
Microbiol. Infect. Dis. 20, 340–342 (2001).
39. Hwang, J. J.,Lau, Y. J., Hu, B. S., Shi, Z. Y. & Lin, Y. H. Haemophilus parainfluenzaeand
Fusobacterium necrophorum liver abscess: a case report. J. Microbiol. Immunol.
Infect. 35, 65–67 (2002).
40. Xu, M. et al. Changesof fecal Bifidobacteriumspecies in adult patientswith hepatitis
B virus-induced chronic liver disease. Microb. Ecol. 63, 304–313 (2012).
41. Greenblum, S., Turnbaugh, P. J. & Borenstein, E. Metagenomic systems biology
of the human gut microbiome reveals topological shifts associated with
obesity and inflammatory bowel disease. Proc. Natl Acad. Sci. USA 109, 594–599
(2012).
42. Krieger, D. et al. Manganese and chronic hepatic encephalopathy. Lancet 346,
270–274 (1995).
43. Ferenci,P., Schafer, D. F., Kleinberger, G., Hoofnagle,J. H. & Jones, E. A. Serumlevels
of gamma-aminobutyric-acid-like activity in acute and chronic hepatocellular
disease. Lancet ii, 811–814 (1983).
44. Minuk,G. Y., Winder, A., Burgess, E. D. & Sarjeant,E. J. Serum gamma-aminobutyric
acid(GABA) levels in patientswith hepaticencephalopathy.Hepatogastroenterology
32, 171–174 (1985).
45. Yatsunenko, T. et al. Human gut microbiome viewed across age and geography.
Nature 486, 222–227 (2012).
46. Wu, G. D. et al. Linking long-term dietary patterns with gut microbial enterotypes.
Science 334, 105–108 (2011).
Supplementary Information is available in the online version of the paper.
Acknowledgements This work was supported by the National Program on Key Basic
Research Project (2013CB531401),the National Natural ScienceFoundation of China
(81301475 and 81330011), the Science Fund for Creative Research Groups of the
National Natural Science Foundation of China (81121002), the Technology Group
Project for Infectious Disease Control of Zhejiang Province (2009R50041) and the
Metagenopolis grant ANR-11-DPBS-0001. We thank Q. Cao, K. Su, J. Shao and
A. Ghozlane for help with data computation, and H. Zhang, H. Lu, Q. Bao, J. Ge, J. Jiang,
Z. Ren and M. Ye for assistance with sample collection.We are thankful to the MetaHIT
consortium for generating the gut gene set and the Human Microbiome Project for
generating the reference genomes from human gut microbes.
Author ContributionsL.J.L., S.D.E., S.S.Z.and N.Q. designed the project. L.J.L.,S.P.K. and
N.Q. managed the project. F.L.Y., N.Q., Y.F.C., J.G., G.R.Q., X.J.H. and B.W.Z. collected
samples and performed clinical study. J.G., Y.T.C. and W.X. performed DNA extraction
experiments. Y.J., L.J.W., J.W.Z. and S.J.N. performed library construction and
sequencing. L.J.L. and S.D.E. designed the analysis. N.Q., A.L.,E.P., E.L.C., L.L., N.P., P.L.,
J.M.B., C.H.Y. and W.C.D. analysed the data. A.L. and N.Q. did the functional annotation
analyses. L.S.,E.P., E.L.C. and A.L. analysed the statistics. N.Q., F.L.Y., L.S. and E.P. wrote
the paper. L.J.L. and S.D.E. revised the paper.
Author Information The raw Illumina read data for all samples have been deposited in
the European Bioinformatics Institute European Nucleotide Archive under accession
number ERP005860. Reprints and permissions information is available at
www.nature.com/reprints. The authors declare no competing financial interests.
Readers are welcome to comment on the onlineversion of the paper. Correspondence
and requests for materials should be addressed to L.J.L. (ljli@zju.edu.cn),
S.S.Z. (zyzsss@zju.edu.cn) or S.D.E. (dusko.ehrlich@jouy.inra.fr).
RESEARCH ARTICLE
6 | NATURE | VOL 000 | 00 MONTH 2014
Macmillan Publishers Limited. All rights reserved
©2014
METHODS
Patient information. Liver cirrhosis was diagnosedaccording to the international
guidelines by comprehensive consideration of liver biopsy, imaging examination,
clinical symptoms, physical signs, laboratory tests, medical history, progress notes
and cirrhosis-associated complications. Biopsy as the ‘gold standard’ for cirrhosis
diagnosis was used for 46 out of the 123 (37.4%) patients. As biopsy was counter-
indicated for patientswith conditions such as refractory ascites and obviousbleed-
ing tendency, the remaining77 (62.6%) were diagnosed using all other approaches
combined.To confirm diagnoses, we solicitedoutside expert opinions for each case.
Borderline or otherwise inconclusive cases were excluded from the study. After
discharge of the patient from the hospital, their case history was further reviewed
for medication history. Cases that progressed to hepatic carcinoma or those found
to suffer from other diseases such as hypertension and diabetes were excluded.
The control group included 114 healthy volunteers who visited the First Affiliated
Hospital of Zhejiang University in China for their annual physical examination.
The liver imaging and liver biochemistry results of all healthy controls were in the
normal range. Physical examination, routine e xamination of blood, urine and stools,
preoperative serological tests (including the detection of hepatitis B surface antigen,
hepatitis C virus antibody, Treponema pallidum antibody, human immunodefi-
ciency virus antibody), liver function, renal function, electrolyte, liver ultrasound,
electrocardiogram andchest X-ray results were checked in the healthy controls to
exclude any abnormal samples. Comprehensive clinical information for each enrolled
individual was recorded (Supplementary Table 1). Exclusion criteria for the con-
trol groupincluded hypertension, diabetes, obesity, metabolic syndrome, IBD, non-
alcoholic fatty liver disease, coeliac disease and cancer. Individuals who received
antibiotics and/or probiotics within 8 weeks before enrolment were also excluded.
All participants, or their legally authorized representatives, provided a written informed
consent upon enrolment.The study conformed to the ethical guidelinesof the 1975
Declaration of Helsinkiand was approved by the Institutional ReviewBoard of the
First Affiliated Hospital of Zhejiang University.
Human faecal sample collectionand DNA extraction. Each cirrhotic patient and
healthy individual provided a fresh stool sample that was delivered immediately
from our hospital to the laboratory in an ice bag using insulating polystyrene foam
containers. In the laboratory it was divided into five aliquots of 200 mg and imme-
diately stored at 280 uC. A frozen aliquot (200mg) of each faecal sample was pro-
cessed by phenol trichloromethane DNA extraction
16,47
as previously described. DNA
concentration was measured by NanoDrop (Thermo Scientific) and its molecular
size was estimated by agarose gel electrophoresis.
DNA library construction and sequencing. DNA libraries were constructed accord-
ing to the manufacturer’s instructions (Illumina). The same workflows from Illumina
were used to perform cluster generation, template hybridization, isothermal amp-
lification, linearization, blocking, denaturing and hybridization of the sequencing
primers. We performed paired-end sequencing on 2 3100 base pairs (bp) for all
libraries. The base-calling pipeline (Casava 1.8.2 withparameters ‘-use-bases-mask
y100n, I6n,Y100n, -mismatches1, -adaptor-sequence’) was used to processthe raw
fluorescent images and callsequences. The sameinsert size inferredby Agilent 2100
was used for all libraries (ranging from 275 to 450).
Quality control of reads. Reads that mapped to human genome together with
their mated/paired readswere removed from each sample using BWA
48
with para-
meter ‘-n 0.2’. Then qualitycontrol used the following criteria: (1) readscontaining
more than 3 Nbases were removed; (2) reads containing more than 50 bases with
low quality (Q2) were removed; (3) no more than 10 bases with low quality (Q2)or
assignedas Nin the tail of reads were trimmed. Sequences thatlost their mated reads
were consideredas single reads and were used in the assemblyprocedure. Resulting
filtered reads were considered for the next step of the analysis.
De novo
assembly of the Illumina short reads. Considering that k-mers with
very low frequencies might arise from sequencing errors, they were not used in
assembly by SOAPdenovo
49
(version 1.05), whichis based on De Brujin graph con-
struction. SOAPdenovo (version 1.05) was used in Illumina short read assembly
with parameters ‘-d 1 -M 3’. Then we removed ambiguous bases from assembled
scaffolds (this coulddivide one scaffold into multiple ones) and discardedscaffolds
with lengths less than 500bp. Finally we tested series of k-mer values (from 31 to
59), then choseone with the longest N50 value for theremaining scaffolds. For each
sample, we mapped clean data against scaffolds using SOAPalignversion 2.21 (ref. 50)
with parameters ‘-u -2 -m 200’. Unused data from each sample were pooled and split
into four parts (considering memory limit). Unused reads were repeatedly assembled
with the same parameters but only one k-mer value, -K 55, was chosen.
Construction of non-redundant human gut gene set. Total DNA was extracted
from the faecal samples of 98 Chinese patients with liver cirrhosis and 83 healthy
Chinese controls(Supplementary Table 1) and sequenced using an IlluminaHiSeq
2000 (Illumina). This produced an average of 4.74 gigabases (Gb) of high-quality
sequence for each sample, providing a total of 858 Gb of sequence data (Supplementary
Table 15). The reads wereassembled into contigs for all samples using the assembly
software SOAPdenovo
49
.Unassembled reads from 166 samples were pooled and
the de novo assembly process was performed again for these reads (Extended Data
Fig. 9d). Finally, 61.68% of the totalreads were used to generate 4.4 million contigs
without ambiguous bases (minimum length of 500 bp). These contigs had a total
length of 11.1 Gb, an average N50 length of 8,644 bp and ranged from 1,673 to
48,822 bp (SupplementaryTable 15). To predict microbial genesfor each of the 181
samples, we applied the methodology used in the MetaHIT human gut gene cata-
logue study
29
. The non-redundant human gut gene set was built by pairwise com-
parison of all thepredicted ORFs using blat and the redundant ORFs were removed
using a criterion of 95% identity over 90% of the shorterORF length, which is con-
sistent with the criterion used for the non-redundant European human gut gene
set
29
and T2D study
22
.
MetaGeneMark
51
(prokaryoticGeneMark.hmm version 2.8) was used to predict
ORFs in scaffolds without ambiguous bases. The program predicted 13,371,697
ORFs using a 100 bp cut-off for p rediction (Supplementary Table 15). The total length
of the predicted ORFs was 9,495,923,532 bp, represen ting 90.28% of the total length of
the contigs. Among the ORFs, 1,047,885 (54.6%) were complete genes, while 869,808
(45.4%) were incomplete. A non-redundant ‘liver cirrhosis gene set’ was established
by removing redundant ORFs, defined as those sharing 95% identity over 90% of the
shorter ORF lengthin pairwise alignments. The final non-redundantliver cirrhosis
gut gene set contained 2,688,468 ORFs, with an average length of 750 bp and 42%
of reads could be aligned to the gene catalogue.
Then genes from the liver cirrhosis, T2D and MetaHIT catalogues were merged
to create a non-redundant gene set for subsequent analyses. We checked the gaps
and frames in the blat results; if there were gaps or the frames were different in the
alignment result of two ORFs, the shorter one would not be removed as a redund-
ancy.We used MetaGeneMarkto predict genesin assembled contigs originallyfrom
MetaHITand T2D and mergedthese three gene setsinto a single onewith the above
method.
Organism abundanceprofiling.SO APalign2. 21 was used to align paired-end clean
reads against referencegenomes with parameters ‘–r 2 –m 200 –x 1000’. Reads with
alignments on the same referencegenomes could be assigned into two types, as fol-
lows. (1) Unique reads (U): reads having alignments with only one genome. These
reads were denoted asunique reads.(2) Multiple reads (M): reads having alignments
with more than one genome. If these genomes came from one species, we denoted
these reads as unique reads. If they were from more than one species, we denoted
these reads as multiple reads.
For species S, if its abundance is Ab(S), and it might have alignments with U
unique reads and Mmultiple reads, the computation is
Ab SðÞ~Ab(U)zAb(M)
Ab(U)~U=l
Ab(M)~(X
M
i~1
Co fMg)=l
Ab(U) and Ab(M) are abundance of unique and multiple reads, respectively, and l
is length of relative genome. For each multiple read, there is a species-specific coe-
fficient Co; let us suppose one read in {M} has alignments with Ndifferent species,
then Co was calculated as follows:
Co~U=X
N
i~1
Ab(U)
For these reads,we add a unique abundance of Nspecies as thedenominator. Before
we calculate the abundance of species S, we calculate Ab(U) for all species as con-
stants; if Ab(U) of species Sis 0, then Co will also be 0, and consecutively the abun-
dance of species Sis 0. Species abundance was added to obtain the genus-level profile
table. For some species that do not have a genus, they are denoted as unclassified
genera for each species.
Gene abundance profiling. Reads were aligned against the gene set by using
SOAPalign
50
with parameters ‘-r -m 200 -x 1000’. We counted a gene’s abundance
if both paired-end reads could be aligned on the same gene. If onlyone of the paired-
end reads could be aligned on a gene, we aligned both reads against assembled
contigs by checking if the previously non-aligned read were in the non-translated
region or not. If true, both reads were validated for gene count; if not, both reads
were discarded.
When calculating the abundance of genes, we used the same strategy as for
the abundance profiling of the organisms. For a given gene G, its abundance is
Ab(G), and it might have alignments with Uunique reads and Mmultiple reads,
as follows:
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
Ab GðÞ~Ab(U)zAb(M)
Ab(U)~U=l
Ab(M)~(X
M
i~1
Co fMg)=l
Ab(U) and Ab(M) are the abundances of unique and multiple reads, respectively,
and lis length of gene G. For each multiple read, we calculate a specific coefficient
Co for this gene. Let us suppose one read with multiple {M} alignments in Ndif-
ferent genes, then Co was calculated as follows.
Co~U=X
N
i~1
Ab(U)
For these reads, we add a unique abundance of Nspecies as the denominator.
Population stratification. Population stratification involved in our metagenomic
data was corrected with the modified EIGENSTART method as follows, First, sin-
gular value decomposition was carried out to obtain axes of variation, where the
number of significant axes wasdetermined according to a Tracy–Widom test at a
significance level of P,0.05; each axis was then replaced with the residuals of this
axis from a regression to disease state; the corrected data were finally achieved by sub-
tracting from original data set the information associated with the residuals of each axis.
Gene count determination. Gene counts were computed essentially as described
in ref. 27. Briefly, datawere downsized to adjust for sequencingdepth and technical
variability by randomly selecting 6.2 million reads mapped to the merged gene
catalogue for each sample and thencomputing the mean number of genes over 30
random drawings (Supplementary Table 4). This was possible for all but two patients
with liver cirrhosis from thevalidation cohort (with insufficient number of mapped
reads), who were excluded from this analysis. The results are displayed in Extended
Data Fig. 4 top left.
Gene functional classification and orthologue group abundance profiling. Protein
sequences of the predictedgenes were searched using National Center for Biotech-
nology Information blastP against the eggNOG 3.0 database
52
and the KEGG gene
database(KEGG FTP release21 January 2013) withparameters ‘-num_descriptions
100000, -evalue 1e-5’. Genes that had alignments with a bits score higher than 60
were assignedinto one or more eggNOG or KEGG orthologue groups.We used the
methods introduced in ref. 29 to calculate abundance of proteins archived in the
eggNOGand KEGG databases. To calculate abundances of eggNOGor KEGG orth-
ologue groups, we added abundances of proteins assigned into the same eggNOG or
KEGG orthologuegroups, as abundances of eggNOGor KEGG orthologue groups,
then profiles of eggNOG/KEGG orthologue groups were generated.
Gene biomarker identification. Genes from the gene-profile matrix were used in
an association study aimed at identifying those that were differentially abundant
between the patient and the healthycontrol groups. Wilcoxon tests were employed
to compute the probabilitiesthat frequencyprofilesdid not differbetweenthe patient
and the healthy controlgroups by chance alone. Benjamini and Hochberg multiple
test correction wasapplied to the Pvalues. By performing a selection only based on
a threshold of P,0.01, we found 541,582 genes.For specificity and computational
reasons, we useda very stringent significance thresholdof FDR ,0.0001. This pro-
cess identified 75,245 genes that were differentially abundant between the groups
(49,830 were more abundant in the patients with liver cirrhosis and 25,415 in the
healthy controlgroup). A similar Pvalue and group enrichmentmethod was calcu-
lated for the NOG/KEGG orthologue groups as well.
MGS. We followed the approach described in refs 27 and 30 to cluster genes from
the current study into MGS. Briefly, in a first step the pairwise Spearman’s corre-
lation coefficient (r) of different genes was computed, using gene abundances
across all individuals, and the genes correlated over a given threshold were clus-
tered (single-linkage clustering). To favour clustering specificity (that is, assigning
only the genes of the samespecies to the same cluster) we used a ratherhigh thresh-
old (r.0.7). To correct for the concomitant loss of sensitivity, we performed a
second step wherebythe mean abundance signal of each cluster of atleast 50 genes
was computed, using the 50 most connected genes of a cluster. The clusters that
had r.0.85 were fused. This procedurewas applied separately to the 49,830 genes
enriched in patients with liver cirrhosis and the 25,415 genes enriched in healthy
controls. Of the 25,415‘healthy’ genes, 21,423 fell into 43 clusters composed of 51–
2,702 genes after the first clustering step, and 38 clusters of 51–2,970 genes after the
second step. Of the ‘liver cirrhosis’genes, 31,386 out of 49,830 fell into 60 clusters of
51–3,000 genes afterthe first clustering step, and 28 clusters of 51–5,755 genesafter
the second step.
To verify that the genes from a given cluster belonged to the same genome and
to annotatethe MGS taxonomically,we performed blastNand blastP analysesusing
a collection of 6,006 genomes (the available reference genomes from the National
Center for Biotechnology Information and the set of draft gastrointestinal gen-
omes from the Data Analysis and Coordination Center of the HMP and MetaHIT
(3 August 2012 version)). MGS were assigned to a given genome when more than
80% of its ‘tracer genes’
27
matched the same genome using blastN,at a threshold of
95% identity over 90%of gene length.Six ‘healthy’ and 24 ‘livercirrhosis’MGS could
thus be assigned to the strainlevel (see Extended Data Fig. 9e, f and Supplementary
Table 4). The remaining MGS wereannotated using blastP analysis and assigned to
a given taxonomical level from genus to superkingdom level if more than 80% of
their 50 tracer genes had the same level of assignment
27
. All but one of the 36 remain-
ing species could thus be assigned to a given genus, family or order (see Supplemen-
tary Table 4). The quality of the clustering was thus validated by the homogenous
annotation of its markergenes, which also held true for all of the MGS genes (data
not shown). The abundanceof the 66 MGS in each individual was computed using
the 50 tracer genes.
To explore the origin of the species-level annotated MGS, we constructed a ref-
erence catalogue, grouping114 publicly availableStreptococcus (57),Fusobacterium
(26), Lactobacillus (16), Veillonella (12) and Megasphaera (3) genomes, mostly of
oral (50) or gut (28) isolates (Supplementary Table 6). The 16 liver cirrhosis MGS
that were assigned to the corresponding genera were compared with the genomes,
using blastN. A score (T) was computed for each MGS, taking into account (1) the
proportion of genes above 95%identity and 90% coverage (Q), (2) the average iden-
tity (R), (3) the average coverage (S)and(4)T5Q3R3S.
A majority of the MGS enriched in patients with liver cirrhosis (15 out of 28)
were of oral origin by this criterion whereas six were from gut or faeces, including
a single species from the ileum (Supplementary Table 4 and Extended Data Fig. 4
top right). To explore further the origin of the liver-cirrhosis-enriched MGS, we
comparedthem by blastN withthe genes from three available ileum metagenomes
31
and failed to reveal identity beyond that found with sequenced genomes.
Only a small minority of the 38 MGS enriched in healthy individuals (15.8%)
could be assigned speciesphylogenetic information by comparisonwith sequenced
gut genomes using blastN(95% identity and 90% overlap; SupplementaryTable 4).
Annotation to comparabletaxonomic levels was observed for the 58 gut MGS ana-
lysed in the context ofgene richness in a Danish cohort
27
(Extended Data Fig. 9e, f),
reflecting a paucityof isolated and sequenced gut strains. Furthermore, it is striking
that all 38 MGS enriched in healthy Chinese were found in the Danish cohort
(Extended Data Fig. 3). In sharp contrast with the MGS enriched in healthy sub-
jects, an overwhelming majority of the MGS enriched in patients (24 out of 28)
could be assigned to a species. Such a difference has a vanishingly low probability
of being caused by chance alone (1.3 310
221
by a x
2
test, Extended Data Fig. 9e, f)
and indicates a highly modified composition of gut microbes.
Co-occurrence network of MGS. The 66 marker profiles of the differentially
abundant MGS betweenpatient and healthy individuals werecorrelated separately
for patients and for healthy individuals, essentially as described in ref.53. For each
of the 2,112 possible edges [(66 366/2) – 66] we computed 1,000 permutations by
renormalizing the data after each step and computed Spearman’s correlation coe-
fficients to obtain the null distributions due to the compositionality effect
53
. For
each of the edges we also computed the bootstrap distribution of the Spearman’s
correlation coefficients to have the confidence interval and the corresponding var-
iance. We next applied for each edge a z-test with the pooled variance from both
distributions and computeda significance Pvalue.Multiple testingcorrections were
applied to the Pvalues using the Benjamini and Hochberg method, and only those
having FDR ,10
29
were used to construct the network. This FDR threshold cor-
responds approximately to r.0.4. The network reflects strong correlations that
are not spurious and that are not due to the compositionality effect. The resulting
network is displayed as Fig. 2.
Marker selection by mRMR. Patient discrimination gene markers (23,000 from
healthy controls and 23,000 from patients, selected as most discriminant by the
Wilcoxonrank-sum test upon adjustment for age, performed as described in ref.54;
Supplementary Table 11) were selected with a two-step scheme (using the side
Channel Attack R package). All markers retained were first filtered by the mRMR
algorithm
55
(using the side Channel Attack R package), and the top 180 best ones
were selected for further analysis. Then, we performed an incremental search to
select theoptimal subset of genes,named as markers. Concisely, genes were sequen-
tiallyadded into the subset with a step of 5, the performance of which was evaluated
on the basis of linear discriminant analysis and leave-one-out cross-validation.
Here, Matthews correlation coefficient is a balanced measure taking into account
true and false positives and negatives; it is superior to accuracy or error rate when
the classes (healthy and diseased, etc.) are of very different sizes. Matthews corre-
lation coefficient (MCC) is defined as
MCC~TP|TN{FP|FN
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
TPzFPðÞTPzFNðÞTNzFPðÞTNzFNðÞ
p
where TP, TN, FP and FN are true positive, true negative, false positive and false
negative, respectively. We finally selected a set of 15 gut microbial gene markers as
the optimal selection for patient discrimination.
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
Model construction and validation. On the basis of the 15 metagenomic markers
described above, a support vector machine classifier (radial basis function kernel
and default parameters)was constructed for patient discrimination(realized by the
e1071 package of R software), the performance of which was assessed by receiver
operating characteristic analysis. The AUC and corresponding 95% confidence
intervals for trainingand validation data sets, obtained by using the pROC package
of R software (10,000 bootstrap replicates), were 0.97 (0.95–0.99) and 0.889 (0.79–
0.98), respectively.
Definition of PDI. To facilitate clinical application of the selected 15 metage-
nomic markers, we defineda more straightforward index (PDI) for discrimination
of patients. For each individual sample, the PDI of sample jthat was denoted by I
j
was computed as follows:
Id
j~X
i[N
Aij
In
j~X
i[M
Aij
Ij~
Id
j
N
jj
{
In
j
M
jj
!
|106
where A
ij
is the relative abundance of marker iin sample j.Nand Mare subsets of
patient- and control-enriched markers in these 15 selected gut metagenomic mar-
kers, respectively. Moreover, jNjand jMjare the sizes of these two sets.
47. Li, M. et al. Symbiotic gut microbes modulate human metabolic phenotypes.
Proc. Natl Acad. Sci. USA 105, 2117–2122 (2008).
48. Li, H. & Durbin, R. Fast and accurate shortread alignment with Burrows-Wheeler
transform. Bioinformatics 25, 1754–1760 (2009).
49. Li, R. et al. De novo assembly of human genomes with massively parallel short
read sequencing. Genome Res. 20, 265–272 (2010).
50. Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment.
Bioinformatics 25, 1966–1967 (2009).
51. Noguchi, H., Park, J. & Takagi, T. MetaGene: prokaryotic gene finding from
environmental genome shotgun sequences. Nucleic Acids Res. 34, 5623–5630
(2006).
52. Powell, S. et al. eggNOG v3.0: orthologous groups covering 1133 organisms
at 41 different taxonomic ranges. Nucleic Acids Res. 40, D284–D289
(2012).
53. Faust, K. et al. Microbial co-occurrence relationships in the human microbiome.
PLOS Comput. Biol. 8, e1002606 (2012).
54. Price, A. L. et al. Principal components analysis corrects for stratification in
genome-wide association studies. Nature Genet. 38, 904–909 (2006).
55. Ding, C. & Peng, H. Minimum redundancy feature selection from microarray
gene expression data. J. Bioinform. Comput. Biol. 3, 185–205 (2005).
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 1
|
Venn diagram comparing the current major
human microbiome gene set and the results of a principal component
analysis of biomarkers distributed between patients with liver cirrhosis and
healthy controls. a, Venn diagram of the four currently available major human
microbiome gene sets. The total gene number in each gene set and the
overlapping areas are indicated.b, Venn diagram of the three major human gut
gene sets (LC, liver cirrhosis gene set; T2D, type 2 diabetes gene set; MetaHIT,
MetaHIT gene set; HMP, HMP gene set). c, Visualization of the principal
component analysis results for the liver-cirrhosis-associated genes that differed
significantly in the discovery cohort (FDR ,0.0001, Wilcoxon rank-sum test
adjusted for multiple testing). The principal component analysis is built here
using these genes in the validation cohort (25 patients with liver cirrhosis in red,
31 healthy controls in green).
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 2
|
Phylogenetic abundance at the phylum, genus
and species levels from liver cirrhosis and healthy control samples.
a, Phylogenetic abundance variation box plot at the phylum level and the 30
most abundant phylotypes at the genus and species levels in the healthy
controls are shown. Red, green, blue, turquoise and purple represent
Bacteroidetes, Firmicutes, Proteobacteria, Actinobacteria and other phyla,
respectively. The colour of each genus and species corresponds with the colour
of its respective phylum. b, Phylogenetic abundance variation box plot at the
phylum level and the 30 most abundant phylotypes at the genus and species
levels in the liver cirrhosis are shown (see Methods for the calculations). The
boxes represent the interquartile range, from the first and third quartiles, and
the inside line represents the median. The whiskers denote the lowest and
highest values within an interquartile range of 1.53 from the first and third
quartiles. The circles represent outliers beyond the whiskers.
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 3
|
MGS enriched in healthy Chinese individuals
(
n
5114) are present in Danish individuals (
n
5292). Presence and
abundance of 50 ‘tracer’ genes for each species; genes are in rows; abundance
is indicated by colour gradient (white, not detected; red, most abundant).
Individuals, ordered by increasing gene count, are in columns. Significance of
correlation of species abundance (computed as mean abundance of the tracer
genes) and gene count (qvalue, FDR adjusted) is given. Species in the Chinese
cohort that were identical to those previously found, as correlated with the
gene diversity in the Danish cohort
27
, are highlighted in red. Left, the Chinese
healthy cohort. Right, the Danish obesity cohort.
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 4
|
Massive changes in the gut microbiome in liver
cirrhosis. Top left, healthy individuals have more gut microbial genes than
patients with liver cirrhosis. Gene count was computed after downsizing the
mapped reads to a level of 6.2 million(ref. 27). The significance of the difference
was computed using a Student’s t-test. Bottom, abundance of patient-enriched
species (n528) in patients with liver cirrhosis (n598) and healthy controls
(n583). The relative abundance of each patient-enriched species was
computed as a sum of the abundances of all the genes assigned to it divided by
the sum of the abundances of all gut microbial genes in each patient, which is
equal to 1 in the normalized data set. Bar length indicates the relative
abundance of a given species depicted by a different colour. Patients were
ordered by the total patient-enriched species abundance; LPA and HPA
quartiles (n524) are separated by red vertical lines. Top right, oral species are
frequent in patients with liver cirrhosis. MGS enriched in healthy controls
are largely not assigned to a species level, while those enriched in patients with
liver cirrhosis are largelyassigned to a species level and are mostly of oral origin
(see Methods for species assignment).
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 5
|
The distribution of eggNOG orthologue group
and KEGG functional categories for liver-cirrhosis-related markers.
a, Comparison between the liver-cirrhosis-enriched and control-enriched
eggNOG orthologue group markers for 24 eggNOG orthologue group
functional categories shown by number. b, Comparison between the liver-
cirrhosis-enriched and control-enriched eggNOG orthologue group markers
for 24 eggNOG orthologue group functional categories shown by percentage.
c, Comparison between the liver-cirrhosis-enriched and control-enriched
KEGG orthologue groupmarkers for each KEGG functional categoryshown by
number. d, Comparison between the liver-cirrhosis-enriched and control-
enriched KEGG orthologue group markers for each KEGG functional category
shown by percentage.
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 6
|
A comparison of the gene markers for the
different groups. a, Venn diagram showing a gene marker comparison of
case-enriched gene markers from the liver cirrhosis, T2D and IBD studies.
b, Venn diagram showing a gene marker comparison of control-enriched gene
markers from the liver cirrhosis, T2D and IBD studies. c, The length of the bar
(yaxis) represents the numberof genes; the Pvalue in the relatedrange is shown
on the xaxis. The pink and light green bars show genes involved in type 2
diabetes and liver cirrhosis, respectively. Inset, the log Pvalue of the gene
markers between the two studies.
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 7
|
The distribution of eggNOG functional categories
for case-enriched and control-enriched gene markers in liver cirrhosis
only, T2D only and the liver cirrhosis/T2D groups. a, Comparison of the
eggNOG orthologue group functional categories for case-enriched gene
markers shown by number. b, Comparison of the eggNOG orthologue group
functional categories for case-enriched gene markers shown by percentage.
c, Comparison of the eggNOG orthologue group functional categories for the
control-enriched gene markers shown by number. d, Comparison of the
eggNOG orthologue group functional categories for the control-enriched gene
markers shown by percentage.
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 8
|
The distribution of the KEGG functional
categories for case-enriched and control-enriched gene markers in liver
cirrhosis only, T2D only or the liver cirrhosis/T2D group. a, Comparison of
the KEGG pathway categories for the case-enriched gene markers shown by
number. b, Comparison of the KEGG pathwaycategories for the case-enriched
gene markers shown by percentage. c, Comparison of the KEGG pathway
categories for the control-enriched gene markers shown by number.
d, Comparison of the KEGG pathway categories for the control-enriched gene
markers shown by percentage.
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 9
|
Estimating the optimum number of markers and
establishing the taxonomic assignment of MGS. a, Comparison of the case-
enriched gene markers. b, Comparison of the control-enriched gene markers.
c, The mRMR method was used to identify the liver-cirrhosis-associated
markers. Sequential subsets were generated at five-marker intervals. For each
subset, the error rate was estimated using a leave-one-out cross-validation of a
linear discrimination classifier. The optimum (highest value of the Matthews
correlation coefficient) subset contains 15 gene markers. d, The study included
a discovery and a validation phase. Volunteers for both phases were recruited in
the same hospital. Both direct read mapping and de novo assembly were
performed for each sample. A taxonomy profiling table was established for
taxonomy analysis. A novel gut gene set was established, and annotated.
Identification of the MGS, finding markers and validating markers is also
shown. e, MGS enriched in Chinese patients with liver cirrhosis and healthy
individuals. Species-level assignment was deduced from the best BlastN hits of
genes from a given MGS at thresholds of the average of more than 95% identity
and more than 90% overlap with genes from a sequenced genome. For MGS
where these thresholds were not reached, an assignment was attributed at the
lowest taxonomy level where at least 80% of the genes had the same best hit
BlastP taxonomy; in all cases these criteria held true at higher taxonomic levels.
f, Taxonomic assignments of 58 species related to gut gene richness in a
Danish cohort
27
.
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 10
|
Phylogenetic abundance of healthy controls in
the discovery stage in the liver cirrhosis and T2D studies. The relative
abundance of top bacterial phylotypes at the phylum, genus and species levels,
respectively, in the livercirrhosis study (top three panels) and in the T2D study
(bottom three panels).
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
... These DNA fragments were subsequently ligated with an adapter for sequencing. Purification was then conducted using an AMPure XP system (Beverly) and Bowtie2 software with the following parameter settings:-end-to-end, -sensitive, -I 200, and -X 400 [16][17][18]. Alternatively, total RNA was extracted from the samples, and ribosomal RNA was removed from the total RNA, followed by random fragmentation of the resulting RNA into short fragments of 250-300 bp in length using divalent cations in NEB fragmentation buffer. ...
... gatech. edu/ GeneM ark/ ) was used to perform ORF prediction of the scaftigs (≥ 500 bp) from each sample [16,[22][23][24][25], and sequences resulting from the prediction results with a length less than 100 nt were filtered out [19,[26][27][28][29]. ...
... bioin forma tics. org/ cd-hit/ ) was used to eliminate redundancy in the ORF prediction results [30,31] with the following parameter settings: -c 0.95, G 0, aS 0.9, g 1, d 0 [14,16]. Clean data from each sample were aligned to the initial gene catalog using Bowtie2. ...
Article
Full-text available
The prevalence of tick‐borne bacterial and viral diseases, which pose a serious threat to human and livestock health, is increasing worldwide. At present, only a limited number of tick‐borne pathogens have been reported, and no analysis of the microbial pathogen community in ticks has been carried out. We sequenced the viral metagenome of Ornithodoros lahorensis species of ticks from the Chinese mainland and identified 390 RNA viruses with unique microbial compositions. A total of 992 assembled viral transcriptomes revealed the breadth and diversity of the genome structure of tick‐borne viruses, reflecting the importance of ticks as RNA viral pools. We analyzed the phylogeny of different virus families to investigate virus evolution and found that the most diverse tick‐associated viruses belonged to the family Siphoviridae, which diverged earlier in evolutionary time than other arboviruses. There were only a few tick‐specific viruses, whereas the number of vertebrate‐infecting viruses in ticks was greater. We hope that our virus sequencing dataset will facilitate future important research on viruses carried by ticks that can infect vertebrates.
... The parameters used were: -T 6 -G 0 -aS 0.9 -g 1 -d 0 -c 0.95 -n 5 -M 8000. The nonredundant genes were defined as contiguous gene coding sequences with the following parameters: -c 0.95, -G 0, -aS 0.9, -g 1, -d 0 54,55 . ...
Article
Full-text available
The impact of dietary microorganisms on host microbiota is recognized, but the underlying mechanisms remain unclear. This study examined the effects of bamboo surface microbiota, including virulence factors, antibiotic resistance genes (ARGs), and mobile genetic elements from different bamboo parts (leaves, shoots, and culms), on giant panda gut microbiota using three pairs of twins. Results showed that bamboo and fecal samples shared 1670 microbial species, with shoot surface microbiota contributing the highest proportion (21%, Bayesian source tracking) of contemporaneous gut microbiota, primarily by increasing abundances of Escherichia coli and ARGs. Klebsiella pneumoniae and Salmonella enterica also showed high co-occurrence in both bamboo and fecal samples, indicating potential colonization. Additionally, Streptococcus suis, Acinetobacter, and Mycobacterium progressively declined in fecal samples as bamboo shoot intake increased, suggesting these microbes are likely transient. The findings emphasize the impact of foodborne microorganisms on the host and the importance of conservation management.
... Moreover, the small sample sizes of several studies further limit the power of the statistical analyses, potentially affecting the accuracy of the pooled effect size. Larger, well-designed RCTs are needed to better assess the efficacy of microbiota-targeted therapies and AI tools in diverse clinical settings [25][26][27][28]. ...
Article
Background. The human microbiota plays a critical role in maintaining health, and its imbalance, known as dysbiosis, is linked to various diseases. Microbiota-targeted therapies, such as probiotics and fecal microbiota transplants (FMT), and artificial intelligence (AI)-driven tools for microbiota analysis are emerging as promising interventions in personalized medicine. However, comprehensive evidence on their effectiveness remains limited. Objective. This systematic review and meta-analysis aim to evaluate the clinical and diagnostic effectiveness of microbiota-targeted therapies and AI-driven tools, synthesizing evidence from diverse studies to provide a pooled effect size and explore potential variations across study designs and patient populations. Methods. A systematic search of PubMed, Scopus, Web of Science, and Google Scholar was conducted for studies published between 2015 and 2023. Studies evaluating microbiota-targeted therapies (probiotics, FMT) and AIdriven tools in the context of personalized medicine were included. A random-effects model was used to calculate the pooled effect size (Standardized Mean Difference, SMD), and heterogeneity was assessed using I² statistics. Subgroup analyses were performed based on intervention type, patient population, and study design. Results. The pooled effect size for the included studies was 0.62 (95% CI: 0.48–0.76), indicating moderate effectiveness of microbiota-targeted therapies and AI-driven tools. AI-driven tools showed the highest diagnostic accuracy (SMD = 0.87), while probiotics and FMT were particularly effective in managing conditions like inflammatory bowel disease (IBD) (SMD = 0.80). Subgroup analyses revealed greater benefits for IBD patients (SMD = 0.79) compared to the general population (SMD = 0.56). Heterogeneity (I² = 67%) was moderate, and publication bias was minimal (Egger’s test p = 0.16). Conclusion. Microbiota-targeted therapies and AI-driven tools demonstrate moderate effectiveness in improving clinical outcomes, with AI offering superior diagnostic accuracy. The findings support the potential of these interventions in personalized medicine, particularly for patients with specific conditions like IBD. However, further largescale, randomized controlled trials are necessary to confirm these findings and enhance the evidence base for clinical application.
... The human microbiome has been emerging as an important player in health and disease (Kashyap et al. 2017, Gevers et al. 2014, Jostins et al. 2012, Kostic et al. 2012, Qin et al. 2012, Scher et al. 2013, Qin et al. 2014. For example, immune maturation and modulation (Ahern et al. 2014, Geva-Zatorsky et al. 2017, inflammatory cytokine production (Schirmer et al. 2016), host serum metabolome and insulin level (Pedersen et al. 2016), and host gene regulation (Fellows et al. 2018) have all been shown to be linked to the human microbiome. ...
Preprint
Full-text available
Microbiome sequencing data are inherently sparse and compositional, with excessive zeros arising from biological absence or insufficient sampling. These zeros pose significant challenges for downstream analyses, particularly those that require log-transformation. We introduce BMDD (BiModal Dirichlet Distribution), a novel probabilistic modeling framework for accurate imputation of microbiome sequencing data. Unlike existing imputation approaches that assume unimodal abundance, BMDD captures the bimodal abundance distribution of the taxa via a mixture of Dirichlet priors. It uses variational inference and a scalable expectation-maximization algorithm for efficient imputation. Through simulations and real microbiome datasets, we demonstrate that BMDD outperforms competing methods in reconstructing true abundances and improves the performance of differential abundance analysis. Through multiple posterior samples, BMDD enables robust inference by accounting for uncertainty in zero imputation. Our method offers a principled and computationally efficient solution for analyzing high-dimensional, zero-inflated microbiome sequencing data and is broadly applicable in microbial biomarker discovery and host-microbiome interaction studies. BMDD is available at: https://github.com/zhouhj1994/BMDD.
... org/index.jsp). From the alignment results of each sequence, the best blast hit results were selected for subsequent analysis (J. [50]; N. [45,51]). ...
Chapter
Our bodies are colonized with trillions of microorganisms, comprising bacteria, fungi, micro-eukaryotes, and viruses, which are collectively named “microbiome” [1–3]. Several human body sites are colonized by different and distinct microbiomes, for example, the skin [4], the oral cavity [5], the gastrointestinal tract [6], and the urogenital tract [7]. Microbiomes have been shown to play crucial roles linked with our health and disease states [3, 4, 8, 9]. For example, it has been shown to train the immune systems in newborns [10], to digest important nutrients that otherwise we would not be able to process, and to produce health-promoting metabolites [11]. More in detail, it has been shown that gut bacteria help to break down complex carbohydrates [12] as well as absorb important vitamins and minerals present in the food [13]. For instance, they produce short-chain fatty acids (SCFAs) like butyrate, propionate, and acetate, by fermenting dietary fibers that reach the colon [14–17]. Additionally, commensal bacteria stimulate the production of mucus by the gut lining and compete for nutrients with pathogens and other harmful bacteria by occupying specific ecological niches in our digestive tract, thus preventing or limiting colonization [18, 19]. Certain bacteria can moreover produce antimicrobial compounds that can directly kill or inhibit the growth of pathogenic bacteria [20].
Article
Background Metabolic syndrome (MS) and type 2 diabetes (T2D) are metabolically related diseases with rising global prevalence and increasingly evident links to the intestinal microbiota. Research suggests that imbalances in microbiota composition may play a crucial role in their pathogenesis. Specific population cohorts, such as the one in Galicia, Spain, offer the opportunity to analyze microbiota patterns within a distinct geographical and genetic context. This study was performed to investigate the relationship between the intestinal microbiota and MS and T2D. Methods A cohort of 79 volunteers was analyzed over a 2-year study period. Recruitment posed significant challenges because of strict inclusion criteria (918PTE0540; PCI2018-093284), which required participants to be free from chronic medications and have a moderate to high risk of developing T2D. Volunteers were classified based on their serum glucose levels, body mass index, and the presence or absence of MS. To analyze the microbiota composition, amplicon sequencing of 16S rRNA genes was performed on stool samples. Alpha diversity was assessed using the Chao and Shannon indices, while beta diversity was evaluated using permutational analysis of variance with Bray–Curtis and Chao distances. Differential abundance analysis was conducted using the LinDA method. Results In patients with MS, we observed a higher Firmicutes/Bacteroidetes ratio and an increased prevalence of Blautia compared to healthy patients. than in healthy individuals. Other enriched taxa in patients with MS included Tyzerella, Streptococcus, and Ruminococcus callidus . In patients with T2D, we observed a higher Bacteroidetes/Firmicutes ratio and a decrease in the phylum Actinobacteria compared with healthy individuals. Taxa such as Dorea, Prevotella, Dialister invisus , Fusicatenibacter, and Coprococcus were associated with T2D, while beneficial taxa such as Eubacterium, Ligilactobacillus, and Acidaminococcus were more prevalent in healthy or prediabetic individuals. Conclusions This study reveals notable differences in the intestinal microbiota composition among patients with MS and T2D. Changes in microbial composition, particularly the Firmicutes/Bacteroidetes ratio, may serve as indicators of underlying pathology. At more specific taxonomic levels, several enriched taxa were identified in patients with MS, including Blautia, Tyzzerella, Dorea, Streptococcus, and Ruminococcus callidus . Additionally, species such as Dorea longicatena and Dialister invisus were enriched in prediabetic and diabetic patients, whereas beneficial genera (Eubacterium, Acidaminococcus, Bifidobacterium, and Ligilactobacillus) were more prevalent in healthy and prediabetic individuals than in those with T2D.
Article
Background/Objectives:Metabolic dysfunction-associated steatohepatitis (MASH), characterized by liver inflammation, fibrosis, and fat accumulation, can develop into cirrhosis and liver cancer. Despite its increasing prevalence worldwide, there are few established therapies for advanced MASH. We previously demonstrated that stem cells from human exfoliated deciduous teeth-conditioned media (SHED-CM) exerted therapeutic effects in a MASH mouse model. The gut–liver axis is thought to be associated with liver disease progression, and soluble Siglec-9 (sSiglec-9), an immunoinhibitory receptor, is a key protein in SHED-CM that induces anti-inflammatory macrophages and has intestinal epithelial protective effects. Therefore, we evaluated sSiglec-9’s role in intestinal barrier protection in MASH mice. Methods: We evaluated sSiglec-9 effects on intestinal barrier function using in vitro Caco-2 cell monolayers injured by TNF-α and IFN-γ. For the MASH mouse model, male C57BL/6J mice were given a Western diet and high-sugar solution orally; to induce liver injury, CCl4 was intraperitoneally administered for 12 weeks. Mice were treated weekly with 10 ng/g sSiglec-9 or vehicle. Intestinal permeability was assessed by blood 4 kDa FITC-dextran concentration, and intestinal transcriptomes and liver histology were analyzed. Results: sSiglec-9 decreased intestinal permeability and liver inflammation in MASH mice. sSiglec-9 and SHED-CM reduced 4 kDa FITC-dextran permeability in injured Caco-2 cells, and sSiglec-9 significantly reduced intestinal permeability and modulated expression of 34 intestinal genes. The NAFLD Activity Score indicated significantly reduced inflammation following sSiglec-9 treatment. Conclusions: sSiglec-9 may protect intestinal barrier function by mitigating mucosal inflammation. sSiglec-9 treatment may represent a novel therapeutic approach for MASH via gut–liver axis modulation.
Article
Full-text available
The human oral microbiome is the most studied human microflora, but 53% of the species have not yet been validly named and 35% remain uncultivated. The uncultivated taxa are known primarily from 16S rRNA sequence information. Sequence information tied solely to obscure isolate or clone numbers, and usually lacking accurate phylogenetic placement, is a major impediment to working with human oral microbiome data. The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with a body site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity based on a curated 16S rRNA gene-based provisional naming scheme. Currently, two primary types of information are provided in HOMD—taxonomic and genomic. Named oral species and taxa identified from 16S rRNA gene sequence analysis of oral isolates and cloning studies were placed into defined 16S rRNA phylotypes and each given unique Human Oral Taxon (HOT) number. The HOT interlinks phenotypic, phylogenetic, genomic, clinical and bibliographic information for each taxon. A BLAST search tool is provided to match user 16S rRNA gene sequences to a curated, full length, 16S rRNA gene reference data set. For genomic analysis, HOMD provides comprehensive set of analysis tools and maintains frequently updated annotations for all the human oral microbial genomes that have been sequenced and publicly released. Oral bacterial genome sequences, determined as part of the Human Microbiome Project, are being added to the HOMD as they become available. We provide HOMD as a conceptual model for the presentation of microbiome data for other human body sites.
Article
Full-text available
Complex gene-environment interactions are considered important in the development of obesity(1). The composition of the gut microbiota can determine the efficacy of energy harvest from food(2-4) and changes in dietary composition have been associated with changes in the composition of gut microbial populations(5,6). The capacity to explore microbiota composition was markedly improved by the development of metagenomic approaches(7,8), which have already allowed production of the first human gut microbial gene catalogue(9) and stratifying individuals by their gut genomic profile into different enterotypes(10), but the analyses were carried out mainly in nonintervention settings. To investigate the temporal relationships between food intake, gut microbiota and metabolic and inflammatory phenotypes, we conducted diet-induced weight-loss and weight-stabilization interventions in a study sample of 38 obese and 11 overweight individuals. Here we report that individuals with reduced microbial gene richness (40%) present more pronounced dys-metabolism and low-grade inflammation, as observed concomitantly in the accompanying paper(11). Dietary intervention improves low gene richness and clinical phenotypes, but seems to be less efficient for inflammation variables in individuals with lower gene richness. Low gene richness may therefore have predictive potential for the efficacy of intervention.
Article
Full-text available
Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.
Article
Full-text available
We are facing a global metabolic health crisis provoked by an obesity epidemic. Here we report the human gut microbial composition in a population sample of 123 non-obese and 169 obese Danish individuals. We find two groups of individuals that differ by the number of gut microbial genes and thus gut bacterial richness. They contain known and previously unknown bacterial species at different proportions; individuals with a low bacterial richness (23% of the population) are characterized by more marked overall adiposity, insulin resistance and dyslipidaemia and a more pronounced inflammatory phenotype when compared with high bacterial richness individuals. The obese individuals among the lower bacterial richness group also gain more weight over time. Only a few bacterial species are sufficient to distinguish between individuals with high and low bacterial richness, and even between lean and obese participants. Our classifications based on variation in the gut microbiome identify subsets of individuals in the general white adult population who may be at increased risk of progressing to adiposity-associated co-morbidities.
Article
Full-text available
Complex gene–environment interactions are considered important in the development of obesity. The composition of the gut microbiota can determine the efficacy of energy harvest from food and changes in dietary composition have been associated with changes in the composition of gut microbial populations. The capacity to explore microbiota composition was markedly improved by the development of metagenomic approaches, which have already allowed production of the first human gut microbial gene catalogue and stratifying individuals by their gut genomic profile into different enterotypes, but the analyses were carried out mainly in non-intervention settings. To investigate the temporal relationships between food intake, gut microbiota and metabolic and inflammatory phenotypes, we conducted diet-induced weight-loss and weight-stabilization interventions in a study sample of 38 obese and 11 overweight individuals. Here we report that individuals with reduced microbial gene richness (40%) present more pronounced dys-metabolism and low-grade inflammation, as observed concomitantly in the accompanying paper. Dietary intervention improves low gene richness and clinical phenotypes, but seems to be less efficient for inflammation variables in individuals with lower gene richness. Low gene richness may therefore have predictive potential for the efficacy of intervention.
Article
Full-text available
Reference genomes are required to understand the diverse roles of microorganisms in ecology, evolution, human and animal health, but most species remain uncultured. Here we present a sequence composition-independent approach to recover high-quality microbial genomes from deeply sequenced metagenomes. Multiple metagenomes of the same community, which differ in relative population abundances, were used to assemble 31 bacterial genomes, including rare (<1% relative abundance) species, from an activated sludge bioreactor. Twelve genomes were assembled into complete or near-complete chromosomes. Four belong to the candidate bacterial phylum TM7 and represent the most complete genomes for this phylum to date (relative abundances, 0.06-1.58%). Reanalysis of published metagenomes reveals that differential coverage binning facilitates recovery of more complete and higher fidelity genome bins than other currently used methods, which are primarily based on sequence composition. This approach will be an important addition to the standard metagenome toolbox and greatly improve access to genomes of uncultured microorganisms.
Article
OBJECTIVES:Systemic endotoxemia has been implicated in various pathophysiological sequelae of chronic liver disease. One of its potential causes is increased intestinal absorption of endotoxin. We therefore examined the association of small intestinal bacterial overgrowth with systemic endotoxemia in patients with cirrhosis.METHODS:Fifty-three consecutive patients with cirrhosis (Child-Pugh group A, 23; group B, 18; group C, 12) were included. Jejunal secretions were cultivated quantitatively and systemic endotoxemia determined by the chromogenic Limulus amoebocyte assay. Patients were followed up for 1 yr.RESULTS:Small intestinal bacterial overgrowth, defined as ≥105 total colony forming units per milliliter of jejunal secretions, was present in 59% of patients and strongly associated with acid suppressive therapy. The mean plasma endotoxin level was 0.86 ± 0.48 endotoxin units/ml (range = 0.03–1.44) and was significantly associated with small intestinal bacterial overgrowth (0.99 vs 0.60 endotoxin units/ml, p = 0.03). During the 1-yr follow-up, seven patients were lost to follow up or underwent liver transplantation and 12 patients died. Multivariate Cox regression showed Child-Pugh group to be the only predictor for survival.CONCLUSIONS:Small intestinal bacterial overgrowth in cirrhotic patients is common and associated with systemic endotoxemia. The clinical relevance of this association remains to be defined.