Content uploaded by Sarah Ahannach
Author content
All content in this area was uploaded by Sarah Ahannach on Feb 18, 2022
Content may be subject to copyright.
Citizen-science map of the vaginal microbiome
Sarah Lebeer ( sarah.lebeer@uantwerpen.be )
UAntwerpen https://orcid.org/0000-0002-9400-6918
Sarah Ahannach
UAntwerpen
Stijn Wittouck
UAntwerpen
Thies Gehrmann
UAntwerpen
Tom Eilers
UAntwerpen
Eline Oerlemans
UAntwerpen
Sandra Condori
UAntwerpen
Jelle Dillen
University of Antwerp
Irina Spacova
UAntwerpen https://orcid.org/0000-0003-0562-7489
Leonore Vander Donck
UAntwerpen
Caroline Masquiller
UAntwerpen
Peter Bron
University of Antwerp
Wannes Van Beeck
UAntwerpen
Charlotte De Backer
UAntwerpen
Gil Donders
UAntwerpen
Veronique Verhoeven
veronique.verhoeven@uantwerpen.be
Biological Sciences - Article
Keywords: Citizen science, vaginal microbiome, lactobacilli, large-scale remote sampling, population
cohort, lifestyle impact
Posted Date: February 14th, 2022
DOI: https://doi.org/10.21203/rs.3.rs-1350465/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
1
Citizen-science map of the vaginal microbiome 1
Sarah Lebeer1,*,$, Sarah Ahannach1,$, Stijn Wittouck1,$, Thies Gehrmann1,$, Tom Eilers1, Eline 2
Oerlemans1, Sandra Condori1, Jelle Dillen1, Irina Spacova1, Leonore Vander Donck1, Caroline 3
Masquillier2, Peter A. Bron1, Wannes Van Beeck1, Charlotte De Backer3, Gilbert Donders4,5,6,°, 4
Veronique Verhoeven7,°
5
6
*corresponding author 7
$shared first authors 8
°shared responsible clinicians 9
10
Affiliations 11
1Department of Bioscience Engineering, Research Group Environmental Ecology and Applied 12
Microbiology, University of Antwerp, Groenenborgerlaan 171, 2020 Antwerp, Belgium 13
2Department of Sociology, Center for Population, Family and Health, University of Antwerp, 14
Sint-Jacobstraat 2, 2000 Antwerp, Belgium 15
3Department Communication Sciences, University of Antwerp, Sint-Jacobstraat 2, 2000 16
Antwerp, Belgium 17
4Department of Obstetrics and Gynaecology, University Hospital Antwerp, Drie Eikenstraat 18
655, 2650 Edegem, Belgium 19
5Regional Hospital Heilig Hart, Kliniekstraat 45, 3300 Tienen, Belgium 20
6Femicare, Clinical Research for Women, Gasthuismolenstraat 33, 3300 Tienen, Belgium 21
7Department of Family Medicine and Population health (FAMPOP), University of Antwerp, 22
Doornstraat 331, 2610 Antwerp, Belgium 23
24
25
Keywords 26
Citizen science / vaginal microbiome / lactobacilli / large-scale remote sampling / population 27
cohort / lifestyle impact 28
2
Abstract 29
The vaginal microbiome is crucial for women’s health and reproduction, but its ecology and 30
determinants in the general population are still unclear. This lack of a reference framework 31
hampers much-needed innovations in diagnostics and therapeutics. Here, we remotely 32
mapped the vaginal microbiome of 3,345 women in Western Europe via a citizen-science 33
approach. More than 75% of the vaginal samples were dominated by Lactobacillus taxa, but 34
not in discrete community state types. Compositional correlation network analysis validated 35
with public data pointed at six main modules of interacting microbes: a Lactobacillus 36
crispatus-, Lactobacillus iners-, Gardnerella-, Prevotella-, Anaerococcus-, and gut-derived 37
module. In the first module, Limosilactobacillus taxa were functionally connected to L. 38
crispatus and Lactobacillus jensenii. This module was positively associated with the luteal 39
phase of the menstrual cycle and negatively with the number of vaginal complaints, while the 40
Gardnerella-module was associated with discharge and increasing age. Contraceptives with 41
oestrogen correlated with higher levels of the L. crispatus- and less of the Gardnerella-42
module, with the opposite found for a hormonal intrauterine device or having multiple 43
partners. Mothers had lower relative abundance of the L. crispatus-module and more 44
Bifidobacterium, Lactobacillus gasseri and Streptococcus. Other covariates such as BMI, 45
menstrual pads and cups, smoking and dietary habits were also associated with the microbial 46
constellation. These findings suggest that lifestyle interventions have potential to improve 47
vaginal health when combined with dedicated therapies. 48
3
Introduction 49
The vaginal microbiome plays a central role in women’s health and reproduction, but detailed 50
knowledge about its general ecology and the host-side determinants of its composition is 51
lacking. For more than a century, the vagina has been considered a rather simple ecosystem 52
characterized by a low diversity and a high abundance of lactic acid-producing bacteria1. In 53
1892, Döderlein and colleagues described a gram-positive bacterium, as the key bacterium in 54
the vagina2. Since then, it has been well established that Lactobacillus taxa are the most 55
dominant bacteria in female populations from European and Asian3–5. The dominance of 56
these lactobacilli in the vagina is linked to health: when disrupted by an overgrowth of 57
anaerobic bacteria such as Gardnerella vaginalis during bacterial vaginosis (BV), or because 58
of inflammation during aerobic vaginitis (AV) or pelvic inflammatory disease (PID)6,7, an 59
increased susceptibility to conditions such as sexually transmitted diseases8–10 and adverse 60
reproductive outcomes11,12 is observed. In 2020, the taxonomy of the family Lactobacillaceae 61
was significantly revised13. This was an important taxonomic update, as it revealed that the 62
typical vaginal species all belong to the same genus: the Lactobacillus genus strictu sensu. In 63
addition, the update highlighted the evolutionary distances to other lactobacilli such as 64
Lacticaseibacillus rhamnosus, Lactiplantibacillus plantarum and Limosilactobacillus reuteri 65
that are commonly studied as gut probiotics13. 66
With the advent of amplicon sequencing, the vaginal microbiome has been generally 67
described based on five vaginal community state types (CSTs)3. L. crispatus is dominant in CST 68
I, L. gasseri in CST II, L. iners in CST III and L. jensenii in CST V. CST IV is not dominated by 69
Lactobacillus, but rather a mix of more facultative or strict anaerobes such as Gardnerella, 70
Atopobium, Prevotella, and Finegoldia14. This CST IV is found in asymptomatic women but is 71
4
more associated with dysbiosis and problems such as BV. The recent VALENCIA (VAginaL 72
community state typE Nearest CentroId clAssifier) study proposed thirteen CSTs, based on 73
meta-analysis of 1,976 women from different study cohorts, with particularly extra 74
subdivisions for this CST IV15. The CST framework has been very useful to simplify high-75
dimensional microbial community datasets and facilitate statistical analyses. However, it is 76
currently unclear how well the vaginal CSTs reflect the inherent biology. 77
To better understand the ecology and function of vaginal lactobacilli and other microbiome 78
members and to better design diagnostic and therapeutic options for vaginally associated 79
diseases, more reference datasets are also necessary. So far, female populations in North 80
America, Scandinavia and South-Africa have been mainly characterized3,15–18, while there 81
seems to be a vast knowledge gap on the vaginal microbiome in other populations. Moreover, 82
other valuable information can come from human-animal comparisons19. Humans appear to 83
be the only animals with a vagina mostly dominated by Lactobacillus taxa under healthy 84
conditions20,21. This unique phenomenon is at present not yet well understood, but the typical 85
hormonal fluctuations throughout the menstrual cycle, particularly estrogen14; the glycogen 86
accumulated in the vaginal epithelial cells22; the typical human diet since agriculture was 87
introduced23; and the strong antimicrobial capacity of lactobacilli that protect the limited 88
offspring of humans from infections20 have all been suggested to play a role. A detailed 89
mapping of lifestyle and personal characteristics in relation to the vaginal microbiome can aid 90
to better understand the unique build-up of the human vaginal microbiome. 91
In this citizen science-based self-sampling study, we mapped the vaginal microbiome in a 92
large cohort in Belgium, with a particular focus on the prevalence and abundance of key taxa 93
of the lactobacilli, and their association with life-course and lifestyle factors. Two self-94
5
collected vaginal swabs were donated by 3,345 women ranging from 18 to 98 years old: one 95
for 16S rRNA amplicon sequencing and one for culturing and metabolic analyses. The project 96
was named ‘Isala’, after Isala Van Diest (1842-1916), honoring the very first female doctor in 97
Belgium. 98
Results 99
Citizen Science-based study cohort. The call for participation was launched in Belgium 100
(Western Europe) in March 2020. Within ten days, 6,007 participants registered on the 101
website and registrations were closed (https://isala.be/en/). A total of 4,682 of the original 102
registrants completed the questionnaire with an average completion time of 49 minutes and 103
received the self-sampling kit (Figure 1A-B). The sole exclusion criteria were pregnancy and 104
being younger than 18 years. Of the participants that filled in the questionnaire, 3,345 105
provided two vaginal swabs, allowing microbiome, culturomics and metabolomics analyses 106
(Figure 1C). The mean age and body mass indexes (BMIs) of the included participants were 107
31.8 +- 9.5 years and 24.3 +- 4.6 kg/m2, respectively (Figure 1D). 108
The call was directed towards the general female population outside a clinical setting. Indeed, 109
69.7 % of the women did not report a single vaginal health symptom at the time of sampling 110
based on the questionnaire data (Table S3). 18.3% had one self-reported vaginal symptom, 111
ranging from redness, dryness, odor, increased and/or discoloration of discharge, pain during 112
intercourse, itching, swelling, burning, to urinary infection. Only 7% and 2.6% reported two, 113
or three symptoms respectively. Nevertheless, more than 50% and 70% of the participants 114
answered to have at least once experienced a fungal infection or bladder infection, 115
respectively, which are prevalences in agreement with previous studies24,25. 116
6
117
Figure 1 – Characterization of the Isala study cohort and key physiological, behavioral, lifestyle and 118
environmental factors of the participating women. (A) The self-sampling kit sent to the participants 119
via the national postal service. (B) Geographical overview of the participants that sent in samples for 120
this project, by overlaying their zip codes on a map of the Flemish region and some cities from the 121
Wallonia region of Belgium. Darker colors represent higher numbers of participants with that specific 122
zip code. (C) An overview of the population cohort that registered within ten days after the first 123
announcement, with their different citizen-science roles to the Isala project: minimal involvement by 124
expressing online interest as potential donor via website and answering five questions on age, 125
pregnancy, contraceptive use, country of living until three years and zip code (gray), partial 126
involvement by filling out the extensive questionnaire (blue) and full involvement as donors and 24h 127
follow-up questionnaire (pink). The distribution of a selection of the questionnaire variables: (D) age 128
and BMI, (E) reported contraceptive use of the whole cohort, (F) a subset of the binary variables. 129
5.2% of the Isala participants were menopausal. 30.2% used a combined oral contraceptive 130
pill, 19.9% a hormonal intrauterine device, 13.1% condoms, 3.7% a copper intrauterine device 131
and 2.5% a progestogen-only pill (Figure 1E). Other forms of contraception (implant, cup, 132
periodic celibacy, sterilization of the participant and/or partner, etc.) were less frequent at 133
5.7% combined (Table S3). About four out of ten (39.2%) of all women had ever been 134
pregnant. 16.0% reported sexual intercourse within 24h before sampling. 9.1% identified 135
7
themselves as a smoker, while 8.6% reported drug use (Figure 1F). As expected, age was 136
significantly correlated with BMI, previous pregnancy, having kids and menopause (Figure S1). 137
4.8% of the participants were not born in Belgium, and 10.0% identified with a culture besides 138
the Belgian one. Ethnicity or race, as previously collected as metadata in US vaginal 139
microbiome studies (Caucasian, African-American, Asian, Hispanic)26 was not explicitly 140
questioned, since considered not relevant to the Belgian population with its diverse 141
ethnography27. 163 participants (5.4%) reported to be part of families below the national 142
poverty threshold, calculated based on the total family income and number of dependents28. 143
Dominance of Lactobacillus taxa. 3,345 fully involved Isala donors delivered vaginal samples 144
between July and October 2020, of which 3,196 (96.6%) passed quality control based on 145
estimated DNA concentrations. The high-quality samples totaled over 82 million high-quality 146
V4 16S rRNA read pairs, ranging from 2,126 to 376,242 read pairs per sample with an average 147
of 25,909. Read pairs were merged and denoised into a total of 4,972 unique Amplicon 148
Sequence Variants (ASVs). Short-read 16S rRNA gene sequencing studies generally do not 149
allow for species-level identification29. This also applies to many vaginal species: for example, 150
the species L. jensenii and Lactobacillus mulieris both occur in the vagina, but cannot be 151
discriminated using 16S rRNA gene regions30. To be able to analyze the data at the functionally 152
interpretable genus level, but still be able to discriminate between the “big four” vaginal 153
Lactobacillus species, the Lactobacillus genus was divided into subgenera based on a high-154
quality core genome phylogeny (Figure 2C-D and Figure S2). This resulted in nine subgenera, 155
four of which are known to be associated with the vagina: the L. crispatus group, L. iners 156
group, L. jensenii group and L. gasseri group. To validate this subgenus-level classification 157
approach, shotgun metagenomic sequencing was done for a subset of samples (n = 18, Figure 158
8
2C). For the four subgenera containing the four typical vaginal Lactobacillus species, the 159
relative abundance correlations between the methods were remarkably large (Figure S3). 160
For each sample, the dominant (sub)genus was then determined as the (sub)genus with the 161
largest relatively abundance over 30%. Employing these criteria, the L. crispatus group (163 162
ASVs) dominated the largest number of samples (43.2% of the participants), followed by the 163
L. iners group (120 ASVs) (27.7%) and Gardnerella (49 ASVs) (9.8%). Several smaller dominant 164
taxa also occurred, namely the L. jensenii group (54 ASVs) (3.5%), Prevotella (421 ASVs) (3.4%), 165
the L. gasseri group (56 ASVs) (3.2%), Bifidobacterium (18 ASVs) (1.8%) and Streptococcus (52 166
ASVs) (1.2%) (Figure 2A-B). 167
Because of the citizen-science nature of the project, the personal vaginal microbiome profiles 168
were communicated to the participants before the submission of this manuscript (Figure S4 169
and https://isala.be/en/results/). Participants received information about the top eight taxa 170
in the dataset accompanied by information for non-microbiology experts (Figure S5). A 171
feedback questionnaire (n = 2,000) showed that 83% of participants who received their 172
results, perceived them as easy to interpret and 99.6% of participants would volunteer again 173
in future Isala endeavors. 174
9
175
Figure 2 - Overview of the most abundant taxa in the vaginal microbiome of the Isala cohort, with 176
particular focus on the Lactobacillus taxa. (A) Stacked bar chart describing the microbiome 177
composition of all participants in the study in terms of the 10 most abundant taxa. (B) Occurrence of 178
the most dominant taxa in the vaginal microbiome of the Isala cohort based on the highest taxonomic 179
resolution possible with our available data. Dominance was defined as the most abundant taxon that 180
constituted at least 30% of the profile. “Other” refers to the number of samples where a different 181
(sub)genus was dominant from the seven that are shown; “no dominance” refers to the number of 182
samples where not a single (sub)genus reached at least 30% abundance. (C) Validation of the 16S 183
amplicon sequencing pipeline, including classification to Lactobacillus subgenera, with shotgun 184
sequencing data (n = 18). For the “big four” Lactobacillus subgenera, the spearman correlations 185
between their relative abundances in the amplicon and shotgun samples are shown. (D) Maximum-186
likelihood phylogeny of species of the genus Lactobacillus inferred from the amino acid sequences of 187
100 single-copy core genes. Colors indicate the nine custom-defined subgenera used in this study. 188
Bold tip labels indicate representative species of the subgenera. Species names were taken from the 189
Genome Taxonomy Database31, which splits species that are very diverse, yielding e.g., L. 190
delbrueckii_A and L. jensenii_A, the latter recently identified as L. mulieris32. The size of the circles 191
reflects the genome size of representative genomes of the species (with the average genome size also 192
put between brackets). 193
194
10
Vaginal community structure. To enable a detailed map of the different constellations of the 195
vaginal microbiota in our cohort, samples were embedded in a two-dimensional t-SNE 196
space33. t-SNE projects a high dimensional space into a low dimensional space while aiming 197
to preserve inter-sample distances, placing higher weight on smaller distances to preserve 198
sample neighborhoods. This allows a better global representation of the diversity compared 199
to other commonly used approaches such as PCoA plots33. This t-SNE plot was annotated with 200
the two most dominant taxa per sample (Figure 3A-B). Several high-density regions were 201
observed in this two-dimensional representation that broadly corresponded to the five 202
previously described CSTs3, but these high-density regions were connected by intermediate 203
regions (Figure 3A). A clear example was provided by the L. crispatus and L. iners high-density 204
regions, which were connected by samples with L. crispatus and L. iners as the two most 205
abundant taxa. This was the case for 454 samples, of which 22% contained L. crispatus and L. 206
iners in near-equal proportions (Figure 3C, gray dashed enclosure). This observation suggests 207
that the previously described CSTs are not distinct possibilities in vaginal community 208
composition. This is especially apparent when visualizing the samples based on the second 209
most dominant (sub)genus (Figure 3B) and the relative abundance of the top (sub)genus 210
(Figure 3C). Intermediate regions can be observed in which at least two subgenera are co-211
dominant, with the same patterns observed in the datasets aggregated in the VALENCIA 212
study15 (Figure 3D-G). As in the Isala data, samples dominated by L. iners and L. crispatus at 213
near equal abundances were also observed here. 214
11
215
Figure 3 – Vaginal microbiome structure of the Isala cohort. (A) t-SNE plot of microbiome samples in 216
the Isala study. Embedding colored by the most abundant (sub)genus. Broad community state types 217
(CSTs) are delineated with black lines, except CST IV, which is composed of the remaining samples. (B) 218
Samples are colored by the second-most abundant (sub)genus. (C) Samples are colored by the largest 219
relative abundance level in each sample. (D) Structure of the vaginal microbiome of the VALENCIA 220
public dataset. A t-SNE plot of all microbiome samples of the VALENCIA dataset (multi-temporal 221
samples per participant included), colored by the 13 CSTs presented in that paper. CST I—L. crispatus 222
dominated (A high relative abundance, B lower relative abundance), CST II—L. gasseri dominated, CST 223
III—L. iners dominated (A high relative abundance, B lower relative abundance), and CST V—L. jensenii 224
dominated. CST IV-A - Candidatus Lachnocurva vaginae (BVAB1) with some G. vaginalis. CST IV-B - G. 225
vaginalis with low relative abundance of Ca. L. vaginae. CST IV-C0 - Prevotella, CST IV-C1—226
Streptococcus, CST IV-C2—Enterococcus dominated, CST IV-C3—Bifidobacterium dominated, and CST 227
IV-C4—Staphylococcus dominated. Samples of the VALENCIA dataset colored by (E) the most 228
dominant genus, (F) the second most dominant genus, (G) and by the largest relative abundance level 229
in each sample. The branching point between L. crispatus dominated and L. iners dominated samples 230
is indicated with a grey line. Of note, BVAB1 corresponds to genus EU728721_g in the Isala dataset, 231
where it only occurred in 1.4% of the participants (not visualized in panel A-B because not in top 10). 232
233
The correlation between taxa abundances was investigated with SparCC, considering the 234
compositionality of the relative abundance data34. Six main modules of intercorrelated taxa 235
were determined (Figure 4). The first module contained the L. crispatus group, L. jensenii 236
group, and Limosilactobacillus. Correlations between the taxa in this module were weakly 237
positive (r = 0.18 – 0.40). A second module was assigned to a group of taxa that included 238
12
Gardnerella, Sneathia, Atopobium and Aerococcus (Gardnerella module, r= 0.11-0.5). A third 239
module contained the relatively strongly correlated Anaerococcus, Peptoniphilus and 240
Finegoldia taxa (Anaerococcus module, r= 0.1-0.71), together with some more weakly 241
correlated taxa such as Staphylococcus. A fourth module was composed of Prevotella and 242
Dialister (r=0.78), which jointly correlated positively with both the Gardnerella and 243
Anaerococcus modules, while the latter two were negatively correlated with each other. A 244
fifth module was composed of taxa associated with the gut, including Ruminococcus, 245
Bacteroides, and Subdoligranulum (Gut module, r=0.16-0.28). Interestingly, the Gut module 246
was positively correlated with the L. crispatus module. Finally, the sixth main module 247
constituted the L. iners group and the genus Ureoplasma. A few taxa did not show any strong 248
correlations with other taxa, notably Bifidobacterium, Streptococcus and the L. gasseri group. 249
Yet, when computing SparCC correlations in the VALENCIA dataset, we identified a striking 250
concordance with the modules identified in the Isala dataset (Figures S6 and S7). In both 251
datasets, the L. crispatus module showed moderately negative correlations to the taxa in the 252
Gardnerella, Anaerococcus and Prevotella modules (-0.22, -0.15, and -0.27 respectively), 253
which is in line with the previously documented inhibitory capacity of L. crispatus-dominated 254
communities against these potential vaginal pathobionts 35,36. 255
13
256
Figure 4 – Six main modules of interacting microbes as defined by a compositional correlation 257
analysis. Modules are enclosed in gray. Positive correlations in blue, negative correlations in red. 258
Thickness of the line indicates the strength of the correlation. Exact correlations are given in Figures 259
S6 and S7. 260
Our analysis also pointed at a strong correlation between the genus Limosilactobacillus and 261
both the L. crispatus and L. jensenii groups (which were also positively correlated with each 262
other). Limosilactobacillus taxa did not show a high average relative abundance (0.4%) in our 263
dataset, but had a surprisingly high prevalence of 47.8% (Figure 2A and Table S1). Based on a 264
case-by-case ASV sequence comparison with a 16S reference database, we could assign the 265
ASVs classified as Limosilactobacillus to one of three groups within the genus: a Lactobacillus 266
reuteri group, the species Limosilactobacillus coleohominis and the species 267
Limosilactobacillus fermentum. The L. reuteri group contained the species Limosilactobacillus 268
reuteri, Limosilactobacillus vaginalis and five other species that are not known to occur in the 269
human vagina. We found a prevalence of 43.7% for the L. reuteri group, 11.5% for L. 270
14
coleohominis and 4.1% for L. fermentum. In addition to our large dataset with amplicon-271
sequenced samples, we also inspected the 264 vaginal metagenomes of the VIRGO metastudy 272
for the presence of Limosilactobacillus species. The most prevalent species were L. 273
coleohominis (25%), L. vaginalis (20%) and L. fermentum (1%)37. L. fermentum was most 274
frequently cultured from a subset of 592 vaginal swabs, with even more isolates obtained 275
than for L. crispatus and L. jensenii based on standard growth conditions for lactobacilli (Table 276
S1). Overall, culture of the vaginal lactobacilli was cumbersome under the standard conditions 277
and remains to be further optimized. 278
Impact of host covariates on the vaginal microbiome. We then analyzed the association of 279
personal data with key features of the vaginal microbiome (Figure 5). As an alternative to 280
reducing the dimensionality of the microbial community data through a classification into 281
CSTs, alpha and beta-diversity metrices, twelve individual (sub)genera of interest and 282
eigentaxa (see Methods) of the four largest modules of intercorrelated taxa were selected for 283
association testing. The functional relevance of this latter approach was confirmed by the 284
association observed between change in discharge and an increase in the Gardnerella-285
module, but not with specific taxa. Similarly, a lower relative abundance level of the L. 286
crispatus-module was associated with an increased number of vaginal complaints specifically. 287
Considering age had the largest effects, the data were also adjusted for this parameter. 288
15
289
290
Figure 5 - Statistical analysis of the association of different personal, reproductive, lifestyle, health, 291
hygiene, environmental and dietary factors with the vaginal microbiome space. Each panel displays 292
effects on different levels of the microbiome: (A) the effect on the beta-diversity between the samples 293
(Adonis test), (B) the effect on the alpha-diversity of the samples, (C) the effect on the abundances of 294
specific taxa and on the eigentaxa of the modules discovered in the SparCC correlation analysis. The 295
A and G modules refer to the Anaerococcus and Gardnerella modules, respectively. Asterisks 296
represent significant associations (FDR adjusted and using a threshold of 0.05; white and black 297
asterisks are merely for visualisation purposes). The number of samples for each question was almost 298
the entire study (n = 3,043 participants). Due to missing data or specific comparisons, this can deviate, 299
and detailed counts are provided in Table S3. 300
16
Besides age, having had children had the strongest association with beta-diversity, explaining 301
1.4% of the microbiome variation. It was significantly negatively associated with the 302
abundance levels of L. crispatus, L. jensenii and Limosilactobacillus (the L. crispatus-module), 303
and positively with Bifidobacterium, L. gasseri, and Streptococcus. Breastfeeding at the time 304
of sampling was correlated with beta-diversity, lower relative abundance of L. crispatus and 305
Limosilactobacillus and higher levels of Streptococcus. Being “peri- or post-menopausal” did 306
not show a significant association with the beta-diversity, but it was correlated with an 307
increased alpha-diversity and levels of Streptococcus, Prevotella and the Anaerococcus-308
module. Having had intercourse in the last 24 hours was associated with a higher alpha 309
diversity, and higher levels of Anaerococcus, Finegoldia, and in particular Streptococcus. We 310
also investigated the associations of partnership with the vaginal microbiome. Compared to 311
not being sexually active, having a monogamous relationship correlated with the beta-312
diversity and higher levels of Streptococcus, but no associations were noted for the alpha-313
diversity. However, having multiple partners was linked with a higher alpha-diversity and 314
higher levels of the Gardnerella-module, but also higher levels of the L. crispatus-module, and 315
less of the Anaerococcus-module. Having a male partner was associated with lower levels of 316
L. jensenii and the L. crispatus-module, compared to having a female partner. The impact of 317
the stage of the menstrual cycle was evaluated for pre-menopausal participants not taking 318
any related hormonal contraceptives, with the follicular phase starting on the first day of 319
menstruation and the luteal phase after ovulation (Figure S8). As expected, the follicular 320
phase was associated with higher alpha-diversity, together with lower levels of the L. 321
crispatus-module and higher levels of Prevotella and the Gardnerella- and Anaerococcus-322
modules, compared to the ovulation and luteal phase. The opposite was true for the luteal 323
phase (compared to the ovulation and follicular phase). Combining the data for 324
17
contraceptives with a high predicted exogenous estrogen level (combination pill, vaginal ring 325
or patch) showed an association with an increase in the L. crispatus- module and less of the 326
Gardnerella-module. The oral combination contraceptive pill, which disrupts the natural cycle 327
and contains estrogen and progestin23, correlated with lower alpha-diversity, lower relative 328
abundances of Prevotella and Gardnerella but higher levels of the gut taxa module. Use of a 329
ring contraceptive was linked to a significantly lower alpha-diversity and lower levels of 330
Prevotella. Use of a hormonal intra-uterine device (containing only progestin) was associated 331
with more of the Gardnerella-module. Having been vaccinated against HPV was linked to 332
lower levels of the Gardnerella-module. Furthermore, we also observed associations for 333
menstrual hygienic products, with a menstrual cup appearing more beneficial for the L. 334
crispatus-module and pads being more associated with an increased alpha diversity. The 335
menstrual pads also significantly reduced the L. crispatus-module and increased the 336
Anaerococcus-module, especially when used in the last 48h. Wiping the vulva from front to 337
back after a bathroom visit was associated with lower levels of the gut taxa module in the 338
vagina. 339
Among the general health and lifestyle factors that were questioned, the largest effect was 340
BMI, which was significantly associated with the beta-diversity, higher alpha-diversity, and 341
higher levels of bacteria in the Anaeroccocus-module. Specific dietary components were also 342
linked with the overall composition and diversity of the vaginal microbiome when adjusting 343
for age. The consumption of sugary beverages was noticeably associated with beta-diversity, 344
and with lower levels of the L. crispatus module, while the consumption of light beverages 345
(marketed as diet, sugar-free, zero-calorie or low-calorie) in the last 24h was associated with 346
a significantly higher alpha-diversity and higher levels of Bifidobacterium. A high portion of 347
seed consumption was significantly associated with beta-diversity, but not with the specific 348
18
taxa or modules that we examined. High frequency of vegetable consumption and its 349
associated fibers, particularly in the last 24h, and being pescatarian were associated with a 350
minor increase of L. crispatus-module. Ethanol consumption in the past 24h was associated 351
with higher levels of the L. crispatus- and gut taxa module. Meat consumption was linked to 352
lower levels of the L. crispatus-module, and higher levels of Prevotella and the Anaerococcus-353
module. Significantly lower levels of the Anaerococcus module taxa occurred when probiotic 354
capsules were consumed in the last 24 hours. In contrast, consumption of probiotic yoghurts 355
in the last 24h was associated with lower relative abundance of L. gasseri. 356
Additional lifestyle factors other than diet were also evaluated. Sleeping less than seven hours 357
per weeknight corresponded to a significantly higher alpha-diversity and higher levels of 358
Anaerococcus and Finegoldia, while sleeping between 7 and 8.5 hours corresponded to a 359
lower alpha-diversity. In addition, smoking was associated with higher alpha-diversity, and 360
higher levels of the Gardnerella-module. While taking drugs was not linked to the diversity of 361
vaginal samples, it was linked to higher levels of L. iners and Limosilactobacillus . Income 362
inequality within couples did not show a significant effect on the vaginal microbiome but 363
being below the Belgian poverty threshold was linked to a higher alpha-diversity, and in 364
particular, higher levels of Gardnerella. Being born in Belgium and living there for the first 3 365
years was associated with significantly lower levels of the Gardnerella-module. Furthermore, 366
living in a more urbanized/polluted area (i.e., city center, village center, busy road, industrial 367
zone) versus suburban/countryside environment (i.e., residential area, rural area, green 368
zone/recreation zone) was associated with lower versus higher levels of Streptococcus. 369
19
All significant factors mentioned above could explain 8.01% of the variation in the vaginal 370
microbiome, compared to 7.63% of the variations explained by covariates in a related study 371
on the gut microbiome in the Belgian population38. 372
Discussion 373
The Isala citizen science project on the vaginal microbiome was inspired by a strong need for 374
a better understanding of the vaginal microbiome outside a clinical setting. The enthusiasm 375
of participants willing to donate intimate samples is in line with the current trend of more 376
women taking their health into their own hands. The fact that our study was fully remote had 377
both advantages and limitations. No blood samples, clinical exams or host genetics data could 378
be obtained, but the fully remote setting and large online questionnaire also provided us with 379
unique opportunities to gain widespread access to samples and intimate data. Other inherent 380
limitations of our study cohort were the slight bias towards a high socioeconomic status, like 381
many other citizen science studies39,40, and the fact that we had to rely on only one timepoint 382
sampled per participant. On the other hand, the fact that intimate self-sampling could be 383
done in the privacy of the home setting had a positive impact on the number of women willing 384
to participate, resulting in a large, diverse set of samples with sufficient variation to study key 385
parameters such as age, BMI, menstrual cycle, contraceptive use, menopausal status, 386
obstetrical parameters, sexual and vaginal health, diet, income, and sleeping habits. The fact 387
that the analysis of all samples was done within the same lab and a small timeframe 388
minimized the technical variability. Taken together, this study set-up enabled us to obtain 389
novel insights in the average vaginal microbiome constellation of this self-reported healthy 390
Western European population. 391
20
The first key finding of this work was the high number of participants with a dominance of 392
Lactobacillus in this Western-European population cohort: 75% of the women were 393
dominated by Lactobacillus taxa, in particular by taxa belong to the L. crispatus and L. iners 394
group, comparable to similar studies3,20. Subgenus or group level classification was preferred 395
to better reflect the diversity in ASVs than generally reported. The L. crispatus group (163 396
ASVs) was detected in 43.2% of the participants. L. iners was dominant in 27.7% of the 397
participants. As we and others have previously reviewed, L. iners has an ambiguous role in 398
the vagina41. The fact that we found L. iners to be so prevalent in complaint-free women 399
suggests that it is often probably rather a friend than a foe in healthy women. Yet, we 400
observed a high diversity of ASVs for L. iners (120 unique ASVs), in line with previous 401
suggestions of different clones of L. iners with distinct functional properties42. Similarly, 402
Gardnerella was dominant in 9.8% of the Isala women, although it is often considered a 403
pathobiont in the vagina. Yet, the association of Gardnerella with symptoms and disease 404
appears to depend on the specific species and strains 43,44, the other members in the vaginal 405
community45 and the host46. This context- and taxon-dependent role of the vaginal bacteria 406
highlights that it is important to capture the diversity of the vaginal ecosystem in the most 407
biologically relevant way. From five3 to thirteen CSTs15 have been previously proposed. CSTs 408
often confuse clinicians and researchers, as they have been mainly proposed for statistical 409
and epidemiological purpose15, and should not be interpreted as stable community state 410
types. With t-SNE embedding analyses, we clearly showed that the vaginal microbiome space 411
is a continuum, highlighting that CSTs should not be interpreted as the existence of fully 412
discrete states of the vaginal microbiome, as is now also increasingly recognized15,47,48. For 413
example, the two most abundant taxa, the L. crispatus and L. iners groups frequently co-414
occurred in varying and even equal proportions. As an alternative approach to maximally 415
21
capture the diversity of the microbial space while still enabling the analysis of associations 416
with as many metadata as possible, we introduced modules of taxa of interacting vaginal 417
bacteria (with positive correlations within and mostly negative correlations between 418
modules), for which we made eigentaxa for correlation analyses. The taxa-taxa correlations 419
likely reflect relevant biologic phenomena including positive or negative microbial 420
dependencies such as cross-feeding49–51, inhibition via antimicrobial production52 but also 421
different immune or inflammation states of the host, where different “states” of the host 422
enrich or restrict different bacteria45. The fact that we could validate the existence of these 423
modules in another large independent dataset (VALENCIA) highlights their biological 424
relevance and existence independent of our dataset, in contrast to CSTs obtained by 425
hierarchical clustering which are more dataset dependent. 426
The L. crispatus-module probably reflects the most common healthy homeostatic state, based 427
on the known associations of these lactobacilli with vaginal health53 and our own observations 428
of a reduced abundance of this module with increased number of vaginal complaints versus 429
its increase with increasing estrogen levels. Notably, the association between this module and 430
vaginal complaints was lost with the individual taxa, showing the added value of 431
implementing these modules. Another unprecedent finding for this module is the prevalence 432
and possibly stabilizing capacity of Limosilactobacillus. This genus was shown to be highly 433
prevalent, with occurrence in almost 50% of the women sampled, and showed to be easier to 434
culture than the classic big four (i.e., L. crispatus, L. iners, L. gasseri and L. jensenii). Positive 435
interactions between different taxa of lactic acid bacteria are very common in food 436
fermentations where lactic acid bacteria dominate. In yoghurt, for instance, Streptococcus 437
thermophilus and Lactobacillus delbrueckii subsp. bulgaricus exchange crucial metabolites, a 438
process called protocooperation49. In kefir, it was recently shown that Lactobacillus 439
22
kefiranofaciens, which dominates the kefir community, uses kefir grains to bind together all 440
other microbes that it needs to survive50. Such mutualistic interactions have also been 441
observed for related Lactobacillus taxa within vertebrate hosts. For example, in the rodent 442
gastrointestinal tract, Lactobacillus johnsonii needs L. reuteri for biofilm formation54. It 443
appears plausible that a similar interaction occurs in the vagina between species of the same 444
two genera, where one or more Limosilactobacillus species support L. crispatus and L. jensenii 445
as keystone taxa. Of note, one of the most widely used vaginal probiotics, L. reuteri RC-14, 446
has been shown to have the capacity to prevent BV in women with HIV55,56 and improve the 447
BV cure rate with single dose of tinidazole57. Yet, in these previous studies, it is difficult to 448
differentiate the effect of L. reuteri RC-14 from the other applied probiotic strain 449
Lacticaseibacillus rhamnosus GR-157. 450
While the L. crispatus module contains presumed health-associated taxa, three of the 451
modules contain taxa previously associated with dysbiosis: the Gardnerella-module consists 452
mostly of taxa associated with BV58,59, while the Anaerococcus- and Prevotella modules also 453
contain taxa previously associated with BV45,60, but also with more inflammatory host states 454
such as AV6,7,61, endometriosis62 and PID63. The negative correlation between the Gardnerella 455
and Anaerococcus modules is in line with the view that BV and other inflammatory states such 456
as AV are different forms of dysbiosis with different underlying causes7. In this light, the 457
positive correlation of the Prevotella module with both modules is harder to explain and 458
requires further investigation. Interestingly, the number of different vaginal complaints 459
reported by the participants was not significantly associated with any of the three modules 460
containing taxa known to be dysbiosis-associated, but only with a reduction of L. crispatus 461
module taxa. This suggests that the presence of these modules in itself is not sufficient for a 462
dysbiotic state to develop; such a development would require an extra host-side factor such 463
23
as a lack of immune control (such as sometimes thought for BV45,64 or the development of an 464
inflammatory state (such as observed in AV)6,18. For change in discharge, it is noticeable that 465
we found a clear association with the Gardnerella-module, but not with the individual taxa, 466
highlighting again the relevance of microbe-microbe interactions. Similarly, we interpret our 467
observation of a gut taxa module by the existence of a gut-vagina axis, which is not only a 468
source of potential urogenital pathogens but also of beneficial colonizers. For the latter, the 469
positive correlation with the L. crispatus module is of particular interest. 470
Having established this update picture of the vaginal microbiome constellation and collecting 471
a large dataset of personal data via questionnaires, allowed us to then perform an in-depth 472
analysis of covariates. We could confirm previously found associations such as for BMI65, the 473
contraceptive pill66 and smoking67. The fact that in our dataset especially estrogen-containing 474
contraceptives had a positive association with the levels of the L. crispatus-module, and were 475
also linked to less of the Gardnerella-module, is in a way reassuring, given the fact that it is so 476
widely administered in Western Europe and completely abolishes the spontaneous menstrual 477
cycle. A disruption of the vaginal microbiome does not seem a major side effect of the 478
combination pill, although we and many Isala participants acknowledge the existence of other 479
side effects, including impact on mood and libido68–70 and increased risk for venous 480
thromboembolism71,72 , which are important to consider when choosing the personally most 481
suitable contraceptive method. Notably, the association of a progestin-containing IUD and 482
increased Gardnerella-module found here could be included in information provided to 483
women choosing this contraceptive method. Our data are in line with clinical data that 484
insertion of a hormonal IUD temporarily increases BV and over time increases Candida spp. 485
colonization in the vagina73, while systemic progestin-only contraceptives appear to have 486
mixed effects on the vaginal microbiome74. 487
24
The life event with the most significant impact on the vaginal microbiome was having children 488
or having been pregnant, which correlated with an overall reduction in L. crispatus, L. jensenii 489
and Limosilactobacillus levels and an increase in Streptococcus, Bifidobacterium and L. gasseri 490
levels. A higher taxonomic resolution was not possible, but these three genera contain taxa 491
beneficial to babies as initial colonizers of the oral cavity and gut of newborns75. It has been 492
previously shown that most women experience a postdelivery disturbance in their vaginal 493
microbiome, characterized by a decrease in Lactobacillus species and increase in diverse 494
anaerobes which persisted for up to one year76. In our Isala dataset, it was surprising that we 495
observed the signature of reduction in the L. crispatus-module and increase in Streptococcus, 496
Bifidobacterium and L. gasseri in all women having biological children, independent of their 497
age. This suggests that the impact of pregnancy could be long-lasting. We have at present no 498
explanation for this phenomenon, although we do acknowledge we have a rather young 499
cohort (average age 31.8 +- 9.5 years). Of note, breastfeeding women (who recently 500
delivered) showed similar and even stronger associations for reduction in L. crispatus and 501
increase in Streptococcus. Hormonal and associated sugar-level changes during pregnancy 502
(including lower estrogens during breastfeeding), as well as the cervix shortening could all be 503
involved and provide interesting aspects for further research. Moreover, the fact whether 504
childbirth has taken place by vaginal or abdominal mode (C-section), the latter with or without 505
preceding labor (i.e., secondary or primary C-section), may have played a major role, and 506
remains to be elucidated in further studies. 507
Another intriguing finding of our Isala citizen-science study is how dietary choices could have 508
a small, but significant impact. For example, intake of vegetable fibers, alcohol consumption 509
and being a pescatarian had a significant beneficial impact on the L. crispatus-module, while 510
drinking sugary beverages had a negative impact. These associations should obviously be 511
25
interpreted with care and not taken as one-on-one directions towards lifestyle 512
improvements. Alcohol consumption, for example, was associated with a higher abundance 513
of the L. crispatus module, but has an established detrimental impact on the gut 514
microbiome77. By contrast, limiting intake of sugary drinks appears a lifestyle intervention 515
that benefits multiple habitats that make up the human body. Another intriguing finding was 516
the different associations found for probiotic capsules versus yoghurts, possibly because 517
different strains and species are consumed with these products. Consumption of probiotic 518
capsules was associated with a lowering of the Anaerococcus-module, probiotics in general in 519
the last 24 hours with an increase of L. gasseri levels, while probiotic yoghurt decreased L. 520
gasseri levels. Unfortunately, our questionnaires lacked detailed information on the specific 521
species and strains in the probiotic products consumed by the Isala participants. Ultimately, 522
dedicated intervention studies with specific foods or diets, hygienic measures and/or 523
probiotic species and strains should further substantiate the associations found here, and 524
help the design of dedicated pharmaceutical and microbiome interventions. 525
Conclusion 526
In this large-scale remote-sampling study, we showed that the vaginal microbiome of women 527
from Belgium is mainly dominated by lactobacilli. We demonstrated that the vaginal 528
microbiome is a continuum, where taxon compositions that are in-between classical 529
community state types are frequently observed. Furthermore, we showed that most vaginal 530
taxa show small to moderate positive or negative abundance correlations with other taxa, 531
and that positively interacting vaginal taxa can be summarized by grouping them into modules 532
of intercorrelated taxa. In addition, we measured 166 participant covariates through 533
questionnaires. Our results showed that some of these factors explain a small but significant 534
26
part of vaginal microbiome variation, with “having had children” explaining the largest 535
fraction of the variation, after age. Finally, we highlighted that given conscious 536
communication tools and style, women are eager to participate in taboo-breaking 537
conversations as well as scientific studies aimed at improving their health. We therefore 538
endorse citizen science as a powerful approach to facilitate large-scale intimate microbiome 539
research and to empower citizens to impact their individual and community-level health by 540
promoting open science-based communication on taboo subjects. 541
Acknowledgements 542
We would like to first and foremost thank all Isala volunteers for their enthusiastic 543
participation and for donating samples. The following colleagues and students helped a lot 544
with the Isala sampling campaign and sample processing: Ines Tuyaerts, Nele Van Vliet, Leen 545
Van Ham, Annelize Groenwals, Samira El Messaoudi, Jana Hiers, Laura Van Dyck, Lize 546
Delanghe, Caroline Dricot, Lore Leysen, Lara Martin Diaz, Marie Legein, Dieter Vandenheuvel, 547
Ilke De Boeck and Eline Cauwenberghs. Strategic communicative support was provided by 548
Liesbeth Talboom and Liesbeth Haesevoets (Studio Maria), Csaba Varszegi (Little Big Things, 549
website), Ruth Broms and Svenja Vergauwen (Sensoa vzw), Elly Den Hond and Carmen 550
Franken (Provinciaal Instituut voor Hygiëne), Nina Van Eekert, Naomi Biegel and Leen De Kort 551
(Sociology Department, UAntwerpen) and Camille Allonsius (Biology Department, 552
UAntwerpen). We much appreciate the guidance provided by Jeroen Raes on logistic lessons 553
learned from the Flemish Gut Flora project and the Antwerp Biobank (University Hospital 554
Antwerp) for their administrative support with biobanking the large number of samples. The 555
Centre of Medical Genetics (University Hospital Antwerp) and Neuromics Support Facility 556
(VIB-UAntwerpen) provided sequencing support. Lastly, we would like to thank Isala 557
27
ambassador Evi Hanssen, collaborative influencers on social media and supportive science 558
journalists for spreading the word on Isala and helping us build an online community that 559
openly discusses vaginal health with the aim to break the taboo. 560
Author contributions 561
SL, SA, EO, SW, GD, VV and CDB designed the study and worked on the conceptualization of 562
the research project. SL, SA, TG, TE, JD, SC, EO, IS, SW, CM and WVB worked on the 563
questionnaire set-up and cleaned the answers. SA, SL, JD, EO, TE and LVD carried out the 564
experimental and logistical work. SW and TG processed the sequencing data and performed 565
the biostatistical analyses. TG, SW, SA, SC and SL worked on the visualizations. SL, SW, TG, SA, 566
VV, GD, SC, JD, IS, PAB and CM contributed to the interpretation of the results. SL, SA, SW and 567
TG wrote the original manuscript. All authors contributed to reviewing and editing of the final 568
manuscript. 569
Funding 570
The authors wish to acknowledge the following funding bodies: the European Research 571
Council (ERC; starting grant Lacto-Be 26850 of SL), the Special Research Fund of the 572
Universiteit Antwerpen (UA BOF; DOCPRO 37054 grant of SA), the Inter-university Special 573
Research Fund of Flanders (iBOF; POSSIBL project) and the Research Foundation - Flanders 574
(FWO; aspirant fundamental research grant 11A0620N of SW, senior postdoc research grant 575
1277222N of IS and Research project FN701000004 of SL). 576
Competing interests 577
SL is a voluntary academic board member of ISAPP (the International Scientific Association on 578
Probiotics and Prebiotics, www.isappscience.org) and chairperson of the scientific advisory 579
28
board of YUN (yun.be). PAB is an independent consultant for several companies in the food 580
and pharmaceutical industry. GD is the chairperson of Femicare vzw (femicare.net) and has 581
worked as a medical consultant for various industries. However, none of these organizations 582
or companies was involved in the design, communication or data analysis of this Isala study, 583
which was fully funded by university, governmental and European funding, with the largest 584
part funded by the ERC StG project Lacto-Be. 585
Methods 586
Study cohort and data collection
587
The study was approved by the Ethical Committee of the Antwerp University 588
Hospital/University of Antwerp (B300201942076) and registered online at clinicaltrials.gov 589
with the unique identifier NCT04319536. The call for participants was launched on March 590
24th, 2020 with the only inclusion criteria were being not pregnant and at least 18 years old. 591
Within ten days, 6,007 women registered through the Isala website (https://isala.be/en/) by 592
filling five questions on age, postal code, previous pregnancies, residence country in first 593
three years and contraceptive use. After obtaining a digital informed consent, these 594
participants were invited to fill out a large online questionnaire that included 137 relevant 595
and GDPR-compliant questions on the Qualtrics platform (Qualtrics, Provo, UT, USA). The 596
4,681 participants that filled out the entire questionnaire were invited to fill out their address 597
on the website to receive an Isala self-sampling kit. Eventually, 4,106 self-sampling kits were 598
sent out and 81.5% of the kits were returned to the University of Antwerp between July-599
October 2020. Two vaginal swabs were self-collected in a standardized way by non-pregnant 600
participants (n = 3,323). And 3,294 participants filled out a short follow-up questionnaire with 601
39 questions within 24 hours of sampling. 602
29
Each kit contained two vaginal swabs. First the eNATTM (Copan, Brescia, Italy), intended for 603
microbiome profiling, was collected and immediately afterwards the ESwabTM (Copan, 604
Brescia, Italy), intended for culturomics and metabolomics, was collected. In the insert it was 605
stipulated that both swabs had to be turned around 2-3 times to acquire enough biomass. 606
Immediately after sampling swabs were to be transferred to a vial which contained the 607
commercial transport buffer of the eNAT or ESwab and stored at home in the fridge. At last, 608
all samples were transported on room temperature with prepaid services by the national 609
parcel service (Bpost) with an average transport time of 2,9 +- 3,3 days (n = 3,306) from which 610
92,8% arrived within 7 days from sampling. Upon arrival, the eNAT swabs were stored at -611
20°C until further processing in the lab78. The ESwab was vortexed for 15 seconds and 612
separated in two aliquots of 500µL, the first of which was stored at -80°C in a 96 tube Micronic 613
plate with 500µL 50% glycerol, the other being centrifuged for 3 min at 13,000 g, and its 614
supernatant stored in a 96 tube Micronic plate at -80°C as well. 615
16S rRNA amplicon sequencing
616
Before further processing, all samples were vortexed for 15-30 seconds and extracted with 617
the DNeasy PowerSoil Pro Kit of which some manually and other automated with the QIAcube 618
(Qiagen, Hilden, Germany) according to the instructions of the manufacturer. DNA 619
concentration of all samples was measured using the Qubit 3.0 Fluorometer (Life 620
Technologies, Ledeberg, Belgium) according to the instructions of the manufacturer. No less 621
than 2 µl of each bacterial DNA sample was used to amplify the V4 region of the 16S rRNA 622
gene, using standard barcoded forward (515F) and reverse (806R) primers78. These primers 623
were altered for dual index paired-end sequencing, as described in Kozich et al. (2013)79. The 624
resulting PCR products were checked on a 1.2% agarose gel. The PCR products were then 625
30
purified using the Agencourt AMPure XP Magnetic BeadCapture Kit (Beckman Coulter, 626
Suarlee, Belgium) and the concentration of all samples was measured using the Qubit 3.0 627
Fluorometer. Next, a library was prepared by pooling all PCR samples in equimolar 628
concentrations. This library was loaded onto a 0.8% agarose gel and purified using the 629
NucleoSpin Gel and PCR clean-up (Macherey-Nagel). The final concentration of the library was 630
measured with the Qubit 3.0 Fluorometer. Afterwards the library was denatured with 0.2N 631
NaOH (Illumina, San Diego California United States), diluted to 6 pM and spiked with 10-15% 632
PhiX control DNA (Illumina). Finally, dual-index paired-end sequencing was performed on a 633
MiSeq Desktop sequencer (Illumina). All DNA samples as well as negative controls of both PCR 634
(PCR grade water) and the DNA extraction runs were included on the sequencing runs. In 635
total, samples were sequenced across nine different MiSeq runs. 636
In order to validate our amplicon sequencing pipeline, including Lactobacillus subgenus 637
classification, we sequenced samples from the Isala pilot study in Ahannach, Delanghe, et al. 638
(2021)78 with both amplicon and shotgun sequencing. These samples were processed in the 639
same way as the Isala samples, except that the DNA extraction was performed with the 640
HostZERO Microbial DNA Kit (Zymo Research, California, United States). These samples were 641
sequenced across two different MiSeq sequencing runs. 642
Metagenomic shotgun sequencing (Isala pilot study samples)
643
For the metagenomic shotgun sequencing of samples from the Isala pilot study, library 644
preparation was performed using the Nextera™ DNA Flex Library Prep or Nextera™ XT DNA 645
Library Preparation kit (Illumina), according to the instructions of the manufacturer. For the 646
Nextera™ DNA Flex Library Prep, 2 – 30 µL DNA sample was used to obtain input DNA with a 647
start amount between 1 and 100 ng. For the Nextera™ XT DNA Library Preparation kit, 1 ng 648
31
DNA samples in 5 µL was used as input DNA. For both protocols, when the 1 ng input DNA 649
could not be obtained for a certain DNA sample, the library preparation was continued with 650
the highest available amount of input DNA. Pooling of the libraries was done individually using 651
the Qubit 3.0 Fluorometer. During library preparation, library quality was checked using the 652
5200 Fragment Analyzer System with Agilent High Sensitivity NGS Fragment Kit (DNF-474). 653
22µL NGS Diluent Marker solution was mixed with 2µL library and ran on the Fragment 654
Analyzer, according the instruction of the manufacturer. The NGS DNA Ladder was used as 655
standard. Finally, the library was sequenced on a MiSeq desktop sequencer. In total, shotgun 656
samples were sequenced on two MiSeq runs. 657
Creation of custom taxonomic reference databases
658
In order to increase taxonomic resolution for the genus Lactobacillus, the genus was split into 659
nine subgenera. These subgenera were defined in three steps. First, a maximum-likelihood 660
species phylogeny of the genus was constructed using amino acid sequences of 100 single-661
copy core genes from representative genomes, using the software IQ-TREE80. Second, the 662
subgenera were manually defined as the minimum number of clades in the species phylogeny 663
that would be needed to discriminate the four major vaginal Lactobacillus species. Finally, the 664
subgenera were checked for monophyly against the species phylogeny of release 05-RS95 of 665
the Genome Taxonomy Database (GTDB)31. 666
To be able to classify amplicon sequences to the Lactobacillus subgenera, a custom 16S rRNA 667
reference database was created. This was done by downloading 16S rRNA sequences 668
extracted from sequenced genomes from the GTDB (release 05-RS95) as well as the GTDB 669
taxonomy hierarchy. This dataset was subsetted to sequences of the family Lactobacillaceae 670
only, and the genus Lactobacillus in the taxonomy hierarchy was replaced by the respective 671
32
subgenera of the species. Finally, these files were converted into a DADA2-compatible 672
reference database. 673
To be able to validate our amplicon data processing pipeline, including classification to 674
Lactobacillus subgenera, we also created a custom reference database for the classification 675
of metagenomic shotgun sequencing data. This database was created from three pieces of 676
data: (1) representative genomes for all bacterial species, downloaded from release 05-RS95 677
of the GTDB, (2) the GTDB taxonomy hierarchy updated with the Lactobacillus subgenera, and 678
(3) version GRCh38 of the human genome, downloaded from NCBI RefSeq81. These files were 679
used to create a database in Kraken2-compatible format. 680
Processing and quality control of amplicon sequencing data
681
Quality control and processing of amplicon reads was performed with the R package DADA2, 682
version 1.6.082. First, reads with more than two expected errors were removed (no trimming 683
was performed). Next, paired reads were merged; in this process, read pairs with one or more 684
sequence conflicts were removed. Chimeras were then detected and removed with the 685
removeBimeraDenovo function. The merged and denoised reads (amplicon sequence 686
variants or ASVs) were taxonomically annotated from the phylum to the genus level with the 687
assignTaxonomy function using the EzBioCloud reference 16S rRNA database83. Next, three 688
different reclassifications were performed. First, ASVs classified to the family 689
Leuconostocaceae were reclassified to the family Lactobacillaceae to be in line with the recent 690
taxonomic update13. Second, the Lactobacillaceae ASVs were reclassified on the genus level 691
to the new genera defined by Zheng et al. And finally, ASVs of the updated genus Lactobacillus 692
(previously known as the Lactobacillus delbrueckii group) were reclassified to nine different 693
subgenera that we manually defined based on the phylogeny of the genus. 694
33
Taxon and sample quality control was performed as follows. Non-bacterial ASVs (e.g., 695
mitochondria and chloroplasts) and ASVs with a length greater than 260 bases were removed. 696
Quality control of the samples was based on normalized read concentrations, which were 697
calculated as follows. First, the total read count per sample was divided by the volume of that 698
sample added to the sequencing library of its MiSeq run (there were nine runs in total). Next, 699
these read concentrations were normalized by dividing them by the median read 700
concentration of their respective run. Samples were then filtered using two criteria: (1) the 701
normalized read concentration should be higher than 0.05 and (2) the read count of a sample 702
should be greater than 2,000. 703
The Isala pilot study samples were processed in the same way as described above, with the 704
following exceptions: (1) ASV classification was performed with a 16S rRNA reference 705
database constructed from version 05-RS95 of the GTDB, followed by reclassification of the 706
Lactobacillus ASVs only to the custom Lactobacillus subgenera; (2) sample quality control was 707
based on a minimum read count of 1,000 reads. 708
Processing and quality control of metagenomic sequencing data (pilot study samples)
709
Metagenomic shotgun sequenced samples from the Isala pilot study were processed as 710
follows. First, paired reads were filtered with the DADA2 R package, version 1.20.082, requiring 711
a minimum length of 50 bases, a maximum of two uncalled bases per read and a maximum 712
of two expected errors per read. Next, read pairs were classified from the phylum to the 713
species level with Kraken284, using a custom reference database designed to validate our 714
amplicon sequencing pipeline (including Lactobacillus subgenus classification). Based on the 715
read classifications against this custom database, a read count table was constructed where 716
the columns represent taxa and the rows represent samples. Taxa were either species or 717
34
higher-level taxa for reads that were unclassified at one or more ranks. Non-bacterial taxa 718
were removed from the data, as were samples with fewer than 500 bacterial reads. 719
All processing of amplicon and shotgun datasets was performed in R version 4.1.185, using the 720
tidyverse set of packages, version 1.3.086, and the in-house package tidyamplicons, version 721
0.2.1. 722
Culture analyses
723
Based on the questionnaire answers a selection of self-reported “healthy” women was made. 724
This selection took place during the course of the study, so it does not include all “healthy” 725
women and included 592 women with: no known infection at the moment of sampling; no 726
use of vaginal probiotics; no current smokers; good general health; no use of 727
antibiotics/antimycotics in the past three months; no vaginal douching; no overall vaginal 728
conditions. The 592 samples were located in the detailed inventory and retrieved from the 729
Micronic plate at -80°C. The individual tubes were gathered to avoid melting of other samples 730
to preserve optimal viability of the microorganisms. To obtain single colonies, 10 μL of each 731
sample was inoculated on a small Petri dish (10mL) with three types of growth media (MRS, 732
MRS + vancomycin, or Colombia blood, all BD Difco™) and grown for 24-48h at 37°C and 5% 733
CO2. After 24h the plates were checked for colonies and if present one colony of each plate 734
was selected at random, resulting in a maximum of three isolates per participant. A part of 735
this colony was inoculated in 10 mL MRS broth and grown overnight in 37°C and 5% CO2. Of 736
the overnight grown culture, 800 μL was mixed with 800 μL 50% glycerol in labelled cryovials 737
(Greiner Bio-one Cryo.STM) and stored in -80°C. At the same time, another part of the colony 738
was also used for colony polymerase chain reaction (colony PCR) for taxonomic identification 739
with 16S Sanger sequencing, using universal primers 27F and 1492R. 740
35
Contraceptives, menstrual cycle and hormonal levels
741
Upon sampling, participants indicated when their menstrual cycle began, and also the average 742
length of their cycle. Depending upon the contraceptive, we used this data to determine the 743
day in which they are in, and predicted the levels of endo and exogenous levels of estrogen 744
and progestin. Peri and post-menopausal women were excluded from this analysis. 745
Statistical analyses 746
t-SNE-embeddings were performed on the relative abundances per sample, using the Bray-747
Curtis distance metric87 to calculate distances within the t-SNE33. Samples were classified into 748
a “primary type” based on the most dominant taxa, except if that taxon occurred less than 749
200 times as the most dominant taxon, in which case it was classified into a type “other”. To 750
determine correlations between the abundances of taxa across our samples, we used the 751
fastspar implementation of SparCC with 100,000 permutations. We calculated correlations 752
only between taxa which were present at some non-zero abundance in at least 100 samples. 753
We used the same correlation threshold of 0.3 as in the original SparCC manuscript34. Clusters 754
were identified with hierarchical clustering with single linkage. Eigentaxa, a summary score 755
for a given set of taxa, (determined by the modules identified in the taxa-taxa correlation 756
networks) were calculated by first CLR-transforming the relative abundance data, and taking 757
the first principle component of the taxa in each cluster. Eigentaxa were multiplied by the sign 758
of the correlation coefficient between the eigentaxa and a representative taxon for each 759
cluster: Gardnerella, Prevotella and Limosilactobacillus for the BV, AV and Lactobacillus 760
modules, respectively. 761
Associations between microbial community composition and the questionnaire were 762
performed with an Adonis test, as implemented in the vegan package in R. For each effect of 763
36
interest, we tested three models. 1) ~ e_i, 2) ~ e_t + e_i, and 3) ~ e_t + age + e_i , where e_t 764
are technical effects, e_i is the effect of interest. Technical effects used were identical across 765
all experiments, and consisted of sequencing run, normalized read concentration and library 766
size, which were found to be strongly associated with the principal component s of the 767
relative abundance. In order to optimize computational performance, initially 1,000 768
permutations were performed for each effect of interest. A total of 10,000 permutations were 769
performed only for those effects which had p-values equal to 0.001. 770
Associations between Shannon diversity and variable collected via the questionnaire were 771
performed with a multiple linear regression, with three different models, as in the Adonis 772
test, 1) Diversity ~ e_i, 2) Diversity ~ e_t + e_i, and 3) Diversity ~ e_t + age + e_i. 773
Associations between the relative abundance of specific taxa and the questionnaire were 774
done with a multiple linear regression, with a model CLR(RA_I) ~ e_t + e_i, where RA_i refers 775
to the relative abundance of a taxa of interest, and CLR refers to the centered log ratio88. 776
Associations between assigned community types and the questionnaire were performed with 777
a logistic regression, where, for each pair of community types T_A and T_B, we tested the 778
following three models: 1) I_T ~ e_i, 2) I_T ~ e_t + e_i, and 3) T_I ~ e_t + age + e_i, where I_T 779
is an indicator function whereby: I_T = 0 if sample is in T_A else 1 if sample is in T_B. Results 780
in figure 5 show the results for model 3, except for age, in which the results for model 2 are 781
shown. 782
For the Adonis model analysis of total explained variance, we included all significant factors 783
in a factorial Adonis test (Factors included are shown in figure 5). In order to perform this, 784
missing values in the questions were encoded as separate categories. 785
37
All data handling and visualization was performed in python and R version 4.1.085 using the 786
tidyverse set of packages and the in-house developed package tidyamplicons 787
(github.com/Swittouck/tidyamplicons). 788
Data availability
789
Sequencing data are available at the European Nucleotide Archive (ENA) under bioproject 790
PRJEB50407. 791
References 792
1. Weinstein, L., Bogin, M., Howard, J. H. & Finkelstone, B. B. A survey of the vaginal 793
flora at various ages, with special reference to the Döderlein bacillus. Am. J. Obstet. 794
Gynecol. 32, 211–218 (1936). 795
2. Lash, A. F. & Kaplan, B. A Study of Döderlein ’ s Vaginal Bacillus. Oxford Univ. Press 38, 796
333–340 (2021). 797
3. Ravel, J. et al. Vaginal microbiome of reproductive-age women. Proc. Natl. Acad. Sci. 798
U. S. A. 108, 4680–4687 (2011). 799
4. Lopes dos Santos Santiago, G. et al. Longitudinal qPCR Study of the Dynamics of L. 800
crispatus, L. iners, A. vaginae, (Sialidase Positive) G. vaginalis, and P. bivia in the 801
Vagina. PLoS One 7, e45281 (2012). 802
5. El Aila, N. A. et al. Identification and genotyping of bacteria from paired vaginal and 803
rectal samples from pregnant women indicates similarity between vaginal and rectal 804
microflora. BMC Infect. Dis. 9, 167 (2009). 805
6. Oerlemans, E. F. M. et al. The Dwindling Microbiota of Aerobic Vaginitis, an 806
Inflammatory State Enriched in Pathobionts with Limited TLR Stimulation. Diagnostics 807
10, 879 (2020). 808
7. Donders, G. G. G. et al. Definition of a type of abnormal vaginal flora that is distinct 809
from bacterial vaginosis: Aerobic vaginitis. BJOG An Int. J. Obstet. Gynaecol. 109, 34–810
43 (2002). 811
8. Gosmann, C. et al. Lactobacillus-Deficient Cervicovaginal Bacterial Communities Are 812
Associated with Increased HIV Acquisition in Young South African Women. Immunity 813
46, 29–37 (2017). 814
9. McClelland, R. S. et al. Evaluation of the association between the concentrations of 815
key vaginal bacteria and the increased risk of HIV acquisition in African women from 816
five cohorts: a nested case-control study. Lancet Infect. Dis. 18, 554–564 (2018). 817
10. Lewis, F. M. T., Bernstein, K. T. & Aral, S. O. Vaginal microbiome and its relationship to 818
38
behavior, sexual health, and sexually transmitted diseases. Obstet. Gynecol. 129, 643–819
654 (2017). 820
11. Campisciano, G. et al. Subclinical alteration of the cervical–vaginal microbiome in 821
women with idiopathic infertility. J. Cell. Physiol. 232, 1681–1688 (2017). 822
12. Kroon, S. J., Ravel, J. & Huston, W. M. Cervicovaginal microbiota, women’s health, and 823
reproductive outcomes. Fertil. Steril. 110, 327–336 (2018). 824
13. Zheng, J. et al. A taxonomic note on the genus Lactobacillus: Description of 23 novel 825
genera, emended description of the genus Lactobacillus Beijerinck 1901, and union of 826
Lactobacillaceae and Leuconostocaceae. Int. J. Syst. Evol. Microbiol. 70, 2782–2858 827
(2020). 828
14. Gajer, P. et al. Temporal Dynamics of the Human Vaginal Microbiota. Sci. Transl. Med. 829
4, 1–21 (2012). 830
15. France, M. et al. VALENCIA: A Nearest Centroid Classification Method for Vaginal 831
Microbial Communities Based on Composition 1–15 (2020) 832
doi:10.21203/rs.2.24139/v1. 833
16. Drell, T. et al. Characterization of the Vaginal Micro- and Mycobiome in 834
Asymptomatic Reproductive-Age Estonian Women. PLoS One 8, (2013). 835
17. Freitas, A. C. et al. The vaginal microbiome of pregnant women is less rich and 836
diverse, with lower prevalence of Mollicutes, compared to non-pregnant women. Sci. 837
Rep. 7, 1–16 (2017). 838
18. Lennard, K. et al. Microbial Composition Predicts Genital Tract Inflammation and 839
Persistent Bacterial Vaginosis in South African Adolescent Females. Infect. Immun. 86, 840
(2017). 841
19. Rhoades, N. S. et al. Longitudinal Profiling of the Macaque Vaginal Microbiome 842
Reveals Similarities to Diverse Human Vaginal Communities. mSystems 6, (2021). 843
20. Miller, E. A., Beasley, D. A. E., Dunn, R. R. & Archie, E. A. Lactobacilli dominance and 844
vaginal pH: Why is the human vaginal microbiome unique? Front. Microbiol. 7, 1–13 845
(2016). 846
21. Yildirim, S. et al. Primate vaginal microbiomes exhibit species specificity without 847
universal Lactobacillus dominance. ISME J. 8, 2431–2444 (2014). 848
22. Mirmonsef, P. et al. Free glycogen in vaginal fluids is associated with Lactobacillus 849
colonization and low vaginal pH. PLoS One 9, 26–29 (2014). 850
23. Song, S. D. et al. Daily Vaginal Microbiota Fluctuations Associated with Natural 851
Hormonal Cycle, Contraceptives, Diet, and Exercise. mSphere 5, 1–14 (2020). 852
24. Foxman, B., Muraglia, R., Dietz, J. P., Sobel, J. D. & Wagner, J. Prevalence of recurrent 853
vulvovaginal candidiasis in 5 European countries and the United States: Results from 854
an internet panel survey. J. Low. Genit. Tract Dis. 17, 340–345 (2013). 855
25. Medina, M. & Castillo-Pino, E. An introduction to the epidemiology and burden of 856
urinary tract infections. Ther. Adv. Urol. 11, 3–7 (2019). 857
39
26. Serrano, M. G. et al. Racioethnic diversity in the dynamics of the vaginal microbiome 858
during pregnancy. Nat. Med. 25, 1001–1011 (2019). 859
27. Noppe, J. et al. Vlaamse Migratie- en integratiemonitor 2018. Brussel Agentschap 860
Binnenl. Best. 311 (2018). 861
28. Vlaanderen, S. Bevolking onder de armoededrempel - Statistiek Vlaanderen. 862
https://www.statistiekvlaanderen.be/nl/bevolking-onder-de-armoededrempel. 863
29. Johnson, J. S. et al. Evaluation of 16S rRNA gene sequencing for species and strain-864
level microbiome analysis. Nat. Commun. 10, 1–11 (2019). 865
30. Putonti, C., Shapiro, J. W., Ene, A., Tsibere, O. & Wolfe, A. J. Comparative Genomic 866
Study of Lactobacillus jensenii and the. Am. Soc. Microbiol. 5, 1–5 (2020). 867
31. Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through 868
a phylogenetically consistent, rank normalized and complete genome-based 869
taxonomy. Nucleic Acids Res. 202, 1–10 (2021). 870
32. Rocha, J. et al. Lactobacillus mulieris sp. nov., a new species of lactobacillus 871
delbrueckii group. Int. J. Syst. Evol. Microbiol. 70, 1522–1527 (2020). 872
33. van der Maaten, L. & Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 9, 873
2579–2605 (2008). 874
34. Watts, S. C., Ritchie, S. C., Inouye, M. & Holt, K. E. FastSpar: Rapid and scalable 875
correlation estimation for compositional data. Bioinformatics 35, 1064–1066 (2019). 876
35. Rizzo, A., Losacco, A. & Carratelli, C. R. Lactobacillus crispatus modulates epithelial 877
cell defense against Candida albicans through Toll-like receptors 2 and 4, interleukin 8 878
and human β-defensins 2 and 3. Immunol. Lett. 156, 102–109 (2013). 879
36. Ojala, T. et al. Comparative genomics of Lactobacillus crispatus suggests novel 880
mechanisms for the competitive exclusion of Gardnerella vaginalis. BMC Genomics 881
15, 1–21 (2014). 882
37. van der Veer, C. et al. Comparative genomics of human Lactobacillus crispatus 883
isolates reveals genes for glycosylation and glycogen degradation: Implications for in 884
vivo dominance of the vaginal microbiota. Microbiome 7, 1–14 (2019). 885
38. Falony, G. et al. Population-level analysis of gut microbiome variation. Science 352, 886
560–564 (2016). 887
39. Peltola, T. & Arpin, I. Science for everybody?: Bridging the socio-economic gap in 888
urban biodiversity monitoring. Citiz. Sci. Innov. Open Sci. Soc. Policy 367–380 (2018). 889
40. Law, E. et al. The Science of Citizen Science. (2017). doi:10.1145/3022198.3022652. 890
41. Petrova, M. I., Reid, G., Vaneechoutte, M. & Lebeer, S. Lactobacillus iners : Friend or 891
Foe? Trends Microbiol. 25, 182–191 (2017). 892
42. France, M. T. et al. Complete Genome Sequences of Six Lactobacillus iners Strains 893
Isolated from the Human Vagina. Microbiol. Resour. Announc. 9, 17–19 (2020). 894
43. Castro, J., Machado, D. & Cerca, N. Unveiling the role of Gardnerella vaginalis in 895
40
polymicrobial Bacterial Vaginosis biofilms: the impact of other vaginal pathogens 896
living as neighbors. ISME J. 13, 1306–1317 (2019). 897
44. Harwich, M. D. et al. Drawing the line between commensal and pathogenic 898
Gardnerella vaginalis through genome analysis and virulence studies. BMC Genomics 899
11, (2010). 900
45. Łaniewski, P. & Herbst-Kralovetz, M. M. Bacterial vaginosis and health-associated 901
bacteria modulate the immunometabolic landscape in 3D model of human cervix. npj 902
Biofilms Microbiomes 7, 1–17 (2021). 903
46. Castro, J., Jefferson, K. K. & Cerca, N. Genetic Heterogeneity and Taxonomic Diversity 904
among Gardnerella Species. Trends Microbiol. 28, 202–211 (2020). 905
47. Charbonneau, M. R. et al. A microbial perspective of human developmental biology. 906
doi:10.1038/nature18845. 907
48. Koren, O. et al. A Guide to Enterotypes across the Human Body: Meta-Analysis of 908
Microbial Community Structures in Human Microbiome Datasets. PLOS Comput. Biol. 909
9, e1002863 (2013). 910
49. Canon, F., Nidelet, T., Guédon, E., Thierry, A. & Gagnaire, V. Understanding the 911
Mechanisms of Positive Microbial Interactions That Benefit Lactic Acid Bacteria Co-912
cultures. Front. Microbiol. 11, 1–16 (2020). 913
50. Blasche, S. et al. Metabolic cooperation and spatiotemporal niche partitioning in a 914
kefir microbial community. Nat. Microbiol. | 6,. 915
51. Agarwal, K. et al. Glycan cross-feeding supports mutualism between Fusobacterium 916
and the vaginal microbiota. PLOS Biol. 18, e3000788 (2020). 917
52. Mokoena, M. P. Lactic Acid Bacteria and Their Bacteriocins: Classification, 918
Biosynthesis and Applications against Uropathogens: A Mini-Review. Mol. A J. Synth. 919
Chem. Nat. Prod. Chem. 22, (2017). 920
53. Petrova, M. I., Lievens, E., Malik, S., Imholz, N. & Lebeer, S. Lactobacillus species as 921
biomarkers and agents that can promote various aspects of vaginal health. Front. 922
Physiol. 6, (2015). 923
54. Lin, X. B. et al. The evolution of ecological facilitation within mixed-species biofilms in 924
the mouse gastrointestinal tract. ISME J. 12, 2770–2784 (2018). 925
55. Hummelen, R. et al. Lactobacillus rhamnosus GR-1 and L. reuteri RC-14 to prevent or 926
cure bacterial vaginosis among women with HIV. Int. J. Gynecol. Obstet. 111, 245–248 927
(2010). 928
56. Liu, J. J., Reid, G., Jiang, Y., Turner, M. S. & Tsai, C. C. Activity of HIV entry and fusion 929
inhibitors expressed by the human vaginal colonizing probiotic Lactobacillus reuteri 930
RC-14. Cell. Microbiol. 9, 120–130 (2007). 931
57. Martinez, R. C. R. et al. Improved cure of bacterial vaginosis with single dose of 932
tinidazole (2 g), Lactobacillus rhamnosus GR-1, and Lactobacillus reuteri RC-14: A 933
randomized, double-blind, placebo-controlled trial. Can. J. Microbiol. 55, 133–138 934
41
(2009). 935
58. Verhelst, R. et al. Cloning of 16S rRNA genes amplified from normal and disturbed 936
vaginal microflora suggests a strong association between Atopobium vaginae, 937
Gardnerella vaginalis and bacterial vaginosis. http://www.biomedcentral.com/1471-938
2180/4/16 (2004). 939
59. Hardy, L. et al. A fruitful alliance: the synergy between Atopobium vaginae and 940
Gardnerella vaginalis in bacterial vaginosis-associated biofilm. Sex. Transm. Infect. 92, 941
487–491 (2016). 942
60. Randis, T. M. & Ratner, A. J. Gardnerella and Prevotella: Co-conspirators in the 943
Pathogenesis of Bacterial Vaginosis. J. Infect. Dis. 220, 1085–1088 (2019). 944
61. Donders, G. G. G., Bellen, G., Grinceviciene, S., Ruban, K. & Vieira-Baptista, P. Aerobic 945
vaginitis: no longer a stranger. Res. Microbiol. 168, 845–858 (2017). 946
62. Perrotta, A. R. et al. The Vaginal Microbiome as a Tool to Predict rASRM Stage of 947
Disease in Endometriosis: a Pilot Study. doi:10.1007/s43032-019-00113-5. 948
63. Haggerty, C. L. et al. Presence and concentrations of select bacterial vaginosis-949
associated bacteria are associated with increased risk of pelvic inflammatory disease. 950
Sex. Transm. Dis. 47, 344 (2020). 951
64. De Seta, F., Campisciano, G., Zanotta, N., Ricci, G. & Comar, M. The vaginal 952
community state types microbiome-immune network as key factor for bacterial 953
vaginosis and aerobic vaginitis. Front. Microbiol. 10, 2451 (2019). 954
65. Si, J., You, H. J., Yu, J., Sung, J. & Ko, G. P. Prevotella as a Hub for Vaginal Microbiota 955
under the Influence of Host Genetics and Their Association with Obesity. Cell Host 956
Microbe 21, 97–105 (2017). 957
66. Vodstrcil, L. A. et al. Combined oral contraceptive pill-exposure alone does not reduce 958
the risk of bacterial vaginosis recurrence in a pilot randomised controlled trial. Sci. 959
Rep. 9, 1–13 (2019). 960
67. Nelson, T. M. et al. Cigarette smoking is associated with an altered vaginal tract 961
metabolomic profile. Sci. Rep. 8, 852 (2018). 962
68. Lewis, C. A. et al. Effects of Hormonal Contraceptives on Mood: A Focus on Emotion 963
Recognition and Reactivity, Reward Processing, and Stress Response. Curr. Psychiatry 964
Rep. 21, 1–15 (2019). 965
69. Lundin, C., Wikman, A., Bixo, M., Gemzell-Danielsson, K. & Sundström Poromaa, I. 966
Towards individualised contraceptive counselling: clinical and reproductive factors 967
associated with self-reported hormonal contraceptive-induced adverse mood 968
symptoms. BMJ Sex. Reprod. Heal. 47, e1–e8 (2021). 969
70. Burrows, L. J., Basha, M. & Goldstein, A. T. The Effects of Hormonal Contraceptives on 970
Female Sexuality: A Review. J. Sex. Med. 9, 2213–2223 (2012). 971
71. Khialani, D., Rosendaal, F. & Vlieg, A. V. H. Hormonal Contraceptives and the Risk of 972
Venous Thrombosis. Semin. Thromb. Hemost. 46, 865–871 (2020). 973
42
72. Morimont, L., Haguet, H., Dogné, J. M., Gaspard, U. & Douxfils, J. Combined Oral 974
Contraceptives and Venous Thromboembolism: Review and Perspective to Mitigate 975
the Risk. Front. Endocrinol. (Lausanne). 12, 1 (2021). 976
73. Donders, G. G. G. et al. Screening for abnormal vaginal microflora by self-assessed 977
vaginal pH does not enable detection of sexually transmitted infections in Ugandan 978
women. Diagn. Microbiol. Infect. Dis. 85, 227–230 (2016). 979
74. Achilles, S. L., Meyn, L. A., Austin, M. N., Avolia, H. A. & Hillier, S. L. A longitudinal 980
evaluation of the impact of contraceptive initiation on vaginal microbiota in us 981
women. Am. J. Obstet. Gynecol. 219, 643–644 (2018). 982
75. Ferretti, P. et al. Mother-to-Infant Microbial Transmission from Different Body Sites 983
Shapes the Developing Infant Gut Microbiome. Cell Host Microbe 24, 133-145.e5 984
(2018). 985
76. DiGiulio, D. B. et al. Temporal and spatial variation of the human microbiota during 986
pregnancy. Proc. Natl. Acad. Sci. U. S. A. 112, 11060–11065 (2015). 987
77. Lee, E. & Lee, J. E. Impact of drinking alcohol on gut microbiota: recent perspectives 988
on ethanol and alcoholic beverage. Curr. Opin. Food Sci. 37, 91–97 (2021). 989
78. Ahannach, S. et al. Microbial enrichment and storage for metagenomics of vaginal, 990
skin, and saliva samples. iScience 24, 103306 (2021). 991
79. Kozich, J. J., Westcott, S. L., Baxter, N. T., Highlander, S. K. & Schloss, P. D. 992
Development of a dual-index sequencing strategy and curation pipeline for analyzing 993
amplicon sequence data on the MiSeq Illumina sequencing platform. Appl. Environ. 994
Microbiol. 79, 5112–20 (2013). 995
80. Nguyen, L. T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. IQ-TREE: A fast and 996
effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. 997
Biol. Evol. 32, 268–274 (2015). 998
81. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: Current status, 999
taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 1000
(2016). 1001
82. Callahan, B. J. et al. DADA2: High-resolution sample inference from Illumina amplicon 1002
data. Nat. Methods 13, 581–583 (2016). 1003
83. Yoon, S. H. et al. Introducing EzBioCloud: A taxonomically united database of 16S 1004
rRNA gene sequences and whole-genome assemblies. Int. J. Syst. Evol. Microbiol. 67, 1005
1613–1617 (2017). 1006
84. Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification 1007
using exact alignments. Genome Biol. 15, R46 (2014). 1008
85. R Core Team. R: A Language and Environment for Statistical Computing. (2020). 1009
86. Wickham, H. et al. Welcome to the Tidyverse. J. Open Source Softw. 4, 1686 (2019). 1010
87. van der Maaten, L. Barnes-Hut-SNE. 1st Int. Conf. Learn. Represent. ICLR 2013 - Conf. 1011
Track Proc. 1–11 (2013). 1012
43
88. Aitchison, J. A concise guide to compositional data analysis. in 2nd Compositional 1013
Data Analysis Workshop (2003). 1014
1015
1016
44
Supplementary figures 1017
1018
Supplementary Figure 1 - Correlations between a subset of the questionnaire variables. Heatmap 1019
with correlations between questionnaire variables shown in Figure 5. Positive correlations are 1020
indicated with green, negative correlations in red. Significant correlations are marked with an asterisk. 1021
45
1022
Supplementary Figure 2 - Species tree of Lactobacillus from the Genome Taxonomy Database. 1023
Maximum-likelihood species phylogeny of the genus Lactobacillus, obtained by taking a subtree of the 1024
species phylogeny of the domain Bacteria inferred by the Genome Taxonomy Database (GTDB), 1025
release 05-RS9531. Colors indicate the nine custom-defined subgenera used in this study. Bold tip 1026
labels indicate representative species of the subgenera. 1027
46
1028
Supplementary Figure 3 - Comparison between amplicon and shotgun sequencing results for 18 1029
samples. Relative abundances for the eleven most abundant taxa overall. Each facet shows a vaginal 1030
sample from a single participant, sequenced with 16S rRNA amplicon sequencing (left) or 1031
metagenome shotgun sequencing (right). 1032
47
1033
Supplementary Figure 4 - Example of a personal vaginal microbiome profile result. Top left figure 1034
indicates the dominant type. Bottom left show the percentage (“verdeling”) of the top eight taxa 1035
identified. Right figure (pie chart) displays the top six taxa plus the remaining (“overage”) ones. 1036
1037
48
1038
Supplementary Figure 5 - Received information for non-microbiology experts. To each of the top 1039
eight taxa a webpage was dedicated. Here, an example of the page on Lactobacillus crispatus is added. 1040
Other taxa can be accessed via https://isala.be/en/category/vaginal-bacteria/ 1041
1042
49
1043
Supplementary Figure 6 – Full taxon correlation matrix. SparCC correlation Network between taxa 1044
determined in the Isala data. Positive correlations are indicated in green, negative correlations in red. 1045
In each cell is given the correlation (*100) for each pair of taxa. The modules identified and shown in 1046
figure 4 are indicated with triangles in the figure. 1047
50
1048
Supplementary Figure 7 – Full taxon correlation matrix of the Valencia study. SparCC correlation 1049
Network between taxa determined in the Isala data. Positive correlations are indicated in green, 1050
negative correlations in red. In each cell is given the correlation (*100) for each pair of taxa. 1051
1052
Supplementary Figure 8 – The menstrual cycle. Using information about each participant’s cycle 1053
length and last menstruation, we estimated the stage of the cycle in which the swab was sampled. 1054
Participants whose cycles had irregular lengths, or who did not report their last menstruation were 1055
classified as “Unknown”, and participants using hormonal contraceptives or were peri/post-1056
menopausal were classified as “Not applicable”. 1057
51
Supplementary tables 1058
Supplementary Table 1 – ASV occurrence and abundance of top 10 lactobacilli and 1059
percentage of top 10 isolated lactobacilli from Isala's samples. The occurrence of the top 10 1060
ASVs of lactobacilli on (sub)genus level over all Isala’s samples and their mean relative abundance and 1061
the percentage of isolates belonging to the top 10 most isolated lactobacilli (determined by 16S 1062
amplicon sequencing) in relation to the total lactobacilli isolates (n = 230) and the total number of 1063
isolates per species. 1064
(sub)genus
16S isolates
(Sub)genus
Occurrence
Mean relative
abundance
Species
Percentage of total
lactobacilli isolates on De
Man, Rogosa en Sharpe
or Columbia Blood media
(glucose as main sugar)
Number of
isolates
studied
(n = 230)
Lactobacillus
crispatus group
0,897699005
0,399114797
Limosilactobacillus
fermentum
24,49%
60
Lactobacillus iners
group
0,719527363
0,240823923
Lactobacillus
crispatus
13,88%
34
Limosilactobacillus
0,478544776
0,004111909
Lactobacillus jensenii
12,24%
30
Lactobacillus jensenii
group
0,467661692
0,04856063
Lactobacillus
paragasseri
9,80%
24
Lactobacillus gasseri
group
0,268345771
0,029760051
Lacticaseibacillus
rhamnosus
8,98%
22
Lactobacillaceae
0,027052239
0,00199087
Lacticaseibacillus
paracasei
7,35%
18
Lacticaseibacillus
0,023942786
0,000494929
Limosilactobacillus
reuteri
6,12%
15
Lactiplantibacillus
0,00528607
1,65417E-05
Lactiplantibacillus
plantarum
4,90%
12
Ligilactobacillus
0,004975124
2,75966E-05
Lactobacillus gasseri
3,67%
9
Apilactobacillus
0,003109453
0,000293372
Leuconostoc
mesenteroides
2,45%
6
1065
Supplementary Table 2 – Descriptive statistics of taxa. Various descriptive statistics for 1066
subgenera of the genus Lactobacillus and genera detected in this study: number of ASVs 1067
within the (sub)genus (n_asvs), occurrence, average relative abundance 1068
(mean_rel_abundance), frequency of being the most abundant taxon and greater than 0% 1069
abundant (top_and_gt0p), same as previous but greater than 30% abundant 1070
(top_and_gt30p), same as previous but greater than 50% abundant (top_and_gt50p), the 1071
previous three measures but in terms of relative frequencies (top_and_gtXp_rel). 1072
Supplementary Table 3 – Association tests between participant characteristics and their 1073
vaginal microbiome. Results of statistical tests for each tested questionnaire responses. 1074
Results are provided for the beta-diversity (Adonis), alpha-diversity, taxa relative abundances 1075
and eigentaxa level tests. In addition to effect sizes, confidence intervals and p-values the 1076
number of participants in each condition are provided. 1077