ArticlePDF Available

Genome-wide estimation of recombination, mutation and positive selection enlightens diversification drivers of Mycobacterium bovis

Authors:

Abstract and Figures

Genome sequencing has reinvigorated the infectious disease research field, shedding light on disease epidemiology, pathogenesis, host-pathogen interactions and also evolutionary processes exerted upon pathogens. Mycobacterium tuberculosis complex (MTBC), enclosing M. bovis as one of its animal-adapted members causing tuberculosis (TB) in terrestrial mammals, is a paradigmatic model of bacterial evolution. As other MTBC members, M. bovis is postulated as a strictly clonal, slowly evolving pathogen, with apparently no signs of recombination or horizontal gene transfer. In this work, we applied comparative genomics to a whole genome sequence (WGS) dataset composed by 70 M. bovis from different lineages (European and African) to gain insights into the evolutionary forces that shape genetic diversification in M. bovis. Three distinct approaches were used to estimate signs of recombination. Globally, a small number of recombinant events was identified and confirmed by two independent methods with solid support. Still, recombination reveals a weaker effect on M. bovis diversity compared with mutation (overall r/m = 0.037). The differential r/m average values obtained across the clonal complexes of M. bovis in our dataset are consistent with the general notion that the extent of recombination may vary widely among lineages assigned to the same taxonomical species. Based on this work, recombination in M. bovis cannot be excluded and should thus be a topic of further effort in future comparative genomics studies for which WGS of large datasets from different epidemiological scenarios across the world is crucial. A smaller M. bovis dataset (n = 42) from a multi-host TB endemic scenario was then subjected to additional analyses, with the identification of more than 1,800 sites wherein at least one strain showed a single nucleotide polymorphism (SNP). The majority (87.1%) was located in coding regions, with the global ratio of non-synonymous upon synonymous alterations (dN/dS) exceeding 1.5, suggesting that positive selection is an important evolutionary force exerted upon M. bovis. A higher percentage of SNPs was detected in genes enriched into "lipid metabolism", "cell wall and cell processes" and "intermediary metabolism and respiration" functional categories, revealing their underlying importance in M. bovis biology and evolution. A closer look on genes prone to horizontal gene transfer in the MTBC ancestor and included in the 3R (DNA repair, replication and recombination) system revealed a global average negative value for Taijima's D neutrality test, suggesting that past selective sweeps and population expansion after a recent bottleneck remain as major evolutionary drivers of the obligatory pathogen M. bovis in its struggle with the host.
Content may be subject to copyright.

Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports
Genome‑wide estimation
of recombination, mutation
and positive selection
enlightens diversication drivers
of Mycobacterium bovis
Ana C. Reis1,2 & Mónica V. Cunha1,2*
Genome sequencing has reinvigorated the infectious disease research eld, shedding light on disease
epidemiology, pathogenesis, host–pathogen interactions and also evolutionary processes exerted
upon pathogens. Mycobacterium tuberculosis complex (MTBC), enclosing M. bovis as one of its
animal‑adapted members causing tuberculosis (TB) in terrestrial mammals, is a paradigmatic model
of bacterial evolution. As other MTBC members, M. bovis is postulated as a strictly clonal, slowly
evolving pathogen, with apparently no signs of recombination or horizontal gene transfer. In this
work, we applied comparative genomics to a whole genome sequence (WGS) dataset composed by
70 M. bovis from dierent lineages (European and African) to gain insights into the evolutionary forces
that shape genetic diversication in M. bovis. Three distinct approaches were used to estimate signs
of recombination. Globally, a small number of recombinant events was identied and conrmed by
two independent methods with solid support. Still, recombination reveals a weaker eect on M. bovis
diversity compared with mutation (overall r/m = 0.037). The dierential r/m average values obtained
across the clonal complexes of M. bovis in our dataset are consistent with the general notion that
the extent of recombination may vary widely among lineages assigned to the same taxonomical
species. Based on this work, recombination in M. bovis cannot be excluded and should thus be a
topic of further eort in future comparative genomics studies for which WGS of large datasets from
dierent epidemiological scenarios across the world is crucial. A smaller M. bovis dataset (n = 42) from
a multi‑host TB endemic scenario was then subjected to additional analyses, with the identication
of more than 1,800 sites wherein at least one strain showed a single nucleotide polymorphism (SNP).
The majority (87.1%) was located in coding regions, with the global ratio of non‑synonymous upon
synonymous alterations (dN/dS) exceeding 1.5, suggesting that positive selection is an important
evolutionary force exerted upon M. bovis. A higher percentage of SNPs was detected in genes enriched
into “lipid metabolism”, “cell wall and cell processes” and “intermediary metabolism and respiration”
functional categories, revealing their underlying importance in M. bovis biology and evolution. A closer
look on genes prone to horizontal gene transfer in the MTBC ancestor and included in the 3R (DNA
repair, replication and recombination) system revealed a global average negative value for Taijima’s
D neutrality test, suggesting that past selective sweeps and population expansion after a recent
bottleneck remain as major evolutionary drivers of the obligatory pathogen M. bovis in its struggle
with the host.
e Mycobacterium tuberculosis complex (MTBC) is one of the most successful taxon of bacterial pathogens
and a paradigmatic case in bacterial evolution, revealing a strikingly high nucleotide identity at the genome
level (> 99%) among its members1,2. e dierent MTBC ecotypes cause tuberculosis (TB), an infectious granu-
lomatous disease, in a broad group of host species, ranging from micro-mammals to humans35. Currently, the
OPEN

            
 *
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
complex encompasses human [M. tuberculosis (Mtb), M. africanum] and animal-adapted pathogens (M. bovis,
M. caprae, M. pinnipedii, M. microti, M. mungi, M. orygis, M. suricattae, “chimpanzee bacillus” and “dassie
bacillus”)5,6. M. canettii (also known as “smooth tubercle bacilli”) has an average nucleotide identity of 98%
with the aforementioned mycobacteria and comparative genomic works suggest that M. canettii and the rest of
MTBC have diverged very recently from a common ancestor7. Considering this notion, several authors refer to
M. canettii as an MTBC member8.
e MTBC has been systematically described as a strictly clonal complex, with population structure being
apparently dominated by reductions in diversity, bottlenecks, selective sweeps and genetic dris9,10. Assuming the
strictly clonal evolution of the complex, polymorphisms such as deletions cannot be restored by recombination9.
Based on this premise, the successive events of genomic deletions of the regions of dierence (RD) and TbD1
(Mtb specic deletion 1 region) have been proposed as molecular markers of MTBC evolution2,5,11. Compara-
tive genomics and whole genome sequencing (WGS) works support the division of human-adapted members
into nine lineages (M. tuberculosis L1 to L4, L7 and L8; and M. africanum L5, L6 and L9), with lineages L2 to L4
sharing the deletion of TbD1 region2,1113. Moreover, animal-adapted members have been proposed to share a
common ancestor and are dened by clade-specic deletions in the RD7, RD8, RD9 and RD102,5,14.
Events of horizontal gene transfer (HGT) and recombination are assumed to be rare and to have occurred in
the ancestors of MTBC, rather than throughout the diverging history of MTBC members1517. Two early reports
by Hughes and collaborators (2002) and Gutacker and collaborators (2006) suggested that recombination events
might have helped to shape the polymorphisms marking specic loci of M. tuberculosis strains18,19. e appar-
ent absence of recombination in MTBC has been attributed to: (1) loss of mechanistic processes and ability for
HGT; (2) rareness of HGT events; and (3) no opportunity for recombination events within MTBC ecological
niches14,17. More recently, a few Whole Genome Sequencing (WGS) studies applied to MTBC strains20 and M.
bovis21 provided evidences of recombination, with the rst suggesting that MTBC strains frequently exchange
small DNA fragments, but because of the limited nucleotide sequence variation, these events remain unnoticed.
Mycobacterium bovis is the MTBC member most frequently recovered from livestock, mainly cattle, although
it can also be isolated from free-ranging and fenced wildlife4,2224. M. bovis evolved to ve main clonal complexes
[European 1 (Eu1), European 2 (Eu2), European 3 (Eu3), African 1 (Af1) and African 2 (Af2)], dened based
on spoligotyping prole, specic deletions and single nucleotide polymorphisms (SNPs) in specic genes2529.
ese clonal complexes evidence the diversity structure of M. bovis population and association with geographic
regions. Furthermore, a recent WGS work by Zimpel and collaborators (2020) devised an M. bovis SNP-based
phylogeny with over 1900 genomes, which suggested the existence of at least four distinct lineages in the world
(named Lb1 to Lb4), that are not entirely concordant with the previous dened clonal complexes, although
geographic specicities may also be conrmed30. ese authors performed phylogenetic and molecular dating
divergence analyses but did not investigate recombination30.
Previous works employing dierent molecular techniques such as spoligotyping, MIRU-VNTR (Mycobacte-
rial Interspersed Repetitive Unit-Variable Number of Tandem Repeat) and, more recently, SNP typing, revealed a
certain level of genetic diversity among M. bovis strains3135. e dierentiation of genetic variants has become
a crucial tool to study disease epidemiology, contributing to gain insights into pathogenesis, virulence and dis-
ease transmission. e arrival of WGS methodologies opened the possibility to shed light into the evolutionary
drivers exerted upon M. bovis genomes during adaptation and persistence to dierent hosts and epidemiological
scenarios.
In this work, we take advantage of a comparative genomic analysis of a diverse M. bovis dataset (n = 70),
including isolates from dierent clonal complexes to gain insights into the evolutionary processes of M. bovis,
specically addressing phylogenetic relationships and recombination events. Complementary to this analysis,
the sub-dataset of M. bovis isolates (n = 42) obtained from a well characterized multi-host TB endemic region
in Portugal31,36 was further explored to infer the balance between the relative rates of nonsynonymous (dN) to
synonymous (dS) nucleotide substitution, and the evolutionary contribution of specic groups of genes referred
to in the literature as having been acquired though HGT by the MTBC ancestor37,38, as well as genes encoding 3R
(DNA repair, replication and recombination) system components39. e genes proposed to be acquired through
HGT were selected since they may represent ancient polymorphisms, and so it is expected that they might contain
a higher fraction of synonymous alterations. e genes included in the 3R system were selected since previous
work performed with M. tuberculosis strains suggest a general negative/purifying selection acting upon these
genes and that they might play an important role in evolution39. Another objective of the work was to infer the
presence of recombination events. For this purpose, and considering that our dataset from Portugal only had
genomes included in European clonal complex 2 and strains without a clonal complex assigned, we decided to
include publicly available genomic data to end up with representatives from all clonal complexes and to increase
robustness and breadth of results.
Methodology
Mycobacterium bovis isolates dataset. Forty-two newly sequenced M. bovis genomes from an endemic
multi-host TB scenario in Portugal (details below), previously characterized from an epidemiological point of
view36, were at the centre of this work. Considering that the dataset from Portugal only has representatives of
European 2 clonal complex and strains without complex assigned, publicly available whole genome sequencing
data was added in order to enlarge the dataset with representatives from all M. bovis clonal complexes. erefore,
three sources of whole genome sequencing data were used in this work: complete/dra genome assemblies up
to a maximum of 10 scaolds deposited at NCBI (National Center for Biotechnology Information) (n = 15 iso-
lates); Illumina fastq les deposited at SRA (Sequence Read Archive) representative of M. bovis clonal complex
diversity (n = 12 isolates)30; and 42 newly sequenced genomes from Portugal. Mycobacterium bovis BCG (bacil-
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
lus Calmette-Guérin) was excluded from the NCBI search. M. bovis AF2122/97 commonly used as reference
genome was included in the dataset. Due to the public unavailability of whole genome sequences from repre-
sentatives of African 1 clonal complex, and the low numbers of genomes from representative strains of Af2 and
Eu1, raw sequencing data available at SRA was used in those cases. e work of Zimpel and collaborators (2020)
helped in the identication of genomes from the aforementioned clonal complexes and in the selection process
of M. bovis to include in the dataset. For Eu3, only one type genome is described (Branger etal., 2020), thus the
genome that we included is the solo representative of the Eu3 complex.
Globally, the dataset included 70M. bovis isolated from eight host species, distributed by 12 countries between
1985 and 2016. irty-six were assigned as Eu2, seven as Eu1, one as Eu3, three as Af1, four as Af2 and 19 were
not attributed to any clonal complex (details below). Detailed information about the M. bovis used in this study
(including accession numbers) can be found in Table1 and Supplementary Table1.
Newly sequenced genomes (dataset from Portugal). Forty-two newly sequenced M. bovis whole genomes origi-
nating from animal TB hotspots in Portugal and scattering a period of over 12years were at the centre of this
study, as the underlying wildlife-livestock disease system has been monitored regularly31,36 (Supplementary
Fig.1). ese strains were isolated from cattle (n = 14), red deer (n = 16) and wild boar (n = 12) from 2003 to
2015, according to the ensuing procedure: animal tissue samples were pooled and processed following the pro-
tocol guidelines recommended in the OIE Manual for Terrestrial Animals and inoculated onto Stonebrink and
Löwenstein-Jensen pyruvate solid media and liquid medium. Cultures were incubated at 37°C and inspected
weekly for growth for a minimum period of 12weeks. Colonies were directly stored at glycerol solution at -80ºC.
e DNA for the WGS procedure was obtained aer a single invitro passage of original archived samples in
mycobacteria selective medium (Middlebrook 7H9, BD Diagnostics). For that purpose, frozen culture stocks
were re-cultured on Middlebrook 7H9 supplemented with 5% sodium pyruvate and 10% ADS enrichment (50g
albumin, 20g glucose, 8.5g sodium chloride in 1 L water) at 37°C. Aer four weeks’ growth, the culture medium
was renewed, and the cultures were monitored regularly until growth was observed. Cells were harvested by
centrifugation, the pellet was resuspended in 500 µL phosphate buer saline (PBS), heat-killed at 99°C during
30min, centrifuged, and the supernatant stored at -20°C until WGS. All procedures were performed on a level
3 biosecurity facility.
WGS paired-end genomic libraries were prepared with unique indexing of each DNA sample and sequenced
using Illumina MiSeq (2 × 250 pb) (40 samples) and HiSeq (2 × 150 pb) (two isolates) technology (Eurons
Genomics, Germany). e genomic DNA was sequenced using the Illumina Genome Analyser with the paired-
end module attachment and libraries were constructed with Nextera XT DNA Library Prep Kit from Illumina,
according to the manufacturer’s specications.
Clonal complex assignment. Considering the data recovered from SRA (n = 12), the clonal complex identica-
tion was available as metadata of the corresponding publications30,41,43. When considering complete genomes,
with the exception of M. bovis AF2122/97 and M. bovis 3601 that are recognized members of Eu1 and Eu3
clonal complexes, respectively25,29, whole genome alignment with M. tuberculosis H37Rv (NCBI accession
NC_000962.3) was performed using MAFFT (Multiple alignment program for amino acid or nucleotide sequences,
version 7.458) with parameter–addfragments48. en, the presence of the deletions and/or SNP characteristic of
the dierent clonal complexes was searched.
e newly sequenced M. bovis (n = 42) and raw reads from dra assembly genomes (n = 3) were aligned
with reference genome M. tuberculosis H37Rv via vSNP pipeline and the presence of the deletions and/or SNP
characteristic of the dierent clonal complexes was searched.
Information from the presence/absence of characteristic deletions and/or SNP and spoligotyping prole were
gathered to assign the genomic data to the corresponding clonal complex. For four dra assemblies it was not
possible to infer the spoligotyping prole, and so they were included in the “without complex” group.
Bioinformatics analysis. e bioinformatics workow followed in this work started from de novo assem-
bly and map to reference strategies, with the purpose to explore recombination events and the polymorphisms of
specic gene groups. Figure1 provides a owchart of the steps followed. For the recombination analysis, all the
genomes were used to increment the robustness of inferencesand the associated metrics.
De novo genome assembly. In order to mitigate errors in the generation of genome consensus sequences, we
rst obtained de novo assemblies and, then, the core multi-alignment. e Unicycler pipeline, currently avail-
able at https:// github. com/ rrwick/ Unicy cler49, was implemented to perform de novo assembly for 54 sequenced
genomes (42 newly sequenced and 12 fastq les recovered from SRA). Briey, before de novo assembly, reads
quality analysis was performed in FastQC version 0.11.7 (https:// github. com/s- andre ws/ FastQC), and whenever
necessary cleaned with Trimmomatic version 0.36 (options “cut adapter and other illumina-specic sequences
from the read” and “cut bases o the end of a read, if bellow a threshold quality of 20” were applied) (http:// www.
usade llab. org/ cms/? page= trimm omatic)50. en, SPAdes optimiser49 was used for genome assembly and Pilon
version 1.1851 for post-assembly optimization. A conservative bridging mode was selected to avoid misassemble
and the k-mer size was searched and selected between 20 and 95% of read length. Following SPAdes guidelines
and considering reads’ size, contigs with less than 300bp were removed and a 20 read depth coverage cut-o
was established52. In the de novo assembly strategy, no genome regions, such as the highly repetitive Proline-
Glutamate (PE) and Proline-Proline Glutamate (PPE) paralogous genes, were removed.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
M. bovis ID Clonal complex(a) Country Ye a r Host species References Type of sequence
Mb0220 w/o CC Portugal 2003 Cattle 40 Newly sequenced
Mb0261 Eu2 Portugal 2006 Red deer 40 Newly sequenced
Mb0601 Eu2 Portugal 2007 Cattle 40 Newly sequenced
Mb0769 Eu2 Portugal 2008 Cattle 40 Newly sequenced
Mb0783 Eu2 Portugal 2008 Wild boar 40 Newly sequenced
Mb0865 Eu2 Portugal 2008 Cattle 40 Newly sequenced
Mb0891 Eu2 Portugal 2009 Red deer 40 Newly sequenced
Mb0893 Eu2 Portugal 2008 Wild boar 40 Newly sequenced
Mb1317 Eu2 Portugal 2010 Cattle 40 Newly sequenced
Mb1339 Eu2 Portugal 2010 Cattle 40 Newly sequenced
Mb1458 w/o CC Portugal 2010 Wild boar 40 Newly sequenced
Mb1480 w/o CC Portugal 2010 Cattle 40 Newly sequenced
Mb1654 Eu2 Portugal 2011 Cattle 40 Newly sequenced
Mb1670 w/o CC Portugal 2011 Red deer 40 Newly sequenced
Mb1711 Eu2 Portugal 2011 Red deer 40 Newly sequenced
Mb1712 Eu2 Portugal 2011 Red deer 40 Newly sequenced
Mb1714 Eu2 Portugal 2011 Cattle 40 Newly sequenced
Mb1744 w/o CC Portugal 2012 Wild boar 40 Newly sequenced
Mb1746 Eu2 Portugal 2012 Red deer 40 Newly sequenced
Mb1758 Eu2 Portugal 2012 Cattle 40 Newly sequenced
Mb1769 Eu2 Portugal 2012 Wild boar 40 Newly sequenced
Mb1785 Eu2 Portugal 2012 Red deer 40 Newly sequenced
Mb1789 Eu2 Portugal 2012 Cattle 40 Newly sequenced
Mb1841 Eu2 Portugal 2012 Cattle 40 Newly sequenced
Mb1870 Eu2 Portugal 2012 Wild boar 40 Newly sequenced
Mb1915 Eu2 Portugal 2013 Red deer 40 Newly sequenced
Mb1948 w/o CC Portugal 2013 Red deer 40 Newly sequenced
Mb1960 Eu2 Portugal 2013 Red deer 40 Newly sequenced
Mb2026 Eu2 Portugal 2013 Cattle 40 Newly sequenced
Mb2043 Eu2 Portugal 2013 Red deer 40 Newly sequenced
Mb2067 Eu2 Portugal 2013 Wild boar 40 Newly sequenced
Mb2206 Eu2 Portugal 2014 Cattle 40 Newly sequenced
Mb2235 w/o CC Portugal 2014 Red deer 40 Newly sequenced
Mb2277 w/o CC Portugal 2014 Red deer 40 Newly sequenced
Mb2300 Eu2 Portugal 2014 Wild boar 40 Newly sequenced
Mb2310 Eu2 Portugal 2015 Red deer 40 Newly sequenced
Mb2313 Eu2 Portugal 2015 Wild boar 40 Newly sequenced
Mb2325 Eu2 Portugal 2015 Red deer 40 Newly sequenced
Mb2328 Eu2 Portugal 2015 Red deer 40 Newly sequenced
Mb2347 w/o CC Portugal 2015 Wild boar 40 Newly sequenced
Mb2395 Eu2 Portugal 2015 Wild boar 40 Newly sequenced
Mb2397 Eu2 Portugal 2015 Wild boar 40 Newly sequenced
Mb502499 Af1 Ghana NA Human 30,41 SRA deposited
Mb502526 Af1 Ghana NA Human 30,41 SRA deposited
Mb1203064 Af1 Ghana NA Human 30,41 SRA deposited
Mb4117155 Af2 France NA Wild boar 30,42 SRA deposited
Mb1791710 Af2 Tanzania NA Chimpanzee 30,43 SRA deposited
Mb1791712 Af2 Tanzania NA Chimpanzee 30,43 SRA deposited
Mb1792006 Eu1 USA 2006 Cattle 43 SRA deposited
Mb1792127 Eu1 USA 2008 Cattle 43 SRA deposited
Mb1792361 Eu1 USA 2013 Cattle 43 SRA deposited
Mb7240242 Eu1 USA 2016 Cattle 43 SRA deposited
Mb7240415 Eu1 USA 2014 Cattle 43 SRA deposited
Mb1791984 Eu1 USA 2005 Cattle 43 SRA deposited
MBE1 w/o CC Egypt 2014 Cattle NA assemble/dra genomes NCBI
MBE3 w/o CC Egypt 2014 Cattle NA assemble/dra genomes NCBI
Continued
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
Table 1. Characteristics of Mycobacterium bovis genomes used in this work. Eu1: European 1, Eu2: European
2, Eu3: European 3, Af1: African 1, Af2: African 2, and w/o CC: without clonal complex. NA: non-available
information.
M. bovis ID Clonal complex(a) Country Ye a r Host species References Type of sequence
MBE4 w/o CC Egypt 2014 Cattle NA assemble/dra genomes NCBI
MBE10 w/o CC Egypt 2015 Cattle NA assemble/dra genomes NCBI
Mb0077 w/o CC Canada 2006 Elk NA assemble/dra genomes NCBI
Mb0565 w/o CC Canada 2011 Cattle NA assemble/dra genomes NCBI
BMR25 w/o CC Canada 1985 Bison NA assemble/dra genomes NCBI
Mb3601 Eu3 France 2014 Cattle 29 assemble/dra genomes NCBI
Mb0476 Eu2 Canada 2002 Cattle NA assemble/dra genomes NCBI
MbSP38 Eu2 Brazil 2010 Cattle 44 assemble/dra genomes NCBI
Mb1595 w/o CC Korea 2012 Cattle 45 assemble/dra genomes NCBI
Mb0030 w/o CC China NA NA 46 assemble/dra genomes NCBI
Mb0001 Eu2 Brazil 2015 Tapirus terrestris NA assemble/dra genomes NCBI
Mb0003 w/o CC India 1986 Cattle NA assemble/dra genomes NCBI
Mb31150 Af2 Uganda NA Chimpanzee 30,47 assemble/dra genomes NCBI
Figure1. Bioinformatics workow followed in this study.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
e quality of de novo assemblies was assessed by QUAST pipeline (http:// quast. sourc eforge. net/ quast. html),
which promotes the remapping of contigs with M. bovis AF2122/97 reference genome (NCBI accession number
LT708304.1) (quality parameters presented in Supplementary Table1).
Genome map to reference. e FASTQ les from the newly sequenced M. bovis obtained from Illumina
sequencing were aligned with M. bovis AF2122/97 reference genome (LT708304.1) with the help of vSNP pipe-
line (https:// github. com/ USDA- VS/ vSNP). e standard ltering parameters or variant quality score recalibra-
tion were applied according to Genome Analysis Toolkit (GATK)’s Best Practices recommendations5355. Results
were ltered using a minimum SAMtools quality score of 150 and AC = 2. Reads were also examined using
Kraken (http:// ccb. jhu. edu/ sow are/ kraken/) to exclude contamination. e vSNP pipeline used for the map
to sequence strategy in our work examines a series of dening SNPs and targets also to exclude mixed infection
scenarios. Genome coverage by reads was superior to 99% (Supplementary Table1).
To avoid mapping errors and false SNPs, a variant was ltered out if: (1) it was supported by less than 20 reads,
(2) it was found in a frequency of less than 0.9, (3) it was registered in at least one strain but also with a gap in
at least another strain. SNPs and positions with mapping issues or alignment problems were visually validated
with Integrated Genomics Viewer (IGV) version 2.4.19 (http:// sow are. b road insti tute. org/ sow are/ igv/)56. Since
Proline-Glutamate (PE) and Proline-Proline Glutamate (PPE) genes are highly repetitive and part of multi-
gene families, they are prone to misreading by Illumina sequencing and mis-mapping and so are preferentially
removed from the bioinformatics workow of Mycobacterium tuberculosis complex members when a strategy of
map to sequence is used to conrm SNPs. We thus ltered PE/PPE genes out from the analysis, as well as indels.
All SNPs were grouped into functional categories according with Bovilist (http:// genol ist. paste ur. fr/ BoviL ist/).
e SnpE pipeline (https:// pcing ola. github. io/ SnpE/) was employed to infer SNP consequences (synonymous
or non-synonymous alterations). A new database for M. bovis AF2122/97 genome (LT708304.1) was created.
Global core genome multi-alignment. e core genome multi-alignment was performed with Parsnp v1.2, cur-
rently available at https:// github. com/ marbl/ parsnp57, using the 69 complete genomes/dra assemblies (with
option -c) and M. bovis AF2122/97 (LT708304.1) as reference. Four core multi-alignment were performed:
including only members of Eu2 clonal complex (n = 37), including all members of European clonal complexes
(n = 44), including a junction of European and African clonal complexes (n = 51), and including all M. bovis from
this study (n = 70).
e core alignments generated by Parsnp were used to infer maximum-likelihood (ML) phylogenetic trees
using RAxML, via CIPRES Science Gateway v3.3 (http:// www. phylo. org/)58, with 1000 bootstrap replications.
Estimation of recombination events. e presence of recombination events was examined using three dier-
ent algorithms and bioinformatics tools in parallel: SplitsTree4 soware, Gubbins (Genealogies Unbiased By
recomBinations In Nucleotide Sequences) pipeline and RDP4 (Recombination Detection Program, version beta
4.101) soware.
e split decomposition method implemented in SplitsTree4 v4.15.1 (http:// www. split stree. org/)59 was imple-
mented to compute unrooted phylogenetic networks, which were validated statistically using the Phi test, with
a signicance threshold of p = 0.05. e core multi-alignments from Parsnp analysis were used as input and the
split decomposition as network criteria was implemented.
Gubbins pipeline v2.3.1 (https:// github. com/ sanger- patho gens/ gubbi ns60 was run using default parameters,
as another way to assess the impact of recombination on M. bovis. e algorithm implemented in the pipeline
reconstructs the clonal genealogy relating the complete genomes/dra assemblies of our dataset and the reference
genome (M. bovis AF2122/97, LT708304.1) to each other; and scans the positions of SNPs across each branch
of the tree in order to detect clusters of SNPs that would indicate recombination events. e null hypothesis for
branch assumes the absence of any recombination events, therefore implying that the SNPs occurring on the
branch should be evenly distributed. e core multi-alignments from Parsnp and the best scoring ML tree from
RAxML were used as input les.
Finally, to conrm the recombination events suggested by the Gubbins pipeline, six algorithms (RDP61,
GENECONV62, Bootscan63, Maxchi64, Chimaera65, and SiScan66) implemented in RDP467 were applied to the
core multi-alignments from Parsnp under default settings. We established that at least three of the algorithms
implemented in RDP4 had to concordantly evidence a signicant signal to validate each recombination event.
Considering that both Gubbins and RDP soware seek recombination signals by inspecting the core multi-
alignment in windows of 500bp maximum, and to conrm that the inclusion of PE/PPE genes in the de novo
assembly process did not interfere with the recombination signals found, the neighbourhood of genes in which
recombination events were identied were further inspected through a synteny analysis. Synteny maps, using
complete genomes, were constructed with MAUVE—multi-genome alignment (http:// darli nglab. org/ mauve/
mauve. html) to exclude local genome translocations or inversions. Furthermore, a synteny analysis with ami-
noacidic sequences was performed via SyntTax webserver (https:// archa ea. i2bc. paris- saclay. fr/ SyntT ax/) using
complete genomes.
Gene diversity analyses. e genome dataset obtained from a multi-host TB system in Portugal was subjected
to deeper analyses with the objective to examine the polymorphisms in the genes referred in the literature as
having been acquired through HGT by the MTBC ancestor37,38 and in the genes encoding 3R (DNA repair,
replication and recombination) system components39. Gene sequences of the 42M. bovis, together with gene
sequence from the reference genome (M. bovis AF2122/97, NC_002945.4), were aligned using ClustalX v2.1
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
(http:// www. clust al. org/ clust al2/) and used as an input for the calculation of gene diversity, nucleotide diversity
(π) and Tajima’s D neutrality test parameters via DnaSP v6.12.03 (http:// www. ub. edu/ dnasp/).
Results and discussion
Global phylogenetic analysis. A Maximum Likelihood (ML) phylogenetic tree based on the 69M. bovis
isolates and reference genome was obtained (Fig.2A). is strategy allows the generation of a more robust tree,
when comparing with single gene based trees or multi-locus based trees, that do not capture the variability
across the entire genome and consequently present low inter-specic discriminatory power68,69. e resulting
topology of the ML tree generally agrees with clonal complex classication, with genomes of Eu2 clustering in
one tree branch and genomes of Af1 also clustering together (Fig.2A). Results are also in agreement with the
known M. bovis evolutionary relationships that present a large division between Eu1 members and a group com-
posed by all the other clonal complexes and genomes without assigned clonal complex30. Small inconsistencies
between clonal complex and the relationships observed at the phylogenetic tree can be explained by the fact that
clonal complexes are described based on specic genomic regions, while the phylogenetic tree is based on core
genome multi-alignment representing the whole genomes.
Evidences of recombination in Mycobacterium bovis. Mycobacterium tuberculosis complex is
described to have clonally evolved, and most evidences accumulated over the years support the idea that ongo-
ing HGT and recombination events do not occur at detectable levels in the MTBC15,17,18.
Previous works have suggested that there might be limited recombination among MTBC strains20,21, while
others were not successful to identify measurable recombination events70,71. To revisit this issue with focus on
M. bovis, and unlike previous works that only accounted for M. tuberculosis70,71; or that accountedMTBC as a
whole, with few M. bovis representatives20; or that only considered a restrict M. bovis dataset21, in this work a
total of 70 strains, with representatives from all clonal complexes, was used to screen for recombination. e
dataset was scaled in four cumulative levels: (1) Eu2 members, (2) all European clonal complexes members (i.e.
European), (3) bothEuropean and African clonal complexes (Eu + Af) and (4) the entire dataset (encompassing
the genomes that are not included in any of the clonal complexes already described).
To investigate this postulate further, a split decomposition network was performed to assess for the absence
of recombination events between genomes, since this method enables the visualization of ancestral relationships
between individuals and displays conicting phylogenetic signals. e presence of cycles in the network (i.e.
regions that do not converge into a single tree), was conrmed in all four datasets under analysis, however none
was supported statistically by the Phi test (Eu2, p = 0.0956; European, p = 0.1637; Eu + Af p = 0.2774; entire dataset
p = 0.2451), providing poor evidence for the presence of recombination events (Fig.3A-D).
Following this analysis, and considering the observation of cycles in all networks, the reconstruction algo-
rithm implemented in Gubbins pipeline was applied in order to reconstruct the clonal genealogy and to perform
a complementary estimation of the impact of recombination in M. bovis genomes. A cumulative number of
recombination events was inferred with the majority occurring in terminal branches (i.e. occurring in a single
genome) (Table2). e metrics showed consistency across the datasets and revealed that recombination events
occurred two hundred to three hundred times less frequently than mutations, once the rho/theta parameter
Figure2. Maximum likelihood phylogenetic tree (GTR) built based on core-genome alignment of M. bovis
genomes before (A) and aer (B) the removal of recombination sites. Branch colors represent M. bovis clonal
complexes: purple for European 1, red for European 2, blue for European 3, orange for African 1 and green for
African 2. e tree is rooted and drawn to scale with branch lengths measured as the number of substitutions
per site.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
that represents the relative rates of recombination and point mutation on a branch presented an average value
between 0.0037 and 0.0056 (Table3). Recently, a published work with 38 M. bovis strains evidenced a higher rho/
theta value (rho/theta = 0.1) than the one obtained for this dataset21, however the work by Patané and co-workers
used reference-based assemblies to infer recombination parameters, a procedure detail that was already associ-
ated with enrichment of putative recombination events at terminal branches due to the assembly procedure70.
Following, the r/m parameter, which represents the ratio of diversity introduced by recombination and
mutation, revealed an average value between 0.025 and 0.037, pointing that recombination has a lower overall
Figure3. Visualization of conicting phylogenetic signals at unrooted phylogenetic trees by the split
decomposition method in European 2 genomes (n = 37) (A), in European genomes (n = 44) (B), in a
combination of European and African genomes (n = 51) (C) and in the entire dataset (n = 70) (D).
Table 2. Number of recombination events inferred by the Gubbins pipeline and RDP4.
Dataset No. Gubbins events (% in terminal branches) No. RDP4 events (% in terminal branches)
European 2 (n = 37) 4 (50%) 1 (0%)
European (n = 44) 5 (60%) 2 (0%)
European and African (n = 51) 6 (66.7%) 2 (0%)
Entire dataset (n = 70) 8 (75%) 3 (33.3%)
Table 3. Recombination metrics obtained through the Gubbins pipeline analysis.
Dataset r/m Rho/theta
European 2 (n = 37) 0.025 0.0037
European (n = 44) 0.034 0.0046
European and African (n = 51) 0.037 0.0056
Entire dataset (n = 70) 0.037 0.0044
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
eect in M. bovis genetic diversity when comparing with mutation (Table3). To make a broad comparison, the
r/m parameter was estimated using a similar methodology for an MTBC dataset composed by 23 genomes,
revealing a mean value of 0.48620, while for the 38M. bovis dataset of Patané and co-workers21 it evidenced a
mean value of 0.98. In the rst study there were only two M. bovis (M. bovis BCG and reference strain) within
the 23 genomes included in the work, so the obtained value might be biased by the overrepresentation of M.
tuberculosis genomes. In the second report, the M. bovis population under analysis was mainly recovered from
American countries and livestock hosts. In contrast, in our dataset, a higher number of geographic locations
and host species is represented, and genomes grouped into dierent clonal complexes with distinct population
genetic signatures were also used, enabling a deeper and wider population knowledge. e dierential r/m
average values obtained with our dataset are consistent with the notion that the extent of recombination vary
widely among lineages assigned to the same taxonomical species, so these results suggest that M. bovis clonal
complexes might exhibit a dierential impact of recombination, as also suggested by Didelot & Maiden72. Nev-
ertheless, enlarging signicantly this dataset with the inclusion of a higher number of M. bovis genomes would
allow further clarication of this point. Both r/m and rho/theta parameters present variability among the tree
branches, a result that is in agreement with reports concerning other bacterial species72,73.
Finally, to conrm the recombination events identied by Gubbins pipeline, the dierent core multi-align-
ments were also independently tested in RDP4 soware with six dierent algorithms. Globally, less than half of
the events identied by Gubbins were conrmed by RDP4 (Tables4, 5). Considering the entire dataset, three
recombination events were conrmed, two involving internal nodes and another one involving a single genome
in a terminal branch and for which a clonal complex could not be assigned (Tables4, 5). e identication of
events in terminal branches might be a sign that recombination is still ongoing in contemporary M. bovis strains
or the result of misalignment70. In this putative recombination region, circa 20% of positions have an undened
nucleotide (N), which can therefore inuence the recombination signal (Supplementary Fig.2). Moreover, this
region aects the rrs gene, encoding the 16S ribosomal RNA that is expected to be highly conserved, so this puta-
tive recombination signal could be the result of a sequencing error or wrong alignment. Whole genome alignment
between Mb0003 and M. bovis AF2122/97 was thus then performed and the presence of undened nucleotides
and of SNPs was conrmed, so the likely issues related to wrong alignment did not arrive as a consequence of
the bioinformatics procedure implemented in this work.
No gaps or undened nucleotides were identied in the recombination regions of internal nodes (Figs.4, 5).
With respect to these events, one encompasses exclusively Eu2 genomes, aecting the pks12 gene that encodes
a probable polyketide synthase; while the other one is registered across Eu1 genomes and aects narX gene
Table 4. Detailed information concerning the recombination events identied by Gubbins and RDP4 in the
entire dataset. Genome positions according with M. bovis AF2122/97.
Recombination e vent Identication Core-alignment
positions Genome positions(a) Gene name Mb gene name Classication of gene
function M. bovis isolate ID
#1 Gubbins 945,923–945,950 1,220,297–1,220,324 PE PGRS22 Mb1121 PE-PGRS family protein Mb2026
#2 Gubbins; RDP4 1,176,674–1,177,221 1,475,305–1,475,975 rrs Mb5019 Ribosomal RNA 16S Mb0003
#3 Gubbins; RDP4 1,532,736–1,532,787 1,953,495–1,953,548 narX Mb1765c Probable nitrate reduc-
tase NarX Mb1792361
Mb7240415
#4 Gubbins 1,532,751–1,532,781 1,953,840–1,953,870 narX Mb1765c Probable nitrate reduc-
tase NarX Mb1792361
#5 Gubbins; RDP4 1,794,609–1,794,714 2,283,200–2,283,315 pks12 Mb2074c Probable polyketide
synthase pks12
Mb0891 Mb1711
Mb1789 Mb1870
Mb1758 Mb2043
Mb1960
#6 Gubbins 1,794,627–1,794,780 2,283,713–2,285,136 pks12 Mb2074c Probable polyketide
synthase pks12 Mb0003
#7 Gubbins 2,242,002–2,242,098 2,839,474–2,839,570 tatA Mb2121
Probable Sec-independ-
ent protein translocase
membrane-bound
protein tatA
Mb0565
#8 Gubbins 3,244,551–3,244,556 4,003,420–4,003,425 espa Mb3646c Conserved hypothetical
alanine and glycine rich
protein Mb2043
Table 5. Statistical values associated with dierent algorithms implemented in RDP4 for the conrmed
recombination events.
Recombination
event Alignment positions RDP (p-value) GENECONV
(p-value) Bootscan
(p-value) MaxChi
(p-value) Chimaera (p-value)
#2 1,176,674–1,177,221 7.524 × 10−22 1.871 × 10−20 1.004 × 10−15 9.926 × 10−05 9.753 × 10−05
#3 1,532,736–1,532,787 3.771 × 10−09 5.216 × 10−08 5.634 × 10−03 – –
#5 1,794,609–1,794,714 1.338 × 10−11 2.324 × 10−10 6.200 × 10−12 – –
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
encoding a probable nitrate reductase (Table4). Overall, the recombination analysis suggested the presence of
a limited number of recombination segments with statistical support, and the inferred metrics indicate a lower
eect of recombination on M. bovis genealogy. e recombination signal was expected to be low, however it is
important to distinguish true evolutionary signals from background noise, which is a challenging task. In order to
decrease the noise signal proposed to be introduced by reference-based assemblies and misalignment issues70,71,
Figure4. Detailed visualization of alignment in the recombination region of M. bovis dataset aecting the
narX gene encoding a probable nitrate reductase. No gaps or undened nucleotides were identied in the
recombination region of internal nodes. is specic event was registered across Eu1 genomes. e quality
of sequencing of narX gene was evaluated by read mapping against M. bovis AF2122/97. e SNP positions
suggested in the recombination region were conrmed by applying the criteria referred to in the methods
section (at least 20 reads and 0.9 frequency of alteration). e polymorphisms at narX gene were fully conrmed
in genomes Mb1792361 and Mb7240415 (2.3%).
Figure5. Detailed visualization of alignment in the recombination region of M. bovis dataset aecting the
pks12 gene. No gaps or undened nucleotides were identied in the recombination region of internal nodes.
With respect to this event aecting the pks12 gene that encodes a probable polyketide synthase, it encompasses
exclusively Eu2 genomes. e quality of sequencing of pks12 was evaluated by read mapping against M. bovis
AF2122/97. e SNP positions suggested in the recombination region were conrmed by applying the criteria
referred to in the methods section (at least 20 reads and 0.9 frequency of alteration). e polymorphisms were
fully conrmed for genomes Mb0891, Mb1711, Mb1789, Mb1870, Mb1758, Mb2043, Mb1960.
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol.:(0123456789)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
with the exception of complete genomes, all the remaining ones were de novo assembled and the quality of
assemblies was checked and secured via QUAST pipeline analysis (Supplementary Table1). Moreover, a series of
complementary analyses was performed to provide robustness and accurateness to the overall investigation. us,
the quality of sequencing of narX and pks12 genes was evaluated by read mapping against M. bovis AF2122/97.
e SNP positions suggested in the recombination region were conrmed by applying the criteria referred in
the methods section (at least 20 reads and 0.9 frequency of alteration). e polymorphisms at narX gene were
fully conrmed in two genomes (Mb1792361 and Mb7240415; 2.3%), as well as in the case of pks12 gene for
genomes Mb0891, Mb1711, Mb1789, Mb1870, Mb1758, Mb2043, Mb1960. However, for genome Mb2043, six
out of eight positions did not meet the read depth criteria because the SNPs were supported by a maximum of 17
reads that was below the established cut-o of 20. Recombination at this genome spot could thus be conrmed
for six genomes (8.6%) (Figs.4, 5).
PE and PPE genes have repetitive regions prone to misreading by Illumina sequencing and mis-mapping
and so are commonly removed from the bioinformatics workow of Mycobacterium tuberculosis members only
when a strategy of map to sequence is used. e inference of recombination events applied in this work was based
on de novo assemblies for which PE/PPE were not ltered out. We believe that the strategy applied, with the
implementation of three dierent, complementary approaches and algorithms by SplitsTree, Gubbins pipeline
and RDP4 soware, is robust to deal and lter recombination regions arising from false signals. Nevertheless, to
exclude the interference of PE/PPE genes on the identication of SNP clusters by Gubbins and RDP4 soware,
and consequently on the identication of the recombination regions proposed to aect narX and pks12 genes,
the neighbourhood of these genes was inspected (Supplementary Fig.3–5). In M. bovis AF2122/97, the narX
gene is delimited by narK2 and Mb1764c, while pks12 is surrounded by Mb2075c e Mb2073c (Supplementary
Fig.3–5). Synteny maps with MAUVE using complete genomes yielded plots providing information about gene
order conservation and rearrangements, showing four colinear blocks, without signs of genome translocations
or inversions. Furthermore, a complementary analysis with aminoacidic sequences evidenced synteny in all
complete genomes and no PE/PPE were identied in the neighbourhood regions of narX or pks12. For narX, one
genome (Mb0030) had a lower synteny score, since narX gene is identied in two segments (segment 1891 and
1890). For pks12, Mb0030 and Mb003 present lower synteny scores due to a similar situation, whereas pks12 is
identied in two and three segments, respectively, representing dierent domains of the protein (Supplementary
Fig.3–5). Considering this information and that both Gubbins and RDP4 soware perform an analysis inspect-
ing the core multi-alignment in windows with a maximum of 500bp, we conrmed that the PE/PPE genes did
not interfere with the recombination signals aecting narX and pks12.
Although the recombination signals detected in this dataset may be considered residual, recombination in
M. bovis cannot indeed be excluded and should thus continue to be the subject of further analyses for which
sequencing of whole genomes from dierent epidemiological scenarios is crucial.
Comparing the obtained ML phylogenetic trees before and aer the recombination correction (Fig.2A,B) did
not lead to signicant changes in the inferred phylogenetic relationships, with M. bovis strains being gathered
within the same groups.
An evolutionary scenario for M. bovis from a multi‑host TB system in Portugal. A SNP align-
ment containing 1816 polymorphic positions was obtained aer mapping reads of 42 newly sequenced M.
bovis against the reference genome of M. bovis AF2122/97. e majority of SNPs (87.1%) was located in cod-
ing regions and the aected genes were characterized according to functional categories displayed in Bovilist
(Fig.6A,B). Aer accounting for the total number of genes per functional category, the genes encompassed in
“Lipid metabolism” category presented the higher number of SNPs, followed by “Cell wall and cell process” and
“Intermediary metabolism and respiration”, revealing their underlying importance in M. bovis evolution.
Globally, the average dN/dS ratio is superior to 1.5, which suggests a global evolutionary pressure to escape
from the ancestral state and representing positive (diversifying or directional) and/or relaxed purifying selection
scenarios. In the categories “Virulence, detoxication, adaptation, “Insertion seqs and phages” and “Regulatory
proteins, over two-thirds of SNPs were non-synonymous (Fig.6B).
In all categories, there were genes with more than one SNP, leading to an average rate of mutation (i.e. the
mean value of SNPs per gene) greater than one (Fig.6A). e higher mutation values were harboured by pks12
(Mb2074c) with 15 SNPs and fas (Mb2553c) with 8 SNPs. Both genes are involved in fatty acid metabolism. e
pks genes encode polyketide synthases (PKS) which are multifunctional enzymes involved in the biosynthesis of
mycobacterial cell wall lipids74,75. is gene encodes a multifunctional polypeptide that is involved in the synthesis
of mycoketides74,76. e fas gene is involved in the synthesis of mycolic acids. Both genes play an import role in
the biosynthesis of the cell wall that is at the interface with the host.
SNP‑detailed analysis of HGT and 3R genes. To further study the evolutionary processes within
M. bovis, two specic groups of genes were analysed. Previous published works using sequence composition
and phylogenetic methods identied genes that were acquired through HGT by the MTBC ancestor before
diversication37,38. ose genes are listed in Supplementary Table2. e SNP distribution was analysed in a total
of 77 genes presumably involved in HGT, and 26 polymorphic sites were identied, leading, in the majority of
cases (78%), to a non-synonymous (NS) change (Supplementary Table2). Previous work conducted with MTBC
genomes evidenced that putative HGT regions present a higher ratio of NS SNPs when comparing with the rest
of the genome20. If one considers that these recombination tracts were acquired by the MTBC ancestor and, thus,
they over-represent ancient polymorphisms, then it would be expected a higher fraction of synonymous altera-
tions, since NS substitutions are expected to be eliminated by negative selection, as the changes in amino acid
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
might modify protein function. So, our results suggest that functional consequences may arise from substitu-
tions in HGT-like genes, which remits to their importance on valuable adaptive genetic diversity.
In parallel with this analysis, the genes encoding 3R (DNA repair, replication and recombination) system
components were thoroughly examined, following the previous published list by dos Vultos and collaborators
(2008)39. e exchanges of identical DNA fragments cannot be directly observed, although it might be a frequent
process when involving closely related bacteria, such as in the case of this dataset; plus, this process might be
crucial as a DNA repair method72 and thus play a role in homologous recombination. A total of 26 polymorphic
positions distributed by 54 genes were identied (Supplementary Table3). In this group of genes, NS changes
account for about 65% of the consequences, which is in agreement with a previous report for M. tuberculosis
strains39.
Gene and nucleotide diversity (π) were evaluated for the genes presenting polymorphisms. Gene diversity
is a measure of the uniqueness of a particular gene sequence in a population. Average values of 0.256 and 0.226
were obtained for HGT and 3R group genes, respectively. When the value of gene diversity index is zero, all the
sequences under analysis are equal. erefore, the values obtained in this work reveal that there is limited genetic
diversity within the selected panel of genes. e nucleotide diversity (π) compares the similarity per site between
two nucleotide sequences. When π is superior to 0.003 it can be considered that the group of sequences under
analysis is highly diverse. In our analysis, both gene groups reveal an average value inferior to 0.003, with HGT
registering 0.00034 and the 3R circa. 0.00021. No gene had a π value higher than 0.003, thus also conrming
limited nucleotide diversity within the selected gene panels.
e Tajima’s D test of neutrality was also evaluated, and in both groups there were genes with positive and
negative values, evidenced by an average value inferior to zero. e selection against deleterious mutations, past
selective sweeps and population expansion aer a recent bottleneck are pointed as possible causes to decrease
the result from Tajima’s D test.
Balance of forces in M. bovis evolution. Natural selection is a mechanism of evolution and has been
associated with MTBC evolution9. Selective sweeps (i.e. positive selection that leads to the xation of a new
Figure6. Stratied analysis for the M. bovis dataset from Portugal (n = 42). Total number of SNPs and aected
genes registered per functional category (A). Total number of synonymous and non-synonymous alterations
registered by functional category (B).
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol.:(0123456789)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
benecial mutation) and background selection (i.e. selection against a deleterious mutation that leads to the
elimination of any mutation linked to the target of selection) are both linked to the action of natural selection.
In this work, several evidences support the importance of natural selection: (1) SNP distribution is not ran-
dom, with genes included in the “lipid metabolism, “cell wall and cell processes” and “intermediary metabolism
and respiration” categories presenting a higher SNP rate; (2) regions proposed to be transferred from MTBC
ancestor also accumulate an excess of SNPs; and (3) the HGT and 3R groups evidenced a global average value
inferior to zero in the neutrality tests, indicating a past selective sweep or expansion aer bottleneck. Further-
more, the high proportion of low-frequency genetic variants, particularly singletons, is one of the features associ-
ated with MTBC population genetics, and proposed to reect the inuence of background selection10,77, an eect
that is also conrmed in this work, as 372 (20.5%) of the 1816 considered SNPs are strain-specic.
e global elevated value of dN/dS ratio is commonly associated with a positive selection force, likely due
to diversifying selection and local selective sweep. However, a reduction in eective population size might
have contributed, partially, to this unusual rate of NS per synonymous mutations, once mutations that might
have been deleterious in a population with a large eective population size can dri to a high frequency in a
small population and, in that way, reecting reduction in the ecacy of purifying selection as a consequence of
increased genetic dri9,10.
e aected genes could confer important adaptive advantages through NS substitutions, however functional
studies would be necessary to understand the consequences arising from those SNPs and to infer what would be
the benets for mycobacteria. Recent work performed by Yang and collaborators78 with M. tuberculosis strains
suggested that this evolutionary pressure could allow accessory genes (i.e. genes that are not present in all strains
or strain-specic genes) to gradually dominate and eventually become core genes (i.e. present in all strains)79.
is could provide important adaptive and resistance capacities, if considering that accessory genes might be
involved in virulence, immune system evasion or antibiotic resistance.
erefore, a deeper understanding of the role of these evolutionary forces is required to determine which
genes have contributed signicantly to M. bovis evolution in its trajectory of interaction with dierent hosts in
specic disease systems.
Final conclusions and future work
e study of genetic relatedness and structure of obligatory pathogen populations might provide important
insights into their intraspecic genomic diversity and evolution arising upon the interaction with the host. In
recent years, many technologicaladvances have shed light onto the biology of M. bovis, however the use of high-
throughput technologies such as WGS to understand evolutionary steps is still infrequent, with most works in
the TB eld being focused on M. tuberculosis or in the molecular epidemiology of M. bovis.
In the current work, a diverse M. bovis dataset, with representatives of all described clonal complexes, was
used to assess how dierent evolutionary forces impact and shape the genetic diversity of a population. Alto-
gether, we ended up with a dataset composed of 70M. bovis strains, representing the most diverse dataset
available to infer recombination, when comparing with other publicly available works. Furthermore, we used
isolates obtained from multiple hosts, including humans. Although we may speculate that the inclusion of more
genomes might have an impact on the identication of recombination events and recombination metrics, this
pilot work is already signicant in the context of present knowledge. More complete analyses may be conducted
in the future with larger M. bovis datasets to conrm our ndings.
e impact of recombination in our dataset was assessed through three complementary strategies. Moreo-
ver, eorts to avoid unreliable alignments and to guarantee data quality were made, so that the assessment of
recombination signals would be as accurate as possible. Although residual, two approaches support a number of
recombination events in the examined dataset, which argue against the paradigm that MTBC is strictly clonal.
Despite the limited eects on M. bovis diversity when comparing with mutation, recombination events need to
be considered in future evolutionary research works in order to further understand their true impact on biologi-
cal processes, once they may be an important force generating diversication that may translate into virulence,
immune evasion and/or antibiotic resistance phenotypes.
Indeed, previous WGS works support recombination in M. canettii7, showing that strains are highly recombi-
nogenic and evolutionary early-branching, with larger genome sizes, 25-fold more SNPs relative to MTBC mem-
bers. ose works also provide experimental evidence of how pks5-recombination-mediated bacterial surface
remodelling in M. canettii increased virulence, driving evolution from smooth to rough morphology and from
generalist mycobacteria (M. canetti) towards professional pathogens of mammalian hosts (MTBC)80. Moreover,
a recent work performed by Chiner-Oms and collaborators (2019) found evidences of recombination between
theMTBC ancestor and M. canetti ancestor (before diverging to M. canettii), thus proposing the existence of
recombination potential before the diversication of MTBC into dierent ecotypes71. So, eorts to expand
this topic across all MTBC ecotypes should continue in the future. In this work, we excluded recombination
in genomes from the African clonal complexes, nevertheless, a broader sample dataset would be necessary to
accurately address the dierences amongst clonal complexes members.
Following, the comparative genomic analyses performed in a smaller group of genomes representative of
the M. bovis population from an endemic TB scenario in Portugal suggested that genes included in the “lipid
metabolism, “cell wall and cell processes” and “intermediary metabolism and respiration” categories have a
superior importance in M. bovis evolution and a global positive selection force was suggested to be acting upon
this population, as informed by the elevated dN/dS ratio9,10.
Finally, this work reinforces the value of WGS as a high-resolution tool for the analysis of M. bovis genomic
diversity and provides insights into the role of recombination and positive selection as evolutionary driving forces
in a pathogen aecting a large range of host species, with economical and biodiversity impacts across the world.
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
Data availability
e newly sequencing data included in this work is deposited under the following Biosample accession numbers:
SAMN17004141-SAMN17004143, SAMN17004145- SAMN17004174, SAMN17004176- SAMN17004184 and
under the Bioproject accession number PRJNA682618 at a public domain server in National Centre for Biotech-
nology Information (NCBI) SRA database.
Received: 20 May 2021; Accepted: 27 August 2021
References
1. Brosch, R. et al. Comparative genomics of the mycobacteria. Int. J. Med. Microbiol. 290, 143–152 (2000).
2. Brosch, R. et al. A new evolutionary scenario for the Mycobacterium tuberculosis complex. Proc. Natl. Acad. Sci. U. S. A. 99,
3684–3689 (2002).
3. Reis, A. C., Ramos, B., Pereira, A. C. & Cunha, M. V. Global trends of epidemiological research in livestock tuberculosis for the
last four decades. Transbound. Emerg. Dis. https:// doi. org/ 10. 1111/ tbed. 13763 (2020).
4. Reis, A. C., Ramos, B., Pereira, A. C. & Cunha, M. V. e hard numbers of tuberculosis epidemiology in wildlife: A meta-regression
and systematic review. Transbound. Emerg. Dis. 9, 1–20 (2020).
5. Brites, D. et al. A new phylogenetic framework for the animal-adapted mycobacterium tuberculosis complex. Front. Microbiol. 9,
2820 (2018).
6. Gagneux, S. Ecology and evolution of Mycobacterium tuberculosis. Nat. Rev. Microbiol. 16, 202–213 (2018).
7. Supply, P. et al. Genome analysis of smooth tubercle bacilli provides insights into ancestry and pathoadaptation of the etiologic
agent of tuberculosis. Nat. Genet. 45, 172–179 (2013).
8. Brites, D. & Gagneux, S. e Nature and Evolution of Genomic Diversity in the Mycobacterium tuberculosis Complex. In Strain
Variation in the Mycobacterium tuberculosis Complex: Its Role in Biology, Epidemiology and Control, Advances in Experimental
Medicine and Biology (ed. Gagneux, S.) 1–26 (Springer, New York, 2017). https:// doi. org/ 10. 1007/ 978-3- 319- 64371-7_1.
9. Smith, N. H., Gordon, S. V., de la Rua-Domenech, R., Clion-Hadley, R. S. & Hewinson, R. G. Bottlenecks and broomsticks: e
molecular evolution of Mycobacterium bovis. Nat. Rev. Microbiol. 4, 670–681 (2006).
10. Hershberg, R. et al. High functional diversity in Mycobacterium tuberculosis driven by genetic dri and human demography. PLoS
Biol. 6, e311 (2008).
11. Bottai, D. et al. TbD1 deletion as a driver of the evolutionary success of modern epidemic Mycobacterium tuberculosis lineages.
Nat. Commun. 11, 1–14 (2020).
12. Cos colla, M. et al. Phylogenomics of Mycobacterium africanum reveals a new lineage and a complex evolutionary history. Microb.
Genomics 7, 1–14 (2021).
13. Ngabonziza, J. C. S. et al. A sister lineage of the Mycobacterium tuberculosis complex discovered in the African Great Lakes region.
Nat. Commun. 11, 1–11 (2020).
14. Smith, N. H. et al. Ecotypes of the Mycobacterium tuberculosis complex. J. eor. Biol. 239, 220–225 (2006).
15. Liu, X., Gutacker, M. M., Musser, J. M. & Fu, Y. X. Evidence for recombination in Mycobacterium tuberculosis. J. Bacteriol. 188,
8169–8177 (2006).
16. Ros as-Magallanes, V. et al. Horizontal transfer of a virulence operon to the ancestor of Mycobacterium tuberculosis. Mol. Biol. Evol.
23, 1129–1135 (2006).
17. Gutierrez, M. C. et al. Ancient origin and gene mosaicism of the progenitor of Mycobacterium tuberculosis. PLoS Pathog. 1, e5
(2005).
18. Hughes, A. L., Friedman, R. & Murray, M. Genomewide pattern of synonymous nucleotide substitution in two complete genomes
of Mycobacterium tuberculosis. Emerg. Infect. Dis. 8, 1342–1346 (2002).
19. Gutacker, M. M. et al. Single-nucleotide polymorphism-based population genetic analysis of Mycobacterium tuberculosis strains
from 4 geographic sites. J. Infect. Dis. 193, 121–128 (2006).
20. Namouchi, A., Didelot, X., Schöck, U., Gicquel, B. & Rocha, E. Aer the bottleneck: Genome-wide diversication of the Mycobac-
terium tuberculosis complex by mutation, recombination, and natural selection. Genome Res. 22, 721–734 (2012).
21. Patané, J. S. L. et al. Patterns and processes of Mycobacterium bovis evolution revealed by phylogenomic analyses. Genome Biol.
Evol. 9, 521–535 (2017).
22. Naranjo, V., Gortázar, C., Vicentea, J. & de la Fuente, J. Evidence of the role of European wild boar as a reservoir of Mycobacterium
tuberculosis complex. Vet. Microbiol. 127, 1–9 (2008).
23. Palmer, M. V., acker, T. C., Waters, W. R., Gortázar, C. & Corner, L. A. L. Mycobacterium bovis : A model pathogen at the interface
of livestock, wildlife, and humans. Vet. Med. Int. 2012, 236205 (2012).
24. Corner, L. A. L. e role of wild animal populations in the epidemiology of tuberculosis in domestic animals: How to assess the
risk. Vet. Microbiol. 112, 303–312 (2006).
25. Smith, N. H. et al. European 1: A globally important clonal complex of Mycobacterium bovis. Infect. Genet. Evol. 11, 1340–1351
(2011).
26. Rodriguez-Campos, S. et al. European 2—A clonal complex of Mycobacterium bovis dominant in the Iberian Peninsula. Infect.
Genet. Evol. 12, 866–872 (2012).
27. Berg, S. et al. African 2, a clonal complex of Mycobacterium bovis epidemiologically important in East Africa. J. Bacteriol. 193,
670–678 (2011).
28. Muller, B. et al. African 1, an epidemiologically important clonal complex of Mycobacterium bovis Dominant in Mali, Nigeria,
Cameroon, and Chad. J. Bacteriol. 191, 1951–1960 (2009).
29. Branger, M. et al. e complete genome sequence of Mycobacterium bovis Mb3601, a SB0120 spoligotype strain representative of
a new clonal group. Infect. Genet. Evol. 82, 104309 (2020).
30. Zimpel, C. K. et al. Global distribution and evolution of Mycobacterium bovis lineages. Front. Microbiol. 11, 843 (2020).
31. Reis, A. C., Tenreiro, R., Albuquerque, T., Botelho, A. & Cunha, M. V. Long-term molecular surveillance provides clues on a cattle
origin for Mycobacterium bovis in Portugal. Sci. Rep. 10, 1–18 (2020).
32. Duarte, E. L., Domingos, M., Amado, A., Cunha, M. V. & Botelho, A. MIRU-VNTR typing adds discriminatory value to groups
of Mycobacterium bovis and Mycobacterium caprae strains dened by spoligotyping. Vet. Microbiol. 143, 299–306 (2010).
33. Hauer, A. et al. Genetic evolution of mycobacterium bovis causing tuberculosis in livestock and wildlife in France since 1978. PLoS
One 10, e0117103 (2015).
34. Conceição, E. C. et al. Genetic diversity of Mycobacterium tuberculosis from Pará, Brazil, reveals a higher frequency of ancestral
strains than previously reported in South America. Infect. Genet. Evol. 56, 62–72 (2017).
35. Chihota, V. N. et al. Geospatial distribution of Mycobacterium tuberculosis genotypes in Africa. PLoS ONE 13, 1–18 (2018).
36. Reis, A. C. et al. Whole genome sequencing renes knowledge on the population structure of Mycobacterium bovis from a multi-
host tuberculosis system. Microorganisms. 9, 1585 (2021).
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol.:(0123456789)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
37. Be cq, J. et al. Contribution of horizontally acquired genomic islands to the evolution of the Tubercle Bacilli. Mol. Biol. Evol. 24,
1861–1871 (2007).
38. Veyrier, F., Pletzer, D., Turenne, C. & Behr, M. A. Phylogentic detection of horizontal gene transfer during the step-wise genesis
of Mycobacterium tuberculosis. BMC Evol. Biol. 9, 196 (2009).
39. dos Vultos, T. et al. Evolution and diversity of clonal bacteria: e paradigm of Mycobacterium tuberculosis. PLoS Negl. Trop. Dis.
3, e1538 (2008).
40. Reis, A. C. et al. Phylogenomics Sheds Light on the population structure of Mycobacterium bovis from a multi-host tuberculosis
system. bioRxiv 04.26.441523 (2021). https:// doi. org/ 10. 1101/ 2021. 04. 26. 441523
41. Otchere, I. D. et al. Molecular epidemiology and whole genome sequencing analysis of clinical Mycobacterium bov is from Ghana.
PLoS One 14, e0209395 (2019).
42. Branger, M. et al. Dra genome sequence of Mycobacterium bovis strain D-10-02315 isolated from wild boar. Genome Announc.
4, e01268-e1316 (2016).
43. Orloski, K., Robbe-Austerman, S., Stuber, T., Hench, B. & Schoenbaum, M. Whole genome sequencing of Mycobacterium bovis
isolated from livestock in the United States, 1989–2018. Front. Vet. Sci. 5, 253 (2018).
44. Guimarães, A. M. S. et al. Dra genome sequence of Mycobacterium bovis strain SP38, a pathogenic bacterium isolated from a
bovine in Brazil. Genome Announc. 3 (2015).
45. Kim, N. et al. Complete genome sequence of Mycobacterium bov is clinical strain 1595, isolated from the laryngopharyngeal lymph
node of South Korean cattle. Genome Announc. 3, e01124-e1215 (2015).
46. Zhu, L. et al. Precision methylome characterization of Mycobacterium tuberculosis complex (MTBC) using PacBio single-molecule
real-time (SMRT) technology. Nucleic Acids Res. 44, 730–743 (2016).
47. Wanzala, S. I. et al. Dra genome sequences of Mycobacterium bovis BZ 31150 and Mycobacterium bovis B2 7505, pathogenic
bacteria isolated from archived captive animal bronchial washes and human sputum samples in Uganda. Genome Announc. 3,
e01102-15 (2015). https:// doi. org/ 10. 1128/ genom eA. 01102- 15.
48. Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537, 39–64 (2009).
49. Wick, R. R., Judd, L. M., Gorrie, C. L. & Holt, K. E. Unicycler: Resolving bacterial genome assemblies from short and long sequenc-
ing reads. PLoS Comput. Biol. 13, 1005595 (2017).
50. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A exible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120
(2014).
51. Walker, B. et al. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS
One 9, 112963 (2014).
52. Bankevich, A. et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19,
455–477 (2012).
53. Mckenna, A. et al. e genome analysis toolkit : A MapReduce framework for analyzing next-generation DNA sequencing data
sequencing data. Genome Res. 20, 1297–1303 (2010).
54. Depristo, M. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet.
43, 491–498 (2011).
55. Van der Auwera, G. et al. From FastQ data to high condence variant calls: e Genome Analysis Toolkit best practices pipeline.
Curr. Protoc. Bioinforma. 43, 11.10.1-11.10.33 (2014).
56. orvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): High-performance genomics data visu-
alization and exploration. Brief. Bioinform. 14, 178–192 (2012).
57. Treangen, T. J., Ondov, B. D., Koren, S. & Phillippy, A. M. e Harvest suite for rapid core-genome alignment and visualization of
thousands of intraspecic microbial genomes. Genome Biol. 15, 524 (2014).
58. Miller, M. A., Pfeier, W. & Schwartz, T. Creating the CIPRES science gateway for inference of large phylogenetic trees. In Confer-
ence paper (2010). https:// doi. org/ 10. 1109/ GCE. 2010. 56761 29
59. Huson, D. H. & Bryant, D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254–267 (2006).
60. Croucher, N. J. et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gub-
bins. Nucleic Acids Res. 43, e15 (2015).
61. Martin, D. & Rybicki, E. RDP: Detection of recombination amongst aligned sequences. Bioinformatics 16, 562–563 (2000).
62. Padidam, M., Sawyer, S. & Fauquet, C. M. Possible emergence of new geminiviruses by frequent recombination. Virology 265,
218–225 (1999).
63. Martin, D. P., Posada, D., Crandall, K. A. & Williamson, C. A modied bootscan algorithm for automated identication of recom-
binant sequences and recombination breakpoints. AIDS Res. Hum. Retroviruses 21, 98–102 (2005).
64. Smith, J. M. Analyzing the mosaic structure of genes. J. Mol. Evol. 34, 126–129 (1992).
65. Posada, D. & Crandall, K. A. Evaluation of methods for detecting recombination from DNA sequences: Computer simulations.
Proc. Natl. Acad. Sci. U. S. A. 98, 13757–13762 (2001).
66. Gibbs, M. J., Armstrong, J. S. & Gibbs, A. J. Sister-scanning: A Monte Carlo procedure for assessing signals in recombinant
sequences. Bioinformatics 16, 573–582 (2000).
67. Martin, D. P., Murrell, B., Golden, M., Khoosal, A. & Muhire, B. RDP4: Detection and analysis of recombination patterns in virus
genomes. Virus Evol. 1, vev003 (2015).
68. Devulder, G., de Montclos, M. P. & Flandrois, J. P. A multigene approach to phylogenetic analysis using the genus Mycobacterium
as a model. Int. J. Syst. Evol. Microbiol. 55, 293–302 (2005).
69. Mestre, O. et al. Phylogeny of Mycobacterium tuberculosis Beijing strains constructed from polymorphisms in genes involved in
DNA replication. Recombination and Repair. PLoS One 6, e16020 (2011).
70. Godfroid, M., Dagan, T. & Kupczok, A. Recombination signal in Mycobacterium tuberculosis stems from reference-guided assem-
blies and alignment artefacts. Genome Biol. Evol. 10, 1920–1926 (2018).
71. Chiner-Oms, et al. Genomic determinants of speciation and spread of the Mycobacterium tuberculosis complex. Sci. Adv. 5,
eaaw3307 (2019).
72. Didelot, X. & Maiden, M. C. J. Impact of recombination on bacterial evolution. Trends Microbiol. 18, 315–322 (2010).
73. Hadeld, J. et al. Comprehensive global genome dynamics of Chlamydia trachomatis show ancient diversication followed by
contemporary mixing and recent lineage expansion. Genome Res. 27, 1220–1229 (2017).
74. Matsunaga, I. et al. Mycobacterium tuberculosis pks12 produces a novel polyketide presented by CD1c to T cells. J. Exp. Med. 200,
1559–1569 (2004).
75. R ousseau, C. et al. Virulence attenuation of two Mas-like polyketide synthase mutants of Mycobacterium tuberculosis. Microbiology
149, 1837–1847 (2003).
76. Matsunaga, I. & Sugita, M. Mycoketide: A CD1c-presented antigen with important implications in mycobacterial infection. Clin.
Dev. Immunol. 2012, 981821 (2012).
77. Pepperell, C. et al. Bacterial genetic signatures of human social phenomena among M. tuberculosis from an aboriginal Canadian
population. Mol. Biol. Evol. 27, 427–440 (2010). https:// doi. org/ 10. 1093/ molbev/ msp261.
78. Yang, T. et al. Pan-genomic study of Mycobacterium tuberculosis reecting the primary/ secondary genes, generality/ individuality,
and the interconversion through copy number variations. Front. Microbiol. 9, 1886 (2018).
79. Vernikos, G., Medini, D., Riley, D. R. & Tettelin, H. T. years of pan-genome analyses. Curr. Opin. Microbiol. 23, 148–154 (2015).
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Vol:.(1234567890)
Scientic Reports | (2021) 11:18789 | 
www.nature.com/scientificreports/
80. Boritsch, E. C. et al. pks5-recombination-mediated surface remodelling in Mycobacterium tuberculosis emergence. Nat. Microbiol.
1, 15019 (2016). https:// doi. org/ 10. 1038/ nmicr obiol. 2015. 19.
Acknowledgements
is work was funded by Fundação para a Ciência e a Tecnologia, IP (FCT) / MCTES through national funds
(PIDDAC) and co-funded by the European Regional Development Fund (FEDER) of the European Union,
through the Lisbon Regional Operational Program and the Competitiveness and Internationalization Operational
Program for Portugal 2020 or other programs that may succeed (project ‘Colossus: Control Of tubercuLOsiS at
the wildlife/livestock interface uSing innovative natUre-based Solutions, ref. PTDC/CVT-CVT/29783/2017, LIS-
BOA-01-0145-FEDER-029783, POCI-01-0145-FEDER-029783). Strategic funding to cE3c and BioISI Research
Units (UIDB/00329/2020 and UIDB/04046/2020) from FCT is acknowledged. ACR was supported by FCT
through a doctoral grant (PD/BD/128031/2016).
Author contributions
M.V.C. conceived this work and secured resources and funding. A.C.R. performed the bioinformatic analyses
under the guidance of M.V.C. and explored the data under MVC supervision. A.C.R. wrote the rst dra of the
manuscript and M.V.C. critically revised all versions. Both authors approved the nal version.
Competing interests
e authors declare no competing interests.
Additional information
Supplementary Information e online version contains supplementary material available at https:// doi. org/
10. 1038/ s41598- 021- 98226-y.
Correspondence and requests for materials should be addressed to M.V.C.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.
© e Author(s) 2021
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Experimental evidence has shown that while the closely related free-living Mycobacterium canettii could accept donor DNA via homologous recombination, MTBC could not [2]. However, despite this lack of experimental evidence, computational methods sometimes identify recombination in MTBC genomes [4,7,8]. Some of the recombination in MTBC species identified through computational methods stems from poor quality data or unreliable alignments. ...
... Raw M. bovis sequence files from Almaw et al. [10], Loiseau et al. [11], Orloski et al. [12], Reis and Cunha [7], Rodrigues et al. [13], Zwyer et al. [14] and from BioProject PRJNA769553 were downloaded from the NCBI Sequence Read Archive. A full list of downloaded samples can be found in Supplemental File 1. Sequences were assembled using SPAdes genome assembler v3.14.0 [15] in careful mode, and polished using Pilon v1.23 [16]. ...
... Since this resulting alignment is conservative, some potential recombination events may be excised from the alignment before it is fed to Gubbins. Therefore, in addition to the recombination analysis on the core genome alignment, individual gene alignments that had previously been implicated in recombination events in M. bovis [7] were also assessed for recombination using fastGEAR [32]. The individual gene alignments were created using the built-in MAFFT aligner included in Panaroo [22,33]. ...
Article
The increased accessibility of next generation sequencing has allowed enough genomes from a given bacterial species to be sequenced to describe the distribution of genes in the pangenome, without limiting analyses to genes present in reference strains. Although some taxa have thousands of whole genome sequences available on public databases, most genomes were sequenced with short read technology, resulting in incomplete assemblies. Studying pangenomes could lead to important insights into adaptation, pathogenicity, or molecular epidemiology, however given the known information loss inherent in analyzing contig-level assemblies, these inferences may be biased or inaccurate. In this study we describe the pangenome of a clonally evolving pathogen, Mycobacterium bovis , and examine the utility of gene content variation in M. bovis outbreak investigation. We constructed the M. bovis pangenome using 1463 de novo assembled genomes. We tested the assumption of strict clonal evolution by studying evidence of recombination in core genes and analyzing the distribution of accessory genes among core monophyletic groups. To determine if gene content variation could be utilized in outbreak investigation, we carefully examined accessory genes detected in a well described M. bovis outbreak in Minnesota. We found significant errors in accessory gene classification. After accounting for these errors, we show that M. bovis has a much smaller accessory genome than previously described and provide evidence supporting ongoing clonal evolution and a closed pangenome, with little gene content variation generated over outbreaks. We also identified frameshift mutations in multiple genes, including a mutation in glpK , which has recently been associated with antibiotic tolerance in Mycobacterium tuberculosis . A pangenomic approach enables a more comprehensive analysis of genome dynamics than is possible with reference-based approaches; however, without critical evaluation of accessory gene content, inferences of transmission patterns employing these loci could be misguided.
Article
Full-text available
Mycobacterium bovis, a bacterial zoonotic pathogen responsible for the economically and agriculturally important livestock disease bovine tuberculosis (bTB), infects a broad mammalian host range worldwide. This characteristic has led to bidirectional transmission events between livestock and wildlife species as well as the formation of wildlife reservoirs, impacting the success of bTB control measures. Next Generation Sequencing (NGS) has transformed our ability to understand disease transmission events by tracking variant sites, however the genomic signatures related to host adaptation following spillover, alongside the role of other genomic factors in the M. bovis transmission process are understudied problems. We analyzed publicly available M. bovis datasets collected from 700 hosts across three countries with bTB endemic regions (United Kingdom, United States, and New Zealand) to investigate if genomic regions with high SNP density and/or selective sweep sites play a role in Mycobacterium bovis adaptation to new environments (e.g., at the host-species, geographical, and/or sub-population levels). A simulated M. bovis alignment was created to generate null distributions for defining genomic regions with high SNP counts and regions with selective sweeps evidence. Random Forest (RF) models were used to investigate evolutionary metrics within the genomic regions of interest to determine which genomic processes were the best for classifying M. bovis across ecological scales. We identified in the M. bovis genomes 14 and 132 high SNP density and selective sweep regions, respectively. Selective sweep regions were ranked as the most important in classifying M. bovis across the different scales in all RF models. SNP dense regions were found to have high importance in the badger and cattle specific RF models in classifying badger derived isolates from livestock derived ones. Additionally, the genes detected within these genomic regions harbor various pathogenic functions such as virulence and immunogenicity, membrane structure, host survival, and mycobactin production. The results of this study demonstrate how comparative genomics alongside machine learning approaches are useful to investigate further the nature of M. bovis host-pathogen interactions.
Article
Full-text available
Classical molecular analyses of Mycobacterium bovis based on spoligotyping and Variable Number Tandem Repeat (MIRU-VNTR) brought the first insights into the epidemiology of animal tuberculosis (TB) in Portugal, showing high genotypic diversity of circulating strains that mostly cluster within the European 2 clonal complex. Previous surveillance provided valuable information on the prevalence and spatial occurrence of TB and highlighted prevalent genotypes in areas where livestock and wild ungulates are sympatric. However, links at the wildlife–livestock interfaces were established mainly via classical genotype associations. Here, we apply whole genome sequencing (WGS) to cattle, red deer and wild boar isolates to reconstruct the M. bovis population structure in a multi-host, multi-region disease system and to explore links at a fine genomic scale between M. bovis from wildlife hosts and cattle. Whole genome sequences of 44 representative M. bovis isolates, obtained between 2003 and 2015 from three TB hotspots, were compared through single nucleotide polymorphism (SNP) variant calling analyses. Consistent with previous results combining classical genotyping with Bayesian population admixture modelling, SNP-based phylogenies support the branching of this M. bovis population into five genetic clades, three with apparent geographic specificities, as well as the establishment of an SNP catalogue specific to each clade, which may be explored in the future as phylogenetic markers. The core genome alignment of SNPs was integrated within a spatiotemporal metadata framework to further structure this M. bovis population by host species and TB hotspots, providing a baseline for network analyses in different epidemiological and disease control contexts. WGS of M. bovis isolates from Portugal is reported for the first time in this pilot study, refining the spatiotemporal context of TB at the wildlife–livestock interface and providing further support to the key role of red deer and wild boar on disease maintenance. The SNP diversity observed within this dataset supports the natural circulation of M. bovis for a long time period, as well as multiple introduction events of the pathogen in this Iberian multi-host system.
Preprint
Full-text available
Molecular analyses of Mycobacterium bovis based on spoligotyping and Variable Number Tandem Repeat (MIRU-VNTR) brought insights into the epidemiology of animal tuberculosis (TB) in Portugal, showing high genotypic diversity of circulating strains that mostly cluster within the European 2 clonal complex. The genetic relatedness of M. bovis isolates from cattle and wildlife have also suggested sustained transmission within this multi-host system. However, while previous surveillance highlighted prevalent genotypes in areas where livestock and wild ungulates are sympatric and provided valuable information on the prevalence and spatial occurrence of TB, links at the wildlife-livestock interfaces were established mainly via genotype associations. Therefore, evidence at a local fine scale of transmission events linking wildlife hosts and cattle remains lacking. Here, we explore the advantages of whole genome sequencing (WGS) applied to cattle, red deer and wild boar isolates to reconstruct the evolutionary dynamics of M. bovis and to identify putative pathogen transmission events. Whole genome sequences of 44 representative M. bovis isolates, obtained between 2003 and 2015 from three TB hotspots, were compared through single nucleotide polymorphism (SNP) variant calling analyses. Consistent with previous results combining classical genotyping with Bayesian population admixture modelling, SNP-based phylogenies support the branching of this M. bovis population into five genetic clades, three with geographic specificities, as well as the establishment of a SNPs catalogue specific to each clade, which may be explored in the future as phylogenetic markers. The core genome alignment of SNPs was integrated within a spatiotemporal metadata framework to reconstruct transmission networks, which together with inferred secondary cases, further structured this M. bovis population by host species and geographic location. WGS of M. bovis isolates from Portugal is reported for the first time, refining the spatiotemporal context of transmission events and providing further support to the key role of red deer and wild boar on the persistence of animal TB in this Iberian multi-host system.
Article
Full-text available
Human tuberculosis (TB) is caused by members of the Mycobacterium tuberculosis complex (MTBC). The MTBC comprises several human-adapted lineages known as M. tuberculosis sensu stricto , as well as two lineages (L5 and L6) traditionally referred to as Mycobacterium africanum . Strains of L5 and L6 are largely limited to West Africa for reasons unknown, and little is known of their genomic diversity, phylogeography and evolution. Here, we analysed the genomes of 350 L5 and 320 L6 strains, isolated from patients from 21 African countries, plus 5 related genomes that had not been classified into any of the known MTBC lineages. Our population genomic and phylogeographical analyses showed that the unclassified genomes belonged to a new group that we propose to name MTBC lineage 9 (L9). While the most likely ancestral distribution of L9 was predicted to be East Africa, the most likely ancestral distribution for both L5 and L6 was the Eastern part of West Africa. Moreover, we found important differences between L5 and L6 strains with respect to their phylogeographical substructure and genetic diversity. Finally, we could not confirm the previous association of drug-resistance markers with lineage and sublineages. Instead, our results indicate that the association of drug resistance with lineage is most likely driven by sample bias or geography. In conclusion, our study sheds new light onto the genomic diversity and evolutionary history of M. africanum , and highlights the need to consider the particularities of each MTBC lineage for understanding the ecology and epidemiology of TB in Africa and globally.
Article
Full-text available
Animal tuberculosis (TB), caused by Mycobacterium bovis, is maintained in Portugal in a multi-host system, with cattle, red deer and wild boar, playing a central role. However, the ecological processes driving transmission are not understood. The main aim of this study was thus to contribute to the reconstruction of the spatiotemporal history of animal TB and to refine knowledge on M. bovis population structure in order to inform novel intervention strategies. A collection of 948 M. bovis isolates obtained during long-term surveillance (2002-2016, 15 years) of cattle (n = 384), red deer (n = 303) and wild boar (n = 261), from the main TB hotspot areas, was characterized by spoligotyping and 8 to 12-loci MIRU-VNTR. Spoligotyping identified 64 profiles and MIRU-VNTR distinguished 2 to 36 subtypes within each spoligotype, enabling differentiation of mixed or clonal populations. Common genotypic profiles within and among livestock and wildlife in the same spatiotemporal context highlighted epidemiological links across hosts and regions, as for example the SB0119-M205 genotype shared by cattle in Beja district or SB0121-M34 shared by the three hosts in Castelo Branco and Beja districts. These genomic data, together with metadata, were integrated in a Bayesian inference framework, identifying five ancestral M. bovis populations. The phylogeographic segregation of M. bovis in specific areas of Portugal where the disease persists locally is postulated. Concurrently, robust statistics indicates an association of the most probable ancient population with cattle and Beja, providing a clue on the origin of animal TB epidemics. This relationship was further confirmed through a multinomial probability model that assessed the influence of host species on spatiotemporal clustering. Two significant clusters were identified, one that persisted between 2004 and 2010, in Beja district, with Barrancos county at the centre, overlapping the central TB core area of the Iberian Peninsula, and highlighting a significant higher risk associated to cattle. The second cluster was predominant in the 2012-2016 period, holding the county Rosmaninhal at the centre, in Castelo Branco district, for which wild boar contributed the most in relative risk. These results provide novel quantitative insights beyond empirical perceptions, that may inform adaptive TB control choices in different regions.
Article
Full-text available
The human- and animal-adapted lineages of the Mycobacterium tuberculosis complex (MTBC) are thought to have expanded from a common progenitor in Africa. However, the molecular events that accompanied this emergence remain largely unknown. Here, we describe two MTBC strains isolated from patients with multidrug resistant tuberculosis, representing an as-yet-unknown lineage, named Lineage 8 (L8), seemingly restricted to the African Great Lakes region. Using genome-based phylogenetic reconstruction, we show that L8 is a sister clade to the known MTBC lineages. Comparison with other complete mycobacterial genomes indicate that the divergence of L8 preceded the loss of the cobF genome region - involved in the cobalamin/vitamin B12 synthesis - and gene interruptions in a subsequent common ancestor shared by all other known MTBC lineages. This discovery further supports an East African origin for the MTBC and provides additional molecular clues on the ancestral genome reduction associated with adaptation to a pathogenic lifestyle. The human- and animal-adapted lineages of the Mycobacterium tuberculosis complex (MTBC) are thought to be evolved from a common progenitor in Africa. Here, the authors identify two MTBC strains isolated from patients with multidrug-resistant tuberculosis, representing an as-yet-unknown lineage further supporting an East African origin for the MTBC.
Article
Full-text available
Mycobacterium bovis is the main causative agent of zoonotic tuberculosis in humans and frequently devastates livestock and wildlife worldwide. Previous studies suggested the existence of genetic groups of M. bovis strains based on limited DNA markers (a.k.a. clonal complexes), and the evolution and ecology of this pathogen has been only marginally explored at the global level. We have screened over 2,600 publicly available M. bovis genomes and newly sequenced four wildlife M. bovis strains, gathering 1,969 genomes from 23 countries and at least 24 host species, including humans, to complete a phylogenomic analyses. We propose the existence of four distinct global lineages of M. bovis (Lb1, Lb2, Lb3, and Lb4) underlying the current disease distribution. These lineages are not fully represented by clonal complexes and are dispersed based on geographic location rather than host species. Our data divergence analysis agreed with previous studies reporting independent archeological data of ancient M. bovis (South Siberian infected skeletons at ∼2,000 years before present) and indicates that extant M. bovis originated between 715 and 3,556 years BP, with later emergence in the New World and Oceania, likely influenced by trades among countries.
Article
Full-text available
Mycobacterium tuberculosis (Mtb) strains are classified into different phylogenetic lineages (L), three of which (L2/L3/L4) emerged from a common progenitor after the loss of the MmpS6/MmpL6-encoding Mtb-specific deletion 1 region (TbD1). These TbD1-deleted “modern” lineages are responsible for globally-spread tuberculosis epidemics, whereas TbD1-intact “ancestral” lineages tend to be restricted to specific geographical areas, such as South India and South East Asia (L1) or East Africa (L7). By constructing and characterizing a panel of recombinant TbD1-knock-in and knock-out strains and comparison with clinical isolates, here we show that deletion of TbD1 confers to Mtb a significant increase in resistance to oxidative stress and hypoxia, which correlates with enhanced virulence in selected cellular, guinea pig and C3HeB/FeJ mouse infection models, the latter two mirroring in part the development of hypoxic granulomas in human disease progression. Our results suggest that loss of TbD1 at the origin of the L2/L3/L4 Mtb lineages was a key driver for their global epidemic spread and outstanding evolutionary success. Mycobacterium tuberculosis (Mtb) modern strains emerged from a common progenitor after the loss of Mtb-specific deletion 1 region (TbD1). Here, the authors show that deletion of TbD1 correlates with enhanced Mtb virulence in animal models, mirroring the development of hypoxic granulomas in human disease progression.
Article
Tuberculosis (TB) is a widespread disease that crosses the human and animal health boundaries, with infection being reported in wildlife, from temperate and subtropical to arctic regions. Often, TB in wild species is closely associated with disease occurrence in livestock but the TB burden in wildlife remains poorly quantified on a global level. Through meta‐regression and systematic review, this study aimed to summarize global information on TB prevalence in commonly infected wildlife species and to draw a global picture of the scientific knowledge accumulated in wildlife TB. For these purposes, a literature search was conducted through the Web of Science and Google Scholar. The 223 articles retrieved, concerning a 39‐year period, were submitted to bibliometric analysis and 54 publications, regarding three wildlife hosts, fulfilled the criteria for meta‐regression. Using a random‐effects model, the worldwide pooled TB prevalence in wild boar is higher than for any other species and estimated as 21.98%, peaking in Spain (31.68%), Italy (23.84%), and Hungary (18.12%). The pooled prevalence of TB in red deer is estimated at 13.71%, with Austria (31.58%), Portugal (27.75%), New Zealand (19.26%), and Spain (12.08%) positioning on the top, while for European badger it was computed 11.75%, peaking in the UK (16.43%) and Ireland (22.87%). Despite these hard numbers, a declining trend in wildlife TB prevalence is observed over the last decades. The overall heterogeneity calculated by multivariable regression ranged from 28.61% (wild boar) to 60.92% (red deer), indicating that other unexplored moderators could explain disease burden. The systematic review shows that the most prolific countries contributing to knowledge related with wildlife TB are settled in Europe and Mycobacterium bovis is the most reported pathogen (89.5%). This study provides insight into the global epidemiology of wildlife TB, ascertaining research gaps that need to be explored and informing how should surveillance be refined.
Article
Animal tuberculosis (TB) caused by Mycobacterium tuberculosis complex (MTC) bacteria remains as one of the most significant infectious diseases of livestock, despite decades of eradication programs and research efforts, in an era where the livestock sector is amongst the most important and rapidly expanding commercial agricultural segments worldwide. This work provides a global overview of the spatial and temporal trends of reported scientific knowledge of TB in livestock, aiming to gain insights into research subtopics within the animal TB epidemiology domain and to highlight territorial inequalities regarding data reporting and research outputs over the years. To deliver such information, peer‐reviewed reports of TB studies in livestock were retrieved from the Web of Science and Google Scholar, systematized, and dissected. The validated dataset contained 443 occurrence observations, covering the 1981‐2020 period (39 years). We highlight a clear move towards transdisciplinary areas and the One Health approach, with a global temporal increase of publications combining livestock with wildlife and/or human components, which reflect the importance of non‐prototypical hosts as key to understanding animal TB. It becomes evident that cattle is the main host across works from all continents, however many regions remain poorly surveyed. TB research in livestock in low‐/middle‐income countries is markedly growing, reflecting changes in animal husbandry, but also mirroring the globalization era, with a marked increase in international collaboration and capacitation programs for scientific and technological development. This review gives an overview of the most prolific continents, countries, and research fields in animal TB epidemiology, clearly outlining knowledge gaps and key priority topics. The estimated growth trend of livestock production until 2050, particularly in Asia and Africa, in response to human population growth and animal‐protein demand, will require further investment in early surveillance and adaptive research to accommodate the higher diversity of livestock species and MTC members and raising the possibility to finetune funding schemes.
Article
Mycobacterium bovis strain Mb3601 was isolated from the lymph node of an infected bovine in a bovine tuberculosis highly enzoonotic area of Burgundy, France. It was selected to obtain a complete genome for a new clonal complex, mainly constituted by SB120-spoligotype strains that we propose to name “European 3”. It was recently described as “clonal group I” based on whole-genome SNP analysis of 87 French strains. Here we describe the 4,365,068 bp complete genome obtained by the combination of PacBio and Illumina technologies. This genome of 65.64% G + C content includes 4024 predicted protein-coding genes, 52 tRNA, 3 rRNA and 11 copies of IS6110.