Article

Two Methods for Mapping and Visualizing Associated Data on Phylogeny Using Ggtree

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Ggtree is a comprehensive R package for visualizing and annotating phylogenetic trees with associated data. It can also map and visualize associated external data on phylogenies with two general methods. Method 1 allows external data to be mapped on the tree structure and used as visual characteristic in tree and data visualization. Method 2 plots the data with the tree side by side using different geometric functions after re-ordering the data based on the tree structure. These two methods integrate data with phylogeny for further exploration and comparison in the evolutionary biology context. Ggtree is available from http://www.bioconductor.org/packages/ggtree.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The tree which was Frontiers in Microbiology 04 frontiersin.org rooted using the outgroup species S. paradoxus was visualized and annotated using the ggtree (Yu et al., 2018) package in R. FastSTRUCTURE (Version 1.0) (Raj et al., 2014) was used to quantify the number of populations and the degree of admixture in the genomes examined in this study. Owing to the high degree of sequence similarity between the Guinness samples, a single representative sample (IDS1) was used in this analysis; consequently, admixture in 161 genomes was assessed. ...
... Bootstrap branch support was assessed by performing 1,000 pseudo-replicates. Trees were visualized using ggtree (v 3.6.2) (Yu et al., 2018). Sporulation percentage was determined using the ASBC sporulation method (Bilinski and Casey, 1989). ...
Article
Full-text available
Most commercial beers are made using water, malted barley, and hops as the principal ingredients and Saccharomyces yeast as the transforming microorganism. The yeast is used in a semi-conservative process in which crops are collected from one fermentation, stored, and a proportion recycled into a subsequent fermentation. This process differs from wine, cider, and spirit manufacturing where the yeast culture is only used once. The serial fermentation process is continued approximately 8–12 times after which a new culture of verified purity and identity is introduced. This increases the likelihood that the yeast remains true to type. Many commercial brewers use proprietary strains the origins of which are usually unknown. Advances in genetic analyses provide a means for probing the origins of brewing yeast strains, and in this study, six historical Irish brewing yeasts from five breweries located within Ireland were assessed. Using Illumina sequencing technology, whole-genome sequencing data were generated. Single nucleotide polymorphism analysis of these data established that the historical Irish brewing yeast group falls within the previously described “Britain” subpopulation Beer 1 clade. Further analysis established that the six historical Irish brewing yeasts separate into two subgroupings, which associated with specific regional locations. Furthermore, the assessment of the six historical Irish brewing yeast phenotypic attributes relevant to brewing correlated within the same regional location groupings. Our data provide further evidence of how brewing requirements associated with specific beer styles have influenced yeast strain selection.
... This high-throughput automated pipeline employs a combined kmer approach and alignment to define clusters, with a fixed cut-off of 50 SNPs that cannot be customized. Subsequently, we extracted the trees for selected clusters from the portal and visualized them with the ggtree package (v3.8.2) in R [52]. ...
... [58] on the Panaroo output and the SNPs identified via post-read mapping to the reference genome, respectively.. Specifically, we focused on pairwise p values (both worst and best p values) that adjusted for the confounding effect of population structure (lineage effect). The associations were assessed using Scoary (accessible at www.github.com/AdmiralenOla/Scoary). To visualize the tree and associated metadata, we utilized the ggtree package (v3.8.2) in R [52]. ...
Article
Full-text available
Klebsiella pneumoniae is a Gram-negative bacterium associated with a wide range of community- and hospital-acquired infections. The emergence of clonal hypervirulent strains resistant to last-resort antimicrobial agents has become a global concern. The Kingdom of Saudi Arabia (KSA), with its diverse population and high tourism traffic, serves as a platform where the spread of multidrug-resistant (MDR) strains are facilitated. However, the knowledge of epidemiology and population diversity of MDR K. pneumoniae in KSA is scarce. We conducted a comprehensive genomic survey on 352 MDR K. pneumoniae isolates systematically collected from bloodstream and urinary tract infections in 34 hospitals across 15 major cities in KSA during 2022 and 2023. Whole-genome sequencing on the isolates was performed, followed by genomic epidemiology and phylodynamic analysis. Our study revealed a dynamic population characterized by the rapid expansion of several dominant clones, including, ST2096, ST147, and ST231, which were estimated to have emerged within the past decade. These clones exhibited widespread dissemination across hospitals and were genetically linked to global strains, particularly from the Middle East and South Asia. All major clones harboured plasmid-borne ESBLs and carbapenemase genes, with plasmidome analysis identifying multiple IncH, IncA/C and IncL plasmids underlying the MDR-hypervirulent phenotype. These plasmids were shared between major clones and became acquired on the same time scales as the expansion of the dominant clones. Our results report ST2096 as an emerging MDR-hypervirulent clone, emphasizing the need for monitoring of the circulating clones and their plasmid content in the KSA and broader West Asia.
... IQ-TREE v1.6.12 [32] with the TVM+F+ASC+R2 substitution model [33,34] was used to create a maximum likelihood (ML) phylogeny. The phylogenetic tree was visualized and annotated using ggTree [35][36][37] run in RStudio v.4.1 [38]. Statistical analysis of clonal complex (CC) and MLST prevalence was performed using a Monte Carlo estimation of log odds ratio (https://github.com/sayfaldeen/BioinformaticsScripts/blob/main/MC-LOR-comp.py). ...
Article
Full-text available
Staphylococcus aureus (SA) is an opportunistic pathogen and human commensal that is frequently present in the upper respiratory tract, gastrointestinal tract and skin. While SA can cause diseases ranging from minor skin infections to life-threatening bacteraemia, it can also be carried asymptomatically. Indigenous individuals in the Southwest USA experience high rates of invasive SA disease. As carriage is the most significant risk factor for disease, understanding the dynamics of SA carriage, and in particular co-carriage of multiple strains, is important to develop strategies to prevent transmission in vulnerable communities. Here, we investigated SA co-carriage and intrahost evolution by sampling several colonies from multiple anatomical sites and whole-genome sequencing (WGS) on 310 SA isolates collected from 60 Indigenous adults participating in a cross-sectional carriage study. We assessed the richness and diversity of SA isolates via differences in multilocus sequence type, core-genome SNPs and genome content. Using WGS data, we identified 95 distinct SA intra-subject lineages (ISLs) among 60 participants; co-carriage was detected in 42% (25/60). Notably, two participants each carried four distinct SA ISLs. Variation in antibiotic resistance determinants among carried strains was identified among 42% (25/60) of participants. Lastly, we found unequal distribution of clonal complex by body site, suggesting that certain lineages may be adapted to specific anatomical sites. Together, these findings suggest that co-carriage may occur more frequently than previously appreciated and further our understanding of SA intrahost diversity during carriage, which has implications for surveillance activities and epidemiological investigations.
... Gene distribution was visualized using the ComplexHeatmap package in R (Gu et al., 2016). The single-copy core gene phylogenetic tree was generated using Roray v3.11.2 (Page et al., 2015) and then visualized with ggtree v3.2.0 (Yu et al., 2018). ...
Article
Full-text available
Background Multidrug-resistant strains of the genus Aeromonas can produce various β-lactamases that confer resistance to a broad spectrum of β-lactams, which poses a significant public health threat due to their emergence and spread in clinical settings and natural environments. Therefore, a comprehensive investigation into the antibiotic resistance mechanisms of Aeromonas is scientifically significant. Methods Between 2018 and 2021, 78 clinical Aeromonas isolates were collected from human clinical specimens. The MicroScan WalkAway system and average nucleotide identity (ANI) analyses were used to classify the bacterial species. Antibiotic susceptibility was determined through the minimum inhibitory concentration (MIC) test via the agar dilution method. To determine the resistance mechanism and the structure of the resistance gene-related sequences, molecular cloning, whole-genome sequencing and bioinformatic analysis were performed. Results Among the 78 Aeromonas isolates studied in this work, obtained from various specimens from different clinical departments, 77 were classified into seven known species by ANI analysis. Most of the isolates were A. caviae (34.6%, 27/78), followed by A. hydrophila (25.6%, 20/78). Multilocus sequence typing (MLST) revealed that they belonged to 72 sequence types (STs), including 52 new STs. A total of 334 resistance genes of 30 antibiotic resistance genotypes were identified from the genomes, more than half (55.99%, 187/334) of which were β-lactamase genes. The isolates showed much higher rates of resistance to penicillins (penicillin G, 98.7%) and first-generation cephalosporins (cefazolin, 96.2%), but lower resistance rates to fourth-generation cephalosporins (cefepime, 6.4%), monobactams (aztreonam, 5.1%), and carbapenems (imipenem, 1.3% and meropenem, 5.1%). Structural analyses of some β-lactamase genes (such as blaNDM-1 and blaPER-3) related sequences revealed that they were generally associated with mobile genetic elements. Conclusion The investigation of the correlation between the distribution of β-lactamase genes and Aeromonas resistance phenotypes in this study suggested an urgent need for rigorous monitoring and control to counteract the escalating public health threat posed by the increase in Aeromonas strains harboring extended-spectrum β-lactamase and metallo-β-lactamase genes.
... The summarization of 500 stochastic maps was generated through the make.simmap function as implemented in the R packages phytools v.0.7-70 (Revell, 2012) and ggtree (Yu & al., 2017(Yu & al., , 2018Yu, 2020). Phylogeny figures were plotted using the R package ggtree. ...
Article
Full-text available
Spirotropis (Leguminosae, Papilionoideae) is a Neotropical genus of trees that has long remained circumscribed to just one species, S. longifolia. Evidence from previous molecular phylogenetic analyses of nuclear and plastid loci and morphological features supports expanding its circumscription to encompass species from the polyphyletic Clathrotropis s.l. that are widely distributed in the Amazonian forests. Here, we reassess the evolutionary relationships of Spirotropis and Clathrotropis s.l. based on a new plastome-wide phylogenomic analysis of the genistoid legumes. The evolutionary histories of selected morphological characters were estimated through Bayesian stochastic mapping over a robust phylogeny of the Ormosieae clade in order to investigate synapomorphies that define an expanded concept of Spirotropis. The newly circumscribed genus Spirotropis was recovered as a well-supported monophy-letic group comprising five species: the type S. longifolia, three species added from Clathrotropis s.str. (S. nitida, S. paradoxa, S. rosea) and the newly described S. fusca endemic to Venezuela. These species are represented by trees with fragrant flowers and spirally twisted keel petals that grow from tropical lowland rain forests, including seasonally flooded forests, to montane forests. In this study, we present identification keys for the Ormosieae genera and the Spirotropis species, in addition to full taxonomic descriptions , illustrations, maps of geographic distribution, and comments on the morphological distinctiveness, nomenclature, and ecology of each species.
... We plotted all data in R 4.4.1 20 , with the tidyverse 31 , ggprism 32 , and ggtree suite of packages. [33][34][35][36][37] We arranged final figures in Adobe Illustrator. ...
Preprint
Full-text available
Respiratory syncytial virus (RSV) is one of the main seasonal respiratory pathogens in the United States and causes up to 240,000 hospitalizations per year among children under five and adults over 60. RSV is classified into two subtypes, RSV-A and RSV-B. Although several RSV vaccines and a preventative monoclonal antibody, nirsevimab, were recently approved by the FDA, vaccination and prevention measures remain low even among age groups at higher risk for hospitalization. To better understand the epidemiology of RSV, we analyzed RSV-positive nasopharyngeal swabs from Boston Medical Center and its satellite clinics from January to June 2024 using amplicon-based whole-genome sequencing. Of the 59 samples collected, 19 were from children under five years of age, and 17 were from adults over the age of 60. Fifty-four samples sequenced successfully, with over 90% of the genome at a minimum of 20-fold coverage. We found that over 80% of the samples were RSV-B; 48 RSV-B samples and 6 RSV-A samples. This represents a major switch from 2022, when Boston RSV samples were ~90% RSV-A. 45 of 48 RSV-B samples mapped into a single clade (B.D.E.1). These samples do not cluster to a single source within the B.D.E.1 clade, suggesting that the predominance of RSV-B is multifactorial, not the selective expansion of a single variant. We also found examples of identical/highly related genomes among our samples, suggesting clustered transmission. One infant had documented nirsevimab therapy forty days prior to RSV isolation. None of the adults had documented RSV vaccination, and mutations associated with nirsevimab resistance or vaccine escape were not observed. Our work highlights the importance of genomic surveillance for respiratory pathogens as a means to monitor transmission dynamics, such as the unexpected switch from RSV-A to RSV-B subtype dominance, to identify examples of superspreading events, and to understand the epidemiological changes that may be associated with nirsevimab and RSV vaccines.
... Snp-dists software was used to count the number of SNPs between the two described isolates (unpublished)(https://github.com/tseemann/snp-dists). Phylogenetic trees were generated with iqtree and visualized with ggtree [25][26][27][28]. The blast web portal (https://blast.ncbi.nlm.nih. ...
Article
Full-text available
Rationale Hypervirulent Klebsiella pneumoniae (hvKp) infections have principally been identified in Asia. Within a two-month period, two patients between the ages of 30 to 50 years old presented to a tertiary referral hospital in Texas with septic shock, hepatic abscess, and septic thrombophlebitis. Blood cultures were positive for Klebsiella pneumoniae (isolates 2020CK-00441 and 2021CK-00720 respectively). The first patient survived after a prolonged hospital course while the second patient expired. Objectives Describe the clinical presentation of these two patients. Perform whole genome sequencing and bioinformatic analysis to evaluate potential outbreak of specific hvKp bacteria isolates. Methods Whole genome sequencing was performed using both paired-end Illumina MiSeq and nanopore sequencing to obtain a Completed genome for both isolates. Main results 2020CK-00441 belonged to ST23 type while 2021CK-00720 was a ST65 type isolate. Kleborate analyses predicted with high confidence both isolates were hvKp. Phylogenetic analyses showed the two strains are not closely related to each other nor to any known hvKp isolates reported. Both isolates had yersiniabactin, colibactin, aerobactin and salmochelin producing loci which likely confer these isolates hvKp phenotype. 2020CK-00441 and 2021CK-00720 had a unique pK2044 like plasmid. Conclusions HvKp strains capable of causing devastating metastatic septic infections have emerged in Texas. These isolates are unique compared to other hvKp strains globally. Country-wide surveillance and whole genome sequencing of these strains is essential to prevent a major public health emergency in the USA.
... We considered nodes with an ultrafast bootstrap value (UF-Boot) ≥ 95 and an SH-aLRT statistic ≥ 80 as well-supported. Phylogenetic trees were visualised through ggtree v3.6.2 [111,112] in R v4.3.2 [113]. ...
Article
Full-text available
Background Stress responses are key the survival of parasites and, consequently, also the evolutionary success of these organisms. Despite this importance, our understanding of the evolution of molecular pathways dealing with environmental stressors in parasitic animals remains limited. Here, we tested the link between adaptive evolution of parasite stress response genes and their ecological diversity and species richness. We comparatively investigated antioxidant, heat shock, osmoregulatory, and behaviour-related genes (foraging) in two model parasitic flatworm lineages with contrasting ecological diversity, Cichlidogyrus and Kapentagyrus (Platyhelminthes: Monopisthocotyla), through whole-genome sequencing of 11 species followed by in silico exon bait capture as well as phylogenetic and codon analyses. Results We assembled the sequences of 48 stress-related genes and report the first foraging (For) gene orthologs in flatworms. We found duplications of heat shock (Hsp) and oxidative stress genes in Cichlidogyrus compared to Kapentagyrus. We also observed positive selection patterns in genes related to mitochondrial protein import (Hsp) and behaviour (For) in species of Cichlidogyrus infecting East African cichlids—a host lineage under adaptive radiation. These patterns are consistent with a potential adaptation linked to a co-radiation of these parasites and their hosts. Additionally, the absence of cytochrome P450 and kappa and sigma-class glutathione S-transferases in monogenean flatworms is reported, genes considered essential for metazoan life. Conclusions This study potentially identifies the first molecular function linked to a flatworm radiation. Furthermore, the observed gene duplications and positive selection indicate the potentially important role of stress responses for the ecological adaptation of parasite species.
... To facilitate tree data management and visualization in R, we have developed a suite of packages, including ggtree, treeio, tidytree, and ggtreeExtra [19][20][21][22][23]. These packages support the analysis and visualization of phylogenetic placement data. ...
Article
Full-text available
In metabarcoding research, such as taxon identification, phylogenetic placement plays a critical role. However, many existing phylogenetic placement methods lack comprehensive features for downstream analysis and visualization. Visualization tools often ignore placement uncertainty, making it difficult to explore and interpret placement data effectively. To overcome these limitations, we introduce a scalable approach using treeio and ggtree for parsing and visualizing phylogenetic placement data. The treeio‐ggtree method supports placement filtration, uncertainty exploration, and customized visualization. It enhances scalability for large analyses by enabling users to extract subtrees from the full reference tree, focusing on specific samples within a clade. Additionally, this approach provides a clearer representation of phylogenetic placement uncertainty by visualizing associated placement information on the final placement tree.
... [50] for maximum-likelihood inference using traditional bootstrapping with 1000 replicates, and the automated amino acid substitution best-fit model estimator '-m MFP' which selected LG+R10 as the best model. We visualized the resulting tree using Ggtree v3.2.1 and Treeio v1.18.1 R packages [51][52][53], rooted at the midpoint, and nodes were ordered in increasing order. ...
Preprint
Full-text available
Marine SAR116 bacterioplankton are ubiquitous in surface waters across global oceans and form their own order, Puniceispirillales, within the Alphaproteobacteria. To date no comparative physiology among diverse SAR116 isolates has been performed to capture the functional diversity within the clade, and further, diversity through the lens of metabolic potential and environmental preferences via clade-wide pangenomics remains poorly constrained. Using high-throughput dilution-to-extinction cultivation, we isolated and genome sequenced five new and diverse SAR116 isolates from the northern Gulf of Mexico. Here we present a comparative physiological analysis of these SAR116 isolates, along with a pangenomic investigation of the SAR116 clade using a combination of metagenome-assembled genomes (MAGs, n=258), single-amplified genomes (SAGs, n=84), previously existing (n=2), and new isolate genomes (n=5), totaling 349 SAR116 genomes. Phylogenomic investigation supported the division of SAR116 into three distinct subclades, each with additional structure totalling 15 monophyletic groups. Our SAR116 isolates belonged to three groups within subclade I representing distinct genera with different morphologies and varied phenotypic responses to salinity and temperature. Overall, SAR116 genomes encoded differences in vitamin and amino acid synthesis, trace metal transport, and osmolyte synthesis and transport. They also had genetic potential for diverse sulfur oxidation metabolisms, placing SAR116 at the confluence of the organic and inorganic sulfur pools. SAR116 subclades showed distinct patterns in habitat preferences across open ocean, coastal, and estuarine environments, and three of our isolates represented the most abundant coastal and estuarine subclade. This investigation provides the most comprehensive exploration of SAR116 to date anchored by new culture genomes and physiology.
... The fit of each model was compared 368 to a null model of constant rates using Bayes factors from a stepping-stone sampler with one 369 thousand steps each run for fifty thousand generations. In addition to calculating Bayes factors,370 we identified the location, magnitude and support for rate shifts, and visualized branch-specific 371 rates across the phylogeny for the best-supported models, using the R packages BTprocessR 372 (Ferguson-Gow 2020) and ggtree(Yu et al. 2018). ...
Preprint
Full-text available
An ongoing challenge in macroevolutionary research is identifying common drivers of diversification amid the complex interplay of many potentially relevant traits, ecological contexts, and intrinsic characteristics of clades. In this study, we used geometric morphometric and phylogenetic comparative methods to evaluate the tempo and mode of trait evolution in the adaptive radiation of Malagasy vangas and their mainland relatives. The Malagasy radiation is more diverse in both skull and foot shape. However, rather than following the classic “early burst” of diversification, trait evolution accelerated well after their arrival in Madagascar, likely driven by the evolution of new modes of foraging and especially of a few species with highly divergent morphologies. Each anatomical region showed differing evolutionary patterns, and the presence of morphological outliers impacted the results of some analyses, particularly of integration and modularity. Our results demonstrate that the adaptive radiation of Malagasy vangas has evolved exceptional ecomorphological diversity along multiple, independent trait axes, mainly driven by a late expansion in niche space due to key innovations. Our findings highlight the evolution of extreme forms as an overlooked feature of adaptive radiation warranting further study.
... The up-to-date bacterial core gene set (UBCG) consisting of 92 core genes, was used to reconstruct a whole-genome-based phylogenetic tree as described by Na et al. (2018) after the collection of all the whole-genome sequences of closely related strains from NCBI GenBank. All phylogenetic trees were visualized using ggtree in the R platform (Yu et al. 2018). ...
Article
Full-text available
A thermophilic cellulase-producing bacterium, strain HSW-8T, isolated from hot spring waters in South Korea, was subjected to a taxonomic analysis. Cells of strain HSW-8T were gram-stain-negative, facultatively anaerobic, rod-shaped, with optimum growth at 45 °C, pH 7.0, in the presence of 0% (w/v) NaCl. Strain HSW-8T showed the highest 16S rRNA gene sequence similarity to Sinimarinibacterium flocculans NH6-24T (97.52%), followed by Fontimonas thermophila DSM 23609T (96.97%), Solimonas flava CW-KD 4T (95.24%), and Solimonas variicoloris DSM 15731T (95.18%). Based on 16S rRNA phylogeny, strain HSW-8T is phylogenetically closely related to Fontimonas thermophila DSM 23609T and Sinimarinibacterium flocculans DSM 104150T and could be distinguished from the type species based on their phenotypic properties. The genome length of strain HSW-8T was 3.32 Mbp with a 67.33% G + C content. The average nucleotide identity and digital DNA–DNA hybridization values between strain HSW-8T and its closely related type strains were 75.4–83.2 and 20.2–26.2%, respectively. Summed feature 8 (C18:1ω7c and/or C18:1ω6c), C16:0, and iso-C16:0 identified the major fatty acids (> 10%). Phosphatidylglycerol and phosphatidylethanolamine were demonstrated as the major polar lipids while the respiratory quinone is ubiquinone-8. Strain HSW-8T exhibited multiple adaptations for survival at high temperatures, including diverse potential motility mechanisms and toxin-antitoxin systems, as evidenced by both phenotypic characteristics and genomic analysis. Based on genotypic and phenotypic features, strain HSW-8T (= KCTC 92765T = GDMCC 1.4313T) represents a novel Sinimarinibacterium species, in which the name Sinimarinibacterium thermocellulolyticum sp. nov. is proposed.
... To interpret the results of the regression analyses, we plotted the genealogical and functional trees of passerines (Yu et al., 2017(Yu et al., , 2018Yu, 2020) to observe the distribution of cuckoo hosts on the tree map. To verify whether parasitism by cuckoos is phylogenetically conserved, for instance, whether cuckoos prefer to parasitize birds with similar lineages on a certain evolutionary branch, we computed the phylogenetic signal D in a binary trait (Fritz et al., 2010) using the function phylo.d in R package caper and tested whether D differed significantly from random phylogenetic pattern (D = 1) and Brownian phylogenetic pattern (D = 0) (Orme et al., 2018). ...
Article
Full-text available
In the Anthropocene, monitoring and assessing biodiversity and taking conservation measures due to declining biodiversity is an urgent task. However, resource and time constraints make it unfeasible for biodiversity surveys to cover all taxonomic groups; hence, ffnding alternative shortcuts, such as biodiversity indicators, is necessary. We compared the effectiveness of obligate brood parasitic cuckoos as biological indicators with raptors, a previously recognized indicator species, across ecogeographic regions of China based on different components of biodiversity (e.g. taxonomic, phylogenetic, and functional diversity). The results showed that the number of cuckoo species had signiffcant positive correlations with taxonomic diversity (TD), phylogenetic diversity (PD), and functional richness (FRic) of passerines but no signiffcant correlation with evolutionary distinctness (ED) or evolutionary distinct and globally endangered (EDGE). In contrast, raptors showed a signiffcant positive correlation only with EDGE. The greater number of factors associated with cuckoos suggests that they exhibit superior performance as biodiversity indicators compared to raptors. However, this does not undermine the signiffcance of raptors as indicators. Selecting the indicator is context-dependent, with cuckoos being suitable for routine surveys, monitoring overall avian diversity, and raptors being suitable for assessing the status of endangered birds, conducting conservation measures, and measuring their effectiveness. We anticipate cuckoos to gain more public attention as a paradigm of biodiversity indicators and be put into practice.
... An ML phylogeny with 1,000 bootstraps was constructed using RAxML v8.1.20 in a GTR + GAMMA model [52] with P. breweriana serving as the outgroup. Finally, we visualized the phylogenetic tree using R package ggtree v3.6.2 [53]. ...
Article
Full-text available
Background The visual similarities observed across various plant groups often conceal underlying genetic distinctions. This occurrence, known as cryptic diversity, underscores the key importance of identifying and understanding cryptic intraspecific evolutionary lineages in evolutionary ecology and conservation biology. Results In this study, we conducted transcriptome analysis of 81 individuals from 18 natural populations of a northern lineage of Picea brachytyla sensu stricto that is endemic to the Qinghai-Tibet Plateau. Our analysis revealed the presence of two distinct local lineages, emerging approximately 444.8 thousand years ago (kya), within this endangered species. The divergence event aligns well with the geographic and climatic oscillations that occurred across the distributional range during the Mid-Pleistocene epoch. Additionally, we identified numerous environmentally correlated gene variants, as well as many other genes showing signals of positive selection across the genome. These factors likely contributed to the persistence and adaptation of the two distinct local lineages. Conclusions Our findings shed light on the highly dynamic evolutionary processes underlying the remarkably similar phenotypes of the two lineages of this endangered species. Importantly, these results enhance our understanding of the evolutionary past for this and for other endangered species with similar histories, and also provide guidance for the development of conservation plans.
... ML analyses were performed using IQTree v.1.5.5, with automatic model selection by the software (Nguyen et al. 2015). Visualization of phylogenetic trees was accomplished using ggtree (Yu et al. 2018) and iTOL (Letunic and Bork 2019). ...
Article
Full-text available
Understanding metabolic plasticity of animal evolution is a fundamental challenge in evolutionary biology. Owing to the diversification of insect wing morphology and dynamic energy requirements, the molecular adaptation mechanisms underlying the metabolic pathways in wing evolution remain largely unknown. This study reveals the pivotal role of the duplicated Apolipoprotein D (ApoD) gene in lipid and energy homeostasis in the lepidopteran wing. ApoD underwent significant expansion in insects, with gene duplication and consistent retention observed in Lepidoptera. Notably, duplicated ApoD2 was highly expressed in lepidopteran wings and encoded a unique C-terminal tail, conferring distinct ligand-binding properties. Using Bombyx mori as a model organism, we integrated evolutionary analysis, multiomics, and in vivo functional experiments to elucidate the way duplicated ApoD2 mediates lipid trafficking and homeostasis via the AMP-activated protein kinase pathway in wings. Moreover, we revealed the specific expression and functional divergence of duplicated ApoD as a key mechanism regulating lipid homeostasis in the lepidopteran wing. These findings highlight an evolutionary scenario in which neofunctionalization conferred a novel role of ApoD in shaping adaptive lipid metabolic regulatory networks during wing phenotypic evolution. Overall, we provide in vivo evidence for the functional differentiation of duplicate genes in shaping adaptive metabolic regulatory networks during phenotypic evolution.
... Originally, an NJ tree based on the distance matrix was constructed using the PHYLIP package [17] and visualized with the ggtree package [18]. After that, we pruned the SNPs in high levels of pair-wise LD using PLINK v1.90 [16] with the parameter (-indep-pair-wise 50 10 0.2) to perform PCA and ADMIXTURE analysis. ...
Article
Full-text available
Zhaotong pig (ZTP) is a Chinese indigenous pig breed in Yunnan Province, known for its unique body shape and appearance, good meat quality, strong foraging ability, and adaptability. However, there is still a lack of research on its genome. In order to investigate the genetic diversity, population structure, and selection signatures of the breed, we conducted a comprehensive analysis by resequencing on 30 ZTPs and comparing them with genomic data from 10 Asian wild boars (AWBs). A total of 45,514,452 autosomal SNPs were detected in the 40 pigs, and 23,649,650 SNPs were retained for further analysis after filtering. The HE, HO, PN, MAF, π, and Fis values were calculated to evaluate the genetic diversity, and the results showed that ZTPs had higher genetic diversity and lower inbreeding coefficient compared with AWBs. Population structure was analyzed using NJ tree, PCA, ADMIXTURE, and LD methods. It was found that ZTPs were population independent of AWBs and had a lower LD decay compared to AWBs. Moreover, the results of the IBS genetic distance and G matrix showed that most of the individuals had large genetic distances and distant genetic relationships in ZTPs. Selection signatures were detected between ZTPs and AWBs by using two methods, FST and π ratio. Totals of 1104 selected regions and 275 candidate genes were identified. Finally, functional enrichment analysis identified some annotated genes that might affect fat deposition (NPY1R, NPY5R, and NMU), reproduction (COL3A1, COL5A2, GLRB, TAC3, and MAP3K12), growth (STAT6 and SQOR), tooth development (AMBN, ENAM, and ODAM), and immune response (MBL2, IL1A, and DNAJA3). Our results will provide a valuable basis for the future effective protection, breeding, and utilization of ZTPs.
... Trees were midpoint rooted using the R package phytools v.0.7 [28] and rendered with increasing node order using the R package ape v.5.4 [29]. Tree visualisations were created using R packages ggplot2 [22] and ggtree [30,31]. ...
Article
Full-text available
Background Human respiratory syncytial virus (HRSV) is worldwide one of the leading causes of acute respiratory tract infections in young children and the elderly population. Two distinct subtypes of HRSV (A and B) and a multitude of genotypes have been described. The laboratory of Clinical and Epidemiological Virology (KU Leuven/University Hospitals Leuven) has a long-standing history of HRSV surveillance in Belgium. Methods In this study, the seasonal circulation of HRSV in Belgium was monitored during 8 consecutive seasons prior to the SARS-CoV-2 pandemic (2011–2012 until 2018–2019). By use of a multiplex quantitative real time PCR panel, 27,386 respiratory samples were tested for HRSV. Further subtyping and sequencing of the HRSV positive samples was performed by PCR and Sanger sequencing. The prevalence and positivity rate were estimated in 4 distinct age groups and the circulating strains of each subtype were situated in a global context and in reference to the described genotypes in literature. Results HRSV circulated in Belgium in a yearly re-occurring pattern during the winter months and both HRSV subtypes co-circulated simultaneously. All HRSV-B strains contained the 60 nt duplication in the HVR2 region of the G gene. Strains of subtype HRSV-A with a 72 nt duplication in the HVR2 region were first observed during the 2011–2012 season and replaced all other circulating strains from 2014 to 2015 onwards.
... For large hawk-cuckoo call type 2, we only took the acoustic parameters of element 2 into hierarchical clustering analysis, because some blackbird individuals only mimicked this element. Dendrograms were drawn using ggtree function (packages 'ggplot2' , 'ape' , 'ggtree' , 'treeio' , and 'ggdendro') in R [31][32][33][34] . Shapiro-Wilk tests and Levene's tests were then conducted for each acoustic parameter. ...
Article
Full-text available
Some oscine passerines incorporate heterospecific sounds into their repertoires, including vocalizations of other bird species, sounds of other fauna, and even anthropogenic sounds, through vocal mimicry. However, few studies have investigated whether mimics learn heterospecific sounds from model species or from conspecific tutors. Here, we investigate mimicry acquisition using innovation in Cuculidae calls imitated by the Chinese blackbird (Turdus mandarinus). If the mimicry innovation arises and spreads among several neighbors and is not produced by model species, the mimicry must be acquired partially from conspecifics. We found that: (1) Cuculidae calls imitated by blackbirds were reasonably accurate, but with some differences between mimetic and real calls in acoustic structures. (2) We identified four unique mimetic units (mimicry innovation or copy errors), and these units only occurred at certain sites and were shared by several neighbors. In aggregate, frequency parameters (the first principal component) of unique mimetic units were higher than usual mimetic units (p < 0.001). Our findings provide further evidence that mimetic units can be partially learnt from conspecifics based on four cases of unique mimetic units. Our study and approach provide a reference and theoretical basis for the future understanding of social learning and development of vocal mimicry.
... We analyzed the concurrent amino acid mutation motifs existing in each clinical sequence of a single cluster, however distinct from parental strains, aiming to characterize the specific molecular marker for clinical vaccine-homogeneous strains using R v4.1.3 (ggtree and ggmsa packages) 65 . Specifically, concurrent amino acid mutation sites were defined following the criteria: over half of the clinical sequences showed identical mutation with vaccine strain but distinct from vaccine-derived parental strain. ...
Article
Full-text available
Despite a rapid expansion of Porcine reproductive and respiratory syndrome virus (PRRSV) sublineage 8.7 over recent years, very little is known about the patterns of virus evolution, dispersal, and the factors influencing this dispersal. Relying on a national PRRSV surveillance project established over 20 years ago, we expand the available genomic data of sublineage 8.7 from China. We perform independent interlineage and intralineage recombination analyses for the entire study period, which showed a heterogeneous recombination pattern. A series of Bayesian phylogeographic analyses uncover the role of Guangdong as an important infection hub within Asia. The spatial spread of PRRSV is highly linked with a composite of human activities and the heterogeneous provincial distribution of the swine industry, largely propelled by the smaller-scale Chinese rural farming systems in the past years. We sequence all four available modified live vaccines (MLVs) and perform genomic analyses with publicly available data, of which our results suggest a key “leaky” period spanning 2011–2017 with two concurrent amino acid mutations in ORF1a 957 and ORF2 250. Overall, our study provides an in-depth overview of the evolution, transmission dynamics, and potential leaky status of HP-PRRS MLVs, providing critical insights into new MLV development.
... A phylogenetic tree based on the amino acid sequences of the large terminase subunit of various phages was constructed with MEGA 11 (Tamura et al., 2021), using the maximum likelihood method with 1000 bootstrap replicates. The resulting trees were plotted, annotated, and visualized with ggtree (Yu et al., 2018). Domain prediction of phage genes was performed using PfamScan (Madeira et al., 2022). ...
Article
Full-text available
Klebsiella pneumoniae is a common, conditionally pathogenic bacterium that often has a multidrug-resistant phenotype, leading to failure of antibiotic therapies. It can therefore induce serious diseases, including community-acquired pneumonia and bloodstream infections. As an emerging alternative to antibiotics, phages are considered key to solving the problem of drug-resistant bacterial infections. Here, we report a novel phage, pK3–24, that mainly targets ST447 K. pneumoniae. Phage pK3–24 is a T7-like short-tailed phage with a fast adsorption capacity that forms translucent plaques with halos on bacterial lawns. The optimal multiplicity of infection (MOI) is 0.01, and the average burst size is 50 PFU/mL. Phage pK3–24 shows environmental stability, surviving at below 50 °C and at pH values of 6–10. It has a double-stranded DNA genome of 40,327 bp and carries no antibiotic-resistance, virulence, or lysogeny genes. Phylogenetic analysis assigned phage pK3–24 to the genus Przondovirus as a new species. Phage pK3–24 inhibited the production of biofilm. Moreover, treatment with pK3–24 at doses with an MOI > 1 effectively reduced the mortality of Galleria mellonella larvae infected with ST447 K. pneumoniae.
... The bootstrap method (1000 repetitions) was used to check the support rate of each branch 50 . The R package ggtree was used for visualization of evolutionary trees 51 . ...
Article
Full-text available
To determine the role of internal transcribed spacer 2 (ITS2) in the identification of Spatholobus suberectus and explore the genetic diversity of S. suberectus. A total of 292 ITS2s from S. suberectus and 17 other plant species were analysed. S. suberectus was clustered separately in the phylogenetic tree. The genetic distance between species was greater than that within S. suberectus. Synonymous substitution rate (Ks) analysis revealed that ITS2 diverged the most recently within S. suberectus (Ks = 0.0022). These findings suggested that ITS2 is suitable for the identification of S. suberectus. The ITS2s were divided into 8 haplotypes and 4 evolutionary branches on the basis of secondary structure, indicating that there was variation within S. suberectus. Evolutionary analysis revealed that the GC content of paired regions (pGC) was greater than that of unpaired regions (upGC), and the pGC showed a decreasing trend, whereas the upGC remained unchanged. Single-base mutation was the main cause of base pair substitution. In both the initial state and the equilibrium state, the substitution rate of GC was higher than that of AU. The increase in the GC content was partly attributed to GC-biased gene conversion (gBGC). High GC content reflected the high recombination and mutation rates of ITS2, which is the basis for species identification and genetic diversity. We characterized the sequence and structural characteristics of S. suberectus ITS2 in detail, providing a reference and basis for the identification of S. suberectus and its products, as well as the protection and utilization of wild resources.
... For instance, the creation of clusterProfiler initially aimed to compare enrichment analysis results from various cell cycle proteomics data. 53,54 At that time, no tools supported the aggregation and comparison of results under multiple conditions. Similarly, the development of ggtree aimed to integrate and visualize phylogenetic trees and related data for joint presentation from an evolutionary perspective. ...
Article
Full-text available
The bioinformatics software for analyzing biomedical data is essential for converting raw data into meaningful biological insights. In this review, we outline the key stages and considerations in the development of bioinformatics software, using clusterProfiler and CIRCexplorer2 as illustrative examples. Furthermore, we examine some established large-scale life sciences platforms and summarize the design principles in the era of big data and Artificial Intelligence (AI) for open science. Future large-scale platforms are expected to offer graphical programming languages and transition from the sharing of data and codes to that of physical resources. The AI revolution will alter the landscape of bioinformatics software development and redefine the research paradigm of life sciences.
... To analyze ortholog groups among Asteraceae genomes, we first constructed a distance-based phylogenetic tree using JolyTree with default parameters ( 29 ). The tree was saved in Newick format and visualized with the ggtree R package ( 30 ). We selected genomes from 43 Asteraceae species to establish a robust pan-genome based on two criteria: (i) BUSCO completeness scores > 80% ( 31 ) and (ii) the inclusion of RNA-seq data for gene structure prediction. ...
Article
Full-text available
As the largest family of dicotyledon, the Asteraceae family comprises a variety of economically important crops, ornamental plants and numerous medicinal herbs. Advancements in genomics and transcriptomic have revolutionized research in Asteraceae species, generating extensive omics data that necessitate an efficient platform for data integration and analysis. However, existing databases face challenges in mining genes with specific functions and supporting cross-species studies. To address these gaps, we introduce the Asteraceae Multi-omics Information Resource (AMIR; https://yanglab.hzau.edu.cn/AMIR/), a multi-omics hub for the Asteraceae plant community. AMIR integrates diverse omics data from 74 species, encompassing 132 genomes, 4 408 432 genes annotated across seven different perspectives, 3897 transcriptome sequencing samples spanning 131 organs, tissues and stimuli, 42 765 290 unique variants and 15 662 metabolites genes. Leveraging these data, AMIR establishes the first pan-genome, comparative genomics and transcriptome system for the Asteraceae family. Furthermore, AMIR offers user-friendly tools designed to facilitate extensive customized bioinformatics analyses. Two case studies demonstrate AMIR’s capability to provide rapid, reproducible and reliable analysis results. In summary, by integrating multi-omics data of Asteraceae species and developing powerful analytical tools, AMIR significantly advances functional genomics research and contributes to breeding practices of Asteraceae.
... with 10% burn-in removal. Finally, the resulting MCMC phylogenetic tree was visualized using the ggtree R package [100,101]. ...
Article
Full-text available
Citation: Vargas-Bermudez, D.S.; Prandi, B.A.; Souza, U.J.B.d.; Durães-Carvalho, R.; Mogollón, J.D.; Campos, F.S.; Roehe, P.M.; Jaime, J. Molecular Epidemiology and Phyloevolutionary Analysis of Porcine Parvoviruses (PPV1 through PPV7) Detected in Replacement Gilts from Colombia. Int. J. Mol. Sci. 2024, 25, 10354. https://doi. Abstract: Eight porcine parvovirus (PPV) species, designated as PPV1 through PPV8, have been identified in swine. Despite their similarities, knowledge about their distribution and genetic differences remains limited, resulting in a gap in the genetic classification of these viruses. In this study, we conducted a comprehensive analysis using PPV1 to PPV7 genome sequences from Colombia and others available in the GenBank database to propose a classification scheme for all PPVs. Sera from 234 gilts aged 180 to 200 days were collected from 40 herds in Colombia. Individual detection of each PPV (PPV1 through PPV7) was performed using end-point PCR. Complete nucleotide (nt) sequencing was performed on the PPV1 viral protein (VP), and near-complete genome (NCG) sequencing was carried out for novel porcine parvoviruses (nPPVs) (PPV2 through PPV7). Phylogenetic analyses were conducted by comparing PPV1-VP sequences to 94 available sequences and nPPVs with 565 NCG, 846 nPPV-VP, and 667 nPPV-nonstructural protein (NS) sequences. Bayesian phylogenetic analysis was used to estimate substitution rates and the time to the most recent common ancestor for each PPV. The highest prevalence was detected for PPV3 (40.1%), followed by PPV5 (20.5%), PPV6 (17%), PPV1 (14.5%), PPV2 (9.8%), PPV4 (4.2%), and PPV7 (1.3%). Notably, all tested sera were negative for PPV8 genomes. An analysis of the PPV1-VP sequences revealed two main clades (PPV1-I and PPV1-II), with the sequences recovered in this study grouped in the PPV1-II clade. Comparative analysis showed significant genetic distances for PPV2 to PPV7 at the NCG (>6.5%), NS (>6.3%), and VP (>7.5%) regions, particularly when compared to equivalent regions of PPV genomes recovered worldwide. This study highlights the endemic circulation of nPPVs in Colombian pig herds, specifically among gilts. Additionally, it contributes to the phylogenetic classification and evolutionary studies of these viruses. The proposed method aims to categorize and divide subtypes based on current knowledge and the genomes available in databanks.
... The ggtree package in R 4.1.0 was used for visualization and annotation of ML trees (Yu et al. 2018). ...
Article
Full-text available
The classification of severe fever with thrombocytopenia syndrome virus (SFTSV) lacked consistency due to limited virus sequences used across previous studies, and the origin and transmission dynamics of the SFTSV remains not fully understood. In this study, we analyzed the diversity and phylodynamics of SFTSV using the most comprehensive and largest dataset publicly available for a better understanding of SFTSV classification and transmission. A total of 1267 L segments, 1289 M segments, and 1438 S segments collected from China, South Korea, and Japan were included in this study. Maximum likelihood trees were reconstructed to classify the lineages. Discrete phylogeographic analysis was conducted to infer the phylodynamics of SFTSV. We found that the L, M, and S segments were highly conserved, with mean pairwise nucleotide distances of 2.80, 3.36, and 3.35% and could be separated into 16, 13, and 15 lineages, respectively. The evolutionary rate for L, M, and the S segment was 0.61 × 10−4 (95% HPD: 0.48–0.73 × 10−4), 1.31 × 10−4 (95% HPD: 0.77–1.77 × 10−4) and 1.27 × 10−4 (95% HPD: 0.65–1.85 × 10−4) subs/site/year. The SFTSV most likely originated from South Korea around the year of 1617.6 (95% HPD: 1513.1–1724.3), 1700.4 (95% HPD: 1493.7–1814.0), and 1790.1 (95% HPD: 1605.4–1887.2) for L, M, and S segments, respectively. Hubei Province in China played a critical role in the geographical expansion of the SFTSV. The effective population size of SFTSV peaked around 2010 to 2013. We also identified several codons under positive selection in the RdRp, Gn–Gc, and NS genes. By leveraging the largest dataset of SFTSV, our analysis could provide new insights into the evolution and dispersal of SFTSV, which may be beneficial for the control and prevention of severe fever with thrombocytopenia syndrome.
... The R package ggtree (version 3.2.1) and universalmotif (version 1.12.4) were used for tree and motif visualization (Yu et al., 2018). ...
Article
Full-text available
Ethylamine (EA), the precursor of theanine biosynthesis, is synthesized from alanine decarboxylation by alanine decarboxylase (AlaDC) in tea plants. AlaDC evolves from serine decarboxylase (SerDC) through neofunctionalization and has lower catalytic activity. However, lacking structure information hinders the understanding of the evolution of substrate specificity and catalytic activity. In this study, we solved the X-ray crystal structures of AlaDC from Camellia sinensis (CsAlaDC) and SerDC from Arabidopsis thaliana (AtSerDC). Tyr ³⁴¹ of AtSerDC or the corresponding Tyr ³³⁶ of CsAlaDC is essential for their enzymatic activity. Tyr ¹¹¹ of AtSerDC and the corresponding Phe ¹⁰⁶ of CsAlaDC determine their substrate specificity. Both CsAlaDC and AtSerDC have a distinctive zinc finger and have not been identified in any other Group II PLP-dependent amino acid decarboxylases. Based on the structural comparisons, we conducted a mutation screen of CsAlaDC. The results indicated that the mutation of L110F or P114A in the CsAlaDC dimerization interface significantly improved the catalytic activity by 110% and 59%, respectively. Combining a double mutant of CsAlaDC L110F/P114A with theanine synthetase increased theanine production 672% in an in vitro system. This study provides the structural basis for the substrate selectivity and catalytic activity of CsAlaDC and AtSerDC and provides a route to more efficient biosynthesis of theanine.
... We separately analyzed an alignment of the mitochondrial genome ( Fig. S5). We visualized results using the R packages ape, tidyverse, ggtree and associated packages (Paradis & Schliep, 2019;Slowikowski et al., 2024;Vu et al., 2024;Wang et al., 2020;Wickham et al., 2019;Wilke, 2024aWilke, , 2024bYu et al., 2018). ...
Article
Across the tree of life, species have repeatedly evolved similar phenotypes. While well-studied for ecological traits, there is also evidence for recurrent evolution of sexually selected traits. Swordtail fish (Xiphophorus) are a classic model system for studying sexual selection, and female Xiphophorus exhibit strong mate preferences for large male body size and a range of sexually dimorphic ornaments. Interestingly, sexually selected traits have also been lost multiple times in the genus. However, there has been uncertainty over the number of losses of ornamentation and large body size because phylogenetic relationships between species in this group have historically been controversial, partially due to prevalent gene flow. Here, we use whole-genome sequencing approaches to re-examine phylogenetic relationships within a Xiphophorus clade that varies in the presence and absence of sexually selected traits. Using wild-caught individuals, we determine the phylogenetic placement of a small, unornamented species, X. continens, confirming an additional loss of ornamentation and large body size in the clade. With these revised phylogenetic relationships, we analyze evidence for coevolution between body size and other sexually selected traits using phylogenetic comparative methods. These results provide insights into the evolutionary pressures driving the recurrent loss of suites of sexually selected traits.
... Core genome alignments were used in the phylogenetic analysis to focus on the impacts of genomic variation at the species or clade level. Phylogenetic tree visualization and annotation was performed using R v4.2.2 and the ggtree package v3.8.0 [57,58]. Clustering based on phylogenetic distance was performed with TreeCluster v1.0.3 using the length-based clustering method with a threshold branch length of 0.002 [59]. ...
Article
Full-text available
Background Lacticaseibacillus (formerly Lactobacillus) rhamnosus is widely used in probiotics or food supplements to promote microbiome health and may also be part of the normal microbiota of the human gastrointestinal tract. However, it rarely also causes invasive or severe infections in patients. It has been postulated that these infections may originate from probiotics or from endogenous commensal reservoirs. In this report, we examine the population structure of Lacticaseibacillus rhamnosus and investigate the utility of using bacterial genomics to identify the source of invasive Lacticaseibacillus infections. Methods Core genome phylogenetic analysis was performed on 602 L. rhamnosus genome sequences from the National Center for Biotechnology public database. This information was then used along with newly generated sequences of L. rhamnosus isolates from yogurt to investigate a fatal case of L. rhamnosus endocarditis. Results Phylogenetic analysis demonstrated substantial genetic overlap of L. rhamnosus isolates cultured from food, probiotics, infected patients, and colonized individuals. This was applied to a patient who had both consumed yogurt and developed L. rhamnosus endocarditis to attempt to identify the source of his infection. The sequence of the isolate from the patient’s bloodstream differed at only one nucleotide position from one of the yogurt isolates. Both isolates belonged to a clade, identified here as clade YC, composed of mostly gastrointestinal isolates from healthy individuals, some of which also differed by only a single nucleotide change from the patient’s isolate. Conclusions As illustrated by this case, whole genome sequencing may be insufficient to reliably determine the source of invasive infections caused by L. rhamnosus.
... The versions of the scripts (including R scripts) used can be found on Zenondo (The DOI will be provided in the peer-reviewed publication version). For R, the following packages were (Wickham, 2016), ggtree v3.6.2 (Yu, 2020(Yu, , 2022Yu et al., 2018Yu et al., , 2017, grid v4.2.2 (R Core Team, 2022), gridExtra v2.3 (Auguie, 2017), phytools v1.9-16 (Revell, 2012), plyr v1.8.8 (Wickham, 2011), plotly v4.10.1 (Sievert, 2020), readxl v1.4.2 (Wickham and Bryan, 2023), Rsamtools v2.14.0 (Morgan et al., 2022), reshape2 v1.4.4 (Wickham, 2007), svglite v2.1.1 (Wickham et al., 2023b), tidytree v0.4.2 (Yu, 2022), treeio v1.22.0 Yu, 2022), viridisLite v0.4.1 (Garnier et al., 2022), writexl v1.4.2 (Ooms, 2023). ...
Preprint
Full-text available
Virus discovery in mass-reared insects is a growing topic of interest due to outbreak risks and for insect welfare concerns. In the case of black soldier flies (BSF), pioneering bioinformatic studies have uncovered exogenous viruses from the orders Ghabrivirales and Bunyavirales , as well as endogenous viral elements from five virus families. This prompted further virome investigation of BSF metagenomes and metatranscriptomes, including from BSF individuals displaying signs and symptoms of disease. In this study, we describe five newly discovered viruses from the families Dicistroviridae , Iflaviridae , Rhabdoviridae , Solinviviridae , and Inseviridae . These viruses were detected in BSF from multiple origins, outlining a diversity of naturally occurring viruses associated with BSF. This viral community may also include BSF pathogens. The growing list of viruses found in BSF allowed the development of molecular detection tools which could be used for viral surveillance, both in mass-reared and wild populations of BSF.
... Phylogenetic analyses based on the five conserved genes were conducted respectively by using the maximum likelihood method in IQ-Tree v2.2.0.3. All phylogenetic trees were visualized using the ggtree package in R software [32]. purple, proliferating significantly and causing cellular pathology, eventually releasing from the cytoplasm ( Figure 1B). ...
Article
Full-text available
The continual emergence of tick-borne rickettsioses has garnered widespread global attention. Candidatus Rickettsia barbariae (Candidatus R. barbariae), which emerged in Italy in 2008, has been detected in humans from northwestern China. However, the lack of Candidatus R. barbariae genome and isolated strains limits the understanding of its biological characteristics and genomic features. Here, we isolated the Rickettsia for the first time from eggs of Rhipicephalus turanicus in northwestern China, and assembled its whole genome after next-generation sequencing, so we modified the proposed name to Rickettsia barbariae (R. barbariae) to conform to the International Code of Nomenclature of Prokaryotes. Phylogenetic analysis based on the whole genome revealed that it was most closely related to the pathogenic Rickettsia parkeri and Rickettsia africae. All virulence factors, present in the pathogenic spotted fever group rickettsiae, were identified in the R. barbariae isolate. These findings highlight the pathogenic potential of R. barbariae and the necessity for enhanced surveillance of the emerging Rickettsia in the human population.
... The reliability was evaluated by 1000 bootstrap replications. Thereafter, visualization and beautification of the tree files were performed on the R-project platform using the "ggplot2", "treeio" and "ggtree" packages [60,61]. ...
Article
Full-text available
Background SET domain-containing histone lysine methyltransferases (HKMTs) and JmjC domain-containing histone demethylases (JHDMs) are essential for maintaining dynamic changes in histone methylation across parasite development and infection. However, information on the HKMTs and JHDMs in human pathogenic piroplasms, such as Babesia duncani and Babesia microti, and in veterinary important pathogens, including Babesia bigemina, Babesia bovis, Theileria annulata and Theileria parva, is limited. Results A total of 38 putative KMTs and eight JHDMs were identified using a comparative genomics approach. Phylogenetic analysis revealed that the putative KMTs can be divided into eight subgroups, while the JHDMs belong to the JARID subfamily, except for BdJmjC1 (BdWA1_000016) and TpJmjC1 (Tp Muguga_02g00471) which cluster with JmjC domain only subfamily members. The motifs of SET and JmjC domains are highly conserved among piroplasm species. Interspecies collinearity analysis provided insight into the evolutionary duplication events of some SET domain and JmjC domain gene families. Moreover, relative gene expression analysis by RT‒qPCR demonstrated that the putative KMT and JHDM gene families were differentially expressed in different intraerythrocytic developmental stages of B. duncani, suggesting their role in Apicomplexa parasite development. Conclusions Our study provides a theoretical foundation and guidance for understanding the basic characteristics of several important piroplasm KMT and JHDM families and their biological roles in parasite differentiation.
Article
Somatic structural variations (SVs) represent a critical category of genomic mutations in hepatocellular carcinoma (HCC). However, the accurate identification of somatic SVs using short-read high-throughput sequencing (HTS) is challenging. Here, we applied long-read nanopore sequencing and multisite sampling in a cohort of 42 samples from five patients. We discovered a prominent presence of somatic SVs in adjacent nontumor tissues, which significantly differed from somatic single nucleotide variants (SNVs) and copy number variations (CNVs). The types of SVs were markedly different between adjacent nontumor and tumor tissues, with somatic insertions (INSs) and deletions (DELs) serving as early genomic alterations associated with HCC. Notably, hepatitis B virus (HBV) DNA integration frequently resulted in the generation of somatic SVs, particularly inducing interchromosomal translocations. While HBV DNA integration into the liver genome occurs randomly, multisite shared HBV-induced SVs are implicated as early driving events in the pathogenesis of HCC. Long-read RNA sequencing revealed that some HBV-induced SVs impact cancer-associated genes, with translocations being capable of inducing the formation of fusion genes. These findings enhance our understanding of somatic SVs in HCC and their role in early tumorigenesis.
Article
Full-text available
Background The emergence of tick-borne pathogens poses a serious threat to both human and animal health. There remains controversy about virome diversity in relation to tick genus and ecogeographical factors. Results We conducted the meta‑transcriptomic sequencing of 155 pools of ticks encompassing 7 species of 3 genera collected from diverse geographical fauna of Ningxia Province, China. Two species of Dermacentor genus were distributed in the predominantly grassland areas of the central and eastern regions, with the lowest viral diversity. Two species of Hyalomma ticks were found in the predominantly desert areas of the northern regions, with intermediate viral diversity. Three species of Haemaphysalis ticks were concentrated in the predominantly forested areas of the southern regions, exhibiting the highest viral diversity. We assembled 348 viral genomes of 63 species in 14 orders, including 26 novel viruses. The identified viruses were clearly specific to tick genus: 22 virus species were exclusive to Dermacentor, 12 to Hyalomma, and 27 to Haemaphysalis. Conclusions The associations between tick genera and geographical distribution, viral richness, and composition provide new insights into tick-virus interactions, offering clues to identify high-risk regions for different tick-borne viruses. 6kMdfWHdy7oBEdZt-Zs9kPVideo Abstract
Preprint
Full-text available
Plasmodium falciparum Pfs230 and Pfs48/45, part of a core fertilization complex, are leading malaria transmission-blocking vaccine candidates. However, how the two proteins interact is unknown. Here we report a 3.36 Å resolution cryo-electron microscopy structure of the endogenous Pfs230-Pfs48/45 complex. We show that Pfs48/45 interacts with Pfs230 domains 13 and 14, domains that are not included in current Pfs230 vaccine immunogens. Using a transgenic parasite line with a domain 13 to 14 deletion, we show that these domains are essential for Pfs230 localization on the gamete surface. Nanobodies against domains 13 and 14 inhibit Pfs230-Pfs48/45 complex formation, reduce transmission and structural analyses reveal their binding epitopes. Furthermore, domains 13 and 14 are targets of naturally acquired immunity and when delivered as mRNA-LNP vaccines induce potent immune responses. Our comprehensive structural insights on a core P. falciparum fertilization complex will guide the design of novel transmission-blocking vaccine candidates against malaria.
Article
Full-text available
Salmonella enterica serovar Minnesota (S. Minnesota) is an emerging serovar that persists within poultry supply chains, potentially causing outbreaks in humans. Understanding its population genomics is crucial for designing preventive measures. We performed a genomic surveillance study of S. Minnesota by analyzing 259 isolates from poultry in Saudi Arabia. Whole-genome sequencing data for these isolates were analyzed to characterize emerging clones and the genetic factors underlying antimicrobial resistance and virulence. We compared the isolates to all available global genomes of S. Minnesota. Our results revealed the emergence of four clones, three of which were mixed with global strains. These clones exhibited higher levels of antimicrobial resistance and virulence due to the acquisition of multiple plasmids, particularly IncC plasmids, carrying resistance and virulence genes. IncC plasmids underwent genomic rearrangements, presenting diverse configurations of resistance genes. Our findings demonstrate the emergence and persistence of pathogenic and multidrug-resistant S. Minnesota clones.
Article
Humans have a long history of fermenting food and beverages that led to domestication of the baker's yeast, Saccharomyces cerevisiae . Despite their tight companionship with humans, yeast species that are domesticated or pathogenic can also live on trees. Here we used over 300 genomes of S. cerevisiae from oaks and other trees to determine whether tree‐associated populations are genetically distinct from domesticated lineages and estimate the timing of forest lineage divergence. We found populations on trees are highly structured within Europe, Japan, and North America. Approximate estimates of when forest lineages diverged out of Asia and into North America and Europe coincide with the end of the last ice age, the spread of agriculture, and the onset of fermentation by humans. It appears that migration from human‐associated environments to trees is ongoing. Indeed, patterns of ancestry in the genomes of three recent migrants from the trees of North America to Europe could be explained by the human response to the Great French Wine Blight. Our results suggest that human‐assisted migration affects forest populations, albeit rarely. Such migration events may even have shaped the global distribution of S. cerevisiae . Given the potential for lasting impacts due to yeast migration between human and natural environments, it seems important to understand the evolution of human commensals and pathogens in wild niches.
Preprint
Full-text available
Objectives Staphylococcus aureus is a leading cause of hospital acquired infections worldwide. Over recent decades, methicillin resistant Staphylococcus aureus (MRSA), which is resistant to multiple antimicrobials, has emerged as a significant pathogenic strain in both hospital and community settings. The rapid emergence and dissemination of MRSA clones are driven by a dynamic and evolving population, spreading swiftly across regions on epidemiological time scales. Despite the vast geographical expanse and diverse demographics of the Kingdom of Saudi Arabia and the broader West Asia region, the population diversity of MRSA in hospitals in these areas remains underexplored. Methods We conducted a large scale genomic analysis of a systematic Staphylococcus aureus collection obtained from 34 hospitals across all provinces of KSA, from diverse infection sites between 2022 and 2024. The dataset comprised 582 MRSA and 30 methicillin susceptible Staphylococcus aureus (MSSA) isolates, all subjected to whole genome sequencing. A combination of phylogenetic and population genomics approaches was utilized to analyze the genomic data. A hybrid sequencing approach was employed to retrieve the complete plasmid content. Results The population displayed remarkable diversity, comprising 35 distinct sequence types (STs), with the majority harboring community associated SCCmec loci (types IVa, V VII, and VI). Virulence factors associated with community acquired MRSA (CA MRSA), including Panton Valentine Leukocidin (PVL) genes, were identified in 12 distinct STs. Dominant clones, including ST8 t008 (USA300), ST88 t690, ST672 t3841, ST6 t304, and ST5 t311, were associated with infections at various body sites and were widely disseminated across the country. Linezolid and vancomycin resistance were mediated by cfr carrying plasmids and mutations in the vraR gene (involved in cell wall stress response) and the murF gene (peptidoglycan biosynthesis) in five isolates, respectively. Phylodynamic analysis revealed the rapid expansion of the dominant clones, with their emergence estimated to have occurred 10 20 years ago. Plasmidome analysis uncovered a diverse repertoire of blaZ containing plasmids and the sharing of erm(C) encoding plasmids among major clades. The acquisition of plasmids coincided with clonal expansion. Conclusions Our results highlight the recent concurrent expansion and geographical dissemination of CA MRSA clones across hospitals. These findings also underscore the interplay between clonal spread and horizontal gene transfer in shaping the resistance landscape of MRSA.
Article
Bacteria and archaea acquire resistance to genetic parasites by preferentially integrating short fragments of foreign DNA at one end of a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR). "Leader" DNA upstream of CRISPR loci regulates transcription and foreign DNA integration into the CRISPR. Here, we analyze 37,477 CRISPRs from 39,277 bacterial and 556 archaeal genomes to identify conserved sequence motifs in CRISPR leaders. A global analysis of all leader sequences fails to identify universally conserved motifs. However, an analysis of leader sequences that have been grouped by 16S rRNA-based taxonomy and CRISPR subtype reveals 87 specific motifs in type I, II, III, and V CRISPR leaders. Fourteen of these leader motifs have biochemically demonstrated roles in CRISPR biology including integration, transcription, and CRISPR RNA processing. Another 28 motifs are related to DNA binding sites for proteins with functions that are consistent with regulating CRISPR activity. In addition, we show that these leader motifs can be used to improve existing CRISPR detection methods and enhance the accuracy of CRISPR classification.
Article
The Apicomplexa are a phylum of single-celled eukaryotes that can infect humans and include the mosquito-borne parasite Plasmodium, the cause of malaria. Viruses that infect non-Plasmodium spp. disease-causing protozoa affect pathogen life cycle and disease outcomes. However, only one RNA virus (Matryoshka RNA virus 1) has been identified in Plasmodium, and none have been identified in zoonotic Plasmodium species. The rapid expansion of the known RNA virosphere via metagenomic sequencing suggests that this dearth is due to the divergent nature of RNA viruses that infect protozoa. We leveraged newly uncovered data sets to explore the virome of human-infecting Plasmodium species collected in Sabah, east (Borneo) Malaysia. From this, we identified a highly divergent RNA virus in two human-infecting P. knowlesi isolates that is related to the unclassified group ‘ormycoviruses’. By characterising 15 additional ormycoviruses identified in the transcriptomes of arthropods we show that this group of viruses exhibits a complex ecology as non-infecting passengers at the arthropod-mammal interface. With the addition of viral diversity discovered using the artificial intelligence-based analysis of metagenomic data, we also demonstrate that the ormycoviruses are part of a diverse and unclassified viral taxon. This is the first observation of an RNA virus in a zoonotic Plasmodium species. By linking small-scale experimental data to advances in large-scale virus discovery, we characterise the diversity and confirm the putative genomic architecture of an unclassified viral taxon. This approach can be used to further explore the virome of disease-causing the Apicomplexa and better understand how protozoa-infecting viruses may affect parasite fitness, pathobiology, and treatment outcomes.
Article
Clostridioides difficile infection (CDI) is an urgent public health threat with limited preventative options. In this work, we developed a messenger RNA (mRNA)–lipid nanoparticle (LNP) vaccine targeting C. difficile toxins and virulence factors. This multivalent vaccine elicited robust and long-lived systemic and mucosal antigen-specific humoral and cellular immune responses across animal models, independent of changes to the intestinal microbiota. Vaccination protected mice from lethal CDI in both primary and recurrent infection models, and inclusion of non-toxin cellular and spore antigens improved decolonization of toxigenic C. difficile from the gastrointestinal tract. Our studies demonstrate mRNA-LNP vaccine technology as a promising platform for the development of novel C. difficile therapeutics with potential for limiting acute disease and promoting bacterial decolonization.
Article
Full-text available
Ergasilidae is a family of globally distributed copepods parasitizing freshwater fishes. Despite their widespread occurrence, their phylogeographic patterns are poorly understood, specifically in the African Great Lakes. Here, we aim to provide an update on the distribution of Ergasilus kandti, a copepod species infecting Tylochromis polylepis, an endemic cichlid fish species in Lake Tanganyika, and the phylogenetic relationship of African ergasilids. We present the first record of E. kandti parasitizing the gills of T. polylepis in Lake Tanganyika proper, identified through light microscopy and, for the first time for any ergasilid, confocal laser scanning microscopy. We suggest that this technique adds spatial context to characters that are hardly visible while using light microscopy. Phylogenetic analyses based on ribosomal DNA fragments suggest two monophyletic groups of African ergasilids. However, the phylogenetic relationships of Ergasilus remain unresolved, possibly because of the insufficient resolution of these widely used phylogenetic markers and low taxonomic coverage. A comparison of ergasilid mitochondrial genomes highlights traits found in other parasite lineages including genome shrinkage and low evolutionary rates of the cox1 gene. This study presents the most extensive molecular characterization of any ergasilid species to date.
Preprint
Full-text available
Apicomplexa are single-celled eukaryotes that can infect humans and include the mosquito-borne parasite Plasmodium, the cause of malaria. Increasing rates of drug resistance in human-only Plasmodium species are reducing the efficacy of control efforts and antimalarial treatments. There are also rising cases of P. knowlesi, the only zoonotic Plasmodium species that causes severe disease and death in humans. Thus, there is a need to develop additional innovative strategies to combat malaria. Viruses that infect non-Plasmodium spp. disease-causing protozoa have been shown to affect pathogen life cycle and disease outcomes. However, only one virus (Matryoshka RNA virus 1) has been identified in Plasmodium, and none have been identified in zoonotic Plasmodium species. The rapid expansion of the known RNA virosphere using structure- and artificial intelligence-based methods suggests that this dearth is due to the divergent nature of RNA viruses that infect protozoa. We leveraged these newly uncovered data sets to explore the virome of human-infecting Plasmodium species collected in Sabah, east (Borneo) Malaysia. We identified a highly divergent RNA virus in two human-infecting P. knowlesi isolates that is related to the unclassified group "ormycoviruses". By characterising fifteen additional ormycoviruses identified in the transcriptomes of arthropods we show that this group of viruses exhibits a complex ecology at the arthropod-mammal interface. Through the application of artificial intelligence methods, we then demonstrate that the ormycoviruses are part of a diverse and unclassified viral taxon. This is the first observation of an RNA virus in a zoonotic Plasmodium species. By linking small-scale experimental data to large-scale virus discovery advances, we characterise the diversity and genomic architecture of an unclassified viral taxon. This approach should be used to further explore the virome of disease-causing Apicomplexa and better understand how protozoa-infecting viruses may affect parasite fitness, pathobiology, and treatment outcomes.
Article
Full-text available
Burkholderia pseudomallei is the causative agent of melioidosis, a disease highly endemic to Southeast Asia and northern Australia, though the area of endemicity is expanding. Cases may occur in returning travelers or, rarely, from imported contaminated products. Identification of B. pseudomallei is challenging for laboratories that do not see this organism frequently, and misidentifications by matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS) and automated biochemical testing have been reported. The in vitro diagnostic database for use with the Vitek MS has recently been updated to include B. pseudomallei and we aimed to validate the performance for identification in comparison to automated biochemical testing with the Vitek 2 GN card, quantitative real-time polymerase chain reaction (qPCR) targeting the type III secretion system, and capsular polysaccharide antigen detection using a lateral flow immunoassay (LFA). We tested a “derivation” cohort including geographically diverse B. pseudomallei and a range of closely related Burkholderia species, and a prospective “validation” cohort of B. pseudomallei and B. cepacia complex clinical isolates. MALDI-TOF MS had a sensitivity of 1.0 and specificity of 1.0 for the identification and differentiation of B. pseudomallei from related Burkholderia species when a certainty cutoff of 99.9% was used. In contrast, automated biochemical testing for B. pseudomallei identification had a sensitivity of 0.83 and specificity of 0.88. Both qPCR and LFA correctly identified all B. pseudomallei isolates with no false positives. Due to the high level of accuracy, we have now incorporated MALDI-TOF MS into our laboratory’s B. pseudomallei identification workflow. IMPORTANCE Burkholderia pseudomallei causes melioidosis, a disease associated with high morbidity and mortality that disproportionately affects rural areas in Southeast Asia and northern Australia. The known area of endemicity is expanding and now includes the continental United States. Laboratory identification can be challenging which may result in missed or delayed diagnoses and poor patient outcomes. In this study, we compared mass spectrometry using an updated spectral database with multiple other methods for B. pseudomallei identification and found mass spectrometry highly accurate. We have therefore incorporated this fast and cost-effective method into our laboratory’s workflow for B. pseudomallei identification.
Article
Full-text available
The remarkable pace of genomic data generation is rapidly transforming our understanding of life at the micron scale. Yet this data stream also creates challenges for team science. A single microbe can have multiple versions of genome architecture, functional gene annotations, and gene identifiers; additionally, the lack of mechanisms for collating and preserving advances in this knowledge raises barriers to community coalescence around shared datasets. “Digital Microbes” are frameworks for interoperable and reproducible collaborative science through open source, community-curated data packages built on a (pan)genomic foundation. Housed within an integrative software environment, Digital Microbes ensure real-time alignment of research efforts for collaborative teams and facilitate novel scientific insights as new layers of data are added. Here we describe two Digital Microbes: 1) the heterotrophic marine bacterium Ruegeria pomeroyi DSS-3 with > 100 transcriptomic datasets from lab and field studies, and 2) the pangenome of the cosmopolitan marine heterotroph Alteromonas containing 339 genomes. Examples demonstrate how an integrated framework collating public (pan)genome-informed data can generate novel and reproducible findings.
Article
Full-text available
We present an r package, ggtree , which provides programmable visualization and annotation of phylogenetic trees. ggtree can read more tree file formats than other softwares, including newick , nexus , NHX , phylip and jplace formats, and support visualization of phylo, multiphylo, phylo4, phylo4d, obkdata and phyloseq tree objects defined in other r packages. It can also extract the tree/branch/node‐specific and other data from the analysis outputs of beast , epa , hyphy , paml , phylodog , pplacer , r8s , raxml and revbayes software, and allows using these data to annotate the tree. The package allows colouring and annotation of a tree by numerical/categorical node attributes, manipulating a tree by rotating, collapsing and zooming out clades, highlighting user selected clades or operational taxonomic units and exploration of a large tree by zooming into a selected portion. A two‐dimensional tree can be drawn by scaling the tree width based on an attribute of the nodes. A tree can be annotated with an associated numerical matrix (as a heat map), multiple sequence alignment, subplots or silhouette images. The package ggtree is released under the artistic‐2.0 license . The source code and documents are freely available through bioconductor ( http://www.bioconductor.org/packages/ggtree ).
Article
Full-text available
Evolview is an online visualization and management tool for customized and annotated phylogenetic trees. It allows users to visualize phylogenetic trees in various formats, customize the trees through built-in functions and user-supplied datasets and export the customization results to publication-ready figures. Its ‘dataset system’ contains not only the data to be visualized on the tree, but also ‘modifiers’ that control various aspects of the graphical annotation. Evolview is a single-page application (like Gmail); its carefully designed interface allows users to upload, visualize, manipulate and manage trees and datasets all in a single webpage. Developments since the last public release include a modern dataset editor with keyword highlighting functionality, seven newly added types of annotation datasets, collaboration support that allows users to share their trees and datasets and various improvements of the web interface and performance. In addition, we included eleven new ‘Demo’ trees to demonstrate the basic functionalities of Evolview, and five new ‘Showcase’ trees inspired by publications to showcase the power of Evolview in producing publication-ready figures. Evolview is freely available at: http://www.evolgenius.info/evolview/.
Article
Interactive Tree Of Life (iTOL) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. Trees can be interactively pruned and re-rooted. Various types of data such as genome sizes or protein domain repertoires can be mapped onto the tree. Export to several bitmap and vector graphics formats is supported. Availability: iTOL is available at http://itol.embl.de Contact: bork{at}embl.de
ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data
  • G Yu
  • DK Smith
  • H Zhu
  • Y Guan
  • TT-Y. Lam
Yu G, Smith DK, Zhu H, Guan Y, Lam TT-Y. 2017. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol. 8(1):28-36.