Nikos C Kyrpides’s research while affiliated with Lawrence Berkeley National Laboratory and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (977)


Genomes OnLine Database (GOLD) v.10: new features and updates
  • Article

November 2024

·

16 Reads

Nucleic Acids Research

·

Dimitri Stamatis

·

Cindy Tianqing Li

·

[...]

·

T B K Reddy

The Genomes OnLine Database (GOLD; https://gold.jgi.doe.gov/) at the Department of Energy Joint Genome Institute is a comprehensive online metadata repository designed to catalog and manage information related to (meta)genomic sequence projects. GOLD provides a centralized platform where researchers can access a wide array of metadata from its four organization levels namely Study, Organism/Biosample, Sequencing Project and Analysis Project. GOLD continues to serve as a valuable resource and has seen significant growth and expansion since its inception in 1997. With its expanded role as a collaborative platform, it not only actively imports data from other primary repositories like National Center for Biotechnology Information but also supports contributions from researchers worldwide. This collaborative approach has enriched the database with diverse datasets, creating a more integrated resource to enhance scientific insights. As genomic research becomes increasingly integral to various scientific disciplines, more researchers and institutions are turning to GOLD for their metadata needs. To meet this growing demand, GOLD has expanded by adding diverse metadata fields, intuitive features, advanced search capabilities and enhanced data visualization tools, making it easier for users to find and interpret relevant information. This manuscript provides an update and highlights the new features introduced over the last 2 years.


Lake Mendota sample collection. (A) Lake Mendota is located in Madison, Wisconsin, as indicated by the red dot in the lower right inset. All samples part of this study were collected from the NTL-LTER site located at the center of Lake Mendota (latitude = 43.0995, longitude = −89.4045). (B) Time-series of the 471 samples collected from Lake Mendota between 2000 – 2019. Sampling time points are indicated by black dots by month (x-axis) and year (y-axis), while the total number of samples collected per year is indicated by the horizontal bar plots.
Phylogenetic tree of the bacterial MAGs. Concentric rings moving outward from the tree show the inferred phylum-level taxonomy and estimated level of genome completeness. Red branches indicate MAGs from the coassembly and branches in black represent family-level representative genomes from the GTDB database (release 95). Phyla are named based on IMG/M taxonomic assignment followed by phylogenetic affiliation according to the Genome Taxonomy Database (GTDB) release 95. Branch lengths are shown simplified and not to true scale.
Phylum-level taxonomy and assembly size of the twenty largest MAGs. MAGs are separated by (A) prokaryote and (B) eukaryote taxonomic affiliations.
Viral genome size distribution
Viruses were taxonomically classified at the phylum level and total length per phyla is shown for genome length less than 20,000 kb (A) and genome length greater than 20,000 kb (B).
Coassembly and binning of a twenty-year metagenomic time-series from Lake Mendota
  • Article
  • Full-text available

September 2024

·

128 Reads

Scientific Data

The North Temperate Lakes Long-Term Ecological Research (NTL-LTER) program has been extensively used to improve understanding of how aquatic ecosystems respond to environmental stressors, climate fluctuations, and human activities. Here, we report on the metagenomes of samples collected between 2000 and 2019 from Lake Mendota, a freshwater eutrophic lake within the NTL-LTER site. We utilized the distributed metagenome assembler MetaHipMer to coassemble over 10 terabases (Tbp) of data from 471 individual Illumina-sequenced metagenomes. A total of 95,523,664 contigs were assembled and binned to generate 1,894 non-redundant metagenome-assembled genomes (MAGs) with ≥50% completeness and ≤10% contamination. Phylogenomic analysis revealed that the MAGs were nearly exclusively bacterial, dominated by Pseudomonadota (Proteobacteria, N = 623) and Bacteroidota (N = 321). Nine eukaryotic MAGs were identified by eukCC with six assigned to the phylum Chlorophyta. Additionally, 6,350 high-quality viral sequences were identified by geNomad with the majority classified in the phylum Uroviricota. This expansive coassembled metagenomic dataset provides an unprecedented foundation to advance understanding of microbial communities in freshwater ecosystems and explore temporal ecosystem dynamics.

Download


Fig. 2 | Data description. a, The count of each category across the full dataset. b, Top left: the proportion of samples obtained from IMG/VR versus the number of new samples contributed. Middle: within new samples, we identified 3,613 vOTUs of which 317 clustered with sequences already in IMG/VR. Bottom left:
Fig. 4 | Metabolic potential encoded by the soil virosphere. A cellular diagram depicting portions of the F-type ATPase (map00190), lipopolysaccharide (LPS) biosynthesis pathway (map00540), pentose phosphate pathway (map00030) and vitamin B-and amino acid-related KEGG pathways in the soil virosphere.
A global atlas of soil viruses reveals unexplored biodiversity and potential biogeochemical impacts

June 2024

·

637 Reads

·

7 Citations

Nature Microbiology

Historically neglected by microbial ecologists, soil viruses are now thought to be critical to global biogeochemical cycles. However, our understanding of their global distribution, activities and interactions with the soil microbiome remains limited. Here we present the Global Soil Virus Atlas, a comprehensive dataset compiled from 2,953 previously sequenced soil metagenomes and composed of 616,935 uncultivated viral genomes and 38,508 unique viral operational taxonomic units. Rarefaction curves from the Global Soil Virus Atlas indicate that most soil viral diversity remains unexplored, further underscored by high spatial turnover and low rates of shared viral operational taxonomic units across samples. By examining genes associated with biogeochemical functions, we also demonstrate the viral potential to impact soil carbon and nutrient cycling. This study represents an extensive characterization of soil viral diversity and provides a foundation for developing testable hypotheses regarding the role of the virosphere in the soil microbiome and global biogeochemistry.


Visualizing metagenomic and metatranscriptomic data: A comprehensive review

Computational and Structural Biotechnology Journal

The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.


Novel spore-forming species exhibiting intrinsic resistance to third- and fourth-generation cephalosporins and description of Tigheibacillus jepli gen. nov., sp. nov

March 2024

·

66 Reads

·

2 Citations

A comprehensive microbial surveillance was conducted at NASA’s Mars 2020 spacecraft assembly facility (SAF), where whole-genome sequencing (WGS) of 110 bacterial strains was performed. One isolate, designated 179-BFC-A-HS T , exhibited less than 80% average nucleotide identity (ANI) to known species, suggesting a novel organism. This strain demonstrated high-level resistance [minimum inhibitory concentration (MIC) >256 mg/L] to third-generation cephalosporins, including ceftazidime, cefpodoxime, combination ceftazidime/avibactam, and the fourth-generation cephalosporin cefepime. The results of a comparative genomic analysis revealed that 179-BFC-A-HS T is most closely related to Virgibacillus halophilus 5B73C T , sharing an ANI of 78.7% and a digital DNA-DNA hybridization (dDDH) value of 23.5%, while their 16S rRNA gene sequences shared 97.7% nucleotide identity. Based on these results and the recent recognition that the genus Virgibacillus is polyphyletic, strain 179-BFC-A-HS T is proposed as a novel species of a novel genus, Tigheibacillus jepli gen. nov., sp. nov (type strain 179-BFC-A-HS T = DSM 115946 T = NRRL B-65666 T ), and its closest neighbor, V. halophilus , is proposed to be reassigned to this genus as Tigheibacillus halophilus comb. nov. (type strain 5B73C T = DSM 21623 T = JCM 21758 T = KCTC 13935 T ). It was also necessary to reclassify its second closest neighbor Virgibacillus soli, as a member of a novel genus Paracerasibacillus , reflecting its phylogenetic position relative to the genus Cerasibacillus , for which we propose Paracerasibacillus soli comb. nov. (type strain CC-YMP-6 T = DSM 22952 T = CCM 7714 T ). Within Amphibacillaceae ( n = 64), P. soli exhibited 11 antibiotic resistance genes (ARG), while T. jepli encoded for 3, lacking any known β-lactamases, suggesting resistance from variant penicillin-binding proteins, disrupting cephalosporin efficacy. P. soli was highly resistant to azithromycin (MIC >64 mg/L) yet susceptible to cephalosporins and penicillins. IMPORTANCE The significance of this research extends to understanding microbial survival and adaptation in oligotrophic environments, such as those found in SAF. Whole-genome sequencing of several strains isolated from Mars 2020 mission assembly cleanroom facilities, including the discovery of the novel species Tigheibacillus jepli , highlights the resilience and antimicrobial resistance (AMR) in clinically relevant antibiotic classes of microbes in nutrient-scarce settings. The study also redefines the taxonomic classifications within the Amphibacillaceae family, aligning genetic identities with phylogenetic data. Investigating ARG and virulence factors (VF) across these strains illuminates the microbial capability for resistance under resource-limited conditions while emphasizing the role of human-associated VF in microbial survival, informing sterilization practices and microbial management in similar oligotrophic settings beyond spacecraft assembly cleanrooms such as pharmaceutical and medical industry cleanrooms.


Complete genome sequence of the type strain bacterium Sphaerochaeta associata GLS2T (VKM B-2742T)

February 2024

·

36 Reads

This study reports the complete genome sequence of Sphaerochaeta associata GLS2 T (=VKM B-2742 T =DSM 26261 T), which was isolated from a consortium with methanogenic archaeon Methanosarcina mazei JL01. The consortium was collected from permafrost of the Kolyma lowland in Russia. The hybrid approach, combining paired-end Illumina reads with Oxford Nanopore Technologies MinION reads, was used to assemble the genome. The final assembly resulted in a circular chromosome that is 3,554,903 bp long. This high-quality genome assembly serves as a basis for algorithmic pathway reconstruction and postgenomic analysis. To further this research, the genome was imported into research portals for the algorithmic reconstruction of metabolic pathways, in both common sense (KEGG) and with special attention to carbohydrate metabolism (CAZy). These portals offer high-quality workplaces for in-depth studies.


Whole community shotgun metagenomes of two biological soil crust types from the Mojave Desert

February 2024

·

68 Reads

Microbiology Resource Announcements

We present six whole community shotgun metagenomic sequencing data sets of two types of biological soil crusts sampled at the ecotone of the Mojave Desert and Colorado Desert in California. These data will help us understand the diversity and function of biocrust microbial communities, which are essential for desert ecosystems.


Metatranscriptomes of two biological soil crust types from the Mojave desert in response to wetting

January 2024

·

70 Reads

Microbiology Resource Announcements

We present eight metatranscriptomic datasets of light algal and cyanolichen biological soil crusts from the Mojave Desert in response to wetting. These data will help us understand gene expression patterns in desert biocrust microbial communities after they have been reactivated by the addition of water.


Antiquaquibacter oligotrophicus gen. nov., sp. nov., a novel oligotrophic bacterium from groundwater

December 2023

·

38 Reads

International Journal of Systematic and Evolutionary Microbiology

In this study, a Gram-stain-positive, non-motile, oxidase- and catalase-negative, rod-shaped, bacterial strain (SG_E_30_P1 T ) that formed light yellow colonies was isolated from a groundwater sample of Sztaravoda spring, Hungary. Based on 16S rRNA phylogenetic and phylogenomic analyses, the strain was found to form a distinct linage within the family Microbacteriaceae . Its closest relatives in terms of near full-length 16S rRNA gene sequences are Salinibacterium hongtaonis MH299814 (97.72 % sequence similarity) and Leifsonia psychrotolerans GQ406810 (97.57 %). The novel strain grows optimally at 20–28 °C, at neutral pH and in the presence of NaCl (1–2 w/v%). Strain SG_E_30_P1 T contains MK-7 and B-type peptidoglycan with diaminobutyrate as the diagnostic amino acid. The major cellular fatty acids are anteiso-C 15 : 0 , iso-C 16 : 0 and iso-C 14 : 0 , and the polar lipid profile is composed of diphosphatidylglycerol and phosphatidylglycerol, as well as an unidentified aminoglycolipid, aminophospholipid and some unidentified phospholipids. The assembled draft genome is a contig with a total length of 2 897 968 bp and a DNA G+C content of 65.5 mol%. Amino acid identity values with it closest relatives with sequenced genomes of <62.54 %, as well as other genome distance results, indicate that this bacterium represents a novel genus within the family Microbacteriaceae . We suggest that SG_E_30_P1 T (=DSM 111415 T =NCAIM B.02656 T ) represents the type strain of a novel genus and species for which the name Antiquaquibacter oligotrophicus gen. nov., sp. nov. is proposed.


Citations (73)


... Recent total metagenome-based studies have advanced our understanding of soil viruses, revealing the vast diversity of viral communities and their functional potentials across various soil ecosystems (Graham et al., 2024;Ma et al., 2024). Although viral size fraction metagenomes (viromes) have demonstrated effectiveness over total metagenomes in exploring the virosphere, their application has been limited to small-scale studies (Santos-Medellin et al., 2021). ...

Reference:

Influence of climate on soil viral communities in Australia on a regional scale
A global atlas of soil viruses reveals unexplored biodiversity and potential biogeochemical impacts

Nature Microbiology

... Several studies revealed habitat-specific differences in MGE distribution. For example, soil encodes much more plasmid taxonomic units than humans [8]. Similarly, phages show habitat specificity, with some restricted to particular environments [9]. ...

IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata
  • Citing Article
  • November 2023

Nucleic Acids Research

... By bypassing traditional methods, it has uncovered the vast majority of unculturable microorganisms and revealed new metabolic pathways and bioactive compounds [14], as well as the microbial "dark matter" present in diverse environments across the globe [15][16][17] (Figure 1). Pavlopoulos et al. [18] revealed an immense global diversity of previously uncharacterized proteins in global metagenomes by generating reference-free protein families, identifying over 106,000 novel protein clusters with no similarity to known sequences, thereby doubling the number of known protein families and highlighting vastly untapped functional and structural diversity within microbial "dark matter". Yan et al. [19] established a comprehensive global rumen virome database (RVD), identifying 397,180 viral operational taxonomic units (vOTUs) from 975 rumen metagenomes, revealing the previously unexplored viral "dark matter" of the rumen. ...

Unraveling the functional dark matter through global metagenomics

Nature

... In our catalog, only a small fraction are homologous to reference small protein datasets, with the vast majority of the novel small proteins being found in non-humanassociated habitats (Supplementary Fig. 5b). On the other hand, it encompasses most of the known small proteins in either the RefSeq database or in families discovered recently (NMPfamsDB 61 and FesNov families 28 ). When comparing with small protein databases that focus on eukaryotic organisms, such as smProt2 62 , OpenProt2.0 ...

NMPFamsDB: a database of novel protein families from microbial metagenomes and metatranscriptomes

Nucleic Acids Research

... Latescibacterota and Desulfobacterota are commonly found in low abundance in soils and, notably, both groups happen to harbor bacteria with anaerobic metabolism: anaerobic fermentation in Latescibacterota (Youssef et al., 2015) and Fe(III)-reduction in Geobacteraceae (Megonigal et al., 2003), the most represented family in our study. Crop soils, on the other hand, had a higher relative abundance of Deinococcota (mostly Deinococcaceae), which were more abundant in biocrusts (Wang et al., 2024) and are characterized by their high resistance to UV radiation (Seshadri et al., 2023), consistent with the lower vegetation cover and higher exposure of crop soils to direct sunlight. Prairie restoration also increased the relative abundance of the fungal phylum Glomeromycota [arbuscular mycorrhizal (AM) fungi] (Figure 4), in agreement with AM fungal spore density measurements in these sites and PLFA abundance in other tallgrass prairie restoration studies (Allison et al., 2005;Baer et al., 2015). ...

Comparative Genomics Using the Integrated Microbial Genomes and Microbiomes (IMG/M) System: A Deinococcus Use Case
  • Citing Article
  • June 2023

Journal of the Indian Institute of Science

... CheckV (Table S4) was used for quality control assessment of the phage genomes [32]. Each phage genome was annotated using MetaCerberus (v1.4) [33] using all databases option with Pyrodigal-gv [34] [35]. ...

Identification of mobile genetic elements with geNomad

Nature Biotechnology

... Muchos trabajos referentes al microbioma de diversos ambientes y hospederos se están publicando y se describe como la composición microbiana varía en las diferentes condiciones, en especial esas que generan estrés a los hospederos [2, 29,30]. La cantidad de información relacionada con la diversidad microbiana obtenida a partir de estudios metagenómicos es tan grande que sería imposible de analizar sin las herramientas de bioinformática [31]. ...

Bioinformatics Analysis Tools for Studying Microbiomes at the DOE Joint Genome Institute
  • Citing Article
  • May 2023

Journal of the Indian Institute of Science

... We used geNomad (70) with default parameters to classify contigs as plasmids and viruses. Prophages are noted in geNomad output, and these were excluded from the viral contig counts. ...

You can move, but you can't hide: identification of mobile genetic elements with geNomad

... Tools such as Kraken2 and Centrifuge recognized for processing longer reads provide enhanced capabilities for microbial sequence analysis. The final stage of metatranscriptomic analysis involves coding transcript annotation by tools like InterProScan, BLAST, RefSeq, Diamond, and HMMER (HMM-based), which were further refined by KEGG, COG, and tools like antiS-MASH for detailed annotation (Baltoumas et al. 2023). The set of tools collectively advances the understanding of microbial functional dynamics in diverse environments. ...

Exploring microbial functional biodiversity at the protein family level—From metagenomic sequence reads to annotated protein clusters

Frontiers in Bioinformatics

... Similarity search of nodes on information networks serves as the foundation for numerous data analytics techniques [1][2][3][4][5] and has wide-ranging applications, including online advertising [6], recommendation systems [7], biomedical analysis [8][9][10], spatial-temporal systems [11,12]. Most real-world information networks are heterogeneous information networks (HINs) [13,14], characterized by the coexistence of edges connecting nodes of various types showcases objects and relations within an academic network, with "PAP" (Paper-Author-Paper) and "PVP" (Paper-Venue-Paper) as two example metapaths tailored to specific querying needs, are not predefined but are instead designed to capture user search intentions more precisely, thereby meeting user expectations more effectively. ...

Extreme-Scale Many-against-Many Protein Similarity Search
  • Citing Conference Paper
  • November 2022