About
55
Publications
62,527
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,484
Citations
Introduction
My work examines how soil bacteria/archaea participate in and contribute to soil nutrient cycling and greenhouse gas turnover. Our studies rely heavily on "omics" level characterization of soils, primarily utilizing genome resolved metagenomics to track and predict the functional roles of microbes in the soil environment. We leverage genomic, proteomic, and metabolic data to track the turnover of simple and complex carbon compounds, nitrogen cycling, and sulfur cycling in these environments.
Additional affiliations
Education
August 2009 - June 2015
September 2004 - December 2008
September 2004 - December 2008
Publications
Publications (55)
Significance
Cyanobacteria are increasingly being considered for use in large-scale outdoor production of fuels and industrial chemicals. Cyanobacteria can anticipate daily changes in light availability using an internal circadian clock and rapidly alter their metabolic processes in response to changes light availability. Understanding how signals...
Significance
Cyanobacteria are model organisms for photosynthesis in the laboratory, are key producers of the chemical energy that drives life, and are being developed as biofuel and chemical producers for industry. Despite the importance of these organisms for environmental and biotechnological applications, only a small percentage of cyanobacteri...
Significance
The evolution of photosynthetic cyanobacteria under 24-h cycles of light and darkness selected for a robust circadian clock. Understanding how cyanobacteria integrate circadian clock signals with natural light–dark cycles to control metabolism is critical, because these organisms are central to global carbon cycling and hold promise fo...
Soil microbial activity drives the carbon and nitrogen cycles and is an important determinant of atmospheric trace gas turnover, yet most soils are dominated by microorganisms with unknown metabolic capacities. Even Acidobacteria, among the most abundant bacteria in soil, remain poorly characterized, and functions across groups such as Verrucomicro...
Understanding microbial gene functions relies on the application of experimental genetics in cultured microorganisms. However, the vast majority of bacteria and archaea remain uncultured, precluding the application of traditional genetic methods to these organisms and their interactions. Here, we characterize and validate a generalizable strategy f...
Cyanobacteria are central to biogeochemical cycling, climate change, and eutrophication. While they readily develop associations with environmental microorganisms, the question of whether they consistently recruit specific microbiomes remains unresolved. Here, we established in vitro cyanobacterial consortia by inoculating five different cyanobacte...
Metagenomic or metabarcoding data are often used to predict microbial interactions in complex communities, but these predictions are rarely explored experimentally. Here, we use an organism abundance correlation network to investigate factors that control community organization in mine tailings-derived laboratory microbial consortia grown under doz...
For a deeper and comprehensive understanding of the composition and function of rhizosphere microbiomes, we need to focus at the scale of individual roots in standardized growth containers. Root exudation patterns are known to vary along distinct parts of the root even in juvenile plants giving rise to spatially distinct microbial niches. To addres...
For a deeper and comprehensive understanding of the diversity, composition and function of rhizosphere microbiomes, we need to focus at the scale of individual roots in standardized growth containers. Root exudation patterns are known to vary across distinct parts of the root giving rise to spatially distinct microbial niches. To address this, we a...
Bacteria of the phylum Acidobacteria are one of the most abundant groups across soil ecosystems, yet they are represented by comparatively few sequenced genomes, leaving gaps in our understanding of their metabolic diversity. Recently, genomes of Acidobacteria species with unusually large repertoires of biosynthetic gene clusters (BGCs) were recons...
Metagenomics analysis can be negatively impacted by DNA contamination. While external sources of contamination such as DNA extraction kits have been widely reported and investigated, contamination originating within the study itself remains underreported. Here we applied high-resolution strain-resolved analyses to identify contamination in two larg...
Copper membrane monooxygenases (CuMMOs) play critical roles in the global carbon and nitrogen cycles. Organisms harboring these enzymes perform the first, and rate limiting, step in aerobic oxidation of ammonia, methane, or other simple hydrocarbons. Within archaea, only organisms in the order Nitrososphaerales (Thaumarchaeota) encode CuMMOs, which...
Gut microbiome succession affects infant development. However, it remains unclear what factors promote persistence of initial bacterial colonizers in the developing gut. Here, we perform strain-resolved analyses to compare gut colonization of preterm and full-term infants throughout the first year of life and evaluate associations between strain pe...
Recent studies have demonstrated that drought leads to dramatic, highly conserved shifts in the root microbiome. At present, the molecular mechanisms underlying these responses remain largely uncharacterized. Here we employ genome-resolved metagenomics and comparative genomics to demonstrate that carbohydrate and secondary metabolite transport func...
Background
Biogeochemical exports from watersheds are modulated by the activity of microorganisms that function over micron scales. Here, we tested the hypothesis that meander-bound regions share a core microbiome and exhibit patterns of metabolic potential that broadly predict biogeochemical processes in floodplain soils along a river corridor.
R...
Bacteria of the phylum Acidobacteria are one of the most abundant bacterial across soil ecosystems, yet they are represented by comparatively few sequenced genomes, leaving gaps in our understanding of their metabolic diversity. Recently, genomes of Acidobacteria species with unusually large repertoires of biosynthetic gene clusters (BGCs) were rec...
Copper membrane monooxygenases (CuMMOs) play critical roles in the global carbon and nitrogen cycles. Organisms harboring these enzymes perform the first, and rate limiting, step in aerobic oxidation of ammonia, methane, or other simple hydrocarbons. Within archaea, only organisms in the order Nitrososphaerales (Thaumarchaeota) encode CuMMOs, which...
Gut microbiome succession impacts infant development. However, it remains unclear what factors promote persistence of initial bacterial colonists in the developing gut. Here, we performed strain-resolved metagenomic analyses to compare gut colonization of preterm and full-term infants throughout the first year of life and evaluated links between st...
Knowledge of microbial gene functions comes from manipulating the DNA of individual species in isolation from their natural communities. While this approach to microbial genetics has been foundational, its requirement for culturable microorganisms has left the majority of microbes and their interactions genetically unexplored. Here we describe a ge...
Microbes produce specialized compounds to compete or communicate with one another and their environment. Some of these compounds, such as antibiotics, are also useful in medicine and biotechnology. Historically, most antibiotics have come from soil bacteria which can be isolated and grown in the lab. Though the vast majority of soil bacteria cannot...
Biogeochemical exports of C, N, S and H 2 from watersheds are modulated by the activity of microorganisms that function over micron scales. This disparity of scales presents a substantial challenge for development of predictive models describing watershed function. Here, we tested the hypothesis that meander-bound regions exhibit patterns of microb...
Soil microbial diversity is often studied from the perspective of community composition, but less is known about genetic heterogeneity within species. The relative impacts of clonal interference, gene-specific selection, and recombination in many abundant but rarely cultivated soil microbes remain unknown. Here we track genome-wide population genet...
There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic ana...
Bacterial autotrophs often rely on CO2 concentrating mechanisms (CCMs) to assimilate carbon. Although many CCM proteins have been identified, a systematic screen of the components of CCMs is lacking. Here, we performed a genome-wide barcoded transposon screen to identify essential and CCM-related genes in the γ-proteobacterium Halothiobacillus neap...
Bacteria isolated from soils are major sources of specialized metabolites, including antibiotics and other compounds with clinical value that likely shape interactions among microbial community members and impact biogeochemical cycles. Yet, isolated lineages represent a small fraction of all soil bacterial diversity. It remains unclear how the prod...
Soil microbial diversity is often studied from the perspective of community composition, but less is known about genetic heterogeneity within species and how population structures are affected by dispersal, recombination, and selection. Genomic inferences about population structure can be made using the millions of sequencing reads that are assembl...
Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. We compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test fo...
Cyanobacteria are photosynthetic prokaryotes that are influential in global geochemistry and are promising candidates for industrial applications. Because the livelihood of cyanobacteria is directly dependent upon light, a comprehensive understanding of metabolism in these organisms requires taking into account the effects of day–night transitions...
Cyanobacterial 2-Cys peroxiredoxin (thioredoxin peroxidase, TPX) comprises a family of thiol antioxidant enzymes critically involved in cell survival under oxidative stress. In our previous study, a putative TPX was identified using a proteomics analysis of rice (Oryza sativa L. japonica, OsTPX) seedlings exposed to oxidative stress. This OsTPX gen...
Soil microbial activity drives the carbon and nitrogen cycles and is an important determinant of atmospheric trace gas turnover, yet most soils are dominated by organisms with unknown metabolic capacities. Even Acidobacteria, among the most abundant bacteria in soil, remain poorly characterized, and functions across groups such as Verrucomicrobia,...
Significance
Understanding how photosynthetic bacteria respond to and anticipate natural light–dark cycles is necessary for predictive modeling, bioengineering, and elucidating metabolic strategies for diurnal growth. Here, we identify the genetic components that are important specifically under light–dark cycling conditions and determine how a pro...
In soil ecosystems, microorganisms produce diverse secondary metabolites such as antibiotics, antifungals and siderophores that mediate communication, competition and interactions with other organisms and the environment1,2. Most known antibiotics are derived from a few culturable microbial taxa 3 , and the biosynthetic potential of the vast majori...
The broadly conserved signaling nucleotide cyclic di-adenosine monophosphate (c-di-AMP) is essential for viability in most bacteria where it has been studied. However, characterization of the cellular functions and metabolism of c-di-AMP has largely been confined to the class Bacilli, limiting our functional understanding of the molecule among dive...
Mass spectrometry chromatograms of WT and cdaA mutant extracts.
Each sample was mixed with an internal standard (heavy labeled c-di-AMP), detected as m/z 689 → 146 transition. Biological c-di-AMP was detected through four m/z transitions: 659 → 136 (as a qualifier and quantifier), 659 → 312 (as a qualifier), 659 → 330 (as a qualifier). (A) In WT ex...
Expression of S. elongatus cdaA in E. coli.
The S. elongatus cdaA gene was expressed in a modified version of the IPTG inducible vector pMAL-c2X in DH5α (AM5466). Fold change is shown relative to uninduced vector. Error bars represent SE of two replicates.
(PDF)
Absorbance spectrum of cdaA mutant during a LDC.
Mean absorbance values of the cdaA transposon mutant (8S16-L9) and WT at (A) 0 h, (B) 30 h, and (C) 36 h into an LDC. Absorbance is normalized to OD750 and each value represents the average of four replicates.
(PDF)
Genetic interaction with flv1.
(A) Relative survival of the cdaA single mutant and the cdaA-flv1 double mutant when grown competitively against each other. (B) Relative survival of WT and the flv1 single mutant when grown competitively against each other. In all figure parts survival is determined by spot plates (see Materials and Methods) and erro...
Binding of c-di-AMP by KdpD.
Binding of KdpD (Synpcc7942_1729) expressed in E. coli to c-di-AMP, determined by DRaCALA on cell lysate (see Materials and Methods). Error bars indicate SE of two replicates.
(PDF)
cdaA mutant sensitized genetic interactions.
(XLSX)
Sensitized interaction screen R script.
This file contains the annotated R script for determining sensitized genetic interactions, where two genetic backgrounds within the library are compared under two environmental conditions. Also included are the files on which the script was run to produce sensitized interaction scores for the cdaA mutant: 1)...
Genotyping of the cdaA insertion mutant.
On the left, the location of the transposon insertion mutation conferring Km resistance is shown over a schematic drawn to scale of the gene. On the right, the genotyping gel containing lanes: 1, amplification of cdaA mutant allele (8S16-L9), in which a 1.3 kb insertion is present, with primers surrounding t...
Complementation of the cdaA mutant.
The top panel shows the phenotype, measured by spot plate, of the cdaA mutant (8S16-L9) under constant light and LDCs. The bottom panel shows the phenotype of the cdaA mutant when a WT allele of the cdaA gene is added in trans to neutral site two (using vector AM5253). ***P<10−3.
(PDF)
Assaying rpoD2 for sensitized genetic interaction.
(A) Relative survival of the cdaA single mutant and the cdaA-rpoD2 double mutant when grown competitively against each other in constant light and LDCs. (B) Relative survival of WT and the rpoD2 single mutant when grown competitively against each other in constant light and LDCs. Survival in all fi...
cdaA mutant genetic interactions.
(XLSX)
Interaction screen R script.
This file contains the annotated R script for determining genetic interactions, where two genetic backgrounds within the library are compared. Also included are the files on which the script was run to produce interaction scores for the cdaA mutant: 1) “all.poolcount.txt”, the location and reads of barcoded transposon m...
The recurrent pattern of light and darkness generated by Earth’s axial rotation has profoundly influenced the evolution of organisms, selecting for both biological mechanisms that respond acutely to environmental changes and circadian clocks that program physiology in anticipation of daily variations. The necessity to integrate environmental respon...
Cyanobacteria are important primary producers of organic matter in diverse environments on a global scale. While mechanisms of CO2 fixation are well understood, the distribution of the flow of fixed organic carbon within individual cells and complex microbial communities is less well characterized. To obtain a general overview of metabolism, we des...
Significance
Genome-scale models of metabolism are important tools for metabolic engineering and production strain development. We present an experimentally validated and manually curated model of metabolism in Synechococcus elongatus PCC 7942 that ( i ) leads to discovery of unique metabolic characteristics, such as the importance of a truncated,...
The networks that govern carbon metabolism and control intracellular carbon partitioning in photosynthetic cells are poorly understood. Target of rapamycin (TOR) kinase is a conserved growth regulator that integrates nutrient signals and modulates cell growth in eukaryotes, though the TOR signaling pathway in plants and algae has yet to be complete...
Early research on the cyanobacterial clock focused on characterizing the genes needed to keep, entrain, and convey time within the cell. As the scope of assays used in molecular genetics has expanded to capture systems-level properties (e.g., RNA-seq, ChIP-seq, metabolomics, high-throughput screening of genetic variants), so has our understanding o...
Hemicellulose, the second most abundant plant biomass fraction after cellulose, is widely viewed as a potential substrate for the production of liquid fuels and other value-added materials. Degradation of hemicellulose by filamentous fungi requires production of many different enzymes, which are induced by biopolymers or its derivatives and regulat...
Cell-cell fusion is essential for a variety of developmental steps in many eukaryotic organisms, during both fertilization and vegetative cell growth. Although the molecular mechanisms associated with intracellular membrane fusion are well characterized, the molecular mechanisms of plasma membrane merger between cells are poorly understood. In the...
Questions
Questions (4)
In a metagenomics study I have identified a group of 570 genomes that show one of two properties: they are more abundant at the surface of a soil or they are more abundant deeper in the soil. These genomes are subsequently associated with two categories either Surface (n = 179) or Depth (n = 391).
All genomes were annotated using KEGG which resulted in ~5000 KO functions identified across all of the genomes
I have been using a random forrest based method (Boruta) in R to try and identify KO features that may be highly predictive of the two classes of genomes I identified, and also to establish an order of importance of the predictive features.
My input for Boruta is a 570X5000 table with the first column indicating if the genome is Surface or Depth and the rest of the columns indicating how many times each of the KEGG KO features was identified in the genome.
The following is the code I have used to run Boruta:
set.seed(123)
depth_feature.bu <- Boruta(Depth_Change ~ ., data = Depth_Bortua, doTrace = 2, maxRuns = 500)
What I have noticed is that there are many KO features that are present in > 30 Depth genomes and zero Surface genomes, but none of these features are identified as important predictors by this method. Alternatively I see one feature that is present in only 9 Surface genomes and none of the Depth genomes being identified as a significantly important feature, but then a feature that is present in 27 Surface genomes and zero Depth genomes is not significant.
It seems to not make sense that if a feature is only present in genomes belonging to one class that it does not get identified as at least somewhat significant for predicting that class. Also, it is very confusing as to why some features that are specific to a genome category get identified as significant while other features that may even be more abundant within that category do not?!
I have also tried running this data as binary data where instead of counts of KOs within a genome it is just presence or absence of a KO within a genome.
Would anybody have some explanation as to why I am observing this, or perhaps random forest/Boruta is not the best way to explore KO features that may be strongly associated with either genome class?
-Spencer
I would like to construct an off the shelf photobioreactor vessel made out of polycarbonate tube. This way the vessel would be low cost, autoclaveable, and any ports needed to be added to the sides for sample collection etc. could easily be drilled in. However, I am having trouble coming up with a good way to cap the vessel. To permanently seal the bottom of the tube it would be easy to just bond a circular piece of polycarbonate, however at the top I would like a more removable solution. Threading polycarbonate for a cap would require access to more serious machining and I would like this to be accessible to any scientist with basic tools.
One interesting example was something I saw here: http://2012.igem.org/Team:Cornell/project/drylab/components
Does anybody else have other suggestions?
A number of corrections exist for p-values in multiple hypothesis testing (ie: transcriptomics datasets) such as FDR or Bonferroni correction. What is your preferred method to use and why?
I have no problem accessing and navigating the KEGG database, however I have never been able to figure out how to pull out a spreadsheet with all the KEGG IDs mapped to any kind of gene identifier in my organism (Synechococcus elongatus PCC7942). This would be very helpful for subsequent pathway analysis.