Wellington Muchero, Jianjun Guo, Stephen P DiFazio, Jin-Gui Chen, Priya Ranjan, Gancho T Slavov, Lee E Gunter, Sara Jawdy, Anthony C Bryan, Robert Sykes, [......], Oleksandr Skyba, Faride Unda, Yousry A El-Kassaby, Carl J Douglas, Shawn D Mansfield, Joel Martin, Wendy Schackwitz, Luke M Evans, Olaf Czarnecki, Gerald A Tuskan
[Show abstract][Hide abstract] ABSTRACT: BackgroundQTL cloning for the discovery of genes underlying polygenic traits has historically been cumbersome in long-lived perennial plants like Populus. Linkage disequilibrium-based association mapping has been proposed as a cloning tool, and recent advances in high-throughput genotyping and whole-genome resequencing enable marker saturation to levels sufficient for association mapping with no a priori candidate gene selection. Here, multiyear and multienvironment evaluation of cell wall phenotypes was conducted in an interspecific P. trichocarpa x P. deltoides pseudo-backcross mapping pedigree and two partially overlapping populations of unrelated P. trichocarpa genotypes using pyrolysis molecular beam mass spectrometry, saccharification, and/ or traditional wet chemistry. QTL mapping was conducted using a high-density genetic map with 3,568 SNP markers. As a fine-mapping approach, chromosome-wide association mapping targeting a QTL hot-spot on linkage group XIV was performed in the two P. trichocarpa populations. Both populations were genotyped using the 34 K Populus Infinium SNP array and whole-genome resequencing of one of the populations facilitated marker-saturation of candidate intervals for gene identification.ResultsFive QTLs ranging in size from 0.6 to 1.8 Mb were mapped on linkage group XIV for lignin content, syringyl to guaiacyl (S/G) ratio, 5- and 6-carbon sugars using the mapping pedigree. Six candidate loci exhibiting significant associations with phenotypes were identified within QTL intervals. These associations were reproducible across multiple environments, two independent genotyping platforms, and different plant growth stages. cDNA sequencing for allelic variants of three of the six loci identified polymorphisms leading to variable length poly glutamine (PolyQ) stretch in a transcription factor annotated as an ANGUSTIFOLIA C-terminus Binding Protein (CtBP) and premature stop codons in a KANADI transcription factor as well as a protein kinase. Results from protoplast transient expression assays suggested that each of the polymorphisms conferred allelic differences in the activation of cellulose, hemicelluloses, and lignin pathway marker genes.Conclusion
This study illustrates the utility of complementary QTL and association mapping as tools for gene discovery with no a priori candidate gene selection. This proof of concept in a perennial organism opens up opportunities for discovery of novel genetic determinants of economically important but complex traits in plants.
[Show abstract][Hide abstract] ABSTRACT: Terpene synthesis in the majority of bacterial species, together with plant plastids, takes place via the 1-deoxy-d-xylulose 5-phosphate (DXP) pathway. The first step of this pathway involves the condensation of pyruvate and glyceraldehyde 3-phosphate by DXP synthase (Dxs), with one sixth of the carbon lost as CO2. A hypothetical route (nDXP) from a pentose phosphate to DXP could enable a more direct pathway from C5 sugars to terpenes and also circumvent regulatory mechanisms that control Dxs, but there is no enzyme known that can convert a sugar into its 1-deoxy equivalent. Employing a selection for complementation of a dxs deletion in E. coli grown on xylose as sole carbon source, we uncovered two candidate nDXP genes. Complementation was achieved via either overexpression of the wild type E. coli yajO gene, annotated as a putative xylose reductase, or via various mutations in the native ribB gene. In vitro analysis with purified YajO and mutant RibB proteins revealed that in both cases DXP was synthesized from ribulose 5-phosphate (Ru5P). We demonstrate utility of these genes for microbial terpene biosynthesis by engineering the DXP pathway in E. coli for production of the sesquiterpene bisabolene, a candidate biodiesel. To further improve flux into the pathway from Ru5P, nDXP enzymes were expressed as fusions to DXP reductase, Dxr, the second enzyme in the DXP pathway. Expression of a Dxr-RibB(G108S) fusion improved bisabolene titers more than 4-fold and alleviated accumulation of intracellular DXP.
Applied and Environmental Microbiology 10/2014; · 3.95 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Lignocellulosic plant material is a viable source of biomass to produce alternative energy including ethanol and other biofuels. However, several factors - including toxic byproducts from biomass pretreatment and poor fermentation of xylose and other pentose sugars - currently limit the efficiency of microbial biofuel production. To begin to understand the genetic basis of desirable traits, we characterized three strains of Saccharomyces cerevisiae with robust growth in a pretreated lignocellulosic hydrolysate or tolerance to stress conditions relevant to industrial biofuel production, through genome and transcriptome sequencing analysis. All stress resistant strains were highly mosaic, suggesting that genetic admixture may contribute to novel allele combinations underlying these phenotypes. Strain-specific gene sets not found in the lab strain were functionally linked to the tolerances of particular strains. Furthermore, genes with signatures of evolutionary selection were enriched for functional categories important for stress resistance and included stress-responsive signaling factors. Comparison of the strains' transcriptomic responses to heat and ethanol treatment - two stresses relevant to industrial bioethanol production - pointed to physiological processes that were related to particular stress resistance profiles. Many of the genotype-by-environment expression responses occurred at targets of transcription factors with signatures of positive selection, suggesting that these strains have undergone positive selection for stress tolerance. Our results generate new insights into potential mechanisms of tolerance to stresses relevant to biofuel production, including ethanol and heat, present a backdrop for further engineering, and provide glimpses into the natural variation of stress tolerance in wild yeast strains.
Genome Biology and Evolution 09/2014; · 4.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Forest trees are dominant components of terrestrial ecosystems that have global ecological and economic importance. Despite distributions that span wide environmental gradients, many tree populations are locally adapted, and mechanisms underlying this adaptation are poorly understood. Here we use a combination of whole-genome selection scans and association analyses of 544 Populus trichocarpa trees to reveal genomic bases of adaptive variation across a wide latitudinal range. Three hundred ninety-seven genomic regions showed evidence of recent positive and/or divergent selection and enrichment for associations with adaptive traits that also displayed patterns consistent with natural selection. These regions also provide unexpected insights into the evolutionary dynamics of duplicated genes and their roles in adaptive trait variation.
[Show abstract][Hide abstract] ABSTRACT: Brachypodium distachyon is small annual grass that has been adopted as a model for the grasses. Its small genome, high quality reference genome, large germplasm collection, and selfing nature make it an excellent subject for studies of natural variation. We sequenced six divergent lines to identify a comprehensive set of polymorphisms and analyze their distribution and concordance with gene expression. Multiple methods and controls were utilized to identify polymorphisms and validate their quality. mRNA-Seq experiments under control and simulated drought-stress conditions, identified 300 genes with a genotype-dependent treatment response. We showed that large-scale sequence variants had extremely high concordance with altered expression of hundreds of genes, including many with genotype-dependent treatment responses. We generated a deep mRNA-Seq dataset for the most divergent line and created a de novo transcriptome assembly. This led to the discovery of >2,400 previously unannotated transcripts and hundreds of genes not present in the reference genome. We built a public database for visualization and investigation of sequence variants among these widely used inbred lines.This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract] ABSTRACT: By directed evolution in the laboratory, we previously generated populations of Escherichia coli that exhibit a complex new phenotype, extreme resistance to ionizing radiation (IR). The molecular basis of this extremophile phenotype, involving strain isolates with a 3-4 order of magnitude increase in IR resistance at 3000 Gy, is now addressed. Of 69 mutations identified in one of our most highly adapted isolates, functional experiments demonstrate that the IR resistance phenotype is almost entirely accounted for by only three of these nucleotide changes, in the DNA metabolism genes recA, dnaB, and yfjK. Four additional genetic changes make small but measurable contributions. Whereas multiple contributions to IR resistance are evident in this study, our results highlight a particular adaptation mechanism not adequately considered in studies to date: Genetic innovations involving pre-existing DNA repair functions can play a predominant role in the acquisition of an IR resistance phenotype. DOI: http://dx.doi.org/10.7554/eLife.01322.001.
[Show abstract][Hide abstract] ABSTRACT: The complexity of plant cell walls creates many challenges for microbial decomposition. Clostridium phytofermentans, an anaerobic bacterium isolated from forest soil, directly breaks down and utilizes many plant cell wall carbohydrates. The objective of this research is to understand constraints on rates of plant decomposition by Clostridium phytofermentans and identify molecular mechanisms that may overcome these limitations.
Experimental evolution via repeated serial transfers during exponential growth was used to select for C. phytofermentans genotypes that grow more rapidly on cellobiose, cellulose and xylan. To identify the underlying mutations an average of 13,600,000 paired-end reads were generated per population resulting in ∼300 fold coverage of each site in the genome. Mutations with allele frequencies of 5% or greater could be identified with statistical confidence. Many mutations are in carbohydrate-related genes including the promoter regions of glycoside hydrolases and amino acid substitutions in ABC transport proteins involved in carbohydrate uptake, signal transduction sensors that detect specific carbohydrates, proteins that affect the export of extracellular enzymes, and regulators of unknown specificity. Structural modeling of the ABC transporter complex proteins suggests that mutations in these genes may alter the recognition of carbohydrates by substrate-binding proteins and communication between the intercellular face of the transmembrane and the ATPase binding proteins.
Experimental evolution was effective in identifying molecular constraints on the rate of hemicellulose and cellulose fermentation and selected for putative gain of function mutations that do not typically appear in traditional molecular genetic screens. The results reveal new strategies for evolving and engineering microorganisms for faster growth on plant carbohydrates.
PLoS ONE 01/2014; 9(1):e86731. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The Salton Sea is the largest inland body of water in California, with salinities ranging from brackish freshwater to hypersaline. The lake experiences high nutrient input, and its surface water is exposed to temperatures up to 40°C. Here, we report the community profiles associated with surface water from the Salton Sea.
[Show abstract][Hide abstract] ABSTRACT: Enterobacter cloacae strain JD6301 was isolated from a mixed culture with wastewater collected from a municipal treatment facility and oleaginous microorganisms. A draft genome sequence of this organism indicates that it has a genome size of 4,772,910 bp, an average G+C content of 53%, and 4,509 protein-coding genes.
[Show abstract][Hide abstract] ABSTRACT: The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five percent of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25× higher than those between inbred lines and 50× lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP-encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.
[Show abstract][Hide abstract] ABSTRACT: Genetic mapping of quantitative traits requires genotypic data for large numbers of markers in many individuals. For such studies, the use of large single nucleotide polymorphism (SNP) genotyping arrays still offers the most cost-effective solution. Herein we report on the design and performance of a SNP genotyping array for Populus trichocarpa (black cottonwood). This genotyping array was designed with SNPs pre-ascertained in 34 wild accessions covering most of the species latitudinal range. We adopted a candidate gene approach to the array design that resulted in the selection of 34 131 SNPs, the majority of which are located in, or within 2 kb of, 3543 candidate genes. A subset of the SNPs on the array (539) was selected based on patterns of variation among the SNP discovery accessions. We show that more than 95% of the loci produce high quality genotypes and that the genotyping error rate for these is likely below 2%. We demonstrate that even among small numbers of samples (n = 10) from local populations over 84% of loci are polymorphic. We also tested the applicability of the array to other species in the genus and found that the number of polymorphic loci decreases rapidly with genetic distance, with the largest numbers detected in other species in section Tacamahaca. Finally, we provide evidence for the utility of the array to address evolutionary questions such as intraspecific studies of genetic differentiation, species assignment and the detection of natural hybrids.
[Show abstract][Hide abstract] ABSTRACT: Pyrenophora tritici-repentis is a necrotrophic fungus causal to the disease tan spot of wheat, whose contribution to crop loss has increased significantly during the last few decades. Pathogenicity by this fungus is attributed to the production of host-selective toxins (HST), which are recognized by their host in a genotype-specific manner. To better understand the mechanisms that have led to the increase in disease incidence related to this pathogen, we sequenced the genomes of three P. tritici-repentis isolates. A pathogenic isolate that produces two known HSTs was used to assemble a reference nuclear genome of approximately 40 Mb composed of 11 chromosomes that encode 12,141 predicted genes. Comparison of the reference genome with those of a pathogenic isolate that produces a third HST, and a nonpathogenic isolate, showed the nonpathogen genome to be more diverged than those of the two pathogens. Examination of gene-coding regions has provided candidate pathogen-specific proteins and revealed gene families that may play a role in a necrotrophic lifestyle. Analysis of transposable elements suggests that their presence in the genome of pathogenic isolates contributes to the creation of novel genes, effector diversification, possible horizontal gene transfer events, identified copy number variation, and the first example of transduplication by DNA transposable elements in fungi. Overall, comparative analysis of these genomes provides evidence that pathogenicity in this species arose through an influx of transposable elements, which created a genetically flexible landscape that can easily respond to environmental changes.
[Show abstract][Hide abstract] ABSTRACT: The DNA sequences of chromosomes I and II of Rhodobacter sphaeroides strain 2.4.1 have been revised, and the annotation of the entire genomic sequence, including both chromosomes and the five plasmids, has been updated. Errors in the originally published sequence have been corrected, and ∼11% of the coding regions in the original sequence have been affected by the revised annotation.
Journal of bacteriology 12/2012; 194(24):7016-7. · 2.69 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: While cellulosic ethanol is being looked to for relief of the global energy demand, a number of molecular bottlenecks currently exist that prevent the efficient bioconversion of lignocellulose into ethanol. For example, it is well known that: 1) native Saccharomyces cerevisiae yeast strains cannot sufficiently ferment xylose, and 2) side products generated from pretreatment, including Ammonia Fiber Expansion (AFEX™) and alkaline hydrogen peroxide (AHP), of plant biomass illicit a cellular stress response, which further limits fermentation productivity. At the Great Lakes Bioenergy Research Center, we have taken a multi-comparative approach to facilitate the discovery and understanding of molecular bottlenecks in the fermentation of lignocellulosic hydrolysates by yeast. Through multi-phenotypic and bioinformatic analysis of 111 natural and domesticated isolates, we have identified a wild S. cerevisiae strain that maintains rapid growth and cell viability in a variety of distinctly prepared hydrolysates. Following directed engineering of a xylose metabolism pathway, we performed directed evolution that yielded mutants able to ferment 2 to 3-fold more xylose from AFEX™ corn stover hydrolysate (ACSH) than unevolved parents. Furthermore, we employed temporal profiling of gene expression levels during ACSH fermentations, which identified differences in cell physiology between evolved and unevolved strains. Analysis of extracellular metabolite, amino acid and metal concentrations additionally identified limiting nutrients during fermentation. Coupled with comparative genome resequencing of parental and evolved strains, this suite of Omic data is being integrated in metabolic network models to identify and understand genetic differences that impact xylose fermentation in stress-inducing lignocellulosic hydrolysates.
[Show abstract][Hide abstract] ABSTRACT: • Plant population genomics informs evolutionary biology, breeding, conservation and bioenergy feedstock development. For example, the detection of reliable phenotype-genotype associations and molecular signatures of selection requires a detailed knowledge about genome-wide patterns of allele frequency variation, linkage disequilibrium and recombination. • We resequenced 16 genomes of the model tree Populus trichocarpa and genotyped 120 trees from 10 subpopulations using 29 213 single-nucleotide polymorphisms. • Significant geographic differentiation was present at multiple spatial scales, and range-wide latitudinal allele frequency gradients were strikingly common across the genome. The decay of linkage disequilibrium with physical distance was slower than expected from previous studies in Populus, with r(2) dropping below 0.2 within 3-6 kb. Consistent with this, estimates of recent effective population size from linkage disequilibrium (N(e) ≈ 4000-6000) were remarkably low relative to the large census sizes of P. trichocarpa stands. Fine-scale rates of recombination varied widely across the genome, but were largely predictable on the basis of DNA sequence and methylation features. • Our results suggest that genetic drift has played a significant role in the recent evolutionary history of P. trichocarpa. Most importantly, the extensive linkage disequilibrium detected suggests that genome-wide association studies and genomic selection in undomesticated populations may be more feasible in Populus than previously assumed.
New Phytologist 08/2012; 196(3):713-725. · 6.37 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The ND18 strain of Barley stripe mosaic virus (BSMV) infects several lines of Brachypodium distachyon, a recently developed model system for genomics research in cereals. Among the inbred lines tested, Bd3-1 is highly resistant at 20 to 25 °C, whereas Bd21 is susceptible and infection results in an intense mosaic phenotype accompanied by high levels of replicating virus. We generated an F(6:7) recombinant inbred line (RIL) population from a cross between Bd3-1 and Bd21 and used the RILs, and an F(2) population of a second Bd21 × Bd3-1 cross to evaluate the inheritance of resistance. The results indicate that resistance segregates as expected for a single dominant gene, which we have designated Barley stripe mosaic virus resistance 1 (Bsr1). We constructed a genetic linkage map of the RIL population using SNP markers to map this gene to within 705 Kb of the distal end of the top of chromosome 3. Additional CAPS and Indel markers were used to fine map Bsr1 to a 23 Kb interval containing five putative genes. Our study demonstrates the power of using RILs to rapidly map the genetic determinants of BSMV resistance in Brachypodium. Moreover, the RILs and their associated genetic map, when combined with the complete genomic sequence of Brachypodium, provide new resources for genetic analyses of many other traits.
PLoS ONE 06/2012; 7(6):e38333. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Motivation: The sequencing of over a thousand natural strains of the model plant Arabidopsis thaliana is producing unparalleled information at the genetic level for plant researchers. To enable the rapid exploitation of these data for functional proteomics studies, we have created a resource for the visualization of protein information and proteomic datasets for sequenced natural strains of A. thaliana. Results: The 1001 Proteomes portal can be used to visualize amino acid substitutions or non-synonymous single-nucleotide polymorphisms in individual proteins of A. thaliana based on the reference genome Col-0. We have used the available processed sequence information to analyze the conservation of known residues subject to protein phosphorylation among these natural strains. The substitution of amino acids in A. thaliana natural strains is heavily constrained and is likely a result of the conservation of functional attributes within proteins. At a practical level, we demonstrate that this information can be used to clarify ambiguously defined phosphorylation sites from phosphoproteomic studies. Protein sets of available natural variants are available for download to enable proteomic studies on these accessions. Together this information can be used to uncover the possible roles of specific amino acids in determining the structure and function of proteins in the model plant A. thaliana. An online portal to enable the community to exploit these data can be accessed at http://1001proteomes.masc-proteomics.org/ Contact: email@example.com Supplementary information:Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: MOTIVATION: The sequencing of over a thousand natural strains of the model plant Arabidopsis thaliana is producing unparalleled information at the genetic level for plant researchers. To enable the rapid exploitation of these data for functional proteomics studies, we have created a resource for the visualization of protein information and proteomic datasets for sequenced natural strains of A. thaliana. RESULTS: The 1001 Proteomes portal can be used to visualize amino acid substitutions or non-synonymous single-nucleotide polymorphisms in individual proteins of A. thaliana based on the reference genome Col-0. We have used the available processed sequence information to analyze the conservation of known residues subject to protein phosphorylation among these natural strains. The substitution of amino acids in A. thaliana natural strains is heavily constrained and is likely a result of the conservation of functional attributes within proteins. At a practical level, we demonstrate that this information can be used to clarify ambiguously defined phosphorylation sites from phosphoproteomic studies. Protein sets of available natural variants are available for download to enable proteomic studies on these accessions. Together this information can be used to uncover the possible roles of specific amino acids in determining the structure and function of proteins in the model plant A. thaliana. An online portal to enable the community to exploit these data can be accessed at http://1001proteomes.masc-proteomics.org/
[Show abstract][Hide abstract] ABSTRACT: Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.
[Show abstract][Hide abstract] ABSTRACT: Agenesis of the corpus callosum (AgCC) is a congenital brain malformation that occurs in approximately 1:1,000-1:6,000 births. Several syndromes associated with AgCC have been traced to single gene mutations; however, the majority of AgCC causes remain unidentified. We investigated a mother and two children who all shared complete AgCC and a chromosomal deletion at 1q42. We fine mapped this deletion and show that it includes Disrupted-in-Schizophrenia 1 (DISC1), a gene implicated in schizophrenia and other psychiatric disorders. Furthermore, we report a de novo chromosomal deletion at 1q42.13 to q44, which includes DISC1, in another individual with AgCC. We resequenced DISC1 in a cohort of 144 well-characterized AgCC individuals and identified 20 sequence changes, of which 4 are rare potentially pathogenic variants. Two of these variants were undetected in 768 control chromosomes. One of these is a splice site mutation at the 5' boundary of exon 11 that dramatically reduces full-length mRNA expression of DISC1, but not of shorter forms. We investigated the developmental expression of mouse DISC1 and find that it is highly expressed in the embryonic corpus callosum at a critical time for callosal formation. Taken together our results suggest a significant role for DISC1 in corpus callosum development.
American Journal of Medical Genetics Part A 08/2011; 155A(8):1865-76. · 2.30 Impact Factor