Zhong Wang

DOE Joint Genome Institute, Walnut Creek, California, United States

Are you Zhong Wang?

Claim your profile

Publications (25)281.6 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Ruminant livestock represent the single largest anthropogenic source of the potent greenhouse gas methane, which is generated by methanogenic archaea residing in ruminant digestive tracts. While differences between individual animals of the same breed in the amount of methane produced have been observed, the basis for this variation remains to be elucidated. To explore the mechanistic basis of this methane production, we measured methane yields from 22 sheep, which revealed that methane yields are a reproducible quantitative trait. Deep metagenomic and metatranscriptomic sequencing demonstrated the presence of methanogens both in the highest and lowest methane-producing sheep, with a similar abundance of methanogens and methanogenesis pathway genes in high and low methane emitters. However, transcription of methanogenesis pathway genes was substantially increased in sheep with high methane yields. These results identify a discrete set of rumen methanogens whose methanogenesis pathway transcription profiles correlate with methane yields and provide new targets for CH4 mitigation at the levels of microbiota composition and transcriptional regulation.
    Genome Research 06/2014; 24(9):1517-25. · 14.40 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ∼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this “genomic dark matter.” Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene-context based approach can be used to assign function to genes that are otherwise categorized as “genomic dark matter” and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications. Biotechnol. Bioeng. © 2014 Wiley Periodicals, Inc.
    Biotechnology and Bioengineering 04/2014; · 4.16 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: By directed evolution in the laboratory, we previously generated populations of Escherichia coli that exhibit a complex new phenotype, extreme resistance to ionizing radiation (IR). The molecular basis of this extremophile phenotype, involving strain isolates with a 3-4 order of magnitude increase in IR resistance at 3000 Gy, is now addressed. Of 69 mutations identified in one of our most highly adapted isolates, functional experiments demonstrate that the IR resistance phenotype is almost entirely accounted for by only three of these nucleotide changes, in the DNA metabolism genes recA, dnaB, and yfjK. Four additional genetic changes make small but measurable contributions. Whereas multiple contributions to IR resistance are evident in this study, our results highlight a particular adaptation mechanism not adequately considered in studies to date: Genetic innovations involving pre-existing DNA repair functions can play a predominant role in the acquisition of an IR resistance phenotype. DOI: http://dx.doi.org/10.7554/eLife.01322.001.
    eLife Sciences 03/2014; 3:e01322. · 8.52 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Improved crop water-use efficiency (WUE) is critical for the long-term sustainability of agricultural production systems in the face of predicted future warmer and drier climates. Crassulacean acid metabolism (CAM) is a specialized mode of photosynthesis that enhances WUE through an inverse day/night pattern of stomatal closure/opening and improves photosynthetic efficiency by concentrating CO2 around RUBISCO. CAM has evolved multiple times from C3 photosynthesis and ~6.5% of higher plant species in more than 35 families have acquired CAM via parallel or convergent evolution. There are two fundamental questions to be answered to understand the molecular basis and evolutionary mechanism of CAM: 1) what are the genetic differences between CAM and non-CAM species and 2) what are the common molecular features shared among CAM plants from diverse origins? To address these questions, comparative genomics analysis was performed using multiple plant species including CAM (e.g., Agave, Kalanchoe, Mesembryanthemum), C3 (e.g., Arabidopsis, Oryza, Populus), C4 (e.g., Setaria, Sorghum, and Zea), and non-vascular plant species (e.g., Physcomitrella, Selaginella). Our analysis not only revealed orthologous gene groups shared between CAM and non-CAM species, but also identified genes specific to the CAM species. Also, expanded gene families were identified in CAM species compared with non-CAM species. Gene ontology and gene expression profiles were used to build hypothesis related to divergent gene functions that likely arose during CAM evolution. This research establishes a framework for CAM comparative genomics studies and provides new knowledge to inform genetic improvement in WUE and photosynthetic efficiency in crop plants under water-limiting conditions.
    International Plant and Animal Genome Conference XXII 2014; 01/2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: RNA-sequencing (RNA-seq) enables in-depth exploration of transcriptomes, but typical sequencing depth often limits its comprehensiveness. In this study, we generated nearly 3 billion RNA-Seq reads, totaling 341 Gb of sequence, from a Zea mays seedling sample. At this depth, a near complete snapshot of the transcriptome was observed consisting of over 90% of the annotated transcripts, including lowly expressed transcription factors. A novel hybrid strategy combining de novo and reference-based assemblies yielded a transcriptome consisting of 126,708 transcripts with 88% of expressed known genes assembled to full-length. We improved current annotations by adding 4,842 previously unannotated transcript variants and many new features, including 212 maize transcripts, 201 genes, 10 genes with undocumented potential roles in seedlings as well as maize lineage specific gene fusion events. We demonstrated the power of deep sequencing for large transcriptome studies by generating a high quality transcriptome, which provides a rich resource for the research community.
    Scientific Reports 01/2014; 4:4519. · 5.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: PIWI proteins play essential and conserved roles in germline development, including germline stem cell maintenance and meiosis. Because germline regulators such as OCT4, NANOG, and SOX2 are known to be potent factors that reprogram differentiated somatic cells into induced pluripotent stem cells (iPSCs), we investigated whether the PIWI protein family is involved in iPSC production. We find that all three mouse Piwi genes, Miwi, Mili, and Miwi2, are expressed in embryonic stem cells (ESCs) at higher levels than in fibroblasts, with Mili being the highest. However, mice lacking all three Piwi genes are viable and female fertile, and are only male sterile. Furthermore, embryonic fibroblasts derived from Miwi/Mili/Miwi2 triple knockout embryos can be efficiently reprogrammed into iPS cells. These iPS cells expressed pluripotency markers and were capable of differentiating into all three germ layers in teratoma assays. Genome-wide expression profiling reveals that the triple knockout iPS cells are very similar to littermate control iPS cells. These results indicate that PIWI proteins are dispensable for direct reprogramming of mouse fibroblasts.
    PLoS ONE 01/2014; 9(9):e97821. · 3.53 Impact Factor
  • Source
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Agaves are succulent monocotyledonous plants native to xeric environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis), and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, built from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having a minimum of approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, a focus on the transcriptomics of the A. deserti juvenile leaf confirms evolutionary conservation of monocotyledonous leaf physiology and development along the proximal-distal axis. Our work presents a comprehensive transcriptome resource for two Agave species and provides insight into their biology and physiology. These resources are a foundation for further investigation of agave biology and their improvement for bioenergy development.
    BMC Genomics 08/2013; 14(1):563. · 4.40 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: MOTIVATION: Researchers need general-purpose methods for objectively evaluating the accuracy of single and metagenome assemblies, and for automatically detecting any errors they may contain. Current methods do not fully meet this need because they require a reference, only consider one of the many aspects of assembly quality, or lack statistical justification, and none are designed to evaluate metagenome assemblies. RESULTS: In this paper we present an Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and integrates read quality, mate pair orientation and insert length (for paired end reads), sequencing coverage, read alignment, and k-mer frequency. ALE pinpoints synthetic errors in both single and metagenomic assemblies, including single-base errors, insertions/deletions, genome rearrangements and chimeric assemblies presented in metagenomes. At the genome level with real-world data ALE identifies three large misassemblies from the Spirochaeta smaragdinae finished genome, which were all independently validated by PacBio sequencing. At the single-base level with Illumina data, ALE recovers 215 of 222 (97%) single nucleotide variants in a training set from a GC-rich Rhodobacter sphaeroides genome. Using real PacBio data, ALE identifies 12 of 12 synthetic errors in a Lambda Phage genome, surpassing even Pacific Biosciences' own variant caller, EviCons. In summary, the ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process. AVAILABILITY: ALE is released as open source software under the UoI/NCSA license at http://www.alescore.org. It is implemented in C and Python. CONTACT: pf98@cornell.edu or ZhongWang@lbl.gov.
    Bioinformatics 01/2013; · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The West Nile virus (WNV) is an emerging infection of biodefense concern and there are no available treatments or vaccines. Here we used a high-throughput method based on a novel gene expression analysis, RNA-Seq, to give a global picture of differential gene expression by primary human macrophages of 10 healthy donors infected in vitro with WNV. From a total of 28 million reads per sample, we identified 1,514 transcripts that were differentially expressed after infection. Both predicted and novel gene changes were detected, as were gene isoforms, and while many of the genes were expressed by all donors, some were unique. Knock-down of genes not previously known to be associated with WNV resistance identified their critical role in control of viral infection. Our study distinguishes both common gene pathways as well as novel cellular responses. Such analyses will be valuable for translational studies of susceptible and resistant individuals-and for targeting therapeutics-in multiple biological settings.
    Viruses 01/2013; 5(7):1664-81. · 2.51 Impact Factor
  • Source
    Zhong Wang, Huntington F Willard
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Combinations of histone variants and modifications, conceptually representing a histone code, have been proposed to play a significant role in gene regulation and developmental processes in complex organisms. While various mechanisms have been implicated in establishing and maintaining epigenetic patterns at specific locations in the genome, they are generally believed to be independent of primary DNA sequence on a more global scale. RESULTS: To address this systematically in the case of the human genome, we have analyzed primary DNA sequences underlying 19 different methylated histones in human primary T-cells. We report that sequence alone can accurately predict the location of most of these histone marks genome-wide in this cell type. Furthermore, the sequence features responsible for such predictions are distinct for different groups of histone marks. CONCLUSIONS: These findings support the existence of a genomic code for histone modification associated with gene expression and chromatin programming, and they suggest that the mechanisms responsible for global histone modifications may interpret genomic sequence in various ways.
    BMC Genomics 08/2012; 13(1):367. · 4.40 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: RNA sequencing (RNA-Seq) is rapidly replacing microarrays for profiling gene expression with much improved accuracy and sensitivity. One of the most common questions in a typical gene profiling experiment is how to identify a set of transcripts that are differentially expressed between different experimental conditions. Some of the statistical methods developed for microarray data analysis can be applied to RNA-Seq data with or without modifications. Recently several additional methods have been developed specifically for RNA-Seq data sets. This review attempts to give an in-depth review of these statistical methods, with the goal of providing a comprehensive guide when choosing appropriate metrics for RNA-Seq statistical analyses.
    Cell & bioscience. 07/2012; 2(1):26.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
    Nature Biotechnology 07/2012; 30(7):693-700. · 32.44 Impact Factor
  • Source
    Jeffrey A Martin, Zhong Wang
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalogue of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies - along with some perspectives on transcriptome assembly in the near future.
    Nature Reviews Genetics 09/2011; 12(10):671-82. · 41.06 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads [1]. However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.
    05/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: The paucity of enzymes that efficiently deconstruct plant polysaccharides represents a major bottleneck for industrial-scale conversion of cellulosic biomass into biofuels. Cow rumen microbes specialize in degradation of cellulosic plant material, but most members of this complex community resist cultivation. To characterize biomass-degrading genes and genomes, we sequenced and analyzed 268 gigabases of metagenomic DNA from microbes adherent to plant fiber incubated in cow rumen. From these data, we identified 27,755 putative carbohydrate-active genes and expressed 90 candidate proteins, of which 57% were enzymatically active against cellulosic substrates. We also assembled 15 uncultured microbial genomes, which were validated by complementary methods including single-cell genome sequencing. These data sets provide a substantially expanded catalog of genes and genomes participating in the deconstruction of cellulosic biomass.
    Science 01/2011; 331(6016):463-7. · 31.20 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Single cell genomics, the amplification and sequencing of genomes from single cells, can provide a glimpse into the genetic make-up and thus life style of the vast majority of uncultured microbial cells, making it an immensely powerful and increasingly popular tool. This is accomplished by use of multiple displacement amplification (MDA), which can generate billions of copies of a single bacterial genome producing microgram-range DNA required for shotgun sequencing. Here, we address a key challenge inherent to this approach and propose a solution for the improved recovery of single cell genomes. While DNA-free reagents for the amplification of a single cell genome are a prerequisite for successful single cell sequencing and analysis, DNA contamination has been detected in various reagents, which poses a considerable challenge. Our study demonstrates the effect of UV irradiation in efficient elimination of exogenous contaminant DNA found in MDA reagents, while maintaining Phi29 activity. Consequently, we also find that increased UV exposure to Phi29 does not adversely affect genome coverage of MDA amplified single cells. While additional challenges in single cell genomics remain to be resolved, the proposed methodology is relatively quick and simple and we believe that its application will be of high value for future single cell sequencing projects.
    01/2011;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.
    01/2011;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Candida albicans is the major invasive fungal pathogen of humans, causing diseases ranging from superficial mucosal infections to disseminated, systemic infections that are often lifethreatening. We have used massively parallel high-throughput sequencing of cDNA (RNA-seq) to generate a high-resolution map of the C. albicans transcriptome under several different environmental conditions. We have quantitatively determined all of the regions that are transcribed under these different conditions, and have identified 602 novel transcriptionally active regions (TARs) and numerous novel introns that are not represented in the current genome annotation. Interestingly, the expression of many of these TARs is regulated in a condition-specific manner. This comprehensive transcriptome analysis significantly enhances the current genome annotation of C. albicans, a necessary framework for a complete understanding of the molecular mechanisms of pathogenesis for this important eukaryotic pathogen.
    Genome Research 10/2010; 20(10):1451-8. · 14.40 Impact Factor

Publication Stats

4k Citations
281.60 Total Impact Points

Institutions

  • 2010–2014
    • DOE Joint Genome Institute
      Walnut Creek, California, United States
    • Lawrence Berkeley National Laboratory
      • Genomics Division
      Berkeley, California, United States
  • 2013
    • Cornell University
      • Center for Applied Mathematics
      Ithaca, NY, United States
  • 2012
    • Duke University
      Durham, North Carolina, United States
  • 2008–2010
    • Yale University
      • Department of Molecular, Cellular and Developmental Biology
      New Haven, CT, United States