[Show abstract][Hide abstract] ABSTRACT: Background
Clone libraries provide researchers with a powerful resource to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, and allowed the mining of novel enzymes. Libraries are often constructed by cloning large inserts into cosmid or fosmid vectors. Recently, there have been reports of GC bias in fosmid metagenomic libraries, and it was speculated to be a result of fragmentation and loss of AT-rich sequences during cloning. However, evidence in the literature suggests that transcriptional activity or gene product toxicity may play a role.
To explore possible mechanisms responsible for sequence bias in clone libraries, we constructed a cosmid library from a human microbiome sample and sequenced DNA from different steps during library construction: crude extract DNA, size-selected DNA, and cosmid library DNA. We confirmed a GC bias in the final cosmid library, and we provide evidence that the bias is not due to fragmentation and loss of AT-rich sequences but is likely occurring after DNA is introduced into Escherichia coli. To investigate the influence of strong constitutive transcription, we searched the sequence data for promoters and found that rpoD/σ70 promoter sequences were underrepresented in the cosmid library. Furthermore, when we examined the genomes of taxa that were differentially abundant in the cosmid library relative to the original sample, we found the bias to be more correlated with the number of rpoD/σ70 consensus sequences in the genome than with simple GC content.
The GC bias of metagenomic libraries does not appear to be due to DNA fragmentation. Rather, analysis of promoter sequences provides support for the hypothesis that strong constitutive transcription from sequences recognized as rpoD/σ70 consensus-like in E. coli may lead to instability, causing loss of the plasmid or loss of the insert DNA that gives rise to the transcription. Despite widespread use of E. coli to propagate foreign DNA in metagenomic libraries, the effects of in vivo transcriptional activity on clone stability are not well understood. Further work is required to tease apart the effects of transcription from those of gene product toxicity.
Electronic supplementary material
The online version of this article (doi:10.1186/s40168-015-0086-5) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract] ABSTRACT: Functional metagenomics is a powerful experimental approach for studying gene function, starting from the extracted DNA of mixed microbial populations. A functional approach relies on the construction and screening of metagenomic libraries—physical libraries that contain DNA cloned from environmental metagenomes. The information obtained from functional metagenomics can help in future annotations of gene function and serve as a complement to sequence-based metagenomics. In this Perspective, we begin by summarizing the technical challenges of constructing metagenomic libraries and emphasize their value as resources. We then discuss libraries constructed using the popular cloning vector, pCC1FOS, and highlight the strengths and shortcomings of this system, alongside possible strategies to maximize existing pCC1FOS-based libraries by screening in diverse hosts. Finally, we discuss the known bias of libraries constructed from human gut and marine water samples, present results that suggest bias may also occur for soil libraries, and consider factors that bias metagenomic libraries in general. We anticipate that discussion of current resources and limitations will advance tools and technologies for functional metagenomics research.
Frontiers in Microbiology 10/2015; 6. DOI:10.3389/fmicb.2015.01196 · 3.99 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Cys2-His2 zinc finger (C2H2-ZF) proteins represent the largest class of putative human transcription factors. However, for most C2H2-ZF proteins it is unknown whether they even bind DNA or, if they do, to which sequences. Here, by combining data from a modified bacterial one-hybrid system with protein-binding microarray and chromatin immunoprecipitation analyses, we show that natural C2H2-ZFs encoded in the human genome bind DNA both in vitro and in vivo, and we infer the DNA recognition code using DNA-binding data for thousands of natural C2H2-ZF domains. In vivo binding data are generally consistent with our recognition code and indicate that C2H2-ZF proteins recognize more motifs than all other human transcription factors combined. We provide direct evidence that most KRAB-containing C2H2-ZF proteins bind specific endogenous retroelements (EREs), ranging from currently active to ancient families. The majority of C2H2-ZF proteins, including KRAB proteins, also show widespread binding to regulatory regions, indicating that the human genome contains an extensive and largely unstudied adaptive C2H2-ZF regulatory network that targets a diverse range of genes and pathways.
[Show abstract][Hide abstract] ABSTRACT: High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones.
PLoS ONE 06/2014; 9(6):e98968. DOI:10.1371/journal.pone.0098968 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: C2H2 zinc fingers (C2H2-ZFs) are the most prevalent type of vertebrate DNA-binding domain, and typically appear in tandem
arrays (ZFAs), with sequential C2H2-ZFs each contacting three (or more) sequential bases. C2H2-ZFs can be assembled in a modular
fashion, providing one explanation for their remarkable evolutionary success. Given a set of modules with defined three-base
specificities, modular assembly also presents a way to construct artificial proteins with specific DNA-binding preferences.
However, a recent survey of a large number of three-finger ZFAs engineered by modular assembly reported high failure rates
(∼70%), casting doubt on the generality of modular assembly. Here, we used protein-binding microarrays to analyze 28 ZFAs
that failed in the aforementioned study. Most (17) preferred specific sequences, which in all but one case resembled the intended
target sequence. Like natural ZFAs, the engineered ZFAs typically yielded degenerate motifs, binding dozens to hundreds of
related individual sequences. Thus, the failure of these proteins in previous assays is not due to lack of sequence-specific
DNA-binding activity. Our findings underscore the relevance of individual C2H2-ZF sequence specificities within tandem arrays,
and support the general ability of modular assembly to produce ZFAs with sequence-specific DNA-binding activity.
Nucleic Acids Research 02/2011; 39(11):4680-90. DOI:10.1093/nar/gkq1303 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: S. meliloti forms indeterminate nodules on the roots of its host plant alfalfa (Medicago sativa). Bacteroids of indeterminate nodules are terminally differentiated and, unlike their non-terminally differentiated counterparts in determinate nodules, do not accumulate large quantities of Poly-3-hydroxybutyrate (PHB) during symbiosis. PhaZ is in intracellular PHB depolymerase; it represents the first enzyme in the degradative arm of the PHB cycle in S. meliloti and is the only enzyme in this half of the PHB cycle that remains uncharacterized.
The S. meliloti phaZ gene was identified by in silico analysis, the ORF was cloned, and a S. meliloti phaZ mutant was constructed. This mutant exhibited increased PHB accumulation during free-living growth, even when grown under non-PHB-inducing conditions. The phaZ mutant demonstrated no reduction in symbiotic capacity; interestingly, analysis of the bacteroids showed that this mutant also accumulated PHB during symbiosis. This mutant also exhibited a decreased capacity to tolerate long-term carbon starvation, comparable to that of other PHB cycle mutants. In contrast to other PHB cycle mutants, the S. meliloti phaZ mutant did not exhibit any decrease in rhizosphere competitiveness; however, this mutant did exhibit a significant increase in succinoglycan biosynthesis.
S. meliloti bacteroids retain the capacity to synthesize PHB during symbiosis; interestingly, accumulation does not occur at the expense of symbiotic performance. phaZ mutants are not compromised in their capacity to compete for nodulation in the rhizosphere, perhaps due to increased succinoglycan production resulting from upregulation of the succinoglycan biosynthetic pathway. The reduced survival capacity of free-living cells unable to access their accumulated stores of PHB suggests that PHB is a crucial metabolite under adverse conditions.
[Show abstract][Hide abstract] ABSTRACT: Background: S. meliloti forms indeterminate nodules on the roots of its host plant alfalfa (Medicago sativa). Bacteroids of indeterminate nodules are terminally dierentiated and, unlike their non-terminally dierentiated counterparts in determinate nodules, do not accumulate large quantities of Poly-3-hydroxybutyrate (PHB) during symbiosis. PhaZ is in intracellular PHB depolymerase; it represents the rst enzyme in the degradative arm of the PHB cycle inS. meliloti and is the only enzyme in this half of the PHB cycle that remains uncharacterized.. . . Results: The S. meliloti phaZ gene was identied by in silico analysis, the ORF was cloned, and a S. meliloti phaZ mutant was constructed. This mutant exhibited increased PHB accumulation during free-living growth, even when grown under non-PHB-inducing conditions. The phaZ mutant demonstrated no reduction in symbiotic capacity; interestingly, analysis of the bacteroids showed that this mutant also accumulated PHB during symbiosis. This mutant also exhibited a decreased capacity to tolerate long-term carbon starvation, comparable to that of other PHB cycle mutants. In contrast to other PHB cycle mutants, the S. meliloti phaZ mutant did not exhibit any decrease in rhizosphere competitiveness; however, this mutant did exhibit a