[show abstract][hide abstract] ABSTRACT: Although retroviruses are relatively promiscuous in choice of integration sites, retrotransposons can display marked integration specificity. In yeast and slime mold, some retrotransposons are associated with tRNA genes (tDNAs). In the Saccharomyces cerevisiae genome, the long terminal repeat retrotransposon Ty3 is found at RNA polymerase III (Pol III) transcription start sites of tDNAs. Ty1, 2, and 4 elements also cluster in the upstream regions of these genes. To determine the extent to which other Pol III-transcribed genes serve as genomic targets for Ty3, a set of 10,000 Ty3 genomic retrotranspositions were mapped using high-throughput DNA sequencing. Integrations occurred at all known tDNAs, two tDNA relics (iYGR033c and ZOD1), and six non-tDNA, Pol III-transcribed types of genes (RDN5, SNR6, SNR52, RPR1, RNA170, and SCR1). Previous work in vitro demonstrated that the Pol III transcription factor (TF) IIIB is important for Ty3 targeting. However, seven loci that bind the TFIIIB loader, TFIIIC, were not targeted, underscoring the unexplained absence of TFIIIB at those sites. Ty3 integrations also occurred in two open reading frames not previously associated with Pol III transcription, suggesting the existence of a small number of additional sites in the yeast genome that interact with Pol III transcription complexes.
Genome Research 02/2012; 22(4):681-92. · 14.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: The ability to chronicle transcription-factor binding events throughout the development of an organism would facilitate mapping of transcriptional networks that control cell-fate decisions. We describe a method for permanently recording protein-DNA interactions in mammalian cells. We endow transcription factors with the ability to deposit a transposon into the genome near to where they bind. The transposon becomes a "calling card" that the transcription factor leaves behind to record its visit to the genome. The locations of the calling cards can be determined by massively parallel DNA sequencing. We show that the transcription factor SP1 fused to the piggyBac transposase directs insertion of the piggyBac transposon near SP1 binding sites. The locations of transposon insertions are highly reproducible and agree with sites of SP1-binding determined by ChIP-seq. Genes bound by SP1 are more likely to be expressed in the HCT116 cell line we used, and SP1-bound CpG islands show a strong preference to be unmethylated. This method has the potential to trace transcription-factor binding throughout cellular and organismal development in a way that has heretofore not been possible.
[show abstract][hide abstract] ABSTRACT: Domestication of plants and animals promoted humanity's transition from nomadic to sedentary lifestyles, demographic expansion, and the emergence of civilizations. In contrast to the well-documented successes of crop and livestock breeding, processes of microbe domestication remain obscure, despite the importance of microbes to the production of food, beverages, and biofuels. Lager-beer, first brewed in the 15th century, employs an allotetraploid hybrid yeast, Saccharomyces pastorianus (syn. Saccharomyces carlsbergensis), a domesticated species created by the fusion of a Saccharomyces cerevisiae ale-yeast with an unknown cryotolerant Saccharomyces species. We report the isolation of that species and designate it Saccharomyces eubayanus sp. nov. because of its resemblance to Saccharomyces bayanus (a complex hybrid of S. eubayanus, Saccharomyces uvarum, and S. cerevisiae found only in the brewing environment). Individuals from populations of S. eubayanus and its sister species, S. uvarum, exist in apparent sympatry in Nothofagus (Southern beech) forests in Patagonia, but are isolated genetically through intrinsic postzygotic barriers, and ecologically through host-preference. The draft genome sequence of S. eubayanus is 99.5% identical to the non-S. cerevisiae portion of the S. pastorianus genome sequence and suggests specific changes in sugar and sulfite metabolism that were crucial for domestication in the lager-brewing environment. This study shows that combining microbial ecology with comparative genomics facilitates the discovery and preservation of wild genetic stocks of domesticated microbes to trace their history, identify genetic changes, and suggest paths to further industrial improvement.
Proceedings of the National Academy of Sciences 08/2011; 108(35):14539-44. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: Transcription factors direct gene expression, so there is much interest in mapping their genome-wide binding locations. Current methods do not allow for the multiplexed analysis of TF binding, and this limits their throughput. We describe a novel method for determining the genomic target genes of multiple transcription factors simultaneously. DNA-binding proteins are endowed with the ability to direct transposon insertions into the genome near to where they bind. The transposon becomes a "Calling Card" marking the visit of the DNA-binding protein to that location. A unique sequence "barcode" in the transposon matches it to the DNA-binding protein that directed its insertion. The sequences of the DNA flanking the transposon (which reveal where in the genome the transposon landed) and the barcode within the transposon (which identifies the TF that put it there) are determined by massively parallel DNA sequencing. To demonstrate the method's feasibility, we determined the genomic targets of eight transcription factors in a single experiment. The Calling Card method promises to significantly reduce the cost and labor needed to determine the genomic targets of many transcription factors in different environmental conditions and genetic backgrounds.
Genome Research 04/2011; 21(5):748-55. · 14.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: The ability to design and engineer organisms demands the ability to predict kinetic responses of novel regulatory networks built from well-characterized biological components. Surprisingly, few validated kinetic models of complex regulatory networks have been derived by combining models of the network components. A major bottleneck in producing such models is the difficulty of measuring in vivo rate constants for components of complex networks. We demonstrate that a simple, genetic approach to measuring rate constants in vivo produces an accurate kinetic model of the complex network that Saccharomyces cerevisiae employs to regulate the expression of genes encoding glucose transporters. The model predicts a transient pulse of transcription of HXT4 (but not HXT2 or HXT3) in response to addition of a small amount of glucose to cells, an outcome we observed experimentally. Our model also provides a mechanistic explanation for this result: HXT2-4 are governed by a type 2, incoherent feed forward regulatory loop involving the Rgt1 and Mig2 transcriptional repressors. The efficiency with which Rgt1 and Mig2 repress expression of each HXT gene determines which of them have a pulse of transcription in response to glucose. Finally, the model correctly predicts how lesions in the feed forward loop change the kinetics of induction of HXT4 expression.
Proceedings of the National Academy of Sciences 09/2010; 107(38):16743-8. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: Local adaptations within species are often governed by several interacting genes scattered throughout the genome. Single-locus models of selection cannot explain the maintenance of such complex variation because recombination separates co-adapted alleles. Here we report a previously unrecognized type of intraspecific multi-locus genetic variation that has been maintained over a vast period. The galactose (GAL) utilization gene network of Saccharomyces kudriavzevii, a relative of brewer's yeast, exists in two distinct states: a functional gene network in Portuguese strains and, in Japanese strains, a non-functional gene network of allelic pseudogenes. Genome sequencing of all available S. kudriavzevii strains revealed that none of the functional GAL genes were acquired from other species. Rather, these polymorphisms have been maintained for nearly the entire history of the species, despite more recent gene flow genome-wide. Experimental evidence suggests that inactivation of the GAL3 and GAL80 regulatory genes facilitated the origin and long-term maintenance of the two gene network states. This striking example of a balanced unlinked gene network polymorphism introduces a remarkable type of intraspecific variation that may be widespread.
[show abstract][hide abstract] ABSTRACT: Assembling the tree of life is a major goal of biology, but progress has been hindered by the difficulty and expense of obtaining the orthologous DNA required for accurate and fully resolved phylogenies. Next-generation DNA sequencing technologies promise to accelerate progress, but sequencing the genomes of hundreds of thousands of eukaryotic species remains impractical. Eukaryotic transcriptomes, which are smaller than genomes and biased toward highly expressed genes that tend to be conserved, could potentially provide a rich set of phylogenetic characters. We sampled the transcriptomes of 10 mosquito species by assembling 36-bp sequence reads into phylogenomic data matrices containing hundreds of thousands of orthologous nucleotides from hundreds of genes. Analysis of these data matrices yielded robust phylogenetic inferences, even with data matrices constructed from surprisingly few sequence reads. This approach is more efficient, data-rich, and economical than traditional PCR-based and EST-based methods and provides a scalable strategy for generating phylogenomic data matrices to infer the branches and twigs of the tree of life.
Proceedings of the National Academy of Sciences 01/2010; 107(4):1476-81. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: Next-generation sequencing has opened the door to genomic analysis of nonmodel organisms. Technologies generating long-sequence reads (200-400 bp) are increasingly used in evolutionary studies of nonmodel organisms, but the short-sequence reads (30-50 bp) that can be produced at lower cost are thought to be of limited utility for de novo sequencing applications. Here, we tested this assumption by short-read sequencing the transcriptomes of the tropical disease vectors Aedes aegypti and Anopheles gambiae, for which complete genome sequences are available. Comparison of our results to the reference genomes allowed us to accurately evaluate the quantity, quality, and functional and evolutionary information content of our "test" data. We produced more than 0.7 billion nucleotides of sequenced data per species that assembled into more than 21,000 test contigs larger than 100 bp per species and covered approximately 27% of the Aedes reference transcriptome. Remarkably, the substitution error rate in the test contigs was approximately 0.25% per site, with very few indels or assembly errors. Test contigs of both species were enriched for genes involved in energy production and protein synthesis and underrepresented in genes involved in transcription and differentiation. Ortholog prediction using the test contigs was accurate across hundreds of millions of years of evolution. Our results demonstrate the considerable utility of short-read transcriptome sequencing for genomic studies of nonmodel organisms and suggest an approach for assessing the information content of next-generation data for evolutionary studies.
Molecular Biology and Evolution 09/2009; 26(12):2731-44. · 10.35 Impact Factor
[show abstract][hide abstract] ABSTRACT: Efficient uptake of glucose is especially critical to Saccharomyces cerevisiae because its preference to ferment this carbon source demands high flux through glycolysis. Glucose induces expression of HXT genes encoding hexose transporters through a signal generated by the Snf3 and Rgt2 glucose sensors that leads to depletion of the transcriptional regulators Mth1 and Std1. These paralogous proteins bind to Rgt1 and enable it to repress expression of HXT genes. Here we show that Mth1 and Std1 can substitute for one another and provide nearly normal regulation of their targets. However, their roles in the glucose signal transduction cascade have diverged significantly. Mth1 is the prominent effector of Rgt1 function because it is the more abundant of the two paralogs under conditions in which both are active (in the absence of glucose). Moreover, the cellular level of Mth1 is quite sensitive to the amount of available glucose. The abundance of Std1 protein, on the other hand, remains essentially constant over a similar range of glucose concentrations. The signal generated by low levels of glucose is amplified by rapid depletion of Mth1; the velocity of this depletion is dependent on both its rate of degradation and swift repression of MTH1 transcription by the Snf1-Mig1 glucose repression pathway. Quantitation of the contributions of Mth1 and Std1 to regulation of HXT expression reveals the unique roles played by each paralog in integrating nutrient availability with metabolic capacity: Mth1 is the primary regulator; Std1 serves to buffer the response to glucose.
Journal of Biological Chemistry 09/2009; 284(43):29635-43. · 4.65 Impact Factor
[show abstract][hide abstract] ABSTRACT: The 11.3-Mb genome of the yeast Lachancea (Saccharomyces) kluyveri displays an intriguing compositional heterogeneity: a region of approximately 1 Mb, covering almost the whole left arm of chromosome C (C-left), has an average GC content of 52.9%, which is significantly higher than the 40.4% global GC content of the rest of the genome. This region contains the MAT locus, which remains normal in composition. The excess of GC base pairs affects both coding and noncoding sequences, and thus is not due to selective pressure acting on protein sequences. It leads to a strong codon usage bias and alters the amino acid composition of the 457 proteins encoded on C-left that do not show obvious bias for functional categories, or the presence of paralogs or orthologs of essential genes of Saccharomyces cerevisiae. They share significant synteny conservation with other species of the Saccharomycetaceae, and phylogenetic analysis indicates that C-left originates from a Lachancea species. In contrast, there is a complete absence of transposable elements in C-left, whereas 18 elements per megabase are distributed across the rest of the genome. Comparative hybridization of synchronized cells using high-density genome arrays reveals that C-left is replicated later during S phase than the rest of the genome. Two possible primary causes of this major compositional heterogeneity are discussed: an ancient hybridization of two related species with very distinct GC composition, or an intrinsic mechanism, possibly associated with the loss of the silent cassettes from C-left that progressively increased the GC content and generated the delayed replication of this chromosomal arm.
Genome Research 08/2009; 19(10):1710-21. · 14.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: Our knowledge of yeast genomes remains largely dominated by the extensive studies on Saccharomyces cerevisiae and the consequences of its ancestral duplication, leaving the evolution of the entire class of hemiascomycetes only partly explored. We concentrate here on five species of Saccharomycetaceae, a large subdivision of hemiascomycetes, that we call "protoploid" because they diverged from the S. cerevisiae lineage prior to its genome duplication. We determined the complete genome sequences of three of these species: Kluyveromyces (Lachancea) thermotolerans and Saccharomyces (Lachancea) kluyveri (two members of the newly described Lachancea clade), and Zygosaccharomyces rouxii. We included in our comparisons the previously available sequences of Kluyveromyces lactis and Ashbya (Eremothecium) gossypii. Despite their broad evolutionary range and significant individual variations in each lineage, the five protoploid Saccharomycetaceae share a core repertoire of approximately 3300 protein families and a high degree of conserved synteny. Synteny blocks were used to define gene orthology and to infer ancestors. Far from representing minimal genomes without redundancy, the five protoploid yeasts contain numerous copies of paralogous genes, either dispersed or in tandem arrays, that, altogether, constitute a third of each genome. Ancient, conserved paralogs as well as novel, lineage-specific paralogs were identified.
Genome Research 07/2009; 19(10):1696-709. · 14.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: S. cerevisiae senses glucose and galactose differently. Glucose is detected through sensors that reside in the cellular plasma membrane. When activated, the sensors initiate a signal-transduction cascade that ultimately inactivates the Rgt1 transcriptional repressor by causing degradation of its corepressors Mth1 and Std1. This results in the expression of many HXT genes encoding glucose transporters. The ensuing flood of glucose into the cell activates Mig1, a transcriptional repressor that mediates "glucose repression" of many genes, including the GAL genes; hence, glucose sensing hinders galactose utilization. Galactose is sensed in the cytoplasm via Gal3. Upon binding galactose (and ATP), Gal3 sequesters the Gal80 protein, thereby emancipating the Gal4 transcriptional activator of the GAL genes. Gal4 also activates expression of MTH1, encoding a corepressor critical for Rgt1 function. Thus, galactose inhibits glucose assimilation by encouraging repression of HXT genes. C. albicans senses glucose similarly to S. cerevisiae but does not sense galactose through Gal3-Gal80-Gal4. Its genome harbors no GAL80 ortholog, and the severely truncated CaGal4 does not regulate CaGAL genes. We present evidence that C. albicans senses galactose with its Hgt4 glucose sensor, a capability that is enabled by transcriptional "rewiring" of its sugar-sensing signal-transduction pathways. We suggest that galactose sensing through Hgt4 is ancestral in fungi.
Current biology: CB 03/2009; 19(5):436-41. · 10.99 Impact Factor
[show abstract][hide abstract] ABSTRACT: We present a protocol for a novel method for identifying the targets of DNA-binding proteins in the genome of the yeast Saccharomyces cerevisiae. This is accomplished by engineering a DNA-binding protein so that it leaves behind in the genome a permanent mark -- a 'calling card' -- that provides a record of that protein's visit to that region of the genome. The calling card is the yeast Ty5 retrotransposon, whose integrase interacts with the Sir4 protein. If Sir4 is fused to a DNA-binding protein, it recruits the Ty5 integrase, which directs insertion of a Ty5 calling card into the genome. The calling card along with the flanking genomic DNA is harvested by inverse PCR and its genomic location is determined by hybridization of the product to a DNA microarray. This method provides a straightforward alternative to the 'ChIP-chip' method for determining the targets of DNA-binding proteins. This protocol takes approximately 2 weeks to complete.
[show abstract][hide abstract] ABSTRACT: The ability of the fungal pathogen Candida albicans to cause systemic infections depends in part on the function of Hgt4, a cell surface sugar sensor. The orthologues of Hgt4 in Saccharomyces cerevisiae, Snf3 and Rgt2, initiate a signalling cascade that inactivates Rgt1, a transcriptional repressor of genes encoding hexose transporters. To determine whether Hgt4 functions similarly through the C. albicans orthologue of Rgt1, we analysed Cargt1 deletion mutants. We found that Cargt1 mutants are sensitive to the glucose analogue 2-deoxyglucose, a phenotype probably due to uncontrolled expression of genes encoding glucose transporters. Indeed, transcriptional profiling revealed that expression of about two dozen genes, including multiple HGT genes encoding hexose transporters, is increased in the Cargt1 mutant in the absence of sugars, suggesting that CaRgt1 represses expression of several HGT genes under this condition. Some of the HGT genes (probably encoding high-affinity transporters) are also repressed by high levels of glucose, and we show that this repression is mediated by CaMig1, the orthologue of the major glucose-activated repressor in S. cerevisiae, but not by its paralogue CaMig2. Therefore, CaRgt1 and CaMig1 collaborate to control expression of C. albicans hexose transporters in response to different levels of sugars. We were surprised to find that CaRgt1 also regulates expression of GAL1, suggesting that regulation of galactose metabolism in C. albicans is unconventional. Finally, Cargt1 mutations cause cells to hyperfilament, and suppress the hypofilamented phenotype of an hgt4 mutant, indicating that the Hgt4 glucose sensor may affect filamentation by modulating sugar import and metabolism via CaRgt1.
[show abstract][hide abstract] ABSTRACT: Identifying genomic targets of transcription factors is fundamental for understanding transcriptional regulatory networks. Current technology enables identification of all targets of a single transcription factor, but there is no realistic way to achieve the converse: identification of all proteins that bind to a promoter of interest. We have developed a method that promises to fill this void. It employs the yeast retrotransposon Ty5, whose integrase interacts with the Sir4 protein. A DNA-binding protein fused to Sir4 directs insertion of Ty5 into the genome near where it binds; the Ty5 becomes a "calling card" the DNA-binding protein leaves behind in the genome. We constructed customized calling cards for seven transcription factors of yeast by including in each Ty5 a unique DNA sequence that serves as a "molecular bar code." Ty5 transposition was induced in a population of yeast cells, each expressing a different transcription factor-Sir4 fusion and its matched, bar-coded Ty5, and the calling cards deposited into selected regions of the genome were identified, revealing the transcription factors that visited that region of the genome. In each region we analyzed, we found calling cards for only the proteins known to bind there: In the GAL1-10 promoter we found only calling cards for Gal4; in the HIS4 promoter we found only Gcn4 calling cards; in the PHO5 promoter we found only Pho4 and Pho2 calling cards. We discuss how Ty5 calling cards might be implemented for mapping all targets of all transcription factors in a single experiment.
Genome Research 09/2007; 17(8):1202-9. · 14.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: The Hgt4 protein of Candida albicans (orf19.5962) is orthologous to the Snf3 and Rgt2 glucose sensors of Saccharomyces cerevisiae that govern sugar acquisition by regulating the expression of genes encoding hexose transporters. We found that HGT4 is required for glucose induction of the expression of HGT12, HXT10, and HGT7, which encode apparent hexose transporters in C. albicans. An hgt4Delta mutant is defective for growth on fermentable sugars, which is consistent with the idea that Hgt4 is a sensor of glucose and similar sugars. Hgt4 appears to be sensitive to glucose levels similar to those in human serum ( approximately 5 mM). HGT4 expression is repressed by high levels of glucose, which is consistent with the idea that it encodes a high-affinity sugar sensor. Glucose sensing through Hgt4 affects the yeast-to-hyphal morphological switch of C. albicans cells: hgt4Delta mutants are hypofilamented, and a constitutively signaling form of Hgt4 confers hyperfilamentation of cells. The hgt4Delta mutant is less virulent than wild-type cells in a mouse model of disseminated candidiasis. These results suggest that Hgt4 is a high-affinity glucose sensor that contributes to the virulence of C. albicans.
[show abstract][hide abstract] ABSTRACT: The yeast Saccharomyces cerevisiae deploys two different types of glucose sensors on its cell surface that operate in distinct glucose signaling pathways: the glucose transporter-like Snf3 and Rgt2 proteins and the Gpr1 receptor that is coupled to Gpa2, a G-protein alpha subunit. The ultimate target of the Snf3/Rgt2 pathway is Rgt1, a transcription factor that regulates expression of HXT genes encoding glucose transporters. We have found that the cAMP-dependent protein kinase A (PKA), which is activated by the Gpr1/Gpa2 glucose-sensing pathway and by a glucose-sensing pathway that works through Ras1 and Ras2, catalyzes phosphorylation of Rgt1 and regulates its function. Rgt1 is phosphorylated in vitro by all three isoforms of PKA, and this requires several serine residues located in PKA consensus sequences within Rgt1. PKA and the consensus serine residues of Rgt1 are required for glucose-induced removal of Rgt1 from the HXT promoters and for induction of HXT expression. Conversely, overexpression of the TPK genes led to constitutive expression of the HXT genes. The PKA consensus phosphorylation sites of Rgt1 are required for an intramolecular interaction that is thought to regulate its DNA binding activity. Thus, two different glucose signal transduction pathways converge on Rgt1 to regulate expression of glucose transporters.
Journal of Biological Chemistry 10/2006; 281(36):26144-9. · 4.65 Impact Factor
[show abstract][hide abstract] ABSTRACT: Analyses of whole-genome sequences and experimental data sets have revealed a large number of DNA sequence motifs that are conserved in many species and may be functional. However, methods of sufficient scale to explore the roles of these elements are lacking. We describe the use of protein arrays to identify proteins that bind to DNA sequences of interest. A microarray of 282 known and potential yeast transcription factors was produced and probed with oligonucleotides of evolutionarily conserved sequences that are potentially functional. Transcription factors that bound to specific DNA sequences were identified. One previously uncharacterized DNA-binding protein, Yjl103, was characterized in detail. We defined the binding site for this protein and identified a number of its target genes, many of which are involved in stress response and oxidative phosphorylation. Protein microarrays offer a high-throughput method for determining DNA-protein interactions.
Proceedings of the National Academy of Sciences 07/2006; 103(26):9940-5. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: Fundamental biological knowledge and the technology to acquire it have been immeasurably advanced by past efforts to understand and manipulate the genomes of model organisms. Has the utility of bacteria, yeast, worms, flies, mice, plants, and other models now peaked and are humans poised to become the model organism of the future? The Genetics Society of America recently convened its 2006 meeting entitled "Genetic Analysis: Model Organisms to Human Biology" to examine the future role of genetic research. (Because of time limitations, the meeting was unable to cover the substantial contributions and future potential of research on model prokaryotic organisms.) In fact, the potential of model-organism-based studies has grown substantially in recent years. The genomics revolution has revealed an underlying unity between the cells and tissues of eukaryotic organisms from yeast to humans. No uniquely human biological mechanisms have yet come to light. This common evolutionary heritage makes it possible to use genetically tractable organisms to model important aspects of human medical disorders such as cancer, birth defects, neurological dysfunction, reproductive failure, malnutrition, and aging in systems amenable to rapid and powerful experimentation. Applying model systems in this way will allow us to identify common genes, proteins, and processes that underlie human medical conditions. It will allow us to systematically decipher the gene-gene and gene-environment interactions that influence complex multigenic disorders. Above all, disease models have the potential to address a growing gap between our ability to collect human genetic data and to productively interpret and apply it. If model organism research is supported with these goals in mind, we can look forward to diagnosing and treating human disease using information from multiple systems and to a medical science built on the unified history of life on earth.