ArticlePDF AvailableLiterature Review

Life with 6000 Genes

Authors:

Abstract

The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration. The sequence of 12,068 kilobases defines 5885 potential protein-encoding genes, approximately 140 genes specifying ribosomal RNA, 40 genes for small nuclear RNA molecules, and 275 transfer RNA genes. In addition, the complete sequence provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history. The genome shows a considerable amount of apparent genetic redundancy, and one of the major problems to be tackled during the next stage of the yeast genome project is to elucidate the biological functions of all of these genes.
... Yeast remains by far the most complete interaction dataset with data covering almost all of its ~6600 genes 7 [57]. Data coverage in yeast has levelled off since our initial study [8], in particular the fluctuations in protein coverage seen prior to 2010 are no longer observed, reflecting BioGRID's efforts to improve and standardise curation methods [9,58]. ...
Article
Full-text available
Background Probabilistic functional integrated networks (PFINs) are designed to aid our understanding of cellular biology and can be used to generate testable hypotheses about protein function. PFINs are generally created by scoring the quality of interaction datasets against a Gold Standard dataset, usually chosen from a separate high-quality data source, prior to their integration. Use of an external Gold Standard has several drawbacks, including data redundancy, data loss and the need for identifier mapping, which can complicate the network build and impact on PFIN performance. Additionally, there typically are no Gold Standard data for non-model organisms. Results We describe the development of an integration technique, ssNet, that scores and integrates both high-throughput and low-throughout data from a single source database in a consistent manner without the need for an external Gold Standard dataset. Using data from Saccharomyces cerevisiae we show that ssNet is easier and faster, overcoming the challenges of data redundancy, Gold Standard bias and ID mapping. In addition ssNet results in less loss of data and produces a more complete network. Conclusions The ssNet method allows PFINs to be built successfully from a single database, while producing comparable network performance to networks scored using an external Gold Standard source and with reduced data loss.
... Since its development in the late 1970s, DNA sequencing has become one of the most pivotal tools in biomedical research [1]. It initially facilitated the sequencing of whole genomes of phages in the late 1980s [2] followed by several prokaryotic organisms and the first eukaryote, i.e. the yeast Saccharomyces cerevisiae in the mid-2000s [3]. Multicellular eukaryotic genome sequencing was achieved soon later starting with the roundworm Caenorhabditis elegans [4] and the plant Arabidopsis thaliana [5]. ...
Article
Full-text available
Applications in omics research, such as comparative transcriptomics and proteomics, require the knowledge of the species-specific gene sequence and benefit from a comprehensive high-quality annotation of the coding genes to achieve high coverage. While protein-coding genes can in simple cases be detected by scanning the genome for open reading frames, in more complex genomes exonic sequences are separated by introns. Despite advances in sequencing technologies that allow for ever-growing numbers of genomes, the quality of many of the provided genome assemblies do not reach reference quality. These non-contiguous assemblies with gaps and the necessity to predict splice sites limit accurate gene annotation from solely genomic data. In contrast, the transcriptome only contains transcribed gene regions, is devoid of introns and thus provides the optimal basis for the identification of open reading frames. The additional integration of proteomics data to validate predicted protein-coding genes further enriches for accurate gene models. This review outlines the principles of the proteotranscriptomics approach, discusses common challenges and suggests methods for improvement.
... In the following decade, estimations based on the consensus human genome revised this number to ~26,000 protein-coding genes, which has now shrunk to ~20000 upon refined analysis of the complete genome [30,31]. From an anterior perspective, this low number of human genes was unexpected when compared with the ~6000 genes that had been predicted from the genome of Saccharomyces cerevisiae (yeast) [32]. The question at that time was how this relatively small difference in the number of protein-coding genes may account for the incredible gap of complexity between yeast and human. ...
Article
Full-text available
In Eukarya, immature mRNA transcripts (pre-mRNA) often contain coding sequences, or exons, inter-leaved by non-coding sequences, or introns. Introns are removed upon splicing, and further regulation of the retained exons leads to alternatively spliced mRNA. The splicing reaction requires the stepwise assembly of the spliceosome, a macromolecular machine composed of small nuclear ribonucleoproteins (snRNPs). This review focuses on the early stage of spliceosome assembly, when U1 snRNP defines each intron 5'-splice site (5ʹss) in the pre-mRNA. We first introduce the splicing reaction and the impact of alternative splicing on gene expression regulation. Thereafter, we extensively discuss splicing descriptors that influence the 5ʹss selection by U1 snRNP, such as sequence determinants, and interactions mediated by U1-specific proteins or U1 small nuclear RNA (U1 snRNA). We also include examples of diseases that affect the 5ʹss selection by U1 snRNP, and discuss recent therapeutic advances that manipulate U1 snRNP 5ʹss selectivity with antisense oligonucleotides and small-molecule splicing switches.
Article
Next-generation sequencing (NGS) technologies have greatly expanded the size of the known transcriptome. Many newly discovered transcripts are classified as long noncoding RNAs (lncRNAs) which are assumed to affect phenotype through sequence and structure and not via translated protein products despite the vast majority of them harboring short open reading frames (sORFs). Recent advances have demonstrated that the noncoding designation is incorrect in many cases and that sORF-encoded peptides (SEPs) translated from these transcripts are important contributors to diverse biological processes. Interest in SEPs is at an early stage and there is evidence for the existence of thousands of SEPs that are yet unstudied. We hope to pique interest in investigating this unexplored proteome by providing a discussion of SEP characterization generally and describing specific discoveries in innate immunity.
Article
The manufacturing of sherry wines is a unique, carefully regulated process, from harvesting to quality control of the finished product, involving dynamic biological aging in a “criadera-solera” system or some other techniques. Specialized “flor” strains of the yeast Saccharomyces cerevisiae play the central role in the sherry manufacturing process. As a result, sherry wines have a characteristic and unique chemical composition that determines their organoleptic properties (such as color, odor, and taste) and distinguishes them from all other types of wine. The use of modern methods of genetics and biotechnology contributes to a deep understanding of the microbiology of sherry production and allows us to define a new methodology for breeding valuable flor strains. This review discusses the main sherry-producing regions and the chemical composition of sherry wines, as well as genetic, oenological, and other selective markers for flor strains that can be used for screening novel candidates that are promising for sherry production among environmental isolates.
Article
Tetrahydropapaverine (THP) and papaverine are plant natural products with clinically significant roles. THP is a precursor in the production of the drugs atracurium and cisatracurium, and papaverine is used as an antispasmodic during vascular surgery. In recent years, metabolic engineering advances have enabled the production of natural products through heterologous expression of pathway enzymes in yeast. Heterologous biosynthesis of THP and papaverine could play a role in ensuring a stable supply of these clinically significant products. Biosynthesis of THP and papaverine has not been achieved to date, in part because multiple pathway enzymes have not been elucidated. Here, we describe the development of an engineered yeast strain for de novo biosynthesis of THP. The production of THP is achieved through heterologous expression of two enzyme variants with activity on nonnative substrates. Through protein engineering, we developed a variant of N -methylcoclaurine hydroxylase with activity on coclaurine, enabling de novo norreticuline biosynthesis. Similarly, we developed a variant of scoulerine 9- O -methyltransferase capable of O -methylating 1-benzylisoquinoline alkaloids at the 3′ position, enabling de novo THP biosynthesis. Flux through the heterologous pathway was improved by knocking out yeast multidrug resistance transporters and optimization of media conditions. Overall, strain engineering increased the concentration of biosynthesized THP 600-fold to 121 µg/L. Finally, we demonstrate a strategy for papaverine semisynthesis using hydrogen peroxide as an oxidizing agent. Through optimizing pH, temperature, reaction time, and oxidizing agent concentration, we demonstrated the ability to produce semisynthesized papaverine through oxidation of biosynthesized THP.
Article
Full-text available
Leukodystrophies are a broad spectrum of neurological disorders that are characterized primarily by deficiencies in myelin formation. Clinical manifestations of leukodystrophies usually appear during childhood and common symptoms include lack of motor coordination, difficulty with or loss of ambulation, issues with vision and/or hearing, cognitive decline, regression in speech skills, and even seizures. Many cases of leukodystrophy can be attributed to genetic mutations, but they have diverse inheritance patterns (e.g., autosomal recessive, autosomal dominant, or X-linked) and some arise from de novo mutations. In this review, we provide an updated overview of 35 types of leukodystrophies and focus on cellular mechanisms that may underlie these disorders. We find common themes in specialized functions in oligodendrocytes, which are specialized producers of membranes and myelin lipids. These mechanisms include myelin protein defects, lipid processing and peroxisome dysfunction, transcriptional and translational dysregulation, disruptions in cytoskeletal organization, and cell junction defects. In addition, non-cell-autonomous factors in astrocytes and microglia, such as autoimmune reactivity, and intercellular communication, may also play a role in leukodystrophy onset. We hope that highlighting these themes in cellular dysfunction in leukodystrophies may yield conceptual insights on future therapeutic approaches.
Article
Data-independent acquisition (DIA) methods have become increasingly attractive in mass spectrometry (MS)-based proteomics, because they enable high data completeness and a wide dynamic range. Recently, we combined DIA with parallel accumulation – serial fragmentation (dia-PASEF) on a Bruker trapped ion mobility separated (TIMS) quadrupole time-of-flight (TOF) mass spectrometer. This requires alignment of the ion mobility separation with the downstream mass selective quadrupole, leading to a more complex scheme for dia-PASEF window placement compared to DIA. To achieve high data completeness and deep proteome coverage, here we employ variable isolation windows that are placed optimally depending on precursor density in the m/z and ion mobility plane. This Automatic Isolation Design procedure is implemented in the freely available py_diAID package. In combination with in-depth project-specific proteomics libraries and the Evosep LC system, we reproducibly identified over 7,700 proteins in a human cancer cell line in 44 minutes with quadruplicate single-shot injections at high sensitivity. Even at a throughput of 100 samples per day (11 minutes LC gradients), we consistently quantified more than 6,000 proteins in mammalian cell lysates by injecting four replicates. We found that optimal dia-PASEF window placement facilitates in-depth phosphoproteomics with very high sensitivity, quantifying more than 35,000 phosphosites in a human cancer cell line stimulated with an epidermal growth factor (EGF) in triplicate 21 minutes runs. This covers a substantial part of the regulated phosphoproteome with high sensitivity, opening up for extensive systems-biological studies.
Article
Full-text available
In the yeast Saccharomyces cerevisiae, trehalose-6-phospahte synthase (Tps1) and trehalose-6-phosphate phosphatase (Tps2) are the main proteins catalyzing intracellular trehalose production. In addition to Tps1 and Tps2, two putative regulatory proteins with less clearly defined roles also appear to be involved with trehalose production, Tps3 and Tsl1. While this pathway has been extensively studied in laboratory strains of S. cerevisiae, we sought to examine the phenotypic consequences of disrupting these genes in wild strains. Here we deleted the TPS1, TPS2, TPS3 and TSL1 genes in four wild strains and one laboratory strain for comparison. Although some tested phenotypes were not shared between all strains, deletion of TPS1 abolished intracellular trehalose, caused inability to grow on fermentable carbon sources and resulted in severe sporulation deficiency for all five strains. After examining tps1 mutant strains expressing catalytically inactive variants of Tps1, our results indicate that Tps1, independent of trehalose production, is a key component for yeast survival in response to heat stress, for regulating sporulation, and growth on fermentable sugars. All tps2Δ mutants exhibited growth impairment on non-fermentable carbon sources, whereas variations were observed in trehalose synthesis, thermosensitivity and sporulation efficiency. tps3Δ and tsl1Δ mutants exhibited mild or no phenotypic disparity from their isogenic wild type although double mutants tps3Δ tsl1Δ decreased the amount of intracellular trehalose production in all five strains by 17% - 45%. Altogether, we evaluated, confirmed, and expanded the phenotypic characteristics associated trehalose biosynthesis mutants. We also identified natural phenotypic variants in multiple strains that could be used to genetically dissect the basis of these traits and then develop mechanistic models connecting trehalose metabolism to diverse cellular processes.
Article
Eukaryotic genomes vary in terms of size, chromosome number, and genetic complexity. Their temporal organization is complex, reflecting coordination between DNA folding and function. Here, we used fused karyotypes of budding yeast to characterize the effects of chromosome length on nuclear architecture. We found that size-matched megachromosomes expand to occupy a larger fraction of the enlarged nucleus. Hi-C maps reveal changes in the three-dimensional structure corresponding to inactivated centromeres and telomeres. De-clustering of inactive centromeres results in their loss of early replication, highlighting a functional correlation between genome organization and replication timing. Repositioning of former telomere-proximal regions on chromosome arms exposed a subset of contacts between flocculin genes. Chromatin reorganization of megachromosomes during cell division remained unperturbed, and it revealed that centromere-rDNA contacts in anaphase, extending over 0.3 Mb on wild-type chromosome, cannot exceed ∼1.7 Mb. Our results highlight the relevance of engineered karyotypes to unveiling relationships between genome organization and function.
Article
Full-text available
The entire DNA sequence of chromosome III of the yeast Saccharomyces cerevisiae has been determined. This is the first complete sequence analysis of an entire chromosome from any organism. The 315-kilobase sequence reveals 182 open reading frames for proteins longer than 100 amino acids, of which 37 correspond to known genes and 29 more show some similarity to sequences in databases. Of 55 new open reading frames analysed by gene disruption, three are essential genes; of 42 non-essential genes that were tested, 14 show some discernible effect on phenotype and the remaining 28 have no overt function.
Article
A physical map of the Saccharomyces cerevisiae genome is presented. It was derived by mapping the sites for two restriction endonucleases, SfiI and NotI, each of which recognizes an 8-bp sequence. DNA-DNA hybridization probes for genetically mapped genes and probes that span particular SfiI and NotI sites were used to construct a map that contains 131 physical landmarks--32 chromosome ends, 61 SfiI sites and 38 NotI sites. These landmarks are distributed throughout the non-rDNA component of the yeast genome, which comprises 12.5 Mbp of DNA. The physical map suggests that those genes that can be detected and mapped by standard genetic methods are distributed rather uniformly over the full physical extent of the yeast genome. The map has immediate applications to the mapping of genes for which single-copy DNA-DNA hybridization probes are available.
Article
We have constructed a genealogy of strain S288C, from which many of the mutant and segregant strains currently used in studies on the genetics and molecular biology of Saccharomyces cerevisiae have been derived. We have determined that its six progenitor strains were EM93, EM126, NRRL YB-210 and the three baking strains Yeast Foam, FLD and LK. We have estimated that approximately 88% of the gene pool of S288C is contributed by strain EM93. The principal ancestral genotypes were those of segregant strains EM93-1C and EM93-3B, initially distributed by C. C. Lindegren to several laboratories. We have analyzed an isolate of a lyophilized culture of strain EM93 and determined its genotype as MAT a/MAT? SUC2/SUC2 GAL2/gal2 MAL/MAL mel/mel CUP1/cup1 FLO1/flo1. Strain EM93 is therefore the probable origin of genes SUC2, gal2, CUP1 and flo1 of S288C. We give details of the current availability of several of the progenitor strains and propose that this genealogy should be of assistance in elucidating the origins of several types of genetic and molecular heterogeneities in Saccharomyces.
Article
The opportunistic fungal pathogen, Candida albicans, is diploid as usually isolated and has no apparent sexual cycle. Genetic analysis has therefore been very difficult. Molecular genetics has yielded important information in the past few years, but it too is hampered by the lack of a good genetic map. Using the well-characterized strain 1006 and strain WO-1, which undergoes the white-opaque phenotypic transition, we have developed a genomic restriction map of C. albicans with the enzyme SfiI. There are approximately 34 SfiI restriction sites in the C. albicans genome. Restriction fragments were separated by pulsed-field electrophoresis and were assigned to chromosomes by hybridization of complete and partial digests with known chromosome-specific probes as well as by digestion of isolated chromosomes. Telomeric fragments were identified by hybridization with a telomere-specific probe (C. Sadhu, M.J. McEachern, E.P. Rustchenko-Bulgac, J. Schmid, D.R. Soll, and J.B. Hicks, J. Bacteriol. 173:842-850, 1991). WO-1 differs from 1006 in that it has undergone three reciprocal chromosomal translocations. Analysis of the translocation products indicates that each translocation has occurred at or near an SfiI site; thus, the SfiI fragments from the two strains are similar or identical. The tendency for translocation to occur at or near SfiI sites may be related to the repeated sequence RPS 1, which contains four such sites and could provide homology for ectopic pairing and crossing over. The genome size of both strains is about 16 to 17 megabases, in good agreement with previous determinations.
Article
Separation and identification of proteins by two-dimensional (2-D) electrophoresis can be used for protein-based gene expression analysis. In this report single protein spots, from polyvinylidene difluoride blots of micropreparative E. coli 2-D gels, were rapidly and economically identified by matching their amino acid composition, estimated pI and molecular weight against all E. coli entries in the SWISS-PROT database. Thirty proteins from an E. coli 2-D map were analyzed and identities assigned. Three of the proteins were unknown. By protein sequencing analysis, 20 of the 27 proteins were correctly identified. Importantly, correct identifications showed unambiguous "correct" score patterns. While incorrect protein identifications also showed distinctive score patterns, indicating that protein must be identified by other means. These techniques allow large-scale screening of the protein complement of simple organisms, or tissues in normal and disease states. The computer program described here is accessible via the World Wide Web at URL address (http://expasy.hcuge.ch/).
Article
Yeast chromosome ends are similar in structure and function to chromosome ends in most, if not all, eukaryotic organisms. There is a G-rich terminal repeat at the ends which is maintained by telomerase. In addition to the classical functions of protecting the end from degradation and end-to-end fusions, and completing replication, yeast telomeres have several interesting properties including: non-nucleosomal chromatin structure; transcriptional position effect variegation for genes with adjacent telomeres; nuclear peripheral localization; apparent physical clustering; non-random recombinational interactions. A number of genes have been identified that are involved in modifying one or more of these properties. These include genes involved in general DNA metabolism, chromatin structure and telomere maintenance. Adjacent to the terminal repeat is a mosaic of middle repetitive elements that exhibit a great deal of polymorphism both between individual strains and among different chromosome ends. Much of the sequence redundancy in the yeast genome is found in the sub-telomeric regions (within the last 25 kb of each end). The sub-telomeric regions are generally low in gene density, low in transcription, low in recombination, and they are late replicating. The only element which appears to be shared by all chromosome ends is part of the previously defined X element containing an ARS consensus. Most of the ‘core’ X elements also contain an Abf1p binding site and a URS1-like element, which may have consequences for the chromatin structure, nuclear architecture and transcription of native telomeres. Possible functions of sub-telomeric repeats include: fillers for increasing chromosome size to some minimum threshold level necessary for chromosome stability; barrier against transcriptional silencing; a suitable region for adaptive amplification of genes; secondary mechanism of telomere maintenance via recombination when telomerase activity is absent.
Article
We have tested the clones used in the European Yeast Chromosome III Sequencing Programme for possible artefacts that might have been introduced during cloning or passage through Escherichia coli. Southern analysis was performed to compare the BamHI, EcoRI, HindIII and PstI restriction pattern for each clone with that of the corresponding locus on chromosome III in the parental yeast strain. In addition, further enzymes were used to compare the restriction maps of most clones with the map predicted by the nucleotide sequence (Oliver et al., 1992). Only four of 506 6-bp restriction sites predicted by the sequence were not observed experimentally. No significant cloning artefacts appear to disrupt the published sequence of chromosome III. The restriction patterns of six yeast strains have also been compared. In addition to two previously identified sites of Ty integration on chromosome III (Warmington et al., 1986; Stucka et al., 1989; Newlon et al., 1991), a new polymorphic site involving Ty retrotransposition (the Far Right-Arm transposition Hot-Spot, FRAHS) has been identified close to CRY1. On the basis of simple restriction polymorphisms, the strains S288C, AB972 and W303-1b are closely related, while XJ24-24a and J178 are more distant relatives of S288C. A polyploid distillery yeast is heterozygous for many polymorphisms, particularly on the right arm of the chromosome.