[Show abstract][Hide abstract] ABSTRACT: Phenotypes that might otherwise reveal a gene's function can be obscured by genes with overlapping function. This phenomenon is best known within gene families, in which an important shared function may only be revealed by mutating all family members. Here we describe the 'green monster' technology that enables precise deletion of many genes. In this method, a population of deletion strains with each deletion marked by an inducible green fluorescent protein reporter gene, is subjected to repeated rounds of mating, meiosis and flow-cytometric enrichment. This results in the aggregation of multiple deletion loci in single cells. The green monster strategy is potentially applicable to assembling other engineered alterations in any species with sex or alternative means of allelic assortment. To test the technology, we generated a single broadly drug-sensitive strain of Saccharomyces cerevisiae bearing precise deletions of all 16 ATP-binding cassette transporters within clades associated with multidrug resistance.
[Show abstract][Hide abstract] ABSTRACT: Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships.
We have developed a strategy ('Funckenstein') that performs guilt-by-profiling and guilt-by-association and combines the results. Using a benchmark set of functional categories and input data for protein-coding genes in Saccharomyces cerevisiae, Funckenstein was compared with a previous combined strategy. Subsequently, we applied Funckenstein to 2,455 Gene Ontology terms. In the process, we developed 2,455 guilt-by-profiling classifiers based on 8,848 gene characteristics and 12 functional linkage graphs based on 23 biological relationships.
Funckenstein outperforms a previous combined strategy using a common benchmark dataset. The combination of 'guilt-by-profiling' and 'guilt-by-association' gave significant improvement over the component classifiers, showing the greatest synergy for the most specific functions. Performance was evaluated by cross-validation and by literature examination of the top-scoring novel predictions. These quantitative predictions should help prioritize experimental study of yeast gene functions.
[Show abstract][Hide abstract] ABSTRACT: Understanding how individual proteins are organized into complexes and pathways is a significant current challenge. We introduce new algorithms to infer protein complexes by combining seed proteins with a confidence- weighted network. Two new stochastic methods use averaging over a probabil- istic ensemble of networks, and the new deterministic method provides a deterministic ranking of prospective complex members. We compare the per- formance of these algorithms with three existing algorithms. We test algorithm performance using three weighted graphs: a naïve Bayes estimate of the prob- ability of a direct and stable protein-protein interaction; a logistic regression es- timate of the probability of a direct or indirect interaction; and a decision tree estimate of whether two proteins exist within a common protein complex. The best-performing algorithms in these trials are the new stochastic methods. The deterministic algorithm is significantly faster, whereas the stochastic algorithms are less sensitive to the weighting scheme.
Systems Biology and Computational Proteomics, Joint RECOMB 2006 Satellite Workshops on Systems Biology and on Computational Proteomics, San Diego, CA, USA, December 1-3, 2006, Revised Selected Papers; 01/2006
[Show abstract][Hide abstract] ABSTRACT: Systematic mapping of protein-protein interactions, or 'interactome' mapping, was initiated in model organisms, starting with defined biological processes and then expanding to the scale of the proteome. Although far from complete, such maps have revealed global topological and dynamic features of interactome networks that relate to known biological properties, suggesting that a human interactome map will provide insight into development and disease mechanisms at a systems level. Here we describe an initial version of a proteome-scale map of human binary protein-protein interactions. Using a stringent, high-throughput yeast two-hybrid system, we tested pairwise interactions among the products of approximately 8,100 currently available Gateway-cloned open reading frames and detected approximately 2,800 interactions. This data set, called CCSB-HI1, has a verification rate of approximately 78% as revealed by an independent co-affinity purification assay, and correlates significantly with other biological attributes. The CCSB-HI1 data set increases by approximately 70% the set of available binary interactions within the tested space and reveals more than 300 new connections to over 100 disease-associated proteins. This work represents an important step towards a systematic and comprehensive human interactome project.
[Show abstract][Hide abstract] ABSTRACT: Biochemists and geneticists, represented by Doug and Bill in classic essays, have long debated the merits of their methods. We revisited this issue using genomic data from the budding yeast, Saccharomyces cerevisiae, and found that genetic interactions outperformed protein interactions in predicting functional relationships between genes. However, when combined, these interaction types yielded superior performance, convincing Doug and Bill to call a truce.
Trends in Genetics 09/2005; 21(8):424-7. · 9.77 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Large-scale studies have revealed networks of various biological interaction types, such as protein-protein interaction, genetic interaction, transcriptional regulation, sequence homology, and expression correlation. Recurring patterns of interconnection, or 'network motifs', have revealed biological insights for networks containing either one or two types of interaction.
To study more complex relationships involving multiple biological interaction types, we assembled an integrated Saccharomyces cerevisiae network in which nodes represent genes (or their protein products) and differently colored links represent the aforementioned five biological interaction types. We examined three- and four-node interconnection patterns containing multiple interaction types and found many enriched multi-color network motifs. Furthermore, we showed that most of the motifs form 'network themes' -- classes of higher-order recurring interconnection patterns that encompass multiple occurrences of network motifs. Network themes can be tied to specific biological phenomena and may represent more fundamental network design principles. Examples of network themes include a pair of protein complexes with many inter-complex genetic interactions -- the 'compensatory complexes' theme. Thematic maps -- networks rendered in terms of such themes -- can simplify an otherwise confusing tangle of biological relationships. We show this by mapping the S. cerevisiae network in terms of two specific network themes.
Significantly enriched motifs in an integrated S. cerevisiae interaction network are often signatures of network themes, higher-order network structures that correspond to biological phenomena. Representing networks in terms of network themes provides a useful simplification of complex biological relationships.
[Show abstract][Hide abstract] ABSTRACT: Genetic interactions define overlapping functions and compensatory pathways. In particular, synthetic sick or lethal (SSL) genetic interactions are important for understanding how an organism tolerates random mutation, i.e., genetic robustness. Comprehensive identification of SSL relationships remains far from complete in any organism, because mapping these networks is highly labor intensive. The ability to predict SSL interactions, however, could efficiently guide further SSL discovery. Toward this end, we predicted pairs of SSL genes in Saccharomyces cerevisiae by using probabilistic decision trees to integrate multiple types of data, including localization, mRNA expression, physical interaction, protein function, and characteristics of network topology. Experimental evidence demonstrated the reliability of this strategy, which, when extended to human SSL interactions, may prove valuable in discovering drug targets for cancer therapy and in identifying genes responsible for multigenic diseases.
Proceedings of the National Academy of Sciences 12/2004; 101(44):15682-7. · 9.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In apparently scale-free protein-protein interaction networks, or 'interactome' networks, most proteins interact with few partners, whereas a small but significant proportion of proteins, the 'hubs', interact with many partners. Both biological and non-biological scale-free networks are particularly resistant to random node removal but are extremely sensitive to the targeted removal of hubs. A link between the potential scale-free topology of interactome networks and genetic robustness seems to exist, because knockouts of yeast genes encoding hubs are approximately threefold more likely to confer lethality than those of non-hubs. Here we investigate how hubs might contribute to robustness and other cellular properties for protein-protein interactions dynamically regulated both in time and in space. We uncovered two types of hub: 'party' hubs, which interact with most of their partners simultaneously, and 'date' hubs, which bind their different partners at different times or locations. Both in silico studies of network connectivity and genetic interactions described in vivo support a model of organized modularity in which date hubs organize the proteome, connecting biological processes--or modules--to each other, whereas party hubs function inside modules.
[Show abstract][Hide abstract] ABSTRACT: Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H) and affinity purification coupled with mass spectrometry (APMS) have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship.
Using a supervised machine learning approach--probabilistic decision tree, we integrated high-throughput protein interaction datasets and other gene- and protein-pair characteristics to predict co-complexed pairs (CCP) of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among the top predictions not annotated as CCPs in our reference set (obtained from the MIPS complex catalogue), a significant fraction was found to physically interact according to a separate database (YPD, Yeast Proteome Database), and the remaining predictions may potentially represent unknown CCPs.
We demonstrated that the probabilistic decision tree approach can be successfully used to predict co-complexed protein (CCP) pairs from other characteristics. Our top-scoring CCP predictions provide testable hypotheses for experimental validation.
[Show abstract][Hide abstract] ABSTRACT: A genetic interaction network containing�1000 genes and�4000 interactions
was mapped by crossing mutations in 132 different query genes into a set of
�4700 viable gene yeast deletion mutants and scoring the double mutant
progeny for fitness defects.Network connectivity was predictive of function
because interactions often occurred among functionally related genes, and
similar patterns of interactions tended to identify components of the same
pathway.The genetic network exhibited dense local neighborhoods; therefore,
the position of a gene on a partially mapped network is predictive of other
genetic interactions.Because digenic interactions are common in yeast, similar
networks may underlie the complex genetics associated with inherited phenotypes
in other organisms.
[Show abstract][Hide abstract] ABSTRACT: To initiate studies on how protein-protein interaction (or "interactome") networks relate to multicellular functions, we have mapped a large fraction of the Caenorhabditis elegans interactome network. Starting with a subset of metazoan-specific proteins, more than 4000 interactions were identified from high-throughput, yeast two-hybrid (HT=Y2H) screens. Independent coaffinity purification assays experimentally validated the overall quality of this Y2H data set. Together with already described Y2H interactions and interologs predicted in silico, the current version of the Worm Interactome (WI5) map contains approximately 5500 interactions. Topological and biological features of this interactome network, as well as its integration with phenome and transcriptome data sets, lead to numerous biological hypotheses.