Michael Riffle

University of Washington Seattle, Seattle, Washington, United States

Are you Michael Riffle?

Claim your profile

Publications (20)133.28 Total impact

  • Source
    Daniel Jaschob, Trisha N Davis, Michael Riffle
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequence feature annotations (e.g., protein domain boundaries, binding sites, and secondary structure predictions) are an essential part of biological research. Annotations are widely used by scientists during research and experimental design, and are frequently the result of biological studies. A generalized and simple means of disseminating and visualizing these data via the web would be of value to the research community. Mason is a web site widget designed to visualize and compare annotated features of one or more nucleotide or protein sequence. Annotated features may be of virtually any type, ranging from annotating transcription binding sites or exons and introns in DNA to secondary structure or domain boundaries in proteins. Mason is simple to use and easy to integrate into web sites. Mason has a highly dynamic and configurable interface supporting multiple sets of annotations per sequence, overlapping regions, customization of interface and user-driven events (e.g., clicks and text to appear for tooltips). It is written purely in JavaScript and SVG, requiring no 3(rd) party plugins or browser customization. Mason is a solution for dissemination of sequence annotation data on the web. It is highly flexible, customizable, simple to use, and is designed to be easily integrated into web sites. Mason is open source and freely available at https://github.com/yeastrc/mason .
    BMC Research Notes 12/2015; 8(1):1009. DOI:10.1186/s13104-015-1009-z
  • [Show abstract] [Hide abstract]
    ABSTRACT: Protein chemical cross-linking and mass spectrometry enable the analysis of protein-protein interactions and protein topologies, however complicated cross-linked peptide spectra require specialized algorithms to identify interacting sites. The Kojak cross-linking software application is a new, efficient approach to identify cross-linked peptides, enabling large-scale analysis of protein-protein interactions by chemical cross-linking techniques. The algorithm integrates spectral processing and scoring schemes adopted from traditional database search algorithms, and can identify cross-linked peptides using many different chemical cross-linkers, with or without heavy isotope labels. Kojak was used to analyze both novel and existing datasets, and was compared with existing cross-linking algorithms. The algorithm provided increased cross-link identifications over existing algorithms, and equally importantly, the results in a fraction of computational time. The Kojak algorithm is open source, cross-platform, and freely available. This software provides both existing and new cross-linking researchers alike an effective way to derive additional cross-link identifications from new or existing datasets. For new users, it provides a simple analytical resource resulting in more cross-link identifications than other methods.
    Journal of Proteome Research 03/2015; 14(5). DOI:10.1021/pr501321h
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Diploid budding yeast undergoes rapid mitosis when it ferments glucose, and in the presence of a non-fermentable carbon source and the absence of a nitrogen source it triggers sporulation. Rich medium with acetate is a commonly used pre-sporulation medium, but our understanding of the molecular events underlying the acetate-driven transition from mitosis to meiosis is still incomplete. We identified 263 proteins for which mRNA and protein synthesis are linked or uncoupled in fermenting and respiring cells. Using motif predictions, interaction data and RNA profiling we find among them 28 likely targets for Ume6, a subunit of the conserved Rpd3/Sin3 histone deacetylase- complex regulating genes involved in metabolism, stress response and meiosis. Finally, we identify 14 genes for which both RNA and proteins are detected exclusively in respiring cells but not in fermenting cells in our sample set, including CSM4, SPR1, SPS4 and RIM4, which were thought to be meiosis-specific. Our work reveals intertwined transcriptional and post-transcriptional control mechanisms acting when a MATa/α strain responds to nutritional signals, and provides molecular clues how the carbon source primes yeast cells for entering meiosis.
    Journal of Proteomics 02/2015; 119. DOI:10.1016/j.jprot.2015.01.015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Accurate transmission of genetic material relies on the coupling of chromosomes to spindle microtubules by kinetochores. These linkages are regulated by the conserved Aurora B/Ipl1 kinase to ensure that sister chromatids are properly attached to spindle microtubules. Kinetochore-microtubule attachments require the essential Ndc80 complex, which contains two globular ends linked by large coiled-coil domains. In this study, we isolated a novel ndc80 mutant in Saccharomyces cerevisiae that contains mutations in the coiled-coil domain. This ndc80 mutant accumulates erroneous kinetochore-microtubule attachments, resulting in misalignment of kinetochores on the mitotic spindle. Genetic analyses with suppressors of the ndc80 mutant and in vitro cross-linking experiments suggest that the kinetochore misalignment in vivo stems from a defect in the ability of the Ndc80 complex to stably fold at a hinge in the coiled coil. Previous studies proposed that the Ndc80 complex can exist in multiple conformations: elongated during metaphase and bent during anaphase. However, the distinct functions of individual conformations in vivo are unknown. Here, our analysis revealed a tightly folded conformation of the Ndc80 complex that is likely required early in mitosis. This conformation is mediated by a direct, intra-complex interaction and involves a greater degree of folding than the bent form of the complex at anaphase. Furthermore, our results suggest that this conformation is functionally important in vivo for efficient error correction by Aurora B/Ipl1 and, consequently, to ensure proper kinetochore alignment early in mitosis.
    Genetics 09/2014; DOI:10.1534/genetics.114.167775
  • [Show abstract] [Hide abstract]
    ABSTRACT: The use of in vivo Forster resonance energy transfer (FRET) data to determine the molecular architecture of a protein complex in living cells is challenging due to data sparseness, sample heterogeneity, signal contributions from multiple donors and acceptors, unequal fluorophore brightness, photobleaching, flexibility of the linker connecting the fluorophore to the tagged protein, and spectral cross-talk. We address these challenges by using a Bayesian approach that produces the posterior probability of a model, given the input data. The posterior probability is defined as a function of the dependence of our FRET metric FRETR on a structure (forward model), a model of noise in the data, as well as prior information about the structure, relative populations of distinct states in the sample, forward model parameters, and data noise. The forward model was validated against Kinetic Monte Carlo simulations and in vivo experimental data collected on 9 systems of known structure. In addition, our Bayesian approach was validated by a benchmark of 16 protein complexes of known structure. Given the structures of each subunit of the complexes, models were computed from synthetic FRETR data with a distance RMSD error of 14 -17 Å. The approach is implemented in the open source Integrative Modeling Platform (http://integrativemodeling.org), allowing us to determine macromolecular structures by a combination of in vivo FRETR data with data from other sources, such as electron microscopy and chemical cross-linking.
    Molecular &amp Cellular Proteomics 08/2014; 13(11). DOI:10.1074/mcp.M114.040824
  • Source
    Daniel Jaschob, Trisha N Davis, Michael Riffle
    [Show abstract] [Hide abstract]
    ABSTRACT: As high throughput sequencing continues to grow more commonplace, the need to disseminate the resulting data via web applications continues to grow. Particularly, there is a need to disseminate multiple versions of related gene and protein sequences simultaneously--whether they represent alleles present in a single species, variations of the same gene among different strains, or homologs among separate species. Often this is accomplished by displaying all versions of the sequence at once in a manner that is not intuitive or space-efficient and does not facilitate human understanding of the data. Web-based applications needing to disseminate multiple versions of sequences would benefit from a drop-in module designed to effectively disseminate these data.
    BMC Research Notes 07/2014; 7(1):468. DOI:10.1186/1756-0500-7-468
  • [Show abstract] [Hide abstract]
    ABSTRACT: To better understand the quantitative characteristics and structure of phenotypic diversity, we measured over 14,000 transcript, protein, metabolite, and morphological traits in 22 genetically diverse strains of Saccharomyces cerevisiae. Over 50% of all measured traits varied significantly across strains (FDR = 5%). The structure of phenotypic correlations is complex, with 85% of all traits significantly correlated with at least one other phenotype (median = 6, maximum = 328). We show how high-dimensional molecular phenomics datasets can be leveraged to accurately predict phenotypic variation between strains, often with greater precision than afforded by DNA sequence information alone. These results provide new insights into the spectrum and structure of phenotypic diversity and the characteristics influencing the ability to accurately predict phenotypes.
    Genome Research 05/2013; DOI:10.1101/gr.155762.113
  • Source
    Daniel Jaschob, Michael Riffle
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Laboratories engaged in computational biology or bioinformatics frequently need to run lengthy, multistep, and user-driven computational jobs. Each job can tie up a computer for a few minutes to several days, and many laboratories lack the expertise or resources to build and maintain a dedicated computer cluster. RESULTS: JobCenter is a client-server application and framework for job management and distributed job execution. The client and server components are both written in Java and are cross-platform and relatively easy to install. All communication with the server is client-driven, which allows worker nodes to run anywhere (even behind external firewalls) and provides inherent load balancing. Adding a worker node to the worker pool is as simple as dropping the JobCenter client files onto any computer and performing basic configuration, providing tremendous ease-of-use, flexibility, and limitless horizontal scalability. Each worker installation may be independently configured, including the types of jobs it is able to run. Executed jobs may be written in any language and may include multiple execution steps. CONCLUSIONS: JobCenter is a versatile and scalable distributed job management system that allows laboratories to very efficiently distribute all computational work among all available resources. JobCenter is freely available at http://code.google.com/p/jobcenter/.
    Source Code for Biology and Medicine 07/2012; 7(1):8. DOI:10.1186/1751-0473-7-8
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Relocalization of proteins is a hallmark of the DNA damage response. We use high-throughput microscopic screening of the yeast GFP fusion collection to develop a systems-level view of protein reorganization following drug-induced DNA replication stress. Changes in protein localization and abundance reveal drug-specific patterns of functional enrichments. Classification of proteins by subcellular destination enables the identification of pathways that respond to replication stress. We analysed pairwise combinations of GFP fusions and gene deletion mutants to define and order two previously unknown DNA damage responses. In the first, Cmr1 forms subnuclear foci that are regulated by the histone deacetylase Hos2 and are distinct from the typical Rad52 repair foci. In a second example, we find that the checkpoint kinases Mec1/Tel1 and the translation regulator Asc1 regulate P-body formation. This method identifies response pathways that were not detected in genetic and protein interaction screens, and can be readily applied to any form of chemical or genetic stress to reveal cellular response pathways.
    Nature Cell Biology 07/2012; 14(9):966-76. DOI:10.1038/ncb2549
  • [Show abstract] [Hide abstract]
    ABSTRACT: Mass spectrometry-based proteomics is increasingly being used in biomedical research. These experiments typically generate a large volume of highly complex data, and the volume and complexity are only increasing with time. There exist many software pipelines for analyzing these data (each typically with its own file formats), and as technology improves, these file formats change and new formats are developed. Files produced from these myriad software programs may accumulate on hard disks or tape drives over time, with older files being rendered progressively more obsolete and unusable with each successive technical advancement and data format change. Although initiatives exist to standardize the file formats used in proteomics, they do not address the core failings of a file-based data management system: (1) files are typically poorly annotated experimentally, (2) files are "organically" distributed across laboratory file systems in an ad hoc manner, (3) files formats become obsolete, and (4) searching the data and comparing and contrasting results across separate experiments is very inefficient (if possible at all). Here we present a relational database architecture and accompanying web application dubbed Mass Spectrometry Data Platform that is designed to address the failings of the file-based mass spectrometry data management approach. The database is designed such that the output of disparate software pipelines may be imported into a core set of unified tables, with these core tables being extended to support data generated by specific pipelines. Because the data are unified, they may be queried, viewed, and compared across multiple experiments using a common web interface. Mass Spectrometry Data Platform is open source and freely available at http://code.google.com/p/msdapl/.
    Molecular &amp Cellular Proteomics 05/2012; 11(9):824-31. DOI:10.1074/mcp.O111.015149
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions.
    Genome Research 08/2011; 21(11):1981-94. DOI:10.1101/gr.121475.111
  • Source
    Michael Riffle, Trisha N Davis
    [Show abstract] [Hide abstract]
    ABSTRACT: There is increasing interest in the development of computational methods to analyze fluorescent microscopy images and enable automated large-scale analysis of the subcellular localization of proteins. Determining the subcellular localization is an integral part of identifying a protein's function, and the application of bioinformatics to this problem provides a valuable tool for the annotation of proteomes. Training and validating algorithms used in image analysis research typically rely on large sets of image data, and would benefit from a large, well-annotated and highly-available database of images and associated metadata. The Yeast Resource Center Public Image Repository (YRC PIR) is a large database of images depicting the subcellular localization and colocalization of proteins. Designed especially for computational biologists who need large numbers of images, the YRC PIR contains 532,182 TIFF images from nearly 85,000 separate experiments and their associated experimental data. All images and associated data are searchable, and the results browsable, through an intuitive web interface. Search results, experiments, individual images or the entire dataset may be downloaded as standards-compliant OME-TIFF data. The YRC PIR is a powerful resource for researchers to find, view, and download many images and associated metadata depicting the subcellular localization and colocalization of proteins, or classes of proteins, in a standards-compliant format. The YRC PIR is freely available at http://images.yeastrc.org/.
    BMC Bioinformatics 05/2010; 11:263. DOI:10.1186/1471-2105-11-263
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The prediction of protein-protein interactions is an important step toward the elucidation of protein functions and the understanding of the molecular mechanisms inside the cell. While experimental methods for identifying these interactions remain costly and often noisy, the increasing quantity of solved 3D protein structures suggests that in silico methods to predict interactions between two protein structures will play an increasingly important role in screening candidate interacting pairs. Approaches using the knowledge of the structure are presumably more accurate than those based on sequence only. Approaches based on docking protein structures solve a variant of this problem, but these methods remain very computationally intensive and will not scale in the near future to the detection of interactions at the level of an interactome, involving millions of candidate pairs of proteins. Here, we describe a computational method to predict efficiently in silico whether two protein structures interact. This yes/no question is presumably easier to answer than the standard protein docking question, "How do these two protein structures interact?" Our approach is to discriminate between interacting and non-interacting protein pairs using a statistical pattern recognition method known as a support vector machine (SVM). We demonstrate that our structure-based method performs well on this task and scales well to the size of an interactome. The use of structure information for the prediction of protein interaction yields significantly better performance than other sequence-based methods. Among structure-based classifiers, the SVM algorithm, combined with the metric learning pairwise kernel and the MAMMOTH kernel, performs best in our experiments.
    BMC Bioinformatics 03/2010; 11:144. DOI:10.1186/1471-2105-11-144
  • Source
    Michael Riffle, Jimmy K Eng
    [Show abstract] [Hide abstract]
    ABSTRACT: The field of proteomics, particularly the application of MS analysis to protein samples, is well established and growing rapidly. Proteomic studies generate large volumes of raw experimental data and inferred biological results. To facilitate the dissemination of these data, centralized data repositories have been developed that make the data and results accessible to proteomic researchers and biologists alike. This review of proteomics data repositories focuses exclusively on freely available, centralized data resources that disseminate or store experimental MS data and results. The resources chosen reflect a current "snapshot" of the state of resources available with an emphasis placed on resources that may be of particular interest to yeast researchers. Resources are described in terms of their intended purpose and the features and functionality provided to users.
    Proteomics 10/2009; 9(20):4653-63. DOI:10.1002/pmic.200900216
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Saccharomyces cerevisiae chromosomal passenger proteins Ipl1 (Aurora B) and Sli15 (INCENP) are required for the tension checkpoint, but the role of the third passenger, Bir1, is controversial. We have isolated a temperature-sensitive mutant (bir1-107) in the essential C-terminal region of Bir1 known to be required for binding to Sli15. This allele reveals a checkpoint function for Bir1. The mutant displays a biorientation defect, a defective checkpoint response to lack of tension, and an inability to detach mutant kinetochores. Ipl1 localizes to aberrant foci when Bir1 localization is disrupted in the bir1-107 mutant. Thus, one checkpoint role of Bir1 is to properly localize Ipl1 and allow detachment of kinetochores. Quantitative analysis indicates that the chromosomal passengers colocalize with kinetochores in G1 but localize between kinetochores that are under tension. Bir1 localization to kinetochores is maintained in an mcd1-1 mutant in the absence of tension. Our results suggest that the establishment of tension removes Ipl1, Bir1, and Sli15, and their kinetochore detachment activity, from the vicinity of kinetochores and allows cells to proceed through the tension checkpoint.
    Molecular biology of the cell 01/2009; 20(3):915-23. DOI:10.1091/mbc.E08-07-0723
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13% over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at http://noble.gs.washington.edu/proj/philius. A Philius Web server is available at http://www.yeastrc.org/philius, and the predictions on the YRC database are available at http://www.yeastrc.org/pdr.
    PLoS Computational Biology 12/2008; 4(11):e1000213. DOI:10.1371/journal.pcbi.1000213
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Author Summary The three-dimensional structure of a protein can reveal much about that protein's evolutionary relationships and functions. Such information about all the proteins in an organism—the proteome—would offer a more global view of these relationships, but solving each structure individually would be a formidable task. In this study, we have parsed all Saccharomyces cerevisiae proteins into nearly 15,000 distinct domains and then used de novo structure prediction methods together with worldwide distributed computing to predict structures for all domains lacking sequence similarity to proteins of known structure. To overcome the uncertainties in de novo structure prediction, we combined these predictions with data on the biological process, function, and localization of the proteins from previous experimental studies to assign the domains to families of evolutionarily related proteins. Our genome-wide domain predictions and superfamily assignments provide the basis for the generation of experimentally testable hypotheses about the mechanism of action for a large number of yeast proteins.
    PLoS Biology 05/2007; 5(4):e76. DOI:10.1371/journal.pbio.0050076
  • Source
    Michael Riffle, Lars Malmström, Trisha N Davis
    [Show abstract] [Hide abstract]
    ABSTRACT: The Yeast Resource Center Public Data Repository (YRC PDR) serves as a single point of access for the experimental data produced from many collaborations typically studying Saccharomyces cerevisiae (baker's yeast). The experimental data include large amounts of mass spectrometry results from protein co-purification experiments, yeast two-hybrid interaction experiments, fluorescence microscopy images and protein structure predictions. All of the data are accessible via searching by gene or protein name, and are available on the Web at http://www.yeastrc.org/pdr/.
    Nucleic Acids Research 02/2005; 33(Database issue):D378-82. DOI:10.1093/nar/gki073
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The localization of proteins can give important clues about their function and help sort data from large-scale proteomic screens. Forty-five proteins were tagged with the GFP variant YFP. These proteins were chosen because they are encoded by genes that display strong cell cycle-dependent expression that peaks in G(1). Most of these proteins localize to either the nucleus or to sites of cell growth. We are able to assign new cellular component GO terms to ASF2, TOS4, RTT109, YBR070C, YKR090W, YOL007C, YOL019W and YPR174C. We also have localization data for 21 other proteins. Noteworthy localizations were found for Rfa1p, a member of the DNA replication A complex, and Pri2p and Pol12p, subunits of the alpha-DNA polymerase : primase complex. In addition to its nuclear localization, Rfa1p assembled into cytoplasmic foci adjacent to the nucleus in cells during the G(1)-S phase transition of the cell cycle. Pri2 and Pol12 took on a beaded appearance at the G(1)-S transition and later in the cell cycle were enriched in the nuclear envelope. A new spindle pole body/nuclear envelope component encoded by YPR174 was identified. The cell cycle-dependent abundance of Tos4p mirrored Yox1p and these two proteins were the only proteins that were found exclusively at the G(1)-S phase of the cell cycle. A complete list of localizations, along with images, can be found at our website (http://www.yeastrc.org/cln2/).
    Yeast 07/2004; 21(9):793-800. DOI:10.1002/yea.1133
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Interpreting genome sequences requires the functional analysis of thousands of predicted proteins, many of which are uncharacterized and without obvious homologs. To assess whether the roles of large sets of uncharacterized genes can be assigned by targeted application of a suite of technologies, we used four complementary protein-based methods to analyze a set of 100 uncharacterized but essential open reading frames (ORFs) of the yeast Saccharomyces cerevisiae. These proteins were subjected to affinity purification and mass spectrometry analysis to identify copurifying proteins, two-hybrid analysis to identify interacting proteins, fluorescence microscopy to localize the proteins, and structure prediction methodology to predict structural domains or identify remote homologies. Integration of the data assigned function to 48 ORFs using at least two of the Gene Ontology (GO) categories of biological process, molecular function, and cellular component; 77 ORFs were annotated by at least one method. This combination of technologies, coupled with annotation using GO, is a powerful approach to classifying genes.
    Molecular Cell 01/2004; 12(6):1353-65. DOI:10.1016/S1097-2765(03)00476-3