Walter L Ruzzo

Walter L Ruzzo
University of Washington Seattle | UW · School of Computer Science and Engineering

Professor

About

195
Publications
28,396
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
13,118
Citations
Citations since 2016
22 Research Items
3839 Citations
20162017201820192020202120220100200300400500600
20162017201820192020202120220100200300400500600
20162017201820192020202120220100200300400500600
20162017201820192020202120220100200300400500600
Additional affiliations
September 2008 - present
Fred Hutchinson Cancer Research Center
Position
  • Joint Member
September 2001 - present
University of Washington Seattle
Position
  • Professor (Associate)
September 1977 - present
University of Washington Seattle
Position
  • Professor
Education
September 1973 - June 1978
University of California, Berkeley
Field of study
  • Computer Science
September 1964 - June 1968
California Institute of Technology
Field of study
  • Mathematics

Publications

Publications (195)
Article
Full-text available
Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA's structure is often key to its function. RNA structures are typically characterized by compensatory...
Article
Full-text available
The analysis of mRNA transcript abundance with RNA-Seq is a central tool in molecular biology research, but often analyses fail to account for the uncertainty in these estimates, which can be significant, especially when trying to disentangle isoforms or duplicated genes. Preserving uncertainty necessitates a full probabilistic model of the all the...
Preprint
Full-text available
The analysis of mRNA transcript abundance with RNA-Seq is a central tool in molecular biology research, but often analyses fail to account for the uncertainty in these estimates, which can be significant, especially when trying to disentangle isoforms or duplicated genes. Preserving un-certainty necessitates a full probabilistic model of the all th...
Article
Full-text available
Spatial heterogeneity is a fundamental feature of the tumor microenvironment (TME), and tackling spatial heterogeneity in neoplastic metabolic aberrations is critical for tumor treatment. Genome-scale metabolic network models have been used successfully to simulate cancer metabolic networks. However, most models use bulk gene expression data of ent...
Article
Regulation of embryonic diapause, dormancy that interrupts the tight connection between developmental stage and time, is still poorly understood. Here, we characterize the transcriptional and metabolite profiles of mouse diapause embryos and identify unique gene expression and metabolic signatures with activated lipolysis, glycolysis, and metabolic...
Preprint
Full-text available
Metabolic reprogramming is a hallmark of cancer, and there is an urgent need to exploit metabolic aberrations in cancer to identify perturbations that may selectively kill cancer cells. Spatial heterogeneity is a fundamental feature of the tumor microenvironment (TME), and tackling spatial heterogeneity is critical for understanding tumor progressi...
Article
Full-text available
Background Comparative genomics approaches have facilitated the discovery of many novel non-coding and structured RNAs (ncRNAs). The increasing availability of related genomes now makes it possible to systematically search for compensatory base changes – and thus for conserved secondary structures – even in genomic regions that are poorly alignable...
Article
Full-text available
Sexual reproduction roots the eukaryotic tree of life, although its loss occurs across diverse taxa. Asexual reproduction and clonal lineages persist in these taxa despite theoretical arguments suggesting that individual clones should be evolutionarily short-lived due to limited phenotypic diversity. Here, we present quantitative evidence that an o...
Preprint
Full-text available
Aligning millions of short DNA or RNA reads, of 75 to 250 base pairs each, to a reference genome is a significant computation problem in bioinformatics. We present a flexible and fast FPGA-based short read alignment tool. Our aligner makes use of the processing power of FPGAs in conjunction with the greater host memory bandwidth and flexibility of...
Article
Full-text available
Anatomical subdivisions of the human brain can be associated with different neuronal functions. This functional diversification is reflected by differences in gene expression. By analyzing post-mortem gene expression data from the Allen Brain Atlas, we investigated the impact of transcription factors (TF) and RNA secondary structures on the regulat...
Article
Full-text available
We analyzed chromatin dynamics and transcriptional activity of human embryonic stem cell (hESC)-derived cardiac progenitor cells (CPCs) and KDR⁺/CD34⁺ endothelial cells generated from different mesodermal origins. Using an unbiased algorithm to hierarchically rank genes modulated at the level of chromatin and transcription, we identified candidate...
Article
Full-text available
Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for Conserved RNA Structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After c...
Preprint
Full-text available
While RNA-Seq has enabled great progress towards the goal of wide-scale isoform-level mRNA quantification, short reads have limitations when resolving complex or similar sets of isoforms. As a result, estimates of isoform abundance carry far more uncertainty than those made at the gene level. When confronted with this uncertainty, commonly used met...
Article
Full-text available
During vertebrate development, mesodermal fate choices are regulated by interactions between morphogens such as activin/nodal, BMPs and Wnt/β-catenin that define anterior-posterior patterning and specify downstream derivatives including cardiomyocyte, endothelial and hematopoietic cells. We used human embryonic stem cells to explore how these pathw...
Article
Full-text available
Significance The adult human heart is incapable of significant regeneration after injury. Human embryonic stem cells (hESCs) have the capacity to generate an unlimited number of cardiomyocytes (CMs). However, hESC-derived CMs (hESC-CMs) are at a fetal state with respect to their functional and physiological characteristics, diminishing their utilit...
Article
Full-text available
Background The genome annotations of rhesus (Macaca mulatta) and cynomolgus (Macaca fascicularis) macaques, two of the most common non-human primate animal models, are limited.Methods We analyzed large-scale macaque RNA-based next-generation sequencing (RNAseq) data to identify un-annotated macaque transcripts.ResultsFor both macaque species, we un...
Article
RNA bioinformatics and computational RNA biology have emerged from implementing methods for predicting the secondary structure of single sequences. The field has evolved to exploit multiple sequences to take evolutionary information into account, such as compensating (and structure preserving) base changes. These methods have been developed further...
Article
De novo discovery of “motifs” capturing the commonalities among related noncoding structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis on an approach based on the CMfinder program as a case...
Book
The existence of genes for RNA molecules not coding for proteins (ncRNAs) has been recognized since the 1950's, but until recently, aside from the critically important ribosomal and transfer RNA genes, most focus has been on protein coding genes. However, a long series of striking discoveries, from RNA's ability to carry out catalytic function, to...
Article
Full-text available
High throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor. It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance. We describe a simple yet effective algorithm for disco...
Article
Full-text available
Study of the human microbiota in relation to human health and disease is a rapidly expanding field. To fully understand the complex relationship between the human gut microbiota and disease risks, study designs that capture the variation within and between human subjects at the population level are required, but this has been hampered by the lack o...
Article
Full-text available
Background Transcription factor overexpression is common in biological experiments and transcription factor amplification is associated with many cancers, yet few studies have directly compared the DNA-binding profiles of endogenous versus overexpressed transcription factors. Methods We analyzed MyoD ChIP-seq data from C2C12 mouse myotubes, primar...
Conference Paper
Over the last decade, the number of known biologically important non-coding RNAs (ncRNAs) has increased by orders of magnitude. The function performed by a specific ncRNA is partially determined by its structure, defined by which nucleotides of the molecule form pairs. These correlations may span large and variable distances in the linear RNA molec...
Article
Abstract We analyzed 198 datasets of chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) and developed a methodology for identification of high-confidence enhancer and promoter regions from transcription factor ChIP-seq data alone. We identify 32,467 genomic regions marked with ChIP-seq binding peaks in 15 or more experi...
Article
Full-text available
Unlabelled: We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dr...
Article
Full-text available
Post-transcriptional control of gene expression is mostly conducted by specific elements in untranslated regions (UTRs) of mRNAs, in collaboration with specific binding proteins and RNAs. In several well characterized cases, these RNA elements are known to form stable secondary structures. RNA secondary structures also may have major functional imp...
Data
Full-text available
Tables and figures. This file contains lists of correlated expressed structured riboprobes, and additional tables and figures.
Data
GO analysis of structured UTR probes. 4115 structured UTR probes with known gene symbols are examined for GO term enrichment.
Data
Annotation of predicted structured probes. CSV-file listing the features of all structured probes from Table 1 and their CMfinder predicted RNA secondary structures.
Data
Predicted significant RNA-RNA interactions. CSV-file listing 585 significant (p-value<1e-05) interactions between structured putative ncRNAs and UTRs. The interaction sites are predicted by RNAplfold and RNAplex to be larger than 9 nt and with a MFE smaller than -40 kcal/mol.
Data
GO analysis of non-structured UTR probes. 3407 non-structured UTR probes with known gene symbols are examined for GO term enrichment.
Conference Paper
Full-text available
Bioinformatics is an emerging field with seemingly limitless possibilities for advances in numerous areas of research and applications. We propose a scalable FPGA-based solution to the short read mapping problem in DNA sequencing, which greatly accelerates the task of aligning short length reads to a known reference genome. We compare the runtime,...
Article
The regulatory networks of differentiation programs have been partly characterized; however, the molecular mechanisms of lineage-specific gene regulation by highly similar transcription factors remain largely unknown. Here we compare the genome-wide binding and transcription profiles of NEUROD2-mediated neurogenesis with MYOD-mediated myogenesis. W...
Article
Full-text available
Quantification of sequence abundance in RNA-Seq experiments is often conflated by protocol-specific sequence bias. The exact sources of the bias are unknown, but may be influenced by polymerase chain reaction amplification, or differing primer affinities and mixtures, for example. The result is decreased accuracy in many applications, such as de no...
Article
Facioscapulohumeral dystrophy (FSHD) is one of the most common inherited muscular dystrophies. The causative gene remains controversial and the mechanism of pathophysiology unknown. Here we identify genes associated with germline and early stem cell development as targets of the DUX4 transcription factor, a leading candidate gene for FSHD. The gene...
Article
Full-text available
Although microRNAs (miRNAs) are important regulators of gene expression, the transcriptional regulation of miRNAs themselves is not well understood. We employed an integrative computational pipeline to dissect the transcription factors (TFs) responsible for altered miRNA expression in ovarian carcinoma. Using experimental data and computational pre...
Article
Recent studies have demonstrated that MyoD initiates a feed-forward regulation of skeletal muscle gene expression, predicting that MyoD binds directly to many genes expressed during differentiation. We have used chromatin immunoprecipitation and high-throughput sequencing to identify genome-wide binding of MyoD in several skeletal muscle cell types...
Article
Growing recognition of the numerous, diverse and important roles played by non-coding RNA in all organisms motivates better elucidation of these cellular components. Comparative genomics is a powerful tool for this task and is arguably preferable to any high-throughput experimental technology currently available, because evolutionary conservation h...
Article
Full-text available
Non-coding RNAs (ncRNAs) are transcripts that do not code for proteins. Recent findings have shown that RNA-mediated regulatory mechanisms influence a substantial portion of typical microbial genomes. We present an efficient method for finding potential ncRNAs in bacteria by clustering genomic sequences based on homology inferred from both primary...
Article
Full-text available
Assessing the statistical significance of structured RNA predicted from multiple sequence alignments relies on the existence of a good null model. We present here a random shuffling algorithm, Multiperm, that preserves not only the gap and local conservation structure in alignments of arbitrarily many sequences, but also the approximate dinucleotid...
Article
Full-text available
We used massively parallel pyrosequencing to discover and characterize microRNAs (miRNAs) expressed in human embryonic stem cells (hESC). Sequencing of small RNA cDNA libraries derived from undifferentiated hESC and from isogenic differentiating cultures yielded a total of 425,505 high-quality sequence reads. A custom data analysis pipeline delinea...
Article
Full-text available
A novel family of riboswitches, called SAM-IV, is the fourth distinct set of mRNA elements to be reported that regulate gene expression via direct sensing of S-adenosylmethionine (SAM or AdoMet). SAM-IV riboswitches share conserved nucleotide positions with the previously described SAM-I riboswitches, despite rearranged structures and nucleotide po...
Article
We have identified a highly conserved RNA motif located upstream of genes encoding molybdate transporters, molybdenum cofactor (Moco) biosynthesis enzymes, and proteins that utilize Moco as a coenzyme. Bioinformatics searches have identified 176 representatives in gamma-Proteobacteria, delta-Proteobacteria, Clostridia, Actinobacteria, Deinococcus-T...
Article
Full-text available
Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment methods to misalign, or even refuse to align, homologou...