Srinivas Aluru

Srinivas Aluru
Georgia Institute of Technology | GT · School of Computational Science & Engineering

About

348
Publications
29,937
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,188
Citations
Additional affiliations
January 1999 - July 2013
Iowa State University
Position
  • Professor (Full)

Publications

Publications (348)
Preprint
Motivation Gene regulatory network (GRN) reconstruction from gene expression profiles is a compute- and data-intensive problem. Numerous methods based on diverse approaches including mutual information, random forests, Bayesian networks, correlation measures, as well as their transforms and filters such as data processing inequality, have been prop...
Preprint
Motivation: A pan-genome graph represents a collection of genomes and encodes sequence variations between them. It is a powerful data structure for studying multiple similar genomes. Sequence-to-graph alignment is an essential step for the construction and the analysis of pan-genome graphs. However, existing algorithms incur runtime proportional to...
Preprint
Full-text available
Aligning a sequence to a walk in a labeled graph is a problem of fundamental importance to Computational Biology. For finding a walk in an arbitrary graph with $|E|$ edges that exactly matches a pattern of length $m$, a lower bound based on the Strong Exponential Time Hypothesis (SETH) implies an algorithm significantly faster than $O(|E|m)$ time i...
Article
We propose GRNUlar, a novel deep learning framework for supervised learning of gene regulatory networks (GRNs) from single-cell RNA-Sequencing (scRNA-Seq) data. Our framework incorporates two intertwined models. First, we leverage the expressive ability of neural networks to capture complex dependencies between transcription factors and the corresp...
Article
Motivation Reconstruction of genome-scale networks from gene expression data is an actively studied problem. A wide range of methods that differ between the types of interactions they uncover with varying trade-offs between sensitivity and specificity have been proposed. To leverage benefits of multiple such methods, ensemble network methods that c...
Article
Full-text available
As sequencing depth of chromatin studies continually grows deeper for sensitive profiling of regulatory elements or chromatin spatial structures, aligning and preprocessing of these sequencing data have become the bottleneck for analysis. Here we present Chromap, an ultrafast method for aligning and preprocessing high throughput chromatin profiles....
Article
Full-text available
Motivation Oxford Nanopore Technologies sequencing devices support adaptive sequencing, in which undesired reads can be ejected from a pore in real time. This feature allows targeted sequencing aided by computational methods for mapping partial reads, rather than complex library preparation protocols. However, existing mapping methods either requir...
Article
Full-text available
Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent re...
Preprint
Full-text available
We present Chromap, an ultrafast method for aligning and preprocessing high throughput chromatin profiles. Chromap is comparable to BWA-MEM and Bowtie2 in alignment accuracy and is over 10 times faster than traditional workflows on bulk ChIP-seq / Hi-C profiles and than 10x Genomics' CellRanger v2.0.0 pipeline on single-cell ATAC-seq profiles.
Preprint
Full-text available
Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bia...
Article
Full-text available
Background Third-generation single molecule sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem thr...
Article
Full-text available
Background Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACSk, have been shown to produce results as effective...
Article
Identifying long pairwise maximal common substrings among a large set of sequences is a frequently used construct in computational biology, with applications in DNA sequence clustering and assembly. Due to errors made by sequencers, algorithms that can accommodate a small number of differences are of particular interest. Formally, let D be a collec...
Conference Paper
Suffix arrays and trees are important and fundamental string data structures which lie at the foundation of many string algorithms, with important applications in computational biology, text processing, and information retrieval. Recent work enables the efficient parallel construction of suffix arrays and trees requiring at most O(n/p) memory per p...
Conference Paper
Transmission, storage, and archival of high-throughput sequencing (HTS) short-read datasets pose significant challenges due to the large size of such datasets. Constant improvements to HTS technology, in the form of increasing throughput and decreasing cost, and its increasing adoption amplify the problem. General-purpose compression algorithms hav...
Preprint
Full-text available
Innovations in Next-Generation Sequencing are enabling generation of DNA sequence data at ever faster rates and at very low cost. Large sequencing centers typically employ hundreds of such systems. Such high-throughput and low-cost generation of data underscores the need for commensurate acceleration in downstream computational analysis of the sequ...
Preprint
Full-text available
Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets t...
Preprint
We propose a new approach, called cooperative neural networks (CoNN), which uses a set of cooperatively trained neural networks to capture latent representations that exploit prior given independence structure. The model is more flexible than traditional graphical models based on exponential family distributions, but incorporates more domain specif...
Preprint
Full-text available
Aligning DNA sequences to an annotated reference is a key step for genotyping in biology. Recent scientific studies have demonstrated improved inference by aligning reads to a variation graph, i.e., a reference sequence augmented with known genetic variations. Given a variation graph in the form of a directed acyclic string graph, the sequence to g...
Chapter
Availability of extensive genetics data across multiple individuals and populations is driving the growing importance of graph based reference representations. Aligning sequences to graphs is a fundamental operation on several types of sequence graphs (variation graphs, assembly graphs, pan-genomes, etc.) and their biological applications. Though r...
Article
In this paper, we study the acceleration of applications that identify all the occurrences of thousands of string-patterns in an input data-stream using the Automata Processor (AP). For this evaluation, we use two applications from two fields, namely, cybersecurity and bioinformatics. The first application, called Fast-SNAP, scans network data for...
Preprint
Full-text available
Availability of extensive genetics data across multiple individuals and populations is driving the growing importance of graph based reference representations. Aligning sequences to graphs is a fundamental operation on several types of sequence graphs (variation graphs, assembly graphs, pan-genomes, etc.) and their biological applications. Though r...
Preprint
Full-text available
Motivation: Third-generation sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling r...
Article
Full-text available
A fundamental question in microbiology is whether there is continuum of genetic diversity among genomes, or clear species boundaries prevail instead. Whole-genome similarity metrics such as Average Nucleotide Identity (ANI) help address this question by facilitating high resolution taxonomic analysis of thousands of genomes from diverse phylogeneti...
Conference Paper
Full-text available
High-throughput next generation sequencers (NGS) can rapidly read billions of short DNA fragments, called reads, at low cost. Moreover, their throughput is increasing and cost is decreasing at rates much faster than the Moore's law. This demands commensurate acceleration for NGS secondary analysis that process the reads to identify variations betwe...
Article
Full-text available
Motivation Whole-genome alignment is an important problem in genomics for comparing different species, mapping draft assemblies to reference genomes and identifying repeats. However, for large plant and animal genomes, this task remains compute and memory intensive. In addition, current practical methods lack any guarantee on the characteristics of...
Conference Paper
Full-text available
Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACS_k, have been shown to produce results as effective as multip...
Conference Paper
\beginthebibliography 1 \bibitemalzamel2017faster M. Alzamel, P. Charalampopoulos, C. S. Iliopoulos, S. P. Pissis, J. Radoszewski, and W.-K. Sung. \newblock Faster algorithms for 1-mappability of a sequence. \newblock In \em International Conference on Combinatorial Optimization and Applications, pages 109--121. Springer, 2017. \bibitemderrien2012f...
Article
De Bruijn graph based genome assembly has gained popularity as short read sequencers become ubiquitous. A core assembly operation is the generation of unitigs, which are sequences corresponding to chains in the graph. Unitigs are used as building blocks for generating longer sequences in many assemblers, and can facilitate graph compression. Chain...
Preprint
Full-text available
Rapid advances in next-generation sequencing technologies are improving the throughput and cost of sequencing at a rate significantly faster than the Moore’s law. This necessitates equivalent rate of acceleration of NGS secondary analysis that assembles reads into full genomes and identifies variants between genomes. Conventional improvement in har...
Article
Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long-read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this article,...
Chapter
We present a novel algorithmic framework for solving approximate sequence matching problems that permit a bounded total number k of mismatches, insertions, and deletions. The core of the framework relies on transforming an approximate matching problem into a corresponding exact matching problem on suitably edited string suffixes, while carefully co...
Preprint
Full-text available
Motivation Whole-genome alignment is an important problem in genomics for comparing different species, mapping draft assemblies to reference genomes, and identifying repeats. However, for large plant and animal genomes, this task remains compute and memory intensive. Results We introduce an approximate algorithm for computing local alignment bound...
Article
The Automata Processor (AP) was designed for string-pattern matching. In this paper, we showcase its use to execute integer and floating-point comparisons and apply the same to accelerate interval stabbing queries. An interval stabbing query determines which of the intervals in a set overlap a query point. Such queries are often used in computation...
Article
Dramatic advances in DNA sequencing technology have made it possible to study microbial environments by direct sequencing of environmental DNA samples. Yet, due to the huge volume and high data complexity, current de novo assemblers cannot handle large metagenomic datasets or fail to perform assembly with acceptable quality. This paper presents the...
Preprint
Full-text available
A fundamental question in microbiology is whether there is a continuum of genetic diversity among genomes or clear species boundaries prevail instead. Answering this question requires robust measurement of whole-genome relatedness among thousands of genomes and from diverge phylogenetic lineages. Whole-genome similarity metrics such as Average Nucl...
Article
Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformati...
Conference Paper
State-of-the-art high-throughput sequencing instruments decipher in excess of a billion short genomic fragments per run. The output sequences are referred to as 'reads'. These read datasets facilitate a wide variety of analyses with applications in areas such as genomics, metagenomics, and transcriptomics. Owing to the large size of the read datase...
Conference Paper
Full-text available
Motivation: Reverse engineering gene networks from expression data is a widelymstudied problem, for which numerous mathematical models have been developed. Network reconstruction methods can be used to study specific pathways, or can be applied at the whole-genome scale to analyze large compendiums of expression datasets to uncover genome-wide inte...
Article
Full-text available
Background Alignment-free sequence comparison approaches have been garnering increasing interest in various data- and compute-intensive applications such as phylogenetic inference for large-scale sequences. While k-mer based methods are predominantly used in real applications, the average common substring (ACS) approach is emerging as one of the pr...
Article
Full-text available
Background Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Molecular analysis has been frequently used in the study of HCV outbreaks and tr...
Conference Paper
Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this paper, w...
Preprint
Full-text available
Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this paper, w...
Article
Full-text available
Brassinosteroids (BRs) regulate plant growth and stress responses via the BES1/BZR1 family of transcription factors, which regulate the expression of thousands of downstream genes. BRs are involved in the response to drought, however the mechanistic understanding of interactions between BR signalling and drought response remains to be established....
Data
List of BR-repressed genes revealed by RNA-seq. The BR-induced genes were identified in this study by RNA-seq with 4-week-old adult plants treated with or without 1 μM BL.
Data
List of BR-repressed genes revealed by microarrays. The BR-repressed genes in seedling or adult plants derived from previous microarray studies . The original data were also reanalyzed as described in Methods section.
Data
List of genes up-regulated in RD26OX transgenic plants. The genes up-regulated in RD26OX plants compared to WT were identified in this study by RNA-seq with 4-week-old adult plants without BL treatment (Fig. 2).
Data
List of BR-induced genes revealed by RNA-seq. The BR-induced genes were identified by RNA-seq with 4-week-old adult plants treated with or without 1 μM BL.
Data
List of genes up-regulated by drought stress. Drought induced genes (combination of 2-day and 3-day dehydration treatment data) were from previous study by microarray analysis.
Data
List of genes down-regulated by drought stress. Drought induced genes (combination of 2-day and 3-day dehydration treatment data) were from previous study by microarray analysis.
Data
Supplementary Figures and Supplementary Tables
Data
List of BR-induced genes revealed by microarrays. The BR-induced genes in seedling or adult plants derived from previous microarray studies. The original data were also reanalyzed as described in Methods section.
Data
List of genes down-regulated in RD26OX transgenic plants. The genes down-regulated in RD26OX plants compared to WT were identified in this study by RNA-seq with 4-week-old adult plants without BL treatment (Fig. 2).
Data
List of genes up-regulated in rd26 anac019 anac055 anac102 quadruple mutants. The genes up-regulated in rd26 anac019 anac055 anac102 quadruple mutant plants compared to WT were identified in this study by RNA-seq with 4-week-old adult plants without BL treatment (Supplementary Figure 4).
Data
List of genes down-regulated in rd26 anac019 anac055 anac102 quadruple mutants. The genes down-regulated in rd26 anac019 anac055 anac102 quadruple mutant plants compared to WT were identified in this study by RNA-seq with 4-week-old adult plants without BL treatment (Supplementary Figure 4).
Conference Paper
The Automata Processor was designed for string-pattern matching. In this paper, we showcase its use to execute integer and floating-point comparisons and apply the same to accelerate interval stabbing queries. An interval stabbing query determines which of the intervals in a set overlap a query point. Such queries are often used in computational ge...