About
28
Publications
5,277
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
470
Citations
Publications
Publications (28)
By analyzing two large atlases of almost 4 million cells, we show that immune‐senescence involves a gradual loss of cellular identity, reflecting increased cellular heterogeneity, for effector, and cytotoxic immune cells. The effects are largely similar in both males and females and were robustly reproduced in two atlases, one assembled from 35 div...
Motivation
Integrative analysis of large-scale single cell data collected from diverse cell populations promises an improved understanding of complex biological systems. While several algorithms have been developed for single cell RNA-sequencing data integration, many lack scalability to handle large numbers of datasets and/or millions of cells due...
Motivation:
Gene network reconstruction from gene expression profiles is a compute- and data-intensive problem. Numerous methods based on diverse approaches including mutual information, random forests, Bayesian networks, correlation measures, as well as their transforms and filters such as data processing inequality, have been proposed. However,...
Bayesian networks (BNs) are a widely used graphical model in machine learning. As learning the structure of BNs is NP-hard, high-performance computing methods are necessary for constructing large-scale networks. In this paper, we present a parallel framework to scale BN structure learning algorithms to tens of thousands of variables. Our framework...
Motivation
Reconstruction of genome-scale networks from gene expression data is an actively studied problem. A wide range of methods that differ between the types of interactions they uncover with varying trade-offs between sensitivity and specificity have been proposed. To leverage benefits of multiple such methods, ensemble network methods that c...
Background
Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACSk, have been shown to produce results as effective...
Identifying long pairwise maximal common substrings among a large set of sequences is a frequently used construct in computational biology, with applications in DNA sequence clustering and assembly. Due to errors made by sequencers, algorithms that can accommodate a small number of differences are of particular interest. Formally, let D be a collec...
Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACS_k, have been shown to produce results as effective as multip...
We present a novel algorithmic framework for solving approximate sequence matching problems that permit a bounded total number k of mismatches, insertions, and deletions. The core of the framework relies on transforming an approximate matching problem into a corresponding exact matching problem on suitably edited string suffixes, while carefully co...
State-of-the-art high-throughput sequencing instruments decipher in excess of a billion short genomic fragments per run. The output sequences are referred to as 'reads'. These read datasets facilitate a wide variety of analyses with applications in areas such as genomics, metagenomics, and transcriptomics. Owing to the large size of the read datase...
Motivation: Reverse engineering gene networks from expression data is a widelymstudied problem, for which numerous mathematical models have been developed. Network reconstruction methods can be used to study specific pathways, or can be applied at the whole-genome scale to analyze large compendiums of expression datasets to uncover genome-wide inte...
Background
Alignment-free sequence comparison approaches have been garnering increasing interest in various data- and compute-intensive applications such as phylogenetic inference for large-scale sequences. While k-mer based methods are predominantly used in real applications, the average common substring (ACS) approach is emerging as one of the pr...
Background
Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Molecular analysis has been frequently used in the study of HCV outbreaks and tr...
Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This i...
Identifying long pairwise maximal common substrings among a large set of sequences is a frequently used construct in computational biology, with applications in DNA sequence clustering and assembly. Due to errors made by sequencers, algorithms that can accommodate a small number of differences are of particular interest, but obtaining provably effi...
Reverse-engineering genome-scale gene networks from gene expression data is a principal challenge in systems biology. Mutual information (MI) based methods are favored because of their ability to recover non-linear relationships, low algorithmic complexity, and their successful use in various biological applications such as gene function prediction...
Alignment-free approaches are gaining persistent interest in many sequence analysis applications such as phylogenetic inference and metagenomic classification/clustering, especially for large-scale sequence datasets. Besides the widely used k-mer methods, the average common substring (ACS) approach has emerged to be one of the well-known alignment-...
Alignment-free sequence comparison approaches have been garnering increasing interests in various data- and compute-intensive applications such as phylogenetic inference for large-scale sequences. While k-mer based methods are predominantly used in real applications, average common substring (ACS) approach is emerging as one of the prominent alignm...
Learning Bayesian networks is NP-hard. Even with recent progress in heuristic and parallel algorithms, modeling capabilities still fall short of the scale of the problems encountered. In this paper, we present a massively parallel method for Bayesian network structure learning, and demonstrate its capability by constructing genome-scale gene networ...
Correcting sequence errors in high-throughput DNA sequencing by taking advantage of redundant sampling and low error rates is often an important first step in applications of this technology. Consequently, a number of error correction methods have been developed in the recent years. Due to an order of magnitude throughput gain per year, some of the...
Unlabelled:
Error Correction is important for most next-generation sequencing applications because highly accurate sequenced reads will likely lead to higher quality results. Many techniques for error correction of sequencing data from next-gen platforms have been developed in the recent years. However, compared with the fast development of sequen...