John Chong Mu

John Chong Mu
Bina Technologies

PhD

About

18
Publications
5,142
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
624
Citations
Additional affiliations
August 2009 - September 2014
Stanford University
Position
  • Student

Publications

Publications (18)
Article
Full-text available
Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines wi...
Article
Full-text available
LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multipass sequencing with P5 and P6 chemistries, producing data i...
Preprint
Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens , the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines w...
Article
Full-text available
A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-Throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus...
Article
Full-text available
SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genom...
Article
Identifying somatic mutations is a key analysis in cancer research. The challenge lies in the impure and heterogeneous nature of the tumor samples. Oftentimes, an algorithm works well for one tumor but poorly for another. Here, we present an ensemble approach that integrates multiple algorithms and demonstrate its performance and high accuracy with...
Article
Full-text available
Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous wo...
Article
Full-text available
VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously repo...
Conference Paper
Background / Purpose: Currently there is a lack of comprehensive simulation validation framework for next generation sequencing (NGS) analysis. Multiple agreed-upon validation datasets are critical for development of new secondary analysis methods, and read simulation is a bottleneck when simulating high coverage data. The genome in a bottle cons...
Conference Paper
Background / Purpose: Structural variations (SVs) are large genomic rearrangements, including deletion, insertion, inversion, duplication and translocation. SV detection is a key challenge with next-generation sequencing reads since SVs are generally much larger than read length. Accuracy of SV detection varies significantly by type, region and s...
Article
Full-text available
Our previous finding of a fractal pattern for gastric pH and esophageal pH plus the statistical association of sequential pH values for up to 2 h led to our hypothesis that the fractal pattern encodes information regarding gastric acidity and that depending on the value of gastric acidity, the esophagus can signal the stomach to alter gastric acidi...
Article
Full-text available
Optional P\'{o}lya Tree (OPT) is a flexible non-parametric Bayesian model for density estimation. Despite its merits, the computation for OPT inference is challenging. In this paper we present time complexity analysis for OPT inference and propose two algorithmic improvements. The first improvement, named Limited-Lookahead Optional P\'{o}lya Tree (...
Article
Full-text available
Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large indels is a computationally challenging task for researchers. We i...
Article
Although uridine-rich small nuclear RNAs (U-snRNAs) are essential for pre-mRNA splicing, little is known regarding their function in the regulation of alternative splicing or of the biological consequences of their dysfunction in mammals. Here, we demonstrate that mutation of Rnu2-8, one of the mouse multicopy U2 snRNA genes, causes ataxia and neur...
Conference Paper
Full-text available
We propose a novel design of rate-compatible protograph-based low-density parity-check code families that can cover a wide range of code rates. In contrast to traditional method of lifting, our lifting method use a combination of different circulant matrix sizes for the base protograph and accumulator to increase the number of achievable rates. We...

Network

Cited By