Niranjan Nagarajan

Genome Institute of Singapore, Tumasik, Singapore

Are you Niranjan Nagarajan?

Claim your profile

Publications (56)406.77 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Colorectal cancer with metastases limited to the liver (liver-limited mCRC) is a distinct clinical subset characterized by possible cure with surgery. We performed high-depth sequencing of over 750 cancer-associated genes and copy number profiling in matched primary, metastasis and normal tissues to characterize genomic progression in 18 patients with liver-limited mCRC. High depth Illumina sequencing and use of three different variant callers enable comprehensive and accurate identification of somatic variants down to 2.5% variant allele frequency. We identify a median of 11 somatic single nucleotide variants (SNVs) per tumor. Across patients, a median of 79.3% of somatic SNVs present in the primary are present in the metastasis and 81.7% of all alterations present in the metastasis are present in the primary. Private alterations are found at lower allele frequencies; a different mutational signature characterized shared and private variants, suggesting distinct mutational processes. Using B-allele frequencies of heterozygous germline SNPs and copy number profiling, we find that broad regions of allelic imbalance and focal copy number changes, respectively, are generally shared between the primary tumor and metastasis. Our analyses point to high genomic concordance of primary tumor and metastasis, with a thick common trunk and smaller genomic branches in general support of the linear progression model in most patients with liver-limited mCRC. More extensive studies are warranted to further characterize genomic progression in this important clinical population.
    Genome Biology 12/2015; 16(1). DOI:10.1186/s13059-015-0589-1 · 10.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Genome rearrangements, a hallmark of cancer, can result in gene fusions with oncogenic properties. Using DNA paired-end-tag (DNA-PET) whole-genome sequencing, we analyzed 15 gastric cancers (GCs) from Southeast Asians. Rearrangements were enriched in open chromatin and shaped by chromatin structure. We identified seven rearrangement hot spots and 136 gene fusions. In three out of 100 GC cases, we found recurrent fusions between CLDN18, a tight junction gene, and ARHGAP26, a gene encoding a RHOA inhibitor. Epithelial cell lines expressing CLDN18-ARHGAP26 displayed a dramatic loss of epithelial phenotype and long protrusions indicative of epithelial-mesenchymal transition (EMT). Fusion-positive cell lines showed impaired barrier properties, reduced cell-cell and cell-extracellular matrix adhesion, retarded wound healing, and inhibition of RHOA. Gain of invasion was seen in cancer cell lines expressing the fusion. Thus, CLDN18-ARHGAP26 mediates epithelial disintegration, possibly leading to stomach H(+) leakage, and the fusion might contribute to invasiveness once a cell is transformed. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
    Cell Reports 07/2015; 20. DOI:10.1016/j.celrep.2015.06.020 · 8.36 Impact Factor
  • Source
    Davide Verzotto · Audrey S M Teo · Axel M Hillmer · Niranjan Nagarajan
    [Show abstract] [Hide abstract]
    ABSTRACT: Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and mapping technologies (e.g. optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kbp–2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging due to the lack of efficient and freely available software for robustly aligning maps to sequences. Here we introduce two new map-to-sequence alignment algorithms that efficiently and accurately align high-throughput mapping datasets to large, eukaryotic genomes while accounting for high error rates. In order to do so, these methods (OPTIMA for glocal and OPTIMA-Overlap for overlap alignment) exploit the ability to create efficient data structures that index continuous-valued mapping data while accounting for errors. We also introduce an approach for evaluating the significance of alignments that avoids expensive permutation-based tests while being agnostic to technology-dependent error rates. Our benchmarking results suggest that OPTIMA and OPTIMA-Overlap outperform state-of-the-art approaches in sensitivity (1.6–2× improvement) while simultaneously being more efficient (170–200%) and precise in their alignments (99% precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust and provide improved sensitivity while guaranteeing high precision.
    Fifth RECOMB Satellite Workshop on Massively Parallel Sequencing (RECOMB-Seq 2015), Warsaw, Poland; 04/2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Extensive and multi-dimensional data sets generated from recent cancer omics profiling projects have presented new challenges and opportunities for unraveling the complexity of cancer genome landscapes. In particular, distinguishing the unique complement of genes that drive tumorigenesis in each patient from a sea of passenger mutations is necessary for translating the full benefit of cancer genome sequencing into the clinic. We address this need by presenting a data integration framework (OncoIMPACT) to nominate patient-specific driver genes based on their phenotypic impact. Extensive in silico and in vitro validation helped establish OncoIMPACT's robustness, improved precision over competing approaches and verifiable patient and cell line specific predictions (2/2 and 6/7 true positives and negatives, respectively). In particular, we computationally predicted and experimentally validated the gene TRIM24 as a putative novel amplified driver in a melanoma patient. Applying OncoIMPACT to more than 1000 tumor samples, we generated patient-specific driver gene lists in five different cancer types to identify modes of synergistic action. We also provide the first demonstration that computationally derived driver mutation signatures can be overall superior to single gene and gene expression based signatures in enabling patient stratification and prognostication. Source code and executables for OncoIMPACT are freely available from http://sourceforge.net/projects/oncoimpact. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
    Nucleic Acids Research 01/2015; 43(7). DOI:10.1093/nar/gku1393 · 9.11 Impact Factor
  • Source
    Huaien Luo · Juntao Li · Burton Kuan Hui Chia · Paul Robson · Niranjan Nagarajan
    [Show abstract] [Hide abstract]
    ABSTRACT: High-throughput assays, such as RNA-seq, to detect differential abundance are widely used. Variable performance across statistical tests, normalizations, and conditions leads to resource wastage and reduced sensitivity. EDDA represents a first, general design tool for RNA-seq, Nanostring, and metagenomic analysis, that rationally selects tests, predicts performance, and plans experiments to minimize resource wastage. Case studies highlight EDDA’s ability to model single-cell RNA-seq, suggesting ways to reduce sequencing costs up to five-fold and improving metagenomic biomarker detection through improved test selection. EDDA’s novel mode-based normalization for detecting differential abundance improves robustness by 10% to 20% and precision by up to 140%. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0527-7) contains supplementary material, which is available to authorized users.
    Genome Biology 12/2014; 15(12):527. DOI:10.1186/s13059-014-0527-7 · 10.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a method for obtaining long haplotypes, of over 3 kb in length, using a short-read sequencer, Barcode-directed Assembly for Extra-long Sequences (BAsE-Seq). BAsE-Seq relies on transposing a template-specific barcode onto random segments of the template molecule and assembling the barcoded short reads into complete haplotypes. We applied BAsE-Seq on mixed clones of hepatitis B virus and accurately identified haplotypes occurring at frequencies greater than or equal to 0.4%, with >99.9% specificity. Applying BAsE-Seq to a clinical sample, we obtained over 9,000 viral haplotypes, which provided an unprecedented view of hepatitis B virus population structure during chronic infection. BAsE-Seq is readily applicable for monitoring quasispecies evolution in viral diseases. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0517-9) contains supplementary material, which is available to authorized users.
    Genome Biology 11/2014; 15(11):517. DOI:10.1186/PREACCEPT-6768001251451949 · 10.47 Impact Factor
  • Source
    Shanquan Wang · Kern Rei Chng · Chen Wu · Andreas Wilm · Niranjan Nagarajan · Jianzhong He
    [Show abstract] [Hide abstract]
    ABSTRACT: Dehalococcoides mccartyi strain SG1, isolated from digester sludge, dechlorinates polychlorinated biphenyls (PCBs) to lower congeners. Here we report the draft genome sequence of SG1, which carries a 22.65 kbp circular putative plasmid.
    Genome Announcements 09/2014; 2(5). DOI:10.1128/genomeA.00901-14
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Chromosomal structural variations play an important role in determining the transcriptional landscape of human breast cancers. To assess the nature of these structural variations, we analyzed eight breast tumor samples with a focus on regions of gene amplification using mate-pair sequencing of long-insert genomic DNA with matched transcriptome profiling. We found that tandem duplications appear to be early events in tumor evolution, especially in the genesis of amplicons. In a detailed reconstruction of events on chromosome 17, we found large unpaired inversions and deletions connect a tandemly duplicated ERBB2 with neighboring 17q21.3 amplicons while simultaneously deleting the intervening BRCA1 tumor suppressor locus. This series of events appeared to be unusually common when examined in larger genomic data sets of breast cancers albeit using approaches with lesser resolution. Using siRNAs in breast cancer cell lines, we showed that the 17q21.3 amplicon harbored a significant number of weak oncogenes that appeared consistently coamplified in primary tumors. Down-regulation of BRCA1 expression augmented the cell proliferation in ERBB2-transfected human normal mammary epithelial cells. Coamplification of other functionally tested oncogenic elements in other breast tumors examined, such as RIPK2 and MYC on chromosome 8, also parallel these findings. Our analyses suggest that structural variations efficiently orchestrate the gain and loss of cancer gene cassettes that engage many oncogenic pathways simultaneously and that such oncogenic cassettes are favored during the evolution of a cancer.
    Genome Research 09/2014; 24(10). DOI:10.1101/gr.164871.113 · 13.85 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Fastidious anaerobic bacteria play critical roles in environmental bioremediation of halogenated compounds. However, their characterization and application have been largely impeded by difficulties in growing them in pure culture. Thus far, no pure culture has been reported to respire on the notorious polychlorinated biphenyls (PCBs), and functional genes responsible for PCB detoxification remain unknown due to the extremely slow growth of PCB-respiring bacteria. Here we report the successful isolation and characterization of three Dehalococcoides mccartyi strains that respire on commercial PCBs. Using high-throughput metagenomic analysis, combined with traditional culture techniques, tetrachloroethene (PCE) was identified as a feasible alternative to PCBs to isolate PCB-respiring Dehalococcoides from PCB-enriched cultures. With PCE as an alternative electron acceptor, the PCB-respiring Dehalococcoides were boosted to a higher cell density (1.2 × 10(8) to 1.3 × 10(8) cells per mL on PCE vs. 5.9 × 10(6) to 10.4 × 10(6) cells per mL on PCBs) with a shorter culturing time (30 d on PCE vs. 150 d on PCBs). The transcriptomic profiles illustrated that the distinct PCB dechlorination profile of each strain was predominantly mediated by a single, novel reductive dehalogenase (RDase) catalyzing chlorine removal from both PCBs and PCE. The transcription levels of PCB-RDase genes are 5-60 times higher than the genome-wide average. The cultivation of PCB-respiring Dehalococcoides in pure culture and the identification of PCB-RDase genes deepen our understanding of organohalide respiration of PCBs and shed light on in situ PCB bioremediation.
    Proceedings of the National Academy of Sciences 07/2014; 111(33). DOI:10.1073/pnas.1404845111 · 9.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Opisthorchiasis is a neglected, tropical disease caused by the carcinogenic Asian liver fluke, Opisthorchis viverrini. This hepatobiliary disease is linked to malignant cancer (cholangiocarcinoma, CCA) and affects millions of people in Asia. No vaccine is available, and only one drug (praziquantel) is used against the parasite. Little is known about O. viverrini biology and the diseases that it causes. Here we characterize the draft genome (634.5 Mb) and transcriptomes of O. viverrini, elucidate how this fluke survives in the hostile environment within the bile duct and show that metabolic pathways in the parasite are highly adapted to a lipid-rich diet from bile and/or cholangiocytes. We also provide additional evidence that O. viverrini and other flukes secrete proteins that directly modulate host cell proliferation. Our molecular resources now underpin profound explorations of opisthorchiasis/CCA and the design of new interventions.
    Nature Communications 07/2014; 5:4378. DOI:10.1038/ncomms5378 · 10.74 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: RNA viruses are notorious for their ability to quickly adapt to selective pressure from the host immune system and/or antivirals. This adaptability is likely due to the error-prone characteristics of their RNA-dependent, RNA polymerase [1, 2]. Dengue virus, a member of the Flaviviridae family of positive-strand RNA viruses, is also known to share these error-prone characteristics [3]. Utilizing high-throughput, massively parallel sequencing methodologies, or next-generation sequencing (NGS), we can now accurately quantify these populations of viruses and track the changes to these populations over the course of a single infection. The aim of this chapter is twofold: to describe the methodologies required for sample preparation prior to sequencing and to describe the bioinformatics analyses required for the resulting data.
    Methods in molecular biology (Clifton, N.J.) 01/2014; 1138:175-95. DOI:10.1007/978-1-4939-0348-1_12 · 1.29 Impact Factor
  • Niranjan Nagarajan · Rafael Navajas-Pérez
    [Show abstract] [Hide abstract]
    ABSTRACT: In this chapter, we report a detailed analysis of repetitive elements in the papaya genome, including transposable elements (TEs), tandemly arrayed sequences, and high copy number genes. These repetitive sequences account for ~56 % of the papaya genome, with TEs being the most abundant at 52 %, tandem repeats at 1.3 %, and high copy number genes at 3 %. Most common types of TEs are represented in the papaya genome with retrotransposons being the dominant class, accounting for 40 % of the genome. The most prevalent retrotransposons are Ty3–gypsy (27.8 %) and Ty1–copia (5.5 %). Among the tandem repeats, microsatellites are the most abundant in number but represent only 0.19 % of the genome. Minisatellites and satellites are less abundant but represent 0.68 and 0.43 % of the genome, respectively, due to greater repeat length. Despite an overall smaller gene repertoire in papaya than many other angiosperms, a significant fraction of genes (>2 %) are present in large gene families with copy number greater than 20. Papaya sex chromosomes are significantly enriched of a repertoire of repetitive sequences, and the male-specific region expanded by massively accumulation of repeated DNA, representing 83 % (mostly TE), while the corresponding X region included 70 % of such repeats. In an effort to integrate all the information, we provide here the pipeline to gather and process data related to repetitive elements in papaya.
    Genetics and Genomics of Papaya, 01/2014: pages 225-240; , ISBN: 978-1-4614-8086-0
  • Source
    Andreas Wilm · Denis Bertrand · Li Juntao · Niranjan Nagarajan
    [Show abstract] [Hide abstract]
    ABSTRACT: Background / Purpose: High-throughput sequencing datasets, in principle, enable the detection of extremely low frequency variants seen in a given cell population, to study their evolution and impact on phenotypes of interest. The use of ad hoc filters and statistics can however limit the sensitivity and specificity of detection, particularly when multiple samples are compared, as is the case for somatic variant calling and time course studies. Main conclusion: We demonstrate the utility of a systematic framework for variant calling (LoFreq) that simultaneously incorporates sequence quality, mapping quality, alignment quality and source quality information, allowing for single nucleotide variants and indels to be jointly called. Our benchmarking and validation results on real and in silico datasets demonstrate that this approach provides a significant boost in sensitivity over existing variant callers (accurately calling variants at less than 1% frequency), while retaining very high specificity.
    Joint 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 12th European Conference on Computational Biology (ECCB) 2013; 08/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Rickettsia prowazekii is a notable intracellular pathogen, the agent of epidemic typhus, and a potential biothreat agent. We present here whole-genome sequence data for four strains of R. prowazekii, including one from a flying squirrel.
    Genome Announcements 05/2013; 1(3). DOI:10.1128/genomeA.00399-13
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing Illumina sequencing protocols are, however, limited by the fraction of the 16S rRNA gene that is interrogated and therefore limit the resolution and quality of the profiling. To address this, we present the design of a novel protocol for shotgun Illumina sequencing of the bacterial 16S rRNA gene, optimized to amplify more than 90% of sequences in the Greengenes database and with the ability to distinguish nearly twice as many species-level OTUs compared to existing protocols. Using several in silico and experimental datasets, we demonstrate that despite the presence of multiple variable and conserved regions, the resulting shotgun sequences can be used to accurately quantify the constituents of complex microbial communities. The reconstruction of a significant fraction of the 16S rRNA gene also enabled high precision (>90%) in species-level identification thereby opening up potential application of this approach for clinical microbial characterization.
    PLoS ONE 04/2013; 8(4):e60811. DOI:10.1371/journal.pone.0060811 · 3.23 Impact Factor
  • Niranjan Nagarajan · Mihai Pop
    [Show abstract] [Hide abstract]
    ABSTRACT: Advances in sequencing technologies and increased access to sequencing services have led to renewed interest in sequence and genome assembly. Concurrently, new applications for sequencing have emerged, including gene expression analysis, discovery of genomic variants and metagenomics, and each of these has different needs and challenges in terms of assembly. We survey the theoretical foundations that underlie modern assembly and highlight the options and practical trade-offs that need to be considered, focusing on how individual features address the needs of specific applications. We also review key software and the interplay between experimental design and efficacy of assembly.
    Nature Reviews Genetics 01/2013; 14(3). DOI:10.1038/nrg3367 · 39.79 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Gastric cancer is the second highest cause of global cancer mortality. To explore the complete repertoire of somatic alterations in gastric cancer, we combined massively-parallel short read and DNA-PET sequencing to present the first whole-genome analysis of two gastric adenocarcinomas, one with chromosomal instability and the other with microsatellite instability. RESULTS: Integrative analysis and de novo assemblies revealed the architecture of a wild-type KRAS amplification, a common driver event in gastric cancer. We discovered three distinct mutational signatures in gastric cancer - against a genome-wide backdrop of oxidative and microsatellite instability-related mutational signatures, we identified the first exome-specific mutational signature. Further characterization of the impact of these signatures by combining sequencing data from 40 complete gastric cancer exomes and targeted screening of an additional 94 independent gastric tumours uncovered ACVR2A, RPL22 and LMAN1 as recurrently mutated genes in microsatellite instability-positive gastric cancer and PAPPA as a recurrently mutated gene in TP53 wild-type gastric cancer. CONCLUSIONS: These results highlight how whole-genome cancer sequencing can uncover information relevant to tissue-specific carcinogenesis that would otherwise be missed from exome-sequencing data.
    Genome biology 12/2012; 13(12):R115. DOI:10.1186/gb-2012-13-12-r115 · 10.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Oranges are an important nutritional source for human health and have immense economic value. Here we present a comprehensive analysis of the draft genome of sweet orange (Citrus sinensis). The assembled sequence covers 87.3% of the estimated orange genome, which is relatively compact, as 20% is composed of repetitive elements. We predicted 29,445 protein-coding genes, half of which are in the heterozygous state. With additional sequencing of two more citrus species and comparative analyses of seven citrus genomes, we present evidence to suggest that sweet orange originated from a backcross hybrid between pummelo and mandarin. Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis. This draft genome represents a valuable resource for understanding and improving many important citrus traits in the future.
    Nature Genetics 11/2012; 45(1). DOI:10.1038/ng.2472 · 29.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/.
    Nucleic Acids Research 10/2012; 40(22). DOI:10.1093/nar/gks918 · 9.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Structural variations (SVs) contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. While genomic DNA paired-end-tag (DNA-PET) sequencing is an attractive approach to identify genomic SVs, the current application of PET sequencing with short insert size DNA can be insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We employed a recently developed procedure to generate PET sequencing data using large DNA inserts of 10-20 kb and compared their characteristics with short insert (1 kb) libraries for their ability to identify SVs. Our results suggest that although short insert libraries bear an advantage in identifying small deletions, they do not provide significantly better breakpoint resolution. In contrast, large inserts are superior to short inserts in providing higher physical genome coverage for the same sequencing cost and achieve greater sensitivity, in practice, for the identification of several classes of SVs, such as copy number neutral and complex events. Furthermore, our results confirm that large insert libraries allow for the identification of SVs within repetitive sequences, which cannot be spanned by short inserts. This provides a key advantage in studying rearrangements in cancer, and we show how it can be used in a fusion-point-guided-concatenation algorithm to study focally amplified regions in cancer.
    PLoS ONE 09/2012; 7(9):e46152. DOI:10.1371/journal.pone.0046152 · 3.23 Impact Factor

Publication Stats

2k Citations
406.77 Total Impact Points

Institutions

  • 2010–2015
    • Genome Institute of Singapore
      • Computational and Systems Biology Group
      Tumasik, Singapore
  • 2008–2013
    • University of Maryland, College Park
      • • Department of Computer Science
      • • Center for Bioinformatics and Computational Biology
      CGS, Maryland, United States
    • Loyola University Maryland
      Baltimore, Maryland, United States
  • 2003–2008
    • Cornell University
      • Department of Computer Science
      Ithaca, New York, United States