Niranjan Nagarajan

Genome Institute of Singapore, Tumasik, Singapore

Are you Niranjan Nagarajan?

Claim your profile

Publications (47)340.88 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Chromosomal structural variations play an important role in determining the transcriptional landscape of human breast cancers. To assess the nature of these structural variations, we analyzed eight breast tumor samples with a focus on regions of gene amplification using mate-pair sequencing of long-insert genomic DNA with matched transcriptome profiling. We found that tandem duplications appear to be early events in tumor evolution, especially in the genesis of amplicons. In a detailed reconstruction of events on chromosome 17, we found large unpaired inversions and deletions connect a tandemly duplicated ERBB2 with neighboring 17q21.3 amplicons while simultaneously deleting the intervening BRCA1 tumor suppressor locus. This series of events appeared to be unusually common when examined in larger genomic data sets of breast cancers albeit using approaches with lesser resolution. Using siRNAs in breast cancer cell lines, we showed that the 17q21.3 amplicon harbored a significant number of weak oncogenes that appeared consistently coamplified in primary tumors. Down-regulation of BRCA1 expression augmented the cell proliferation in ERBB2-transfected human normal mammary epithelial cells. Coamplification of other functionally tested oncogenic elements in other breast tumors examined, such as RIPK2 and MYC on chromosome 8, also parallel these findings. Our analyses suggest that structural variations efficiently orchestrate the gain and loss of cancer gene cassettes that engage many oncogenic pathways simultaneously and that such oncogenic cassettes are favored during the evolution of a cancer.
    Genome research. 09/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Fastidious anaerobic bacteria play critical roles in environmental bioremediation of halogenated compounds. However, their characterization and application have been largely impeded by difficulties in growing them in pure culture. Thus far, no pure culture has been reported to respire on the notorious polychlorinated biphenyls (PCBs), and functional genes responsible for PCB detoxification remain unknown due to the extremely slow growth of PCB-respiring bacteria. Here we report the successful isolation and characterization of three Dehalococcoides mccartyi strains that respire on commercial PCBs. Using high-throughput metagenomic analysis, combined with traditional culture techniques, tetrachloroethene (PCE) was identified as a feasible alternative to PCBs to isolate PCB-respiring Dehalococcoides from PCB-enriched cultures. With PCE as an alternative electron acceptor, the PCB-respiring Dehalococcoides were boosted to a higher cell density (1.2 × 10(8) to 1.3 × 10(8) cells per mL on PCE vs. 5.9 × 10(6) to 10.4 × 10(6) cells per mL on PCBs) with a shorter culturing time (30 d on PCE vs. 150 d on PCBs). The transcriptomic profiles illustrated that the distinct PCB dechlorination profile of each strain was predominantly mediated by a single, novel reductive dehalogenase (RDase) catalyzing chlorine removal from both PCBs and PCE. The transcription levels of PCB-RDase genes are 5-60 times higher than the genome-wide average. The cultivation of PCB-respiring Dehalococcoides in pure culture and the identification of PCB-RDase genes deepen our understanding of organohalide respiration of PCBs and shed light on in situ PCB bioremediation.
    Proceedings of the National Academy of Sciences of the United States of America. 07/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Opisthorchiasis is a neglected, tropical disease caused by the carcinogenic Asian liver fluke, Opisthorchis viverrini. This hepatobiliary disease is linked to malignant cancer (cholangiocarcinoma, CCA) and affects millions of people in Asia. No vaccine is available, and only one drug (praziquantel) is used against the parasite. Little is known about O. viverrini biology and the diseases that it causes. Here we characterize the draft genome (634.5 Mb) and transcriptomes of O. viverrini, elucidate how this fluke survives in the hostile environment within the bile duct and show that metabolic pathways in the parasite are highly adapted to a lipid-rich diet from bile and/or cholangiocytes. We also provide additional evidence that O. viverrini and other flukes secrete proteins that directly modulate host cell proliferation. Our molecular resources now underpin profound explorations of opisthorchiasis/CCA and the design of new interventions.
    Nature Communications 01/2014; 5:4378. · 10.74 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: RNA viruses are notorious for their ability to quickly adapt to selective pressure from the host immune system and/or antivirals. This adaptability is likely due to the error-prone characteristics of their RNA-dependent, RNA polymerase [1, 2]. Dengue virus, a member of the Flaviviridae family of positive-strand RNA viruses, is also known to share these error-prone characteristics [3]. Utilizing high-throughput, massively parallel sequencing methodologies, or next-generation sequencing (NGS), we can now accurately quantify these populations of viruses and track the changes to these populations over the course of a single infection. The aim of this chapter is twofold: to describe the methodologies required for sample preparation prior to sequencing and to describe the bioinformatics analyses required for the resulting data.
    Methods in molecular biology (Clifton, N.J.) 01/2014; 1138:175-95. · 1.29 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dehalococcoides mccartyi strain SG1, isolated from digester sludge, dechlorinates polychlorinated biphenyls (PCBs) to lower congeners. Here we report the draft genome sequence of SG1, which carries a 22.65 kbp circular putative plasmid.
    Genome announcements. 01/2014; 2(5).
  • Source
    Andreas Wilm, Denis Bertrand, Li Juntao, Niranjan Nagarajan
    [Show abstract] [Hide abstract]
    ABSTRACT: Background / Purpose: High-throughput sequencing datasets, in principle, enable the detection of extremely low frequency variants seen in a given cell population, to study their evolution and impact on phenotypes of interest. The use of ad hoc filters and statistics can however limit the sensitivity and specificity of detection, particularly when multiple samples are compared, as is the case for somatic variant calling and time course studies. Main conclusion: We demonstrate the utility of a systematic framework for variant calling (LoFreq) that simultaneously incorporates sequence quality, mapping quality, alignment quality and source quality information, allowing for single nucleotide variants and indels to be jointly called. Our benchmarking and validation results on real and in silico datasets demonstrate that this approach provides a significant boost in sensitivity over existing variant callers (accurately calling variants at less than 1% frequency), while retaining very high specificity.
    Joint 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 12th European Conference on Computational Biology (ECCB) 2013; 08/2013
  • Niranjan Nagarajan, Mihai Pop
    [Show abstract] [Hide abstract]
    ABSTRACT: Advances in sequencing technologies and increased access to sequencing services have led to renewed interest in sequence and genome assembly. Concurrently, new applications for sequencing have emerged, including gene expression analysis, discovery of genomic variants and metagenomics, and each of these has different needs and challenges in terms of assembly. We survey the theoretical foundations that underlie modern assembly and highlight the options and practical trade-offs that need to be considered, focusing on how individual features address the needs of specific applications. We also review key software and the interplay between experimental design and efficacy of assembly.
    Nature Reviews Genetics 01/2013; · 41.06 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing Illumina sequencing protocols are, however, limited by the fraction of the 16S rRNA gene that is interrogated and therefore limit the resolution and quality of the profiling. To address this, we present the design of a novel protocol for shotgun Illumina sequencing of the bacterial 16S rRNA gene, optimized to amplify more than 90% of sequences in the Greengenes database and with the ability to distinguish nearly twice as many species-level OTUs compared to existing protocols. Using several in silico and experimental datasets, we demonstrate that despite the presence of multiple variable and conserved regions, the resulting shotgun sequences can be used to accurately quantify the constituents of complex microbial communities. The reconstruction of a significant fraction of the 16S rRNA gene also enabled high precision (>90%) in species-level identification thereby opening up potential application of this approach for clinical microbial characterization.
    PLoS ONE 01/2013; 8(4):e60811. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Rickettsia prowazekii is a notable intracellular pathogen, the agent of epidemic typhus, and a potential biothreat agent. We present here whole-genome sequence data for four strains of R. prowazekii, including one from a flying squirrel.
    Genome announcements. 01/2013; 1(3).
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Gastric cancer is the second highest cause of global cancer mortality. To explore the complete repertoire of somatic alterations in gastric cancer, we combined massively-parallel short read and DNA-PET sequencing to present the first whole-genome analysis of two gastric adenocarcinomas, one with chromosomal instability and the other with microsatellite instability. RESULTS: Integrative analysis and de novo assemblies revealed the architecture of a wild-type KRAS amplification, a common driver event in gastric cancer. We discovered three distinct mutational signatures in gastric cancer - against a genome-wide backdrop of oxidative and microsatellite instability-related mutational signatures, we identified the first exome-specific mutational signature. Further characterization of the impact of these signatures by combining sequencing data from 40 complete gastric cancer exomes and targeted screening of an additional 94 independent gastric tumours uncovered ACVR2A, RPL22 and LMAN1 as recurrently mutated genes in microsatellite instability-positive gastric cancer and PAPPA as a recurrently mutated gene in TP53 wild-type gastric cancer. CONCLUSIONS: These results highlight how whole-genome cancer sequencing can uncover information relevant to tissue-specific carcinogenesis that would otherwise be missed from exome-sequencing data.
    Genome biology 12/2012; 13(12):R115. · 10.30 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Oranges are an important nutritional source for human health and have immense economic value. Here we present a comprehensive analysis of the draft genome of sweet orange (Citrus sinensis). The assembled sequence covers 87.3% of the estimated orange genome, which is relatively compact, as 20% is composed of repetitive elements. We predicted 29,445 protein-coding genes, half of which are in the heterozygous state. With additional sequencing of two more citrus species and comparative analyses of seven citrus genomes, we present evidence to suggest that sweet orange originated from a backcross hybrid between pummelo and mandarin. Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis. This draft genome represents a valuable resource for understanding and improving many important citrus traits in the future.
    Nature Genetics 11/2012; · 35.21 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/.
    Nucleic Acids Research 10/2012; · 8.81 Impact Factor
  • Song Gao, Denis Bertrand, Niranjan Nagarajan
    [Show abstract] [Hide abstract]
    ABSTRACT: With the increased democratization of sequencing, the reliance of sequence assembly programs on heuristics is at odds with the need for black-box assembly solutions that can be used reliably by non-specialists. In this work, we present a formal definition for in silico assembly validation and finishing and explore the feasibility of an exact solution for this problem using quadratic programming (FinIS). Based on results for several real and simulated datasets, we demonstrate that FinIS validates the correctness of a larger fraction of the assembly than existing ad hoc tools. Using a test for unique optimal solutions, we show that FinIS can improve on both precision and recall values for the correctness of assembled sequences, when compared to competing programs. Source code and executables for FinIS are freely available at http://sourceforge.net/projects/finis/.
    Proceedings of the 12th international conference on Algorithms in Bioinformatics; 09/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background. Dengue is the most common arboviral infection of humans. There are currently no specific treatments for dengue. Balapiravir is a prodrug of a nucleoside analogue (called R1479) and an inhibitor of hepatitis C virus replication in vivo.Methods. We conducted in vitro experiments to determine the potency of balapiravir against dengue viruses and then an exploratory, dose-escalating, randomized placebo-controlled trial in adult male patients with dengue with <48 hours of fever.Results. The clinical and laboratory adverse event profile in patients receiving balapiravir at doses of 1500 mg (n = 10) or 3000 mg (n = 22) orally for 5 days was similar to that of patients receiving placebo (n = 32), indicating balapiravir was well tolerated. However, twice daily assessment of viremia and daily assessment of NS1 antigenemia indicated balapiravir did not measurably alter the kinetics of these virological markers, nor did it reduce the fever clearance time. The kinetics of plasma cytokine concentrations and the whole blood transcriptional profile were also not attenuated by balapiravir treatment.Conclusions. Although this trial, the first of its kind in dengue, does not support balapiravir as a candidate drug, it does establish a framework for antiviral treatment trials in dengue and provides the field with a clinically evaluated benchmark molecule.Clinical Trials Registration. NCT01096576.
    The Journal of Infectious Diseases 07/2012; · 5.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe genome mapping on nanochannel arrays. In this approach, specific sequence motifs in single DNA molecules are fluorescently labeled, and the DNA molecules are uniformly stretched in thousands of silicon channels on a nanofluidic device. Fluorescence imaging allows the construction of maps of the physical distances between occurrences of the sequence motifs. We demonstrate the analysis, individually and as mixtures, of 95 bacterial artificial chromosome (BAC) clones that cover the 4.7-Mb human major histocompatibility complex region. We obtain accurate, haplotype-resolved, sequence motif maps hundreds of kilobases in length, resulting in a median coverage of 114× for the BACs. The final sequence motif map assembly contains three contigs. With an average distance of 9 kb between labels, we detect 22 haplotype differences. We also use the sequence motif maps to provide scaffolds for de novo assembly of sequencing data. Nanochannel genome mapping should facilitate de novo assembly of sequencing reads from complex regions in diploid organisms, haplotype and structural variation analysis and comparative genomics.
    Nature Biotechnology 07/2012; 30(8):771-6. · 32.44 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Gastric cancer is a major cause of global cancer mortality. We surveyed the spectrum of somatic alterations in gastric cancer by sequencing the exomes of 15 gastric adenocarcinomas and their matched normal DNAs. Frequently mutated genes in the adenocarcinomas included TP53 (11/15 tumors), PIK3CA (3/15) and ARID1A (3/15). Cell adhesion was the most enriched biological pathway among the frequently mutated genes. A prevalence screening confirmed mutations in FAT4, a cadherin family gene, in 5% of gastric cancers (6/110) and FAT4 genomic deletions in 4% (3/83) of gastric tumors. Frequent mutations in chromatin remodeling genes (ARID1A, MLL3 and MLL) also occurred in 47% of the gastric cancers. We detected ARID1A mutations in 8% of tumors (9/110), which were associated with concurrent PIK3CA mutations and microsatellite instability. In functional assays, we observed both FAT4 and ARID1A to exert tumor-suppressor activity. Somatic inactivation of FAT4 and ARID1A may thus be key tumorigenic events in a subset of gastric cancers.
    Nature Genetics 04/2012; 44(5):570-4. · 35.21 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Tyrosine kinase inhibitors (TKIs) elicit high response rates among individuals with kinase-driven malignancies, including chronic myeloid leukemia (CML) and epidermal growth factor receptor-mutated non-small-cell lung cancer (EGFR NSCLC). However, the extent and duration of these responses are heterogeneous, suggesting the existence of genetic modifiers affecting an individual's response to TKIs. Using paired-end DNA sequencing, we discovered a common intronic deletion polymorphism in the gene encoding BCL2-like 11 (BIM). BIM is a pro-apoptotic member of the B-cell CLL/lymphoma 2 (BCL2) family of proteins, and its upregulation is required for TKIs to induce apoptosis in kinase-driven cancers. The polymorphism switched BIM splicing from exon 4 to exon 3, which resulted in expression of BIM isoforms lacking the pro-apoptotic BCL2-homology domain 3 (BH3). The polymorphism was sufficient to confer intrinsic TKI resistance in CML and EGFR NSCLC cell lines, but this resistance could be overcome with BH3-mimetic drugs. Notably, individuals with CML and EGFR NSCLC harboring the polymorphism experienced significantly inferior responses to TKIs than did individuals without the polymorphism (P = 0.02 for CML and P = 0.027 for EGFR NSCLC). Our results offer an explanation for the heterogeneity of TKI responses across individuals and suggest the possibility of personalizing therapy with BH3 mimetics to overcome BIM-polymorphism-associated TKI resistance.
    Nature medicine 03/2012; 18(4):521-8. · 27.14 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Structural variations (SVs) contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. While genomic DNA paired-end-tag (DNA-PET) sequencing is an attractive approach to identify genomic SVs, the current application of PET sequencing with short insert size DNA can be insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We employed a recently developed procedure to generate PET sequencing data using large DNA inserts of 10-20 kb and compared their characteristics with short insert (1 kb) libraries for their ability to identify SVs. Our results suggest that although short insert libraries bear an advantage in identifying small deletions, they do not provide significantly better breakpoint resolution. In contrast, large inserts are superior to short inserts in providing higher physical genome coverage for the same sequencing cost and achieve greater sensitivity, in practice, for the identification of several classes of SVs, such as copy number neutral and complex events. Furthermore, our results confirm that large insert libraries allow for the identification of SVs within repetitive sequences, which cannot be spanned by short inserts. This provides a key advantage in studying rearrangements in cancer, and we show how it can be used in a fusion-point-guided-concatenation algorithm to study focally amplified regions in cancer.
    PLoS ONE 01/2012; 7(9):e46152. · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: CHO cells, the workhorses of the biopharmaceutical industry, are derived from the Chinese hamster, arguably making it the most economically important industrial organism. The synergistic application of high-throughput sequencing technologies, along with the existing CHO EST collection as backbone, enabled the efficient assembly of the Chinese hamster genome. The current assembly (~2.5Gb), constituting over two billion sequence reads, includes more than 25,000 annotated genes across a range of functional classes. This has allowed a global comparative analysis with the mouse, rat and human genomes. Furthermore, the investigation of regulatory features including promoters, CpG Islands and microRNAs has opened up new avenues for manipulating individual gene expression as well as genome level interventions. In addition, this work aims to study the genetic variation underlying economically important productivity traits in CHO cells, by a comparative genomics approach, with diploid hamster DNA as reference. Further, cell line-specific functional polymorphisms have been identified utilizing RNA-Seq data from several different recombinant lines. The availability of a well-annotated Chinese hamster genome will open up many new opportunities for cell engineering and metabolic intervention for process enhancement.
    2011 AIChE Annual Meeting; 10/2011
  • Source
    Song Gao, Wing-Kin Sung, Niranjan Nagarajan
    [Show abstract] [Hide abstract]
    ABSTRACT: Scaffolding, the problem of ordering and orienting contigs, typically using paired-end reads, is a crucial step in the assembly of high-quality draft genomes. Even as sequencing technologies and mate-pair protocols have improved significantly, scaffolding programs still rely on heuristics, with no guarantees on the quality of the solution. In this work, we explored the feasibility of an exact solution for scaffolding and present a first tractable solution for this problem (Opera). We also describe a graph contraction procedure that allows the solution to scale to large scaffolding problems and demonstrate this by scaffolding several large real and synthetic datasets. In comparisons with existing scaffolders, Opera simultaneously produced longer and more accurate scaffolds demonstrating the utility of an exact approach. Opera also incorporates an exact quadratic programming formulation to precisely compute gap sizes (Availability: http://sourceforge.net/projects/operasf/ ).
    Journal of computational biology: a journal of computational molecular cell biology 09/2011; 18(11):1681-91. · 1.69 Impact Factor

Publication Stats

1k Citations
340.88 Total Impact Points

Institutions

  • 2010–2014
    • Genome Institute of Singapore
      • Computational and Systems Biology Group
      Tumasik, Singapore
  • 2008–2013
    • University of Maryland, College Park
      • • Department of Computer Science
      • • Center for Bioinformatics and Computational Biology
      Maryland, United States
  • 2012
    • Huazhong Agricultural University
      Wu-han-shih, Hubei, China
  • 2011
    • National University of Singapore
      • NUS Graduate School for Integrative Sciences and Engineering
      Singapore, Singapore
  • 2003–2009
    • Cornell University
      • Computer Science
      Ithaca, NY, United States