[Show abstract][Hide abstract] ABSTRACT: Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 affected individuals (cases) using a combination of whole-exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per Mb (0.48 nonsilent) and notably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, and an additional 7.1% had focal deletions), MYCN (1.7%, causing a recurrent p.Pro44Leu alteration) and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1 and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies that rely on frequently altered oncogenic drivers.
[Show abstract][Hide abstract] ABSTRACT: An estimated 15% or more of the cancer burden worldwide is attributable to known infectious agents. We screened colorectal carcinoma and matched normal tissue specimens using RNA-seq followed by host sequence subtraction and found marked over-representation of Fusobacterium nucleatum sequences in tumors relative to control specimens. F. nucleatum is an invasive anaerobe that has been linked previously to periodontitis and appendicitis, but not to cancer. Fusobacteria are rare constituents of the fecal microbiota, but have been cultured previously from biopsies of inflamed gut mucosa. We obtained a Fusobacterium isolate from a frozen tumor specimen; this showed highest sequence similarity to a known gut mucosa isolate and was confirmed to be invasive. We verified overabundance of Fusobacterium sequences in tumor versus matched normal control tissue by quantitative PCR analysis from a total of 99 subjects (p = 2.5 × 10(-6)), and we observed a positive association with lymph node metastasis.
Genome Research 02/2012; 22(2):299-306. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Trastuzumab (Herceptin) resistance is a major obstacle in the treatment of patients with HER2-positive breast cancers. We recently reported that the transcription factor Y-box binding protein-1 (YB-1) leads to acquisition of resistance to trastuzumab in a phosphorylation-dependent manner that relies on p90 ribosomal S6 kinase (RSK). To explore how this may occur we compared YB-1 target genes between trastuzumab-sensitive cells (BT474) and those with acquired resistance (HR5 and HR6) using genome-wide chromatin immunoprecipitation sequencing (ChIP-sequencing), which identified 1391 genes uniquely bound by YB-1 in the resistant cell lines. We then examined differences in protein expression and phosphorylation between these cell lines using the Kinexus Kinex antibody microarrays. Cross-referencing these two data sets identified the mitogen-activated protein kinase-interacting kinase (MNK) family as potentially being involved in acquired resistance downstream from YB-1. MNK1 and MNK2 were subsequently shown to be overexpressed in the resistant cell lines; however, only the former was a YB-1 target based on ChIP-PCR and small interfering RNA (siRNA) studies. Importantly, loss of MNK1 expression using siRNA enhanced sensitivity to trastuzumab. Further, MNK1 overexpression was sufficient to confer resistance to trastuzumab in cells that were previously sensitive. We then developed a de novo model of acquired resistance by exposing BT474 cells to trastuzumab for 60 days (BT474LT). Similar to the HR5/HR6 cells, the BT474LT cells had elevated MNK1 levels and were dependent on it for survival. In addition, we demonstrated that RSK phosphorylated MNK1, and that this phosphorylation was required for ability of MNK1 to mediate resistance to trastuzumab. Furthermore, inhibition of RSK with the small molecule BI-D1870 repressed the MNK1-mediated trastuzumab resistance. In conclusion, this unbiased integrated approach identified MNK1 as a player in mediating trastuzumab resistance as a consequence of YB-1 activation, and demonstrated RSK inhibition as a means to overcome recalcitrance to trastuzumab.
[Show abstract][Hide abstract] ABSTRACT: Networks are typically visualized with force-based or spectral layouts. These algorithms lack reproducibility and perceptual uniformity because they do not use a node coordinate system. The layouts can be difficult to interpret and are unsuitable for assessing differences in networks. To address these issues, we introduce hive plots (http://www.hiveplot.com) for generating informative, quantitative and comparable network layouts. Hive plots depict network structure transparently, are simple to understand and can be easily tuned to identify patterns of interest. The method is computationally straightforward, scales well and is amenable to a plugin for existing tools.
Briefings in Bioinformatics 12/2011; 13(5):627-44. · 5.30 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL) are the two most common non-Hodgkin lymphomas (NHLs). Here we sequenced tumour and matched normal DNA from 13 DLBCL cases and one FL case to identify genes with mutations in B-cell NHL. We analysed RNA-seq data from these and another 113 NHLs to identify genes with candidate mutations, and then re-sequenced tumour and matched normal DNA from these cases to confirm 109 genes with multiple somatic mutations. Genes with roles in histone modification were frequent targets of somatic mutation. For example, 32% of DLBCL and 89% of FL cases had somatic mutations in MLL2, which encodes a histone methyltransferase, and 11.4% and 13.4% of DLBCL and FL cases, respectively, had mutations in MEF2B, a calcium-regulated gene that cooperates with CREBBP and EP300 in acetylating histones. Our analysis suggests a previously unappreciated disruption of chromatin biology in lymphomagenesis.
[Show abstract][Hide abstract] ABSTRACT: ChIP-seq combines chromatin immunoprecipitation with massively parallel short-read sequencing. While it can profile genome-wide in vivo transcription factor-DNA association with higher sensitivity, specificity, and spatial resolution than ChIP-chip, it poses new challenges for statistical analysis that derive from the complexity of the biological systems characterized and from variability and biases in its sequence data. We propose a method called PICS (Probabilistic Inference for ChIP-seq) for identifying regions bound by transcription factors from aligned reads. PICS identifies binding event locations by modeling local concentrations of directional reads, and uses DNA fragment length prior information to discriminate closely adjacent binding events via a Bayesian hierarchical t-mixture model. It uses precalculated, whole-genome read mappability profiles and a truncated t-distribution to adjust binding event models for reads that are missing due to local genome repetitiveness. It estimates uncertainties in model parameters that can be used to define confidence regions on binding event locations and to filter estimates. Finally, PICS calculates a per-event enrichment score relative to a control sample, and can use a control sample to estimate a false discovery rate. Using published GABP and FOXA1 data from human cell lines, we show that PICS' predicted binding sites were more consistent with computationally predicted binding motifs than the alternative methods MACS, QuEST, CisGenome, and USeq. We then use a simulation study to confirm that PICS compares favorably to these methods and is robust to model misspecification.
[Show abstract][Hide abstract] ABSTRACT: Cryptococcus gattii recently emerged as the causative agent of cryptococcosis in healthy individuals in western North America, despite previous characterization of the fungus as a pathogen in tropical or subtropical regions. As a foundation to study the genetics of virulence in this pathogen, we sequenced the genomes of a strain (WM276) representing the predominant global molecular type (VGI) and a clinical strain (R265) of the major genotype (VGIIa) causing disease in North America. We compared these C. gattii genomes with each other and with the genomes of representative strains of the two varieties of Cryptococcus neoformans that generally cause disease in immunocompromised people. Our comparisons included chromosome alignments, analysis of gene content and gene family evolution, and comparative genome hybridization (CGH). These studies revealed that the genomes of the two representative C. gattii strains (genotypes VGI and VGIIa) are colinear for the majority of chromosomes, with some minor rearrangements. However, multiortholog phylogenetic analysis and an evaluation of gene/sequence conservation support the existence of speciation within the C. gattii complex. More extensive chromosome rearrangements were observed upon comparison of the C. gattii and the C. neoformans genomes. Finally, CGH revealed considerable variation in clinical and environmental isolates as well as changes in chromosome copy numbers in C. gattii isolates displaying fluconazole heteroresistance.
[Show abstract][Hide abstract] ABSTRACT: Mesopolyploid whole-genome duplication (WGD) was revealed in the ancestry of Australian Brassicaceae species with diploid-like chromosome numbers (n = 4 to 6). Multicolor comparative chromosome painting was used to reconstruct complete cytogenetic maps of the cryptic ancient polyploids. Cytogenetic analysis showed that the karyotype of the Australian Camelineae species descended from the eight ancestral chromosomes (n = 8) through allopolyploid WGD followed by the extensive reduction of chromosome number. Nuclear and maternal gene phylogenies corroborated the hybrid origin of the mesotetraploid ancestor and suggest that the hybridization event occurred approximately 6 to 9 million years ago. The four, five, and six fusion chromosome pairs of the analyzed close relatives of Arabidopsis thaliana represent complex mosaics of duplicated ancestral genomic blocks reshuffled by numerous chromosome rearrangements. Unequal reciprocal translocations with or without preceeding pericentric inversions and purported end-to-end chromosome fusions accompanied by inactivation and/or loss of centromeres are hypothesized to be the main pathways for the observed chromosome number reduction. Our results underline the significance of multiple rounds of WGD in the angiosperm genome evolution and demonstrate that chromosome number per se is not a reliable indicator of ploidy level.
The Plant Cell 07/2010; 22(7):2277-90. · 9.25 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The presence of intrinsic radiosensitivity within prostate cancer patients may be an important factor contributing to development of radiation toxicity. We investigated whether variants in genes responsible for detecting and repairing DNA damage independently contribute to toxicity following prostate brachytherapy.
Genomic DNA was extracted from blood samples of 41 prostate brachytherapy patients, 21 with high and 20 with low late toxicity scores. For each patient, 242 PCR amplicons were generated containing 173 exons of eight candidate genes: ATM, BRCA1, ERCC2, H2AFX, LIG4, MDC1, MRE11A, and RAD50. These amplicons were sequenced and all sequence variants were subjected to statistical analysis to identify those associated with late radiation toxicity.
Across 41 patients, 239 sites differed from the human genome reference sequence; 170 of these corresponded to known polymorphisms. Sixty variants, 14 of them novel, affected protein coding regions and 43 of these were missense mutations. In our patient population, the high toxicity group was enriched for individuals with at least one LIG4 coding variant (P = 0.028). One synonymous variant in MDC1, rs28986317, was associated with increased radiosensitivity (P = 0.048). A missense variant in ATM, rs1800057, associated with increased prostate cancer risk, was found exclusively in two high toxicity patients but did not reach statistical significance for association with radiosensitivity (P = 0.488).
Our data revealed new germ-line sequence variants, indicating that existing sequence databases do not fully represent the full extent of sequence variation. Variants in three DNA repair genes were linked to increased radiosensitivity but require validation in larger populations.
Clinical Cancer Research 09/2009; 15(15):5008-16. · 7.84 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We created a visualization tool called Circos to facilitate the identification and analysis of similarities and differences arising from comparisons of genomes. Our tool is effective in displaying variation in genome structure and, generally, any other kind of positional relationships between genomic intervals. Such data are routinely produced by sequence alignments, hybridization arrays, genome mapping, and genotyping studies. Circos uses a circular ideogram layout to facilitate the display of relationships between pairs of positions by the use of ribbons, which encode the position, size, and orientation of related genomic elements. Circos is capable of displaying data as scatter, line, and histogram plots, heat maps, tiles, connectors, and text. Bitmap or vector images can be created from GFF-style data inputs and hierarchical configuration files, which can be easily generated by automated tools, making Circos suitable for rapid deployment in data analysis and reporting pipelines.
Genome Research 07/2009; 19(9):1639-45. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We characterized the relationship of H3K4me1 and H3K4me3 at distal and proximal regulatory elements by comparing ChIP-seq profiles for these histone modifications and for two functionally different transcription factors: STAT1 in the immortalized HeLa S3 cell line, with and without interferon-gamma (IFNG) stimulation; and FOXA2 in mouse adult liver tissue. In unstimulated and stimulated HeLa cells, respectively, we determined approximately 270,000 and approximately 301,000 H3K4me1-enriched regions, and approximately 54,500 and approximately 76,100 H3K4me3-enriched regions. In mouse adult liver, we determined approximately 227,000 and approximately 34,800 H3K4me1 and H3K4me3 regions. Seventy-five percent of the approximately 70,300 STAT1 binding sites in stimulated HeLa cells and 87% of the approximately 11,000 FOXA2 sites in mouse liver were distal to known gene TSS; in both cell types, approximately 83% of these distal sites were associated with at least one of the two histone modifications, and H3K4me1 was associated with over 96% of marked distal sites. After filtering against predicted transcription start sites, 50% of approximately 26,800 marked distal IFNG-stimulated STAT1 binding sites, but 95% of approximately 5800 marked distal FOXA2 sites, were associated with H3K4me1 only. Results for HeLa cells generated additional insights into transcriptional regulation involving STAT1. STAT1 binding was associated with 25% of all H3K4me1 regions in stimulated HeLa cells, suggesting that a single transcription factor can interact with an unexpectedly large fraction of regulatory regions. Strikingly, for a large majority of the locations of stimulated STAT1 binding, the dominant H3K4me1/me3 combinations were established before activation, suggesting mechanisms independent of IFNG stimulation and high-affinity STAT1 binding.
Genome Research 10/2008; 18(12):1906-17. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Sequence-based methods for transcriptome characterization have typically relied on generation of either serial analysis of gene expression tags or expressed sequence tags. Although such approaches have the potential to enumerate transcripts by counting sequence tags derived from them, they typically do not robustly survey the majority of transcripts along their entire length. Here we show that massively parallel sequencing of randomly primed cDNAs, using a next-generation sequencing-by-synthesis technology, offers the potential to generate relative measures of mRNA and individual exon abundance while simultaneously profiling the prevalence of both annotated and novel exons and exon-splicing events. This technique identifies known single nucleotide polymorphisms (SNPs) as well as novel single-base variants. Analysis of these variants, and previously unannotated splicing events in the HeLa S3 cell line, reveals an overrepresentation of gene categories including those previously implicated in cancer.
[Show abstract][Hide abstract] ABSTRACT: As part of a larger project to sequence the Populus genome and generate genomic resources for this emerging model tree, we constructed a physical map of the Populus genome, representing one of the few such maps of an undomesticated, highly heterozygous plant species. The physical map, consisting of 2802 contigs, was constructed from fingerprinted bacterial artificial chromosome (BAC) clones. The map represents approximately 9.4-fold coverage of the Populus genome, which has been estimated from the genome sequence assembly to be 485 +/- 10 Mb in size. BAC ends were sequenced to assist long-range assembly of whole-genome shotgun sequence scaffolds and to anchor the physical map to the genome sequence. Simple sequence repeat-based markers were derived from the end sequences and used to initiate integration of the BAC and genetic maps. A total of 2411 physical map contigs, representing 97% of all clones assigned to contigs, were aligned to the sequence assembly (JGI Populus trichocarpa, version 1.0). These alignments represent a total coverage of 384 Mb (79%) of the entire poplar sequence assembly and 295 Mb (96%) of linkage group sequence assemblies. A striking result of the physical map contig alignments to the sequence assembly was the co-localization of multiple contigs across numerous regions of the 19 linkage groups. Targeted sequencing of BAC clones and genetic analysis in a small number of representative regions showed that these co-aligning contigs represent distinct haplotypes in the heterozygous individual sequenced, and revealed the nature of these haplotype sequence differences.
The Plant Journal 07/2007; 50(6):1063-78. · 6.58 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We present a method, called fingerprint profiling (FPP), that uses restriction digest fingerprints of bacterial artificial chromosome clones to detect and classify rearrangements in the human genome. The approach uses alignment of experimental fingerprint patterns to in silico digests of the sequence assembly and is capable of detecting micro-deletions (1-5 kb) and balanced rearrangements. Our method has compelling potential for use as a whole-genome method for the identification and characterization of human genome rearrangements.
[Show abstract][Hide abstract] ABSTRACT: The cause of mental retardation in one-third to one-half of all affected individuals is unknown. Microscopically detectable chromosomal abnormalities are the most frequently recognized cause, but gain or loss of chromosomal segments that are too small to be seen by conventional cytogenetic analysis has been found to be another important cause. Array-based methods offer a practical means of performing a high-resolution survey of the entire genome for submicroscopic copy-number variants. We studied 100 children with idiopathic mental retardation and normal results of standard chromosomal analysis, by use of whole-genome sampling analysis with Affymetrix GeneChip Human Mapping 100K arrays. We found de novo deletions as small as 178 kb in eight cases, de novo duplications as small as 1.1 Mb in two cases, and unsuspected mosaic trisomy 9 in another case. This technology can detect at least twice as many potentially pathogenic de novo copy-number variants as conventional cytogenetic analysis can in people with mental retardation.
The American Journal of Human Genetics 10/2006; 79(3):500-13. · 11.20 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We describe a targeted approach to improve the contiguity of whole-genome shotgun sequence (WGS) assemblies at run-time, using information from Bacterial Artificial Chromosome (BAC)-based physical maps. Clone sizes and overlaps derived from clone fingerprints are used for the calculation of length constraints between any two BAC neighbors sharing 40% of their size. These constraints are used to promote the linkage and guide the arrangement of sequence contigs within a sequence scaffold at the layout phase of WGS assemblies. This process is facilitated by FASSI, a stand-alone application that calculates BAC end and BAC overlap length constraints from clone fingerprint map contigs created by the FPC package. FASSI is designed to work with the assembly tool PCAP, but its output can be formatted to work with other WGS assembly algorithms able to use length constraints for individual clones. The FASSI method is simple to implement, potentially cost-effective, and has resulted in the increase of scaffold contiguity for both the Drosophila melanogaster and Cryptococcus gattii genomes when compared to a control assembly without map-derived constraints. A 6.5-fold coverage draft DNA sequence of the Pan troglodytes (chimpanzee) genome was assembled using map-derived constraints and resulted in a 26.1% increase in scaffold contiguity.
Genome Research 07/2006; 16(6):768-75. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A physical map of the Atlantic salmon (Salmo salar) genome was generated based on HindIII fingerprints of a publicly available BAC (bacterial artificial chromosome) library constructed from DNA isolated from a Norwegian male. Approximately 11.5 haploid genome equivalents (185,938 clones) were successfully fingerprinted. Contigs were first assembled via FPC using high-stringency (1e-16), and then end-to-end joins yielded 4354 contigs and 37,285 singletons. The accuracy of the contig assembly was verified by hybridization and PCR analysis using genetic markers. A subset of the BACs in the library contained few or no HindIII recognition sites in their insert DNA. BglI digestion fragment patterns of these BACs allowed us to identify three classes: (1) BACs containing histone genes, (2) BACs containing rDNA-repeating units, and (3) those that do not have BglI recognition sites. End-sequence analysis of selected BACs representing these three classes confirmed the identification of the first two classes and suggested that the third class contained highly repetitive DNA corresponding to tRNAs and related sequences.
[Show abstract][Hide abstract] ABSTRACT: Cryptococcus neoformans is a basidiomycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its approximately 20-megabase genome, which contains approximately 6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes.