[Show abstract][Hide abstract] ABSTRACT: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
[Show abstract][Hide abstract] ABSTRACT: Medulloblastoma is a highly malignant paediatric brain tumour currently treated with a combination of surgery, radiation and chemotherapy, posing a considerable burden of toxicity to the developing child. Genomics has illuminated the extensive intertumoral heterogeneity of medulloblastoma, identifying four distinct molecular subgroups. Group 3 and group 4 subgroup medulloblastomas account for most paediatric cases; yet, oncogenic drivers for these subtypes remain largely unidentified. Here we describe a series of prevalent, highly disparate genomic structural variants, restricted to groups 3 and 4, resulting in specific and mutually exclusive activation of the growth factor independent 1 family proto-oncogenes, GFI1 and GFI1B. Somatic structural variants juxtapose GFI1 or GFI1B coding sequences proximal to active enhancer elements, including super-enhancers, instigating oncogenic activity. Our results, supported by evidence from mouse models, identify GFI1 and GFI1B as prominent medulloblastoma oncogenes and implicate 'enhancer hijacking' as an efficient mechanism driving oncogene activation in a childhood cancer.
[Show abstract][Hide abstract] ABSTRACT: The computational prediction of alternative splicing from high-throughput sequencing data is inherently difficult and necessitates
robust statistical measures because the differential splicing signal is overlaid by influencing factors such as gene expression
differences and simultaneous expression of multiple isoforms amongst others. In this work we describe ARH-seq, a discovery
tool for differential splicing in case–control studies that is based on the information-theoretic concept of entropy. ARH-seq
works on high-throughput sequencing data and is an extension of the ARH method that was originally developed for exon microarrays.
We show that the method has inherent features, such as independence of transcript exon number and independence of differential
expression, what makes it particularly suited for detecting alternative splicing events from sequencing data. In order to
test and validate our workflow we challenged it with publicly available sequencing data derived from human tissues and conducted
a comparison with eight alternative computational methods. In order to judge the performance of the different methods we constructed
a benchmark data set of true positive splicing events across different tissues agglomerated from public databases and show
that ARH-seq is an accurate, computationally fast and high-performing method for detecting differential splicing events.
Nucleic Acids Research 06/2014; 42(14):e110.. DOI:10.1093/nar/gku495 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The interactions of Meis, Prep, and Pbx1 TALE ho-meoproteins with Hox proteins are essential for development and disease. Although Meis and Prep behave similarly in vitro, their in vivo activities remain largely unexplored. We show that Prep and Meis interact with largely independent sets of genomic sites and select different DNA-binding sequences, Prep associating mostly with promoters and house-keeping genes and Meis with promoter-remote regions and developmental genes. Hox target se-quences associate strongly with Meis but not with Prep binding sites, while Pbx1 cooperates with both Prep and Meis. Accordingly, Meis1 shows strong genetic interaction with Pbx1 but not with Prep1. Meis1 and Prep1 nonetheless coregulate a subset of genes, predominantly through opposing effects. Notably, the TALE homeoprotein binding profile subdivides Hox clusters into two domains differentially regulated by Meis1 and Prep1. During evolution, Meis and Prep thus specialized their interactions but maintained significant regulatory coordination.
[Show abstract][Hide abstract] ABSTRACT: The interactions of Meis, Prep, and Pbx1 TALE homeoproteins with Hox proteins are essential for development and disease. Although Meis and Prep behave similarly in vitro, their in vivo activities remain largely unexplored. We show that Prep and Meis interact with largely independent sets of genomic sites and select different DNA-binding sequences, Prep associating mostly with promoters and housekeeping genes and Meis with promoter-remote regions and developmental genes. Hox target sequences associate strongly with Meis but not with Prep binding sites, while Pbx1 cooperates with both Prep and Meis. Accordingly, Meis1 shows strong genetic interaction with Pbx1 but not with Prep1. Meis1 and Prep1 nonetheless coregulate a subset of genes, predominantly through opposing effects. Notably, the TALE homeoprotein binding profile subdivides Hox clusters into two domains differentially regulated by Meis1 and Prep1. During evolution, Meis and Prep thus specialized their interactions but maintained significant regulatory coordination.
[Show abstract][Hide abstract] ABSTRACT: The elucidation of genetic components of human diseases at the molecular level provides crucial information for developing future causal therapeutic intervention. High-throughput genome sequencing and systematic experimental approaches are fuelling strategic programs designed to investigate gene function at the biochemical, cellular and organism levels. Bioinformatics is one important tool in functional genomics, although showing clear limitations in predicting ab initio gene structures, gene function and protein folds from raw sequence data. Systematic large-scale data-set generation, using the same type of experiments that are used to decipher the function of single genes, are being applied on entire genomes. Comparative genomics, establishment of gene catalogues, and investigation of cellular and tissue molecular profiles are providing essential tools for understanding gene function in complex biological networks.
Trends in Molecular Medicine 12/2001; 7(11):494-501. DOI:10.1016/S1471-4914(01)02181-5 · 9.45 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The testis-expressed human TPTE is a putative transmembrane tyrosine phosphatase, probably involved in signal transduction pathways of the endocrine and/or the spermatogenetic function of the testis. TPTE was mapped to the pericentromeric region of human chromosomes 21 and 13, and to chromosomes 15, 22, and Y. It is unknown which of the TPTE copies are transcribed, contain intronic sequences, and/or have open reading frames. Here, in silico analysis of the genomic sequence of human chromosome 21 allowed the determination of the genomic structure of a copy of the TPTE gene. This copy consists of 24 exons and spans approximately 87 kb. The mapping position of this copy of TPTE on the short arm of chromosome 21 was confirmed by FISH using the BAC 15L0C0 clone as a probe that contains almost the entire TPTE gene. This is the first description of the genomic sequence of a non-RNR gene on the short arm of human acrocentric chromosomes.
Human Genetics 09/2000; 107(2):127-31. DOI:10.1007/s004390000343 · 4.82 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Chromosome 21 is the smallest human autosome. An extra copy of chromosome 21 causes Down syndrome, the most frequent genetic cause of significant mental retardation, which affects up to 1 in 700 live births. Several anonymous loci for monogenic disorders and predispositions for common complex disorders have also been mapped to this chromosome, and loss of heterozygosity has been observed in regions associated with solid tumours. Here we report the sequence and gene catalogue of the long arm of chromosome 21. We have sequenced 33,546,361 base pairs (bp) of DNA with very high accuracy, the largest contig being 25,491,867 bp. Only three small clone gaps and seven sequencing gaps remain, comprising about 100 kilobases. Thus, we achieved 99.7% coverage of 21q. We also sequenced 281,116 bp from the short arm. The structural features identified include duplications that are probably involved in chromosomal abnormalities and repeat structures in the telomeric and pericentromeric regions. Analysis of the chromosome revealed 127 known genes, 98 predicted genes and 59 pseudogenes.
[Show abstract][Hide abstract] ABSTRACT: To establish criteria for and the limitations of novel gene identification, to identify novel genes of potential relevance to Down Syndrome and to investigate features of genome organization, 6. 550kb. In total, 41 novel gene models were predicted, and for a subset of these, RT-PCR experiments helped to verify and refine the models, and were used to assess expression in early development and in adult brain regions of potential relevance to Down syndrome. Results suggest generally low and/or restricted patterns of expression, and also reveal examples of complex alternative processing, especially in brain, that may have important implications for regulation of protein function. Analysis of complete gene structures of the known genes identified a number of very large introns, a number of very short intergenic distances, and at least one potentially bi-directional promoter. At least 3/4 of known genes and 1/2 of predicted genes are associated with CpG islands. For novel genes, three cases of overlapping genes are predicted. Results of these analyses illustrate some of the complexities inherent in mammalian genome organization and some of the limitations of current sequence analysis technologies. They also doubled the number of potential genes within the region.
[Show abstract][Hide abstract] ABSTRACT: Phenotypic and molecular analyses of patients with partial chromosome 21 monosomy enabled us to define a region, spanning 2.4 Mb between D21S190 and D21S226, associated with arthrogryposis, mental retardation, hypertonia, and several facial anomalies. The markers of the region were used to screen a total human PAC library (Ioannou, RZPD). We isolated 57 PACs, which formed primary contigs. EST clusters (UNIGENE collection) located in a 6-Mb interval, between D21S260 and D21S263, were mapped in individual bacterial clones. We mapped the WI-17843 cluster to the PAC clone J12100, which contains the two anchor markers LB10T and LA329. The open reading frame extends over 960 bp, with three putative start codons. The 1695-bp cDNA containing a polyadenylation signal should correspond to the full-length cDNA. From the genomic sequence, we deduced that the gene contained five exons and that there was a putative promoter sequence upstream from exon 1. In silico screening of DNA databases revealed similarity with a murine EST. The corresponding cDNA (1757 bp) sequence was very similar (>85%) to the human cDNA and had an open reading frame of 876 nucleotides. Somatic hybrid mapping localized the cDNA to mouse chromosome 16. EST analyses and RT-PCR indicated that the third exon in the human gene (exon 2 in the mouse) undergoes alternative splicing. Northern blot hybridization showed that the gene was ubiquitously expressed in humans and mice. The longest mouse clone was used to generate riboprobes, which were hybridized to murine embryos at stages E-9.5, E-10.5, E-12.5, E-13.5, and E-14.5-15, to study the pattern of expression during development. Ubiquitous labeling was observed, with strong signals restricted to limited areas of the telencephalon, the mesencephalon, and the interrhombomeric regions in the central nervous system, and other regions of the body such as the limb buds, branchial arches, and somites.
[Show abstract][Hide abstract] ABSTRACT: Progress in complete genomic sequencing of human chromosome 21 relies on the construction of high-quality bacterial clone maps spanning large chromosomal regions. To achieve this goal, we have applied a strategy based on nonradioactive hybridizations to contig building. A contiguous sequence-ready map was constructed in the Down syndrome congenital heart disease (DS-CHD) region in 21q22.2, as a framework for large-scale genomic sequencing and positional candidate gene approach. Contig assembly was performed essentially by high throughput nonisotopic screenings of genomic libraries, prior to clone validation by (1) restriction digest fingerprinting, (2) STS analysis, (3) Southern hybridizations, and (4) FISH analysis. The contig contains a total of 50 STSs, of which 13 were newly isolated. A minimum tiling path (MTP) was subsequently defined that consists of 20 PACs, 2 BACs, and 5 cosmids covering 3 Mb between D21S3 and MX1. Gene distribution in the region includes 9 known genes (c21-LRP, WRB, SH3BGR, HMG14, PCP4, DSCAM, MX2, MX1, and TMPRSS2) and 14 new additional gene signatures consisting of cDNA selection products and ESTs. Forthcoming genomic sequence information will unravel the structural organization of potential candidate genes involved in specific features of Down syndrome pathogenesis.
Genome Research 05/1999; 9(4):360-72. · 14.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Mutations in the human AIRE gene (hAIRE) result in the development of an autoimmune disease named APECED (autoimmune polyendocrinopathy candidiasis ectodermal dystrophy; OMIM 240300). Previously, we have cloned hAIRE and shown that it codes for a putative transcription-associated factor. Here we report the cloning and characterization of Aire, the murine ortholog of hAIRE. Comparative genomic sequencing revealed that the structure of the AIRE gene is highly conserved between human and mouse. The conceptual proteins share 73% homology and feature the same typical functional domains in both species. RT-PCR analysis detected three splice variant isoforms in various mouse tissues, and interestingly one isoform was conserved in human, suggesting potential biological relevance of this product. In situ hybridization on mouse and human histological sections showed that AIRE expression pattern was mainly restricted to a few cells in the thymus, calling for a tissue-specific function of the gene product.
Genome Research 03/1999; 9(2):158-66. · 14.63 Impact Factor