[Show abstract][Hide abstract] ABSTRACT: The computational prediction of alternative splicing from high-throughput sequencing data is inherently difficult and necessitates
robust statistical measures because the differential splicing signal is overlaid by influencing factors such as gene expression
differences and simultaneous expression of multiple isoforms amongst others. In this work we describe ARH-seq, a discovery
tool for differential splicing in case–control studies that is based on the information-theoretic concept of entropy. ARH-seq
works on high-throughput sequencing data and is an extension of the ARH method that was originally developed for exon microarrays.
We show that the method has inherent features, such as independence of transcript exon number and independence of differential
expression, what makes it particularly suited for detecting alternative splicing events from sequencing data. In order to
test and validate our workflow we challenged it with publicly available sequencing data derived from human tissues and conducted
a comparison with eight alternative computational methods. In order to judge the performance of the different methods we constructed
a benchmark data set of true positive splicing events across different tissues agglomerated from public databases and show
that ARH-seq is an accurate, computationally fast and high-performing method for detecting differential splicing events.
Nucleic Acids Research 06/2014; 42(14):e110.. DOI:10.1093/nar/gku495 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The elucidation of genetic components of human diseases at the molecular level provides crucial information for developing future causal therapeutic intervention. High-throughput genome sequencing and systematic experimental approaches are fuelling strategic programs designed to investigate gene function at the biochemical, cellular and organism levels. Bioinformatics is one important tool in functional genomics, although showing clear limitations in predicting ab initio gene structures, gene function and protein folds from raw sequence data. Systematic large-scale data-set generation, using the same type of experiments that are used to decipher the function of single genes, are being applied on entire genomes. Comparative genomics, establishment of gene catalogues, and investigation of cellular and tissue molecular profiles are providing essential tools for understanding gene function in complex biological networks.
Trends in Molecular Medicine 12/2001; 7(11):494-501. DOI:10.1016/S1471-4914(01)02181-5 · 9.45 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The testis-expressed human TPTE is a putative transmembrane tyrosine phosphatase, probably involved in signal transduction pathways of the endocrine and/or the spermatogenetic function of the testis. TPTE was mapped to the pericentromeric region of human chromosomes 21 and 13, and to chromosomes 15, 22, and Y. It is unknown which of the TPTE copies are transcribed, contain intronic sequences, and/or have open reading frames. Here, in silico analysis of the genomic sequence of human chromosome 21 allowed the determination of the genomic structure of a copy of the TPTE gene. This copy consists of 24 exons and spans approximately 87 kb. The mapping position of this copy of TPTE on the short arm of chromosome 21 was confirmed by FISH using the BAC 15L0C0 clone as a probe that contains almost the entire TPTE gene. This is the first description of the genomic sequence of a non-RNR gene on the short arm of human acrocentric chromosomes.
Human Genetics 09/2000; 107(2):127-31. DOI:10.1007/s004390000343 · 4.82 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Chromosome 21 is the smallest human autosome. An extra copy of chromosome 21 causes Down syndrome, the most frequent genetic cause of significant mental retardation, which affects up to 1 in 700 live births. Several anonymous loci for monogenic disorders and predispositions for common complex disorders have also been mapped to this chromosome, and loss of heterozygosity has been observed in regions associated with solid tumours. Here we report the sequence and gene catalogue of the long arm of chromosome 21. We have sequenced 33,546,361 base pairs (bp) of DNA with very high accuracy, the largest contig being 25,491,867 bp. Only three small clone gaps and seven sequencing gaps remain, comprising about 100 kilobases. Thus, we achieved 99.7% coverage of 21q. We also sequenced 281,116 bp from the short arm. The structural features identified include duplications that are probably involved in chromosomal abnormalities and repeat structures in the telomeric and pericentromeric regions. Analysis of the chromosome revealed 127 known genes, 98 predicted genes and 59 pseudogenes.
[Show abstract][Hide abstract] ABSTRACT: To establish criteria for and the limitations of novel gene identification, to identify novel genes of potential relevance to Down Syndrome and to investigate features of genome organization, 6. 550kb. In total, 41 novel gene models were predicted, and for a subset of these, RT-PCR experiments helped to verify and refine the models, and were used to assess expression in early development and in adult brain regions of potential relevance to Down syndrome. Results suggest generally low and/or restricted patterns of expression, and also reveal examples of complex alternative processing, especially in brain, that may have important implications for regulation of protein function. Analysis of complete gene structures of the known genes identified a number of very large introns, a number of very short intergenic distances, and at least one potentially bi-directional promoter. At least 3/4 of known genes and 1/2 of predicted genes are associated with CpG islands. For novel genes, three cases of overlapping genes are predicted. Results of these analyses illustrate some of the complexities inherent in mammalian genome organization and some of the limitations of current sequence analysis technologies. They also doubled the number of potential genes within the region.
[Show abstract][Hide abstract] ABSTRACT: The chromosomal abnormality represented by an isodicentric X chromosome [idic(X)(q13)] is associated with a subset of acute myeloid leukemia (AML) and preleukemia observed in elderly females. A previous study localized the breakpoints of two acquired isodicentric X chromosomes associated with myelodysplasia to a 450-kb region proximal to the XIST gene. Here we report the construction and extensive characterization of a reliable 1-Mb P1 artificial chromosome and bacterial artificial chromosome contig covering a highly problematic region in Xq13 that includes the previously described isodicentric breakpoint region. In addition to mapping of the brain-specific gene (NAP1L2) and the phosphoglyceryl kinase alpha subunit 1 gene (PHKA1) and generation and mapping of a large number of STSs throughout the contig, we have mapped a putative transcriptional regulatory protein (HDACL1), and 35 ESTs. Sequencing data, Southern blot analysis, and fiber-FISH analysis have permitted characterization of extensive region-specific duplications and triplications in addition to an unusually high concentration of long interspersed repeat elements, both of which could be implicated in isodicentric chromosome formation and other Xq13 chromosome aberrations. FISH analysis of metaphase chromosomes from two previously unpublished AML patients and one preleukemic patient using cosmid clones and selected subclones allowed mapping of the idic(X)(q13) breakpoints to a 100-kb interval, consistent with the involvement of an X-linked gene in the genesis of this form of preleukemia, disruption of which may represent a preliminary step in progression to AML. Assembly and physical mapping of this complex 1-Mb contig establish a foundation for ongoing sequencing and gene identification projects in the region.
[Show abstract][Hide abstract] ABSTRACT: Phenotypic and molecular analyses of patients with partial chromosome 21 monosomy enabled us to define a region, spanning 2.4 Mb between D21S190 and D21S226, associated with arthrogryposis, mental retardation, hypertonia, and several facial anomalies. The markers of the region were used to screen a total human PAC library (Ioannou, RZPD). We isolated 57 PACs, which formed primary contigs. EST clusters (UNIGENE collection) located in a 6-Mb interval, between D21S260 and D21S263, were mapped in individual bacterial clones. We mapped the WI-17843 cluster to the PAC clone J12100, which contains the two anchor markers LB10T and LA329. The open reading frame extends over 960 bp, with three putative start codons. The 1695-bp cDNA containing a polyadenylation signal should correspond to the full-length cDNA. From the genomic sequence, we deduced that the gene contained five exons and that there was a putative promoter sequence upstream from exon 1. In silico screening of DNA databases revealed similarity with a murine EST. The corresponding cDNA (1757 bp) sequence was very similar (>85%) to the human cDNA and had an open reading frame of 876 nucleotides. Somatic hybrid mapping localized the cDNA to mouse chromosome 16. EST analyses and RT-PCR indicated that the third exon in the human gene (exon 2 in the mouse) undergoes alternative splicing. Northern blot hybridization showed that the gene was ubiquitously expressed in humans and mice. The longest mouse clone was used to generate riboprobes, which were hybridized to murine embryos at stages E-9.5, E-10.5, E-12.5, E-13.5, and E-14.5-15, to study the pattern of expression during development. Ubiquitous labeling was observed, with strong signals restricted to limited areas of the telencephalon, the mesencephalon, and the interrhombomeric regions in the central nervous system, and other regions of the body such as the limb buds, branchial arches, and somites.
[Show abstract][Hide abstract] ABSTRACT: Progress in complete genomic sequencing of human chromosome 21 relies on the construction of high-quality bacterial clone maps spanning large chromosomal regions. To achieve this goal, we have applied a strategy based on nonradioactive hybridizations to contig building. A contiguous sequence-ready map was constructed in the Down syndrome congenital heart disease (DS-CHD) region in 21q22.2, as a framework for large-scale genomic sequencing and positional candidate gene approach. Contig assembly was performed essentially by high throughput nonisotopic screenings of genomic libraries, prior to clone validation by (1) restriction digest fingerprinting, (2) STS analysis, (3) Southern hybridizations, and (4) FISH analysis. The contig contains a total of 50 STSs, of which 13 were newly isolated. A minimum tiling path (MTP) was subsequently defined that consists of 20 PACs, 2 BACs, and 5 cosmids covering 3 Mb between D21S3 and MX1. Gene distribution in the region includes 9 known genes (c21-LRP, WRB, SH3BGR, HMG14, PCP4, DSCAM, MX2, MX1, and TMPRSS2) and 14 new additional gene signatures consisting of cDNA selection products and ESTs. Forthcoming genomic sequence information will unravel the structural organization of potential candidate genes involved in specific features of Down syndrome pathogenesis.
Genome Research 05/1999; 9(4):360-72. · 14.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Mutations in the human AIRE gene (hAIRE) result in the development of an autoimmune disease named APECED (autoimmune polyendocrinopathy candidiasis ectodermal dystrophy; OMIM 240300). Previously, we have cloned hAIRE and shown that it codes for a putative transcription-associated factor. Here we report the cloning and characterization of Aire, the murine ortholog of hAIRE. Comparative genomic sequencing revealed that the structure of the AIRE gene is highly conserved between human and mouse. The conceptual proteins share 73% homology and feature the same typical functional domains in both species. RT-PCR analysis detected three splice variant isoforms in various mouse tissues, and interestingly one isoform was conserved in human, suggesting potential biological relevance of this product. In situ hybridization on mouse and human histological sections showed that AIRE expression pattern was mainly restricted to a few cells in the thymus, calling for a tissue-specific function of the gene product.
Genome Research 03/1999; 9(2):158-66. · 14.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: A visual transcript map of six genes was constructed on the chromosome 21q22.3 by high resolution fluorescence in situ hybridization (FISH). Expressed sequence tags (ESTs) from six genes-PWP2, KNP1, AIRE, C21orf3, SMT3A, and C21orf1-were successfully localized by fiber-FISH by use of sensitive tyramide-based detection. The sizes of the ESTs varied between 315 to 956 bp and most of them map within the 3'-untranslated region. The ESTs were assigned to and subsequently ordered within cosmid, PAC, and BAC clones hybridized on DNA fibers. Physical distances between ESTs and known markers were determined. Our results demonstrate the feasibility and accuracy of visual mapping EST sequences in relation to known markers. The main advantage of this approach is that it can be applied to finely map any of the database ESTs for positional cloning efforts. The sensitivity, specificity, and reproducibility of this high-resolution EST mapping technique is evaluated.
Genome Research 02/1999; 9(1):62-71. · 14.63 Impact Factor