Detection of differentially expressed segments in tiling array data

Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
Bioinformatics (Impact Factor: 4.98). 04/2012; 28(11):1471-9. DOI: 10.1093/bioinformatics/bts142
Source: PubMed


Tiling arrays have been a mainstay of unbiased genome-wide transcriptomics over the last decade. Currently available approaches to identify expressed or differentially expressed segments in tiling array data are limited in the recovery of the underlying gene structures and require several parameters that are intensity-related or partly dataset-specific.
We have developed TileShuffle, a statistical approach that identifies transcribed and differentially expressed segments as significant differences from the background distribution while considering sequence-specific affinity biases and cross-hybridization. It avoids dataset-specific parameters in order to provide better comparability of different tiling array datasets, based on different technologies or array designs. TileShuffle detects highly and differentially expressed segments in biological data with significantly lower false discovery rates under equal sensitivities than commonly used methods. Also, it is clearly superior in the recovery of exon-intron structures. It further provides window z-scores as a normalized and robust measure for visual inspection.
The R package including documentation and examples is freely available at

Download full-text


Available from: Christian Otto, Dec 02, 2014
  • Source
    • "Global RNA expression was analyzed using Affymetrix whole genome tiling arrays, which interrogate the non-repetitive part, i.e. approximately 40%, of the human genome. Transcriptionally active regions in the genome (TARs) were identified using TileShuffle[41]. Briefly, TileShuffle identifies segments in the tiling array data that are expressed significantly higher than an affinity controlled background distribution. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The genome is pervasively transcribed but most transcripts do not code for proteins, constituting non-protein coding RNAs. Despite increasing numbers of functional reports of individual long noncoding RNAs (lncRNAs), assessing the extent of functionality among the non-coding transcriptional output of mammalian cells remains intricate. In the protein coding world, transcripts differentially expressed in the context of processes essential for the survival of multicellular organisms have been instrumental for the discovery of functionally relevant proteins and their deregulation is frequently associated with diseases. We therefore systematically identify lncRNAs expressed differentially in response to oncologically relevant processes, cell-cycle, p53-, and STAT3 pathway, using tiling arrays. We find that up to 80% of the pathway-triggered transcriptional response can be non-coding. Among these we identify very large macroRNAs with pathway-specific expression patterns and demonstrate that these are likely continuous transcripts. MacroRNAs contain elements conserved in mammals and sauropsids, which in part exhibit conserved RNA secondary structure. Comparing evolutionary rates of a macroRNA to adjacent protein coding genes suggests a local action of the transcript. Finally, in different grades of astrocytoma, a tumor disease unrelated to the initially used cell lines, macroRNAs are differentially expressed. It has been shown previously that the majority of expressed non-ribosomal transcripts are non-coding. We now conclude that differential expression triggered by signaling pathways gives rise to a similar abundance of non-coding content. It is thus unlikely that the prevalence of non-coding transcripts in the cell is a trivial consequence of leaky or random transcription events.
    Genome biology 03/2014; 15(3):R48. DOI:10.1186/gb-2014-15-3-r48 · 10.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Existing statistical methods for tiling array transcriptome data either focus on transcript discovery in one biological or experimental condition or on the detection of differential expression between two conditions. Increasingly often, however, biologists are interested in time-course studies, studies with more than two conditions or even multiple-factor studies. As these studies are currently analyzed with the traditional microarray analysis techniques, they do not exploit the genome-wide nature of tiling array data to its full potential. Results We present an R Bioconductor package, waveTiling, which implements a wavelet-based model for analyzing transcriptome data and extends it towards more complex experimental designs. With waveTiling the user is able to discover (1) group-wise expressed regions, (2) differentially expressed regions between any two groups in single-factor studies and in (3) multifactorial designs. Moreover, for time-course experiments it is also possible to detect (4) linear time effects and (5) a circadian rhythm of transcripts. By considering the expression values of the individual tiling probes as a function of genomic position, effect regions can be detected regardless of existing annotation. Three case studies with different experimental set-ups illustrate the use and the flexibility of the model-based transcriptome analysis. Conclusions The waveTiling package provides the user with a convenient tool for the analysis of tiling array trancriptome data for a multitude of experimental set-ups. Regardless of the study design, the probe-wise analysis allows for the detection of transcriptional effects in both exonic, intronic and intergenic regions, without prior consultation of existing annotation.
    BMC Bioinformatics 09/2012; 13(1):234. DOI:10.1186/1471-2105-13-234 · 2.58 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Technological achievements have always contributed to the advancement of biomedical research. It has never been more so than in recent times, when the development and application of innovative cutting-edge technologies have transformed biology into a data-rich quantitative science. This stunning revolution in biology primarily ensued from the emergence of microarrays over two decades ago. The completion of whole-genome sequencing projects and the advance in microarray manufacturing technologies enabled the development of tiling microarrays, which gave unprecedented genomic coverage. Since their first description, several types of application of tiling arrays have emerged, each aiming to tackle a different biological problem. Although numerous algorithms have already been developed to analyze microarray data, new method development is still needed not only for better performance but also for integration of available microarray data sets, which without doubt constitute one of the largest collections of biological data ever generated. In this chapter we first introduce the principles behind the emergence and the development of tiling microarrays, and then discuss with some examples how they are used to investigate different biological problems.
    Methods in molecular biology (Clifton, N.J.) 08/2013; 1067:3-19. DOI:10.1007/978-1-62703-607-8_1 · 1.29 Impact Factor
Show more