Detection of differentially expressed segments in tiling array data

Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
Bioinformatics (Impact Factor: 4.98). 04/2012; 28(11):1471-9. DOI: 10.1093/bioinformatics/bts142
Source: PubMed

ABSTRACT Tiling arrays have been a mainstay of unbiased genome-wide transcriptomics over the last decade. Currently available approaches to identify expressed or differentially expressed segments in tiling array data are limited in the recovery of the underlying gene structures and require several parameters that are intensity-related or partly dataset-specific.
We have developed TileShuffle, a statistical approach that identifies transcribed and differentially expressed segments as significant differences from the background distribution while considering sequence-specific affinity biases and cross-hybridization. It avoids dataset-specific parameters in order to provide better comparability of different tiling array datasets, based on different technologies or array designs. TileShuffle detects highly and differentially expressed segments in biological data with significantly lower false discovery rates under equal sensitivities than commonly used methods. Also, it is clearly superior in the recovery of exon-intron structures. It further provides window z-scores as a normalized and robust measure for visual inspection.
The R package including documentation and examples is freely available at

Download full-text


Available from: Christian Otto, Dec 02, 2014
12 Reads
  • Source
    • "Global RNA expression was analyzed using Affymetrix whole genome tiling arrays, which interrogate the non-repetitive part, i.e. approximately 40%, of the human genome. Transcriptionally active regions in the genome (TARs) were identified using TileShuffle[41]. Briefly, TileShuffle identifies segments in the tiling array data that are expressed significantly higher than an affinity controlled background distribution. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The genome is pervasively transcribed but most transcripts do not code for proteins, constituting non-protein coding RNAs. Despite increasing numbers of functional reports of individual long noncoding RNAs (lncRNAs), assessing the extent of functionality among the non-coding transcriptional output of mammalian cells remains intricate. In the protein coding world, transcripts differentially expressed in the context of processes essential for the survival of multicellular organisms have been instrumental for the discovery of functionally relevant proteins and their deregulation is frequently associated with diseases. We therefore systematically identify lncRNAs expressed differentially in response to oncologically relevant processes, cell-cycle, p53-, and STAT3 pathway, using tiling arrays. We find that up to 80% of the pathway-triggered transcriptional response can be non-coding. Among these we identify very large macroRNAs with pathway-specific expression patterns and demonstrate that these are likely continuous transcripts. MacroRNAs contain elements conserved in mammals and sauropsids, which in part exhibit conserved RNA secondary structure. Comparing evolutionary rates of a macroRNA to adjacent protein coding genes suggests a local action of the transcript. Finally, in different grades of astrocytoma, a tumor disease unrelated to the initially used cell lines, macroRNAs are differentially expressed. It has been shown previously that the majority of expressed non-ribosomal transcripts are non-coding. We now conclude that differential expression triggered by signaling pathways gives rise to a similar abundance of non-coding content. It is thus unlikely that the prevalence of non-coding transcripts in the cell is a trivial consequence of leaky or random transcription events.
    Genome biology 03/2014; 15(3):R48. DOI:10.1186/gb-2014-15-3-r48 · 10.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Existing statistical methods for tiling array transcriptome data either focus on transcript discovery in one biological or experimental condition or on the detection of differential expression between two conditions. Increasingly often, however, biologists are interested in time-course studies, studies with more than two conditions or even multiple-factor studies. As these studies are currently analyzed with the traditional microarray analysis techniques, they do not exploit the genome-wide nature of tiling array data to its full potential. Results We present an R Bioconductor package, waveTiling, which implements a wavelet-based model for analyzing transcriptome data and extends it towards more complex experimental designs. With waveTiling the user is able to discover (1) group-wise expressed regions, (2) differentially expressed regions between any two groups in single-factor studies and in (3) multifactorial designs. Moreover, for time-course experiments it is also possible to detect (4) linear time effects and (5) a circadian rhythm of transcripts. By considering the expression values of the individual tiling probes as a function of genomic position, effect regions can be detected regardless of existing annotation. Three case studies with different experimental set-ups illustrate the use and the flexibility of the model-based transcriptome analysis. Conclusions The waveTiling package provides the user with a convenient tool for the analysis of tiling array trancriptome data for a multitude of experimental set-ups. Regardless of the study design, the probe-wise analysis allows for the detection of transcriptional effects in both exonic, intronic and intergenic regions, without prior consultation of existing annotation.
    BMC Bioinformatics 09/2012; 13(1):234. DOI:10.1186/1471-2105-13-234 · 2.58 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Long non-coding ribonucleic acids (lncRNAs) have been proposed as biomarkers in prostate cancer. This paper proposes a selection method which uses data from tiled microarrays to identify relatively long regions of moderate expression independent of the microarray platform and probe design. The method is used to search for candidate long non-coding ribonucleic acids (lncRNAs) at locus 8q24 and is run on three independent experiments which all use samples from prostate cancer patients. The robustness of the method is tested by utilizing repeated copies of tiled probes. The method shows high consistency between experiments that used the same samples, but different probe layout. There also is statistically significant consistency when comparing experiments with different samples. The method selected the long non-coding ribonucleic acid PCNCR1 in all three experiments.
    PLoS ONE 06/2014; 9(6):e99899. DOI:10.1371/journal.pone.0099899 · 3.23 Impact Factor