[Show abstract][Hide abstract] ABSTRACT: We have used large surveys of Affymetrix GeneChip data in the public domain to conduct a study of antisense expression across diverse conditions. We derive correlations between groups of probes which map uniquely to the same exon in the antisense direction. When there are no probes assigned to an exon in the sense direction we find that many of the antisense groups fail to detect a coherent block of transcription. We find that only a minority of these groups contain coherent blocks of antisense expression suggesting transcription. We also derive correlations between groups of probes which map uniquely to the same exon in both sense and antisense direction. In some of these cases the locations of sense probes overlap with the antisense probes, and the sense and antisense probe intensities are correlated with each other. This configuration suggests the existence of a Natural Antisense Transcript (NAT) pair. We find the majority of such NAT pairs detected by GeneChips are formed by a transcript of an established gene and either an EST or an mRNA. In order to determine the exact antisense regulatory mechanism indicated by the correlation of sense probes with antisense probes, a further investigation is necessary for every particular case of interest. However, the analysis of microarray data has proved to be a good method to reconfirm known NATs, discover new ones, as well as to notice possible problems in the annotation of antisense transcripts.
Journal of integrative bioinformatics 01/2010; 7(2). DOI:10.2390/biecoll-jib-2010-114
[Show abstract][Hide abstract] ABSTRACT: A chimeric transcript is a single RNA sequence which results from the transcription of two adjacent genes. Recent studies estimate that at least 4% of tandem human gene pairs may form chimeric transcripts. Affymetrix GeneChip data are used to study the expression patterns of tens of thousands of genes and the probe sequences used in these microarrays can potentially map to exotic RNA sequences such as chimeras.
We have studied human chimeras and investigated their expression patterns using large surveys of Affymetrix microarray data obtained from the Gene Expression Omnibus. We show that for six probe sets, a unique probe mapping to a transcript produced by one of the adjacent genes can be used to identify the expression patterns of readthrough transcripts. Furthermore, unique probes mapping to an intergenic exon present only in the MASK-BP3 chimera can be used directly to study the expression levels of this transcript.
We have attempted to implement a new method for identifying tandem chimerism. In this analysis unambiguous probes are needed to measure run-off transcription and probes that map to intergenic exons are particularly valuable for identifying the expression of chimeras.
Journal of integrative bioinformatics 01/2010; 7(3). DOI:10.2390/biecoll-jib-2010-137
[Show abstract][Hide abstract] ABSTRACT: We describe various types of outliers seen in Affymetrix GeneChip data. We have been able to utilise the data in the Gene
Expression Omnibus to screen GeneChips across a range of scales, from single probes, to spatially adjacent fractions of arrays,
to whole arrays, to whole experiments. In this review we describe a number of causes for why some reported intensities might
be misleading on GeneChips.
Briefings in Functional Genomics and Proteomics 06/2009; 8(3):199-212. DOI:10.1093/bfgp/elp027
[Show abstract][Hide abstract] ABSTRACT: We are developing a computational pipeline to use surveys of Affymetrix GeneChips as a discovery tool for unravelling some of the biology associated with post-transcriptional processing of RNA. This work involves the integration of a number of bioinformatics resources, from comparing annotations to processing images to determining the structure of transcripts. The rapidly growing datasets of GeneChips available to the community puts us in a strong position to discover novel biology about post-transcriptional processing, and should enable us to determine the mechanisms by which some groups of genes make co-ordinated changes in their production of isoforms.
[Show abstract][Hide abstract] ABSTRACT: We have developed a computational pipeline to analyse large surveys of Affymetrix GeneChips, for example NCBI's Gene Expression Omnibus. GEO samples data for many organisms, tissues and phenotypes. Because of this experimental diversity, any observed correlations between probe intensities can be associated either with biology that is robust, such as common co-expression, or with systematic biases associated with the GeneChip technology. Our bioinformatics pipeline integrates the mapping of probes to exons, quality control checks on each GeneChip which identifies flaws in hybridization quality, and the mining of correlations in intensities between groups of probes. The output from our pipeline has enabled us to identify systematic biases in GeneChip data. We are also able to use the pipeline as a discovery tool for biology. We have discovered that in the majority of cases, Affymetrix probesets on Human GeneChips do not measure one unique block of transcription. Instead we see numerous examples of outlier probes. Our study has also identified that in a number of probesets the mismatch probes are an informative diagnostic of expression, rather than providing a measure of background contamination. We report evidence for systematic biases in GeneChip technology associated with probe-probe interactions. We also see signatures associated with post-transcriptional processing of RNA, such as alternative polyadenylation.
Journal of integrative bioinformatics 01/2008; 5(2). DOI:10.2390/biecoll-jib-2008-98
[Show abstract][Hide abstract] ABSTRACT: Affymetrix GeneChip technology enables the parallel observations of tens of thousands of genes. It is important that the probe set annotations are reliable so that biological inferences can be made about genes which undergo differential expression. Probe sets representing the same gene might be expected to show similar fold changes/z-scores, however this is in fact not the case.
We have made a case study of the mouse Surf4, chosen because it is a gene that was reported to be represented by the same eight probe sets on the MOE430A array by both Affymetrix and Bioconductor in early 2004. Only five of the probe sets actually detect Surf4 transcripts. Two of the probe sets detect splice variants of Surf2. We have also studied the expression changes of the eight probe sets in a public-domain microarray experiment. The transcripts for Surf4 are correlated in time, and similarly the transcripts for Surf2 are also correlated in time. However, the transcripts for Surf4 and Surf2 are not correlated. This proof of principle shows that observations of expression can be used to confirm, or otherwise, annotation discrepancies. We have also investigated groups of probe sets on the RAE230A array that are assigned to the same LocusID, but which show large variances in differential expression in any one of three different experiments on rat. The probe set groups with high variances are found to represent cases of alternative splicing, use of alternative poly(A) signals, or incorrect annotations.
Our results indicate that some probe sets should not be considered as unique measures of transcription, because the individual probes map to more than one transcript dependent upon the biological condition. Our results highlight the need for care when assessing whether groups of probe sets all measure the same transcript.
[Show abstract][Hide abstract] ABSTRACT: We have compared Affymetrix and Bioconductor annotations for the MOE430A (mouse) GeneChip array. The mappings of probe sets to LocusLink identifiers (LocusIDs) were found to be dynamic, with many changes between successive releases of annotation for both Affymetrix and Bioconductor. There are 49 probe sets that are assigned to one LocusID by Affymetrix and to a different LocusID by Bioconductor from mid-2004 onwards. For virtually all of these examples, the Affymetrix annotation was found to be the one that is in agreement with the current gene prediction. Reference sequence (RefSeq) identifiers are considered to be the gold standard of annotations. However, we could not use these identifiers to discriminate between the accuracy of Bioconductor and Affymetrix because not all of the probes map to the RefSeq transcript to which the probe set is assigned. Moreover, in some cases, probes align to regions downstream of the 3' end of a RefSeq transcript. Adjacent genes were found to be a major cause of discrepancies between the Bioconductor and Affymetrix assignments. Case studies of several probe sets indicated that incorrect assignments are caused by the UniGene cluster assignments of expressed sequence tags representing the probe sets, and by errors in GenBank sequences. Our results indicate that there are a number of errors remaining in the annotation sources used by the microarray community.