GATExplorer: Genomic and Transcriptomic Explorer; mapping expression probes to gene loci, transcripts, exons and ncRNAs

Bioinformatics and Functional Genomics Research Group, Cancer Research Center (CiC-IBMCC, CSIC/USAL), Salamanca, Spain.
BMC Bioinformatics (Impact Factor: 2.67). 04/2010; 11(1):221. DOI: 10.1186/1471-2105-11-221
Source: PubMed

ABSTRACT Genome-wide expression studies have developed exponentially in recent years as a result of extensive use of microarray technology. However, expression signals are typically calculated using the assignment of "probesets" to genes, without addressing the problem of "gene" definition or proper consideration of the location of the measuring probes in the context of the currently known genomes and transcriptomes. Moreover, as our knowledge of metazoan genomes improves, the number of both protein-coding and noncoding genes, as well as their associated isoforms, continues to increase. Consequently, there is a need for new databases that combine genomic and transcriptomic information and provide updated mapping of expression probes to current genomic annotations.
GATExplorer (Genomic and Transcriptomic Explorer) is a database and web platform that integrates a gene loci browser with nucleotide level mappings of oligo probes from expression microarrays. It allows interactive exploration of gene loci, transcripts and exons of human, mouse and rat genomes, and shows the specific location of all mappable Affymetrix microarray probes and their respective expression levels in a broad set of biological samples. The web site allows visualization of probes in their genomic context together with any associated protein-coding or noncoding transcripts. In the case of all-exon arrays, this provides a means by which the expression of the individual exons within a gene can be compared, thereby facilitating the identification and analysis of alternatively spliced exons. The application integrates data from four major source databases: Ensembl, RNAdb, Affymetrix and GeneAtlas; and it provides the users with a series of files and packages (R CDFs) to analyze particular query expression datasets. The maps cover both the widely used Affymetrix GeneChip microarrays based on 3' expression (e.g. human HG U133 series) and the all-exon expression microarrays (Gene 1.0 and Exon 1.0).
GATExplorer is an integrated database that combines genomic/transcriptomic visualization with nucleotide-level probe mapping. By considering expression at the nucleotide level rather than the gene level, it shows that the arrays detect expression signals from entities that most researchers do not contemplate or discriminate. This approach provides the means to undertake a higher resolution analysis of microarray data and potentially extract considerably more detailed and biologically accurate information from existing and future microarray experiments.

Download full-text


Available from: Marcel Dinger, Jul 26, 2015
  • Source
    • "Biological researchers are constantly looking for fresh ways of drawing conclusions from complex data sets and combining them to generate new insights [6]. As a result, we are witnessing a rapid increase in data visualisation tools and software libraries developed in the context of genomics [7] [8] [9] [10] [11], transcriptomics [12] [13], proteomics [14] [15] [16] and systems biology approaches [5] [17], among other 'omics' fields. These tools can be broadly classified in two main categories: "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent advances in high-throughput experimental techniques have led to an exponential increase in both the size and the complexity of the datasets commonly studied in biology. Data visualisation is increasingly used as the key to unlock this data, going from hypothesis generation to model evaluation and tool implementation. It is becoming more and more the heart of bioinformatics workflows, enabling scientists to reason and communicate more effectively. In parallel, there has been a corresponding trend towards the development of related software, which has triggered the maturation of different visualisation libraries and frameworks. For bioinformaticians, scientific programmers and software developers, the main challenge is to pick out the most fitting one(s) to create clear, meaningful and integrated data visualisation for their particular use cases. In this review, we introduce a collection of open source or free to use libraries and frameworks for creating data visualisation, covering the generation of a wide variety of charts and graphs. We will focus on software written in Java, JavaScript or Python. We truly believe this software offers the potential to turn tedious data into exciting visual stories. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    Proteomics 02/2015; 15(8). DOI:10.1002/pmic.201400377 · 3.97 Impact Factor
  • Source
    • "Currently, there are at least 11 databases which record lncRNAs (Dinger et al. 2009; Amaral et al. 2011; Bu et al. 2012; Risueño et al. 2010; Gibb et al. 2011a). "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper the TileShuffle method is evaluated as a search method for candidate lncRNAs at 8q24.2. The method is run on three microarrays. Microarrays which all contained the same sample and repeated copies of tiled probes. This allows the coherence of the selection method within and between microarrays to be estimated by Monte Carlo simulations on the repeated probes.
  • Source
    • "With the availability of individual probe and reference genome sequences, it is possible to re-map probes based on new sources of genome annotations. This allows custom Chip Description Files (CDFs) wherein probes are grouped into novel probe sets based on exon, transcript, and gene level annotation [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]. Most notable is the effort of the BrainArray group [10] which updates custom CDFs for a large number of Affymetrix® GeneChips® by creating probe sets based on annotated features such as Entrez Gene [23], Ensembl transcript, Ensembl gene, and RefSeq Gene [24]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Affymetrix® GeneChip® micro array design defines probe sets consisting of 11, 16, or 20 distinct 25 base pair (BP) probes for determining mRNA expression for a specific gene, which may be covered by one or more probe sets. Each probe has a corresponding perfect match (PM) and mismatch (MM) set. Traditional analytical techniques have either used the MM probes to determine the level of cross-hybridization or reliability of the PM probe, or have been completely ignored. Given the availability of reference genome sequences, we have reanalyzed the mapping of both PM and MM probes to reference genomes in transcript regions. Our results suggest that depending of the species of interest, 66%-93% of the PM probes can be used reliably in terms of single unique matches to the genome, while a small number of the MM probes (typically less than 1%) could be incorporated into the analysis. In addition, we have examined the mapping of PM and MM probes to five different human genome projects, resulting in approximately a 70% overlap of uniquely mapping PM probes, and a subset of 51 uniquely mapping MM probes commonly found in all five projects, 24 of which are found within annotated exonic regions. These results suggest that individual variation in transcriptome regions provides an additional complexity to micro array data analysis. Given these results, we conclude that the development of custom chip definition files (CDFs) should include MM probe sequences to provide the most effective means of transcriptome analysis of Affymetrix® GeneChip® arrays.
    Proceedings of the 2012 ASE/IEEE International Conference on BioMedical Computing (BioMedCom 2012), Washington D.C., DC, USA; 12/2012
Show more