Falcon, S. & Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 2, 257-258

Fred Hutchison Cancer Research Center, Program Computational Biology,1100 Fairview Avenue North P. O. Box 19024, Seattle, WA 98109, USA.
Bioinformatics (Impact Factor: 4.98). 02/2007; 23(2):257-8. DOI: 10.1093/bioinformatics/btl567
Source: PubMed


Functional analyses based on the association of Gene Ontology (GO) terms to genes in a selected gene list are useful bioinformatic tools and the GOstats package has been widely used to perform such computations. In this paper we report significant improvements and extensions such as support for conditional testing.
We discuss the capabilities of GOstats, a Bioconductor package written in R, that allows users to test GO terms for over or under-representation using either a classical hypergeometric test or a conditional hypergeometric that uses the relationships among GO terms to decorrelate the results.
GOstats is available as an R package from the Bioconductor project:

Full-text preview

Available from:
  • Source
    • "Gene sets not believed to be relevant will not be tested with the result that novel associations may never be found. For hierarchical gene set collections such as GO, methods have also been developed that reduce the number of tested gene sets by using information theoretic measures [12], [13] or by computing the association for gene sets higher in the hierarchy conditional on the results for child gene sets [14], [15], [16], [17]. Although such GO-specific methods are effective at addressing the significant overlap between GO term annotations, they are specific to hierarchical gene set collections and, for those based on a specific data set, use a criteria for filtering is not independent of the statistic used to test gene set enrichment. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set ltering (SGSF), a novel technique for independent ltering of gene set collections prior to gene set testing. The SGSF method uses as a lter statistic the p-value measuring the statistical signicance of the association between each gene set and the sample principal components (PCs), taking into account the signicance of the associated eigenvalues. Because this lter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately lters gene sets unrelated to the experimental outcome resulting in signicantly increased gene set testing powe
    IEEE/ACM Transactions on Computational Biology and Bioinformatics 10/2015; 12(5):1-1. DOI:10.1109/TCBB.2015.2415815 · 1.44 Impact Factor
  • Source
    • "The GO enrichment analysis was conducted using the Bioconductor package GOstats (Falcon and Gentleman, 2007) as previously described (Liu et al. 2015). For the plots in Figure 4, all enriched goBP terms of five transcription factors were pooled, sorted by the term size, and filtered for enriched GO terms with sizes ranging from 100 to 1000, resulting in 120 terms total. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Identifying transcription factor target genes is essential for modeling the transcriptional networks underlying developmental processes. Here we report a chromatin immunoprecipitation sequencing (ChIP-seq) resource consisting of genome-wide binding regions and associated putative target genes for four Populus homeodomain transcription factors expressed during secondary growth and wood formation. Software code (programs and scripts) for processing the Populus ChIP-seq data are provided within a publically-available iPlant image, including tools for ChIP-seq data quality control and evaluation adapted from the human ENCODE project. Basic information for each transcription factor (including members of Class I KNOX, Class III HD ZIP, BEL1-like families) binding are summarized, including the number and location of binding regions, distribution of binding regions relative to gene features, associated putative target genes, and enriched functional categories of putative target genes. These ChIP-seq data have been integrated within the Populus Genome Integrative Explorer (PopGenIE) where they can be analyzed using a variety of web-based tools. We present an example analysis that shows preferential binding of transcription factor ARBORKNOX1 to the nearest neighbor genes in a pre-calculated co-expression network module, and enrichment for meristem-related genes within this module including multiple orthologs of Arabidopsis KNOTTED-like Arabidopsis 2/6. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    The Plant Journal 04/2015; 82(5). DOI:10.1111/tpj.12850 · 5.97 Impact Factor
  • Source
    • "The number of novel isoforms was determined from the final annotation based on transcripts flagged by Cufflinks with class_code ''j'' (indicating a novel, spliced transcript sharing at least one junction with an annotated transcript). Human GO terms were mapped to X. tropicalis genes and the hypergeometric test for enriched terms was performed using the GOstats Bioconductor package (Falcon and Gentleman, 2007). The aligned reads have been submitted to the NCBI Sequence Read Archive under BioProject ID PRJNA266550. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Alternative splicing is pervasive in vertebrates, yet little is known about most isoforms or their regulation. transformer-2b (tra2b) encodes a splicing regulator whose endogenous function is poorly understood. Tra2b knockdown in Xenopus results in embryos with multiple defects, including defective somitogenesis. Using RNA sequencing, we identify 142 splice changes (mostly intron retention and exon skipping), 89% of which are not in current annotations. A previously undescribed isoform of wnt11b retains the last intron, resulting in a truncated ligand (Wnt11b-short). We show that this isoform acts as a dominant-negative ligand in cardiac gene induction and pronephric tubule formation. To determine the contribution of Wnt11b-short to the tra2b phenotype, we induce retention of intron 4 in wnt11b, which recapitulates the failure to form somites but not other tra2b morphant defects. This alternative splicing of a Wnt ligand adds intricacy to a complex signaling pathway and highlights intron retention as a regulatory mechanism.
    Cell Reports 02/2015; 10(1):1-10. DOI:10.1016/j.celrep.2014.12.046 · 8.36 Impact Factor
Show more