Biological profiling of Gene Groups utilizing Gene Ontology – A Statistical Framework

Institute for Theoretical Biology, Humboldt University Berlin, Germany.
Genome informatics. International Conference on Genome Informatics 02/2005; 16(1):106-15. DOI: 10.11234/gi1990.16.106
Source: PubMed


Increasingly used high throughput experimental techniques, like DNA or protein microarrays give as a result groups of interesting, e.g. differentially regulated genes which require further biological interpretation. With the systematic functional annotation provided by the Gene Ontology the information required to automate the interpretation task is now accessible. However, the determination of statistical significance of a biological process within these groups is still an open question. In answering this question, multiple testing issues must be taken into account to avoid misleading results. Here we present a statistical framework that tests whether functions, processes or locations described in the Gene Ontology are significantly enriched within a group of interesting genes when compared to a reference group. First we define an exact analytical expression for the expected number of false positives that allows us to calculate adjusted p-values to control the false discovery rate. Next, we demonstrate and discuss the capabilities of our approach using publicly available microarray data on cell-cycle regulated genes. Further, we analyze the robustness of our framework with respect to the exact gene group composition and compare the performance with earlier approaches. The software package GOSSIP implements our method and is made freely available at

Download full-text


Available from: Hanspeter Herzel, Mar 12, 2014
31 Reads
  • Source
    • "The GOSSIP program embedded in the Blast2GO package was used for GO enrichment analysis (Bluthgen et al., 2005a). This program examines each GO term for gene annotation enrichment by comparing a test set with a reference set using Fisher's exacttest method. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genetic improvement for enhanced disease resistance in fish is an increasingly utilized approach to mitigate endemic infectious disease in aquaculture. In domesticated salmonid populations, large phenotypic variation in disease resistance has been identified but the genetic basis for altered responsiveness remains unclear. We previously reported three generations of selection and phenotypic validation of a bacterial cold water disease (BCWD) resistant line of rainbow trout, designated ARS-Fp-R. This line has higher survival after infection by either standardized laboratory challenge or natural challenge as compared to two reference lines, designated ARS-Fp-C (control) and ARS-Fp-S (susceptible). In this study, we utilized 1.1 g fry from the three genetic lines and performed RNA-seq to measure transcript abundance from the whole body of naive and Flavobacterium psychrophilum infected fish at day 1 (early time-point) and at day 5 post-challenge (onset of mortality). Sequences from 24 libraries were mapped onto the rainbow trout genome reference transcriptome of 46,585 predicted protein coding mRNAs that included 2633 putative immune-relevant gene transcripts. A total of 1884 genes (4.0% genome) exhibited differential transcript abundance between infected and mock-challenged fish (FDR < 0.05) that included chemokines, complement components, tnf receptor superfamily members, interleukins, nod-like receptor family members, and genes involved in metabolism and wound healing. The largest number of differentially expressed genes occurred on day 5 post-infection between naive and challenged ARS-Fp-S line fish correlating with high bacterial load. After excluding the effect of infection, we identified 21 differentially expressed genes between the three genetic lines. In summary, these data indicate global transcriptome differences between genetic lines of naive animals as well as differentially regulated transcriptional responses to infection.
    Frontiers in Genetics 01/2015; 5(453):1-15. DOI:10.3389/fgene.2014.00453
  • Source
    • "The analysis of biological processes/pathways was carried out using the KEGG (Kyoto Encyclopedia of Genes and Genomes) map module supported by the Blast2GO bioinformatics tool. Blast2GO also enabled analysis related to over-representation of functional categories through the Gossip package (Blüthgen et al., 2005) for statistical assessment of annotation differences between two sets of sequences, using Fisher's exact test for each GO term. False discovery rate (FDR) controlled P values (FDR < 0.01) were used for the assessment of differentially significant metabolic pathways. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A fundamental goal of cold acclimation research is to understand the mechanisms responsible for the increase in freezing tolerance in response to environmental cues. Changes in gene expression underlie some of the biochemical and physiological changes that occur during cold acclimation. Detailed and comprehensive transcriptome annotation can be considered a prerequisite for effective analysis and a fast and cost-effective way to rapidly obtain information in the context of a given physiological condition. By computational predictions and manual curation, we have annotated 454 sequence assemblies from two blueberry cDNA libraries that represent flower buds in the first and second stages of cold acclimation. Gene ontology functional classification terms were retrieved for 4343 (80.0%) sequences. GO annotation files compatible with a commonly used annotation tool have been generated and are publicly available. By mining the dataset further, it was possible to associate presence of certain transcripts related to carbohydrate metabolism and lipid metabolism with different stages of cold acclimation. This was concomitant with differential presence of Zn finger functional domains and C3H-family transcription factors. The expression of a few selected genes was validated by quantitative real-time PCR assay. Results demonstrate that our transcriptome database is a rich resource for mining cold acclimation-responsive genes.
    Environmental and Experimental Botany 10/2014; 106. DOI:10.1016/j.envexpbot.2013.12.017 · 3.36 Impact Factor
  • Source
    • "A Gene Ontology (GO) term enrichment analysis [48] was performed on the sets of transcripts showing a significant change in expression. The results showed that in response to the non-virulent treatment, 12 biological processes were significantly enriched in the set of up-regulated genes (p<0.05; "
    [Show abstract] [Hide abstract]
    ABSTRACT: Global change and its associated temperature increase has directly or indirectly changed the distributions of hosts and pathogens, and has affected host immunity, pathogen virulence and growth rates. This has resulted in increased disease in natural plant and animal populations worldwide, including scleractinian corals. While the effects of temperature increase on immunity and pathogen virulence have been clearly identified, their interaction, synergy and relative weight during pathogenesis remain poorly documented. We investigated these phenomena in the interaction between the coral Pocillopora damicorni s and the bacterium Vibrio coralliilyticus , for which the infection process is temperature-dependent. We developed an experimental model that enabled unraveling the effects of thermal stress, and virulence vs. non-virulence of the bacterium. The physiological impacts of various treatments were quantified at the transcriptome level using a combination of RNA sequencing and targeted approaches. The results showed that thermal stress triggered a general weakening of the coral, making it more prone to infection, non-virulent bacterium induced an ‘efficient’ immune response, whereas virulent bacterium caused immuno-suppression in its host.
    PLoS ONE 09/2014; 9(9-9):e107672. DOI:10.1371/journal.pone.0107672 · 3.23 Impact Factor
Show more