Question
Asked 11th Feb, 2015

How can I analyze a set of DEGs (differentially expressed genes) to obtain information from them?

I have obtained a set of DEGs after making de-novo RNA-seq data analysis with Trinity. I've already made the annotation of the DEGs using KAAS. What other tests can I do?
Thank you

Most recent answer

9th Jul, 2021
Frinty Varghese
University of Aberdeen
How to Create list of DEG data for meta-analysis in Metavolcano R?

Popular answers (1)

26th Feb, 2015
Sorin Draghici
Wayne State University
There are a number of questions that you may answer using your set of DEGs. These include:
1) What are the biological process, cellular locations and molecular functions that are particularly over- or under-represented in your set of genes.
2) What are the pathways that are significantly impacted in your condition?
3) What are the putative mechanisms that would explain all observed gene expression changes in your system?
4) What are the miRNAs that could be important in this condition based on the gene expression changes observed?
Here are some details about each of these:
1) What are the biological process, cellular locations and molecular functions that are particularly over- or under-represented in your set of genes. This is generally referred to as GO analysis. There are many tools that allow you to do this but there are also many, many mistakes related to the way the p-values are calculated and corrected for multiple comparisons, the association of genes to multiple GO terms and the implicit redundancy, the choice of the background set of genes, etc.. You may find this Nature Reviews Genetics paper useful: "Use and misuse of the gene ontology annotations" http://www.nature.com/nrg/journal/v9/n7/full/nrg2363.html
2) What are the signaling pathways significantly impacted in your phenotype (or significantly different between the two phenotypes compared)? There are several tools that do this kind of analysis. They can be divided into several categories:
2a) simple look up into pathway databases (eg. Reactome, KEGG, etc.). Here your set of DEG will be mapped on pathways. No real analysis is performed but you see what DEGs are on each pathway. The limitations are obvious: no p-values are calculated, no idea about which pathways are affected beyond random chance, etc.
2b) enrichment analysis of pathways. Here, pathways are considered as simple sets of genes and an enrichment p-value is calculated for each (e.g. DAVID, GSEA, Ingenuity, etc.). The limitations include the fact that the p-values are calculated based on the assumption that all variables (genes) are independent while the pathways are there precisely to tell you how these genes influence each other. Another limitation is that the pathways are treated as simple bags of genes, disregarding all the phenomena and interactions between genes that they describe. This analysis approach only looks at the number of DE genes and makes no difference between a situation in which a pathway has 3 entry points and all 3 are severely down-regulated thus effectively shutting down the entire pathway and a situation in which 3 other random genes are down regulated on the same pathway. See this Genome Research paper for instance about the limitations of the enrichment approach: http://vortex.cs.wayne.edu/papers/Genome_Research_reprint1537.pdf
2c) functional analysis of pathways (iPathway-Guide, ROnto-Tools). Here, the topology of pathway is fully taken into consideration and the impact on the pathway is calculated by propagating the signals through the pathway topology, as well as by considering the number of DEGs. See chapter 28 in here for details: http://www.amazon.com/Statistics-Microarrays-Bioconductor-Mathematical-Computational/dp/1439809755/ref=sr_1_1?ie=UTF8&qid=1422476470&sr=8-1&keywords=sorin++draghici
Techniques that address the last two questions (putative mechanisms and miRNAs) are not well documented in literature. iPathway-Guide (http://www.advaitabio.com/ipathwayguide.html#page=page-1) does offer both. Some of the sample datasets available for it are coming from phenotypes created by knocking down a specific gene. The tool is able to correctly identify the mechanism as a chain of gene expression changes that starts with the knock-out gene. In another sample data set, the phenotype is creating by treating with the mimic of a specific miRNA. Even in this case, the tool is able to correctly identify both the mechanism as well as the specific miRNA. so it's pretty interesting. The tool is commercial but one can do an unlimited number of analyses for free. The results are fully accessible for a few days. There is a charge only for storing the results over a longer period of time.
In case you prefer research software or a DIY implementation here is a review paper discussing over 20 topological pathway analysis approaches: http://journal.frontiersin.org/Article/10.3389/fphys.2013.00278/pdf
52 Recommendations

All Answers (11)

11th Feb, 2015
Serhii Vakal
Åbo Akademi University
For example, you can perform GO-terms enrichment analysis to find out in which biological processes these DEGs mostly involved. You can build protein-protein interactions map, reveal which transcription factors control these DEGs etc. A lot of options and bioinformatic tools are available for these tasks.
3 Recommendations
11th Feb, 2015
Robert A Walker
James Cook University
Another approach would be to identify putative regulatory elements in the promoters of genes that show similar expression patterns (i.e. thos which have similar fold-change patterns between different samples). If you have a good genomic resource for your organism of interest you can download the putative promoters (usually 1kb upstream of the transcriptional start site is a good starting point) of all the genes you are interested in and search for enriched DNA motifs - you can use the on-line programs RSAT (http://www.rsat.eu/) or MEME (http://meme.nbcr.net/meme/) . These DNA motifs may be recognised by sequence-specific transcription factors, thereby giving you insight into how this particular transcription profile may be regulated.
8 Recommendations
19th Feb, 2015
Pedro de los Reyes Rodríguez
Max Planck Institute for Plant Breeding Research
Thanks for the answers!! I will try those options.
19th Feb, 2015
Alena Zolotarenko
Vavilov Institute of General Genetics
Cytoscape have different apps to visualise your DEGs in the terms of GO, pathways, PPI and many others. And i agree with Yuriy about the "start package" - DAVID is useful to get the main idea about processes enriched with DEGs, it is very simple and fast. You can find there a particular process of interest and download corresponding DEG list, then use this list for PPI analysis in STRING - to find putative partners for further research. REACTOME, or similar database of pathways - like KEGG, that you`ve used, MetaCore, Ingenuity, iPathwayguide or others - will give you the idea of gene networks altered in your experiment - you may then want to work with upstream regulators or downstream targets. I`ve never worked with Consensus Path DB, but it looks rather useful, even though a bit complicated) 
Good luck with your research!
4 Recommendations
19th Feb, 2015
Andrey Morgun
Oregon State University
we have just wrote the guide for biologist a bout this. try it.
it is coming in Bioinformatics and Biology Insights some time soon.
meanwhile you read less polished version here.
2 Recommendations
21st Feb, 2015
Michele Donato
Stanford Medicine
Hi,
a good book is the following:
I would go for the eBook, as it is not only cheaper, but probably easier to navigate. This book focuses on microarrays, but at the end is the DEG analysis what counts.
With gene expression data you can perform  go enrichment (see bioconductor packages gostats or topgo), pathway analysis (see ROntoTools) or gene set analysis (gsea or gsa). In terms of commercial software try iPathwayGuide (analyses are free, you pay only to retrieve them later on), at www.advaitabio.com. That software does go analysis, gene set, pathway analysis, and it tells you which diseases, variants, and microRNAs are associated with your list of genes.
1 Recommendation
24th Feb, 2015
Alain Coletta
www.geneplaza.com
You can copy paste your list of genes into EnrichR: http://amp.pharm.mssm.edu/Enrichr/
EnrichR will run several web-services on your behalf, many previously mentioned above. You will easily be able to obtain results from various analysis: pathways, ontologies, drugs, etc...
6 Recommendations
26th Feb, 2015
Sorin Draghici
Wayne State University
There are a number of questions that you may answer using your set of DEGs. These include:
1) What are the biological process, cellular locations and molecular functions that are particularly over- or under-represented in your set of genes.
2) What are the pathways that are significantly impacted in your condition?
3) What are the putative mechanisms that would explain all observed gene expression changes in your system?
4) What are the miRNAs that could be important in this condition based on the gene expression changes observed?
Here are some details about each of these:
1) What are the biological process, cellular locations and molecular functions that are particularly over- or under-represented in your set of genes. This is generally referred to as GO analysis. There are many tools that allow you to do this but there are also many, many mistakes related to the way the p-values are calculated and corrected for multiple comparisons, the association of genes to multiple GO terms and the implicit redundancy, the choice of the background set of genes, etc.. You may find this Nature Reviews Genetics paper useful: "Use and misuse of the gene ontology annotations" http://www.nature.com/nrg/journal/v9/n7/full/nrg2363.html
2) What are the signaling pathways significantly impacted in your phenotype (or significantly different between the two phenotypes compared)? There are several tools that do this kind of analysis. They can be divided into several categories:
2a) simple look up into pathway databases (eg. Reactome, KEGG, etc.). Here your set of DEG will be mapped on pathways. No real analysis is performed but you see what DEGs are on each pathway. The limitations are obvious: no p-values are calculated, no idea about which pathways are affected beyond random chance, etc.
2b) enrichment analysis of pathways. Here, pathways are considered as simple sets of genes and an enrichment p-value is calculated for each (e.g. DAVID, GSEA, Ingenuity, etc.). The limitations include the fact that the p-values are calculated based on the assumption that all variables (genes) are independent while the pathways are there precisely to tell you how these genes influence each other. Another limitation is that the pathways are treated as simple bags of genes, disregarding all the phenomena and interactions between genes that they describe. This analysis approach only looks at the number of DE genes and makes no difference between a situation in which a pathway has 3 entry points and all 3 are severely down-regulated thus effectively shutting down the entire pathway and a situation in which 3 other random genes are down regulated on the same pathway. See this Genome Research paper for instance about the limitations of the enrichment approach: http://vortex.cs.wayne.edu/papers/Genome_Research_reprint1537.pdf
2c) functional analysis of pathways (iPathway-Guide, ROnto-Tools). Here, the topology of pathway is fully taken into consideration and the impact on the pathway is calculated by propagating the signals through the pathway topology, as well as by considering the number of DEGs. See chapter 28 in here for details: http://www.amazon.com/Statistics-Microarrays-Bioconductor-Mathematical-Computational/dp/1439809755/ref=sr_1_1?ie=UTF8&qid=1422476470&sr=8-1&keywords=sorin++draghici
Techniques that address the last two questions (putative mechanisms and miRNAs) are not well documented in literature. iPathway-Guide (http://www.advaitabio.com/ipathwayguide.html#page=page-1) does offer both. Some of the sample datasets available for it are coming from phenotypes created by knocking down a specific gene. The tool is able to correctly identify the mechanism as a chain of gene expression changes that starts with the knock-out gene. In another sample data set, the phenotype is creating by treating with the mimic of a specific miRNA. Even in this case, the tool is able to correctly identify both the mechanism as well as the specific miRNA. so it's pretty interesting. The tool is commercial but one can do an unlimited number of analyses for free. The results are fully accessible for a few days. There is a charge only for storing the results over a longer period of time.
In case you prefer research software or a DIY implementation here is a review paper discussing over 20 topological pathway analysis approaches: http://journal.frontiersin.org/Article/10.3389/fphys.2013.00278/pdf
52 Recommendations
26th Feb, 2015
Pedro de los Reyes Rodríguez
Max Planck Institute for Plant Breeding Research
Thank you very much Sorin. Excellent information!!
9th Jul, 2021
Frinty Varghese
University of Aberdeen
How to Create list of DEG data for meta-analysis in Metavolcano R?

Similar questions and discussions

Related Publications

Article
Full-text available
BioJupies is a web application that enables the automated creation, storage, and deployment of Jupyter Notebooks containing RNA-seq data analyses. Through an intuitive interface, novice users can rapidly generate tailored reports to analyze and visualize their own raw sequencing files, gene expression tables, or fetch data from >9,000 published stu...
Code
APAlyzer is a toolkit for bioinformatic analysis of alternative polyadenylation (APA) events using RNA sequencing data. Our main approach is comparison of sequencing reads in regions demarcated by high quality polyadenylation sites (PASs) annotated in the PolyA_DB database (http://exon.njms.rutgers.edu/polya_db/v3/) (Wang et al. 2017, 2018). The cu...
Article
Coxsackievirus A16 (CVA16) is one of major pathogens of hand, foot and mouth disease (HFMD) in children. Long non-coding RNAs (IncRNAs) have been implicated in various biological processes, but they have not been associated with CVA16 infection. In this study, we comprehensively characterized the landscape of IncRNAs of normal and CVA16 infected rh...
Got a technical question?
Get high-quality answers from experts.