Aix-Marseille Université
Question
Asked 5 March 2014
Does anyone have experience with Multi-Omics Interaction Network (GE + miRNA + Methylation + CNA)?
I determined the sensitivity of 40 breast cancer cell lines to a previously untested chemical compound. For each of the cell lines (before treatment) I have multiple Omics data: Gene Expression, mIRNA expression, Methylation status and Copy Number calls. Using randomForest with the appropriate feature selection technique I came out with 4 distinct signatures that are predictive of the response to the compound.
Gene Expression: 46 genes
Mirna: 13 mirna
Methylation: 20 methylated promoters
Copy Number: 4 genomic regions (20-30 genes in total)
I'm satisfied with the prediction results of each of these signatures on a small test-set, but combining these information into a single multiOmics signature gives better prediction results.
What i see is that there are strong connections between the predictors of the different signatures. In example, part of the 46 genes are targets of some of the mirnas while others are known to be interacting with some of the genes that are part of the methylation signature.
How would you combine and present such kind of data in a single network?
The best web based application I found so far is mirob.interactome.ru for the generation of mirna+genes networks. Is there any good cytoscape plugin, any other web server or something else you find appropriated?
Most recent answer
ingenuity is the best.
you can try ipathway too
Popular answers (1)
Roche
We are also working on integrated biomarkers and these are my thoughts.
First, if you are getting signatures with high correlation this could represent a problem. from the mathematical and biological standpoint. Indeed, the algorithm of construction of an integrated biomarker should take into account first if the factors you are mixing each other are correlated or not. If they are correlated it means that the measure is not independent and you will never know if a given microRNA, gene, SNP or so on is correctly set in the model (in few words most of the integrated model currently available work if the numeric variables are independent and not correlated). In biological terms, you do not know which is the hierarchy of the factors and if some of them are included in the analysis as passive bystanders. This problem is present if you are using the data as they are. If you are making categories from your numeric variables and nodes from the data these problems will be partially solved but at the end you will have another one: are the results reproducible? Is the signature adaptive on my dataset? Which is the reason to cutoff the data in a given way? Are the cutoffs independent? More integration you have, more noise in your nodes you have to accept and larger dataset are required to fit and validate the model.
So, bottom line, there is no a single solution but many. You have to understand which is the model and the price to pay. There is no magic wizard, at least at the today date to resolve a multifactorial equation without the knowledge of all the involved variables and the mathematical relationships among them. You have simply to choose the model, with the awareness which will be the "less wrong".
3 Recommendations
All Answers (9)
I think that there isn't an unique software/plugin able to combine multiple OMCs data in a single network that can be reliable. The best way (in my opinion) is to generate a table containing all informations that you want to display (node1-connector-node2) plus another table containing the attributes, import both into Cytoscape and then alter the visual properties of your network by using attribute data.
1 Recommendation
I think too!
You can compare your results with cbioportal : http://www.cbioportal.org/public-portal/
Karim
PI Industries Ltd
Hi marco,
From the information given by you it is clear that the all mi RNA and methylation signal contributing toward the gene silencing.
Quite complex, no standard solution available. Did you see the publication by Kim et al. doi: 10.1016/j.ymeth.2014.02.003 and their new graph-based framework?
1 Recommendation
You can use pathway studio software
it is interesting software for these purpose
but you should buy license if you need i can collaborate in this regards
University of Southampton
if i understand your problem correctly - you could start with generating an incidence matrix e.g.
G1 G2 G3 ... etc
Set1 T F T
Set2 F T F
Set3 T F F
etc
an incidence matrix can be converted into a bipartite graph - i.e. a network, and then you can use the graph analysis methods to find what you are looking for
Tools: i use R for this type of analysis - using the igraph package which will handle most types of graphs.
hope it helps
1 Recommendation
Roche
We are also working on integrated biomarkers and these are my thoughts.
First, if you are getting signatures with high correlation this could represent a problem. from the mathematical and biological standpoint. Indeed, the algorithm of construction of an integrated biomarker should take into account first if the factors you are mixing each other are correlated or not. If they are correlated it means that the measure is not independent and you will never know if a given microRNA, gene, SNP or so on is correctly set in the model (in few words most of the integrated model currently available work if the numeric variables are independent and not correlated). In biological terms, you do not know which is the hierarchy of the factors and if some of them are included in the analysis as passive bystanders. This problem is present if you are using the data as they are. If you are making categories from your numeric variables and nodes from the data these problems will be partially solved but at the end you will have another one: are the results reproducible? Is the signature adaptive on my dataset? Which is the reason to cutoff the data in a given way? Are the cutoffs independent? More integration you have, more noise in your nodes you have to accept and larger dataset are required to fit and validate the model.
So, bottom line, there is no a single solution but many. You have to understand which is the model and the price to pay. There is no magic wizard, at least at the today date to resolve a multifactorial equation without the knowledge of all the involved variables and the mathematical relationships among them. You have simply to choose the model, with the awareness which will be the "less wrong".
3 Recommendations
West Virginia University
An elegant way to do this is with Ingenuity pathway analysis. You can superimpose the different omics data. If you have an IPA license, they are happy to help you with this.
1 Recommendation
Similar questions and discussions
Recommendations
The individual omic technologies available for analyzing immune responses with genome-wide or single-cell or single-molecule resolution are advancing rapidly, but the extraordinary promise to immune aging studies that represents Systems Biology will only be fulfilled by the integration of the measurements resulting from the integratiom of such expe...
The complete sequencing of the human genome has led to a major change in the way cancer diagnosis and treatment is understood, researched, and approached. In this context, the development of the omic sciences, bioinformatics, and molecular techniques has triggered a revolution in cancer research, having important consequences for the diagnosis, pro...
The clinical sampling of urine is noninvasive and unrestricted, whereby huge volumes can be easily obtained. This makes urine a valuable resource for the diagnoses of diseases. Urinary and renal proteomics have resulted in considerable progress in kidney-based disease diagnosis through biomarker discovery and treatment. This review summarizes the b...