Question
RNAi high throughput screening analysis.
What is the best and most robust statistical method to analyze 150 genes? Does z-score apply only for more than 500 genes screening?
All Answers (11)
-
go for a library of siRNA of your gene pathway. For validation use qPCR based taqman probes. An array format will provide you convienience.
What are the cell types you are using? -
Well, i am more concern about the statistical method use to analyze the data rather than doing rq pcr. Any suggestion?
-
If with z-scores you have good separation of your positive and negative controls - why not. I think, z-scores can be used also for small sets of genes.
-
You may need to consider multiple testing issue, such as controlling FDR.
-
Falschlehner C, Steinbrink S, Erdmann G, Boutros M. High-throughput RNAi screening to dissect cellular pathways: a how-to guide. Biotechnol J. 2010 Apr;5(4):368-76. PubMed PMID: 20349460.
-
It would help if you provided a few more details. How many different RNAi reagents do you have per gene? Do you have experimental replicates?
-
Concur with Eugen Buehler. 500 genes can be represented by 500 sample RNAi reagents, or 2000, or 5000. Regardless, try use the z-score based on median and MAD (median absolute deviation). This is often called "robust" z-score and a lot more resistant to any rogue outliers (such as pipetting errors).
-
Dear guys, thanks for the advices. Currently I will stick with SSMD (Standardized Standard Mean Deviation) which I found more robust and result is consistent with my hypothesis that PLK1 ( positive ctrl which causing major death to my cell lines) should as well fall as well into "hits" target. By the way, I am using SmartPool sirna from dharmacon and I have triplicates for my experiment.
-
I would not recommend SSMD for dealing with replicates. It assumes that each reagent has a unique experimental variance, and as a result tends to rank highly reagents (in this case pools) that by chance had replicates close to each other but whose median effect is significantly below that of other reagents . Simply taking the median of your replicates would be a better idea, in my opinion. What is your plan for confirmation of selected hits from your small screen? I would recommend that you use multiple (three or four) siRNAs from another vendor to confirm (do not simply use the single siRNAs from the SmartPool, as you have already tested them in the pool and they will not represent independent evidence). Also, you should plan on a relatively low confirmation rate. The performance of a single reagent is not strong evidence that the targeted gene is effecting the assay. Most of your hits (~80%) will turn out to be off-target effects.
-
So you have 150 data points (say, averages of triplicates for 150 smartpool siRNAs). If these 150 genes were not randomly chosen, I'd say any hit selection metrics used for HTS (such as z-score, SSMD, etc) are less useful. You could simply calculate averages of triplicate wells and then run unpaired t-test for each sample against your negative control (=non-targeting). Many folks use 2x changes and p <0.05 or 0.01. Even without any of these, going over 150 rows excel table won't be too bad:-)
-
i have tried using +/- kMAD analysis. how do u determine the cut off point to get hit genes? 1SD? 2SD?