Science topic

Microarray Analysis - Science topic

Explore the latest questions and answers in Microarray Analysis, and find Microarray Analysis experts.
Questions related to Microarray Analysis
  • asked a question related to Microarray Analysis
Question
4 answers
" Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length "
this error was showed me when I was trying to download gset data in R program but its seems there are some problems.
> gset <- getGEO("GSE77182", GSEMatrix =TRUE, AnnotGPL=FALSE, destdir ="data/")
Found 1 file(s)
GSE77182_series_matrix.txt.gz
Using locally cached version: data//GSE77182_series_matrix.txt.gz
Rows: 59899 Columns: 6
-- Column specification ---------------------------------------------
Delimiter: "\t"
chr (1): ID_REF
dbl (5): GSM2045612, GSM2045615, GSM2045616, GSM2045618, GSM2045620
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Using locally cached version of GPL21369 found here:
data//GPL21369.soft
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
Relevant answer
Answer
Simran Venkatraman Premanand A. Jagajjit Sahu thank a galaxy dear friends
  • asked a question related to Microarray Analysis
Question
4 answers
I have a dataset with 149 columns as GSM ID's and first columns as a Gene Name (Screenshot attached). Total 20,000 rows (Genes) are present. How can I analyze the dataset to find the Biological pathways using KEGG or other pathway database.
All GSM ID's are lung cancer micro array expression data.
I know how to do Differential Gene Expression data and pathway analysis but don't know how to analyze this type of datasets.
Also comment if you feel that dataset is not correct or cannot be used to find the pathways or other information is required.
Any help will do great. Thanks in advance
Relevant answer
Answer
Sir can you share a tutorial or workbook or something like that. I can work in R, Python and Linux.
Thanks
  • asked a question related to Microarray Analysis
Question
4 answers
for PCR analysis and western blot, is there a minimum required amount of average expression level of DEGs? For example, more than 200 or 500. I have various answers to that question and I am a little confused.
Relevant answer
Answer
For PCR an expression value of > 200 is sufficient, I think. Here it is not a question of Fold Change (FC): The same FC with an expression value of 10 vs. 20 would be hardly detectable with PCR or other techniques. With expression values of 200 vs. 400, this is surely much better (identical FC of 2).
  • asked a question related to Microarray Analysis
Question
1 answer
I have a microarray data and for analysing that I am using r software. I have a cancer data for each time pt.
I have 5 samples for to
5 samples for t1
4 samples for t2
3 samples for t3
All r disease model...I want to do microarray analysis to find out the differential gene for each time pt. Compare to t0. I want to take t0 as a reference for each time pt.. s
So how can I make contrast matrix for that.
Relevant answer
Answer
What is the software that you are planning to use for the analysis? Depends on that how you write the contrasts would vary. If you plan to use "limma" (https://www.bioconductor.org/packages/devel/bioc/vignettes/limma/inst/doc/usersguide.pdf) read the user manual and it is easy to understand. As long as you have 2 or more replicates you can do differential expression.
Regards,
Shajahan
  • asked a question related to Microarray Analysis
Question
3 answers
I have a question about "network score" of IPA Network analysis. In many papers, the top 5 networks were listed in tables, while in these tables some network scores are high (around 50), but others are low (less than 20). We use the same method for network analyses, and got the impression that we can see tight association between genes when "the network score" is higher than 40. However, we have not found literature discussing the meaningful "network score" (we found one paper described that “the networks are selected if their score is higher than 21”). We would appreciate it if you could let us know information about such a meaningful network score or your impression/experience of the network score (for example, did you see tight association of genes when the network score was less than 20?).
Relevant answer
Answer
Anyone have the actual citation for these? They are dead links all of them!!
Kind Regards
R,
  • asked a question related to Microarray Analysis
Question
1 answer
I tried to find the data of all together from GEO but I couldn't, so what if I got the data of the breast cancer cell lines which are MCF7, MDAMB321 and SKB3R. Then, I got the data of the gene I want to check, which is HK2 and do the microarray analysis through R studio to check the differential gene expression of KH2 among the cell lines.
Relevant answer
Answer
As far as I know, there is no data portal for GEO allowing the simultaneous analysis of multiple microarray datasets fitting particular criteria (such as breast cancer cell line). You will probably have to identify the datasets of interest, download the microarray data, then apply an R package for microarray analysis on each dataset, then analyse the combined results. GEO does offer a web analysis tool for many microarray datasets though (GEO2R) which runs the R code for you.
  • asked a question related to Microarray Analysis
Question
4 answers
Dear fellow Researchers,
I am currently trying to analyze Affymetrix microarray data through dChip software and I have the input files - probe sequence and CDF for Rat 230 2, yet facing issues in obtaining expected results. Could anyone please help me out if gene info file is much necessary (as only CDFinput is mentioned as mandatory as per the protocol I have) and where to obtain them?
Thank you in advance
Relevant answer
Answer
You can search for your question through the following link:
  • asked a question related to Microarray Analysis
Question
1 answer
I would like to perform differential expressed Genes analysis of a NimbleGen data. I have dataset of 48 .pair files and 48 .calls files. 
1) Can I perform DE genes analysis only with these data without using oligo package? ( my data contains single channel only 532 output) 
2) what is the appropriate method for getting differential expressed genes?
3) When I transformed my pair info to xys file by extracting X-Y and signal values, those results are not accurate. The genes that were shown to be DE are not correlated with my experiment conditions.
Please help me 
Thank you in advance
Best Regards
Tunc
Also want to add : Our pair and call file don't have header. That is why, we don't know any thing about the NDF file. We do know that our chip is 100718.hg18 but we don't know the correct GPL file. In the lab method, it was reportad that Nimblegen Human Expression Array 12 x135K chip used.
Relevant answer
Answer
Hi Tunc Morova , I am facing a similar issue right now, analysing the PAIR files. Did you have any luck?
  • asked a question related to Microarray Analysis
Question
3 answers
Does anyone know of a source for microarrays to study tRNA expression? I was told microarrays.com provided these. I have asked them directly but no reply so far.
Relevant answer
Answer
tRNA microarrays were initially developed to assess the aminoacylation status of specific tRNAs. Although it is true that tRNA deep-seq techniques have problems (mainly due to modified bases), microarrays also present serious limitations in terms of detection sensitivity. My advice would be to use one of the several strategies for tRNA library preparation and go for small RNA sequencing.
  • asked a question related to Microarray Analysis
Question
8 answers
On a given microarray design there are multiple different probes spotted for many genes. The (normalized) signals of the features (all referring to the same gene) often are quite different (log2 values can vary between 2 and 16, so essentially from "almost undetectable" to "completely saturated").
If a gene set analysis or an over-representation analysis is performed, there should be one value per gene.
How to select which signal to use for the gene? I don't feel good to take the average of all the multiple features, because they are often so different. Taking the highest signal only also seems to be wrong.
Any ideas?
The attached file shows a table with example data (from an Agilent Microarray) with 5 different probes addressing the gene "PRDM". The last 4 colums show the log signal intensities for 4 different samples. The values range from 3 to 10, so there is a more than 100-fold difference in the signal intensities between the probes.
Relevant answer
Answer
https://www.networkanalyst.ca/. NetworkAnalyst software may help you.
  • asked a question related to Microarray Analysis
Question
10 answers
hello .. 
I'm trying to analysis two different microarray datasets from different chips using web-based tool. 
i don't know how to do that .. should i use one off them only ? 
or should i combine them using some kind of algorithm ? 
thank you 
Relevant answer
Answer
You may use NetworkAnalyst software.
  • asked a question related to Microarray Analysis
Question
5 answers
I have analyzed the dataset of GSE38132 from gene expression omnibus. The data is from cell line breast cancer ZR-75-1 which comprises of 9 conditions with 4 replicates for each condition making it 36 samples. I used limma R package to normalize the data (quantile normalization). I noticed a great change in a group where three replicates shows similar expression where the 4th replicate of the same sample is different from all other three. I confirmed the expression from raw data by cross checking with the probe id and found 1 replicate is different from all other three. As a double validation I checked the normalized sample deposited in NCBI-GEO and I found the same. Is this possible?
Please see the heat map first four are replicates from the same sample
Relevant answer
Answer
It is likely that the authors did not even try to check the sample and hybridization quality. To assess the sample quality you would need to have a look in the lab books or talk with the people who actually did the processing (might not be possible!). To assess the hypridization quality you need to create diagnostic plots of the raw data. I am not too deep into Illumina arrays. Boxplots of signal intensities, possibly startified by probe type (pos/neg controls, regular genes) should be a minimum. I don't know if spatial plots are possible and if unspecific/background signals are considered.
  • asked a question related to Microarray Analysis
Question
3 answers
I have two batches of samples which were collected during two time period. If I perform batch correction to remove batch effect will it affect the downstream analysis of gene expression studies?
Relevant answer
Answer
Batch correction has been described from biologist point of view in this paper: Unbiased data analytics for biomarker discovery in precision medicine.
In brief, better to have some internal controls and use comBat algorithms. If you don't have internal control, there are some algorithm too. See the paper.
  • asked a question related to Microarray Analysis
Question
3 answers
Generally in microarray differential expression analysis studies the lower bound for |logc| is chosen around 1 to make fold change 2 which sounds like a common sense. In other cases, when |logfc| >= 1 gives zero differentally expressed genes, logfc is chosen to get a "reasonable" amount of differentially expressed genes. It stands to reason, that a more rational way of choosing logfc would be to infer it from the microarray platform's accuracy or the quality of the hybridizations in the particular microarray-experiment or some other evidence-based criteria.
How to decide which logfc to choose?
Relevant answer
Answer
The fold change or log-fold change can be used as a measure of effect size in
high throughput experiments including gene expression analysis. However, the statistical significance obtaine from repetitions is another important number. One can used multiple testing adjusted p-values or false discovery rate values and ofte it is then advisable by plotting everything as a volcano plot, where log-fold change of each gene is on the x-axis and the log10 p-value (adjusted) is on the y-axis. I recommend the limma package in R to do such an analysis.
  • asked a question related to Microarray Analysis
Question
16 answers
Hello everyone,
Currently I am trying to do K - mean clustering on microarray dataset which consists of 127 columns and 1000 rows. When I plot the graph, it gives an error like "figure margins too large". Then, I write this in R console:
par("mar") #It will give current dimensions
par(mar=c(1,1,1,1) #Tried to update the dimensions
But; it did not work. So, can anyone suggest me another way of fixing this problem? (Attached the some part of code in below)
Thanks,
Hasan
--------------------------------------------------------------------------------------------------------------
x = as.data.frame(x)
km_out = kmeans(x, 2, nstart = 20)
km_out$cluster
plot(x, col=(km.out$cluster+1), main="K - Means Clustering Results with K=2",xlab"", ylab"", pch=20, cex=2)
>Error in plot.new() : figure margins too large
Relevant answer
Answer
  • asked a question related to Microarray Analysis
Question
11 answers
While examination of the differential expression of non coding RNAs from blood samples or cell cultrues and animal models, how many times should we repeat the microarray analysis experiments ?
Is the repetition of the microarray analysis change according to the examples? For example, we should repeat the experiments at least 3 times on cell culture model to identify non codings expression profiles, is that certain information?
Relevant answer
Answer
I think if a particular study has been done for quite few times and you just want to look for the some of the facts than you can chose few replicates.In addition to this it is important to find the truth first and then consider the numbers.
  • asked a question related to Microarray Analysis
Question
9 answers
I want to generate a heatmap and clustering in R for DEGs of 3 samples (foldchange), some 15000 genes. Everytime I run the command it shows
" Error in heatmap(data_matrix) :'x' must be a numeric matrix"
I have searched web and tried multiple commands but every time it gives the same error.
Relevant answer
Answer
Dear Humaira
Did You try to add "header=T" to Your first command (file upload)?
But in general, as Jan Lisec said, carefully inspect Your data on "NAs" instead of gene expressing values. If You have troubles to do this in R, You can do it in Excel too.
  • asked a question related to Microarray Analysis
Question
6 answers
Hi all,
I am running a Random Forest –Mean Decrease in Accuracy algorithm for feature selection on my Microarray data in order to use the selected genes as a classifier to discriminate between 2 classes of cell lines. I am having problems to interpret the output information given by the algorithm. It gives me a small list of selected genes and for each gene there is a Pearson correlation value, a fold change value and a q-value (False Discovery Rate) .
The variable “class” is discrete (normal vs disease), so what does the Pearson correlation mean in this case?
Should I take the q-value showed as a multiple test correction and give less importance, or exclude, the genes that showed a q-value (FDR) higher than 0.2 (or any other pre-determined cut-off for significance)?
I would appreciate any suggestion on how to interpret the results of a RF-MDA for feature selection algorithm.
Relevant answer
Answer
Hello Priscila,
Since you mentioned about using a GUI and not having a manual, you may try contacting the developers for the best explanation (possibly may have a look at the source code), to be exactly sure what the pearson coeffs and q values related to. In the mean time, you may have a look at the following studies where they have explained the importance of various correlation factors in RF-MDA:
Thanks
  • asked a question related to Microarray Analysis
Question
3 answers
I'm looking to take an Arabidopsis RNA-Seq differentially expressed gene set and search it against other publicly available RNA-Seq (and possibly microarray) experiments to find the experiments that found the most similar patterns of deferentially expressed genes.
Does anyone know if a tool that enables this has already been created?
Relevant answer
Answer
You can check out following two resources for gene expression analysis:
Another option is, you can browse datasets of Arabidopsis on GEO (https://www.ncbi.nlm.nih.gov/geo/). It host most publicly available datasets. It predominantly stores raw files from different transcriptome experiments. Once you have listed down GEO IDs of your interest, you can yourself analyze these datasets with click of button using portal GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/). It hardly takes minute and you can store result of each analysis you do. You need not have computational background to execute this program. Its as simple as plug n play.
I hope it helps !!
All the best.
  • asked a question related to Microarray Analysis
Question
3 answers
Hello,
I am trying to do normalization the data of GSE8397 with MAS5.0 by using R:
setwd("D:/justforR/GSE8397")
biocLite()
library(affy)
affy.data = ReadAffy()
However, the data used to 2 platforms: Affymetrix Human Genome U133A and Affymetrix Human Genome U133B Array.
The code gave me the warning message: "Error in affyio::read_abatch(filenames, rm.mask, rm.outliers, rm.extra, :
Cel file D:/justforR/GSE8397/GSM208669.cel does not seem to be of HG-U133A type"
So, how can I keep normalizing the data when they are in both U133A and B? Should I try another method of normalization (RMA or GCRMA?)
DO you have any ideas about this problem?
Thank you so much!
Relevant answer
Answer
Hi Phung,
I guess it's impossible to analyze these two types of affymetrix arrays by simple command lines. in fact U133A and U133B share only 168 probes among the 22k sondes in both designs. take a look at this post (https://www.biostars.org/p/283639/). in fact both designs are sold to be used as complementary and can't be compared.
fred
  • asked a question related to Microarray Analysis
Question
4 answers
As example i have download .cel file now how can i get the data regarding upregulation and downregulation? and what is the principal behind this values?
Relevant answer
Answer
Its a while since I used this. If you refer to micro array data, here is my script for HuGene-1_0-st-v1 arrays (easily modifiable to other versions). Im using R. Just download all .CEL files to a folder, and set this folder as your working folder.
library("oligo")
library("pd.hugene.1.1.st.v1")
celFiles <- list.celfiles()
affyRaw <- read.celfiles(celFiles)
eset <- rma(affyRaw, background=TRUE, normalize=TRUE, subset=NULL, target="core")
library(annotate)
library(hugene11sttranscriptcluster.db)
annodb <- "hugene11sttranscriptcluster.db"
ID <- featureNames(eset)
Symbol <- as.character(lookUp(ID, annodb, "SYMBOL"))
Name <- as.character(lookUp(ID, annodb, "GENENAME"))
Entrez <- as.character(lookUp(ID, annodb, "ENTREZID"))
theProbes <- exprs(eset)
df <- cbind.data.frame(theProbes, ID, Symbol, Name, Entrez )
write.table(df, "something.csv", sep=";" ) # Nice to have a spreadsheet of expression values
Now you can use LIMMA.
pData <- Here you have to make a .CSV file with phenotypic data, which is described with the dataset.
library(limma)
affy <- read.table(file="something.csv", sep=";", header=T, row.names=1 )
library("Biobase")
eset <- ExpressionSet(assayData=affy)
Make the design as you which, E.g:
design <- model.matrix(~ timepoint + ID)
fit <- lmFit(eset, design)
fit <- eBayes(fit)
topTable(fit, coef="something", adjust="BH", n=Inf, confint=T)
  • asked a question related to Microarray Analysis
Question
3 answers
I am trying to perform miRNA microarray analysis for a human cell line. Though the samples have good A260/A280 ratio (2 and above) and good RIN (most above 8) the A260/A230 ratio is low in many of the samples (as low as 0.3 in some samples).
So the question is, is it wise to proceed with these samples? can the A260/A230 ratio affect the quality of the microarray (I will use Agilent microarray chip)?
PS: The analysis will be done with a company and not in-house. Qiagen miRNeasy kit was used to extract the RNA, a DNASE treatment step was included.
Relevant answer
Answer
Dear Sherif
There is no need to bother about A260/A230 ratio when you have a good RIN value. If your miRNA purity and concentration is good you can proceed for sequencing. I have got a good miRNA microarray result even from FFPE tissue which got poor RIN value. All the best.
  • asked a question related to Microarray Analysis
Question
15 answers
I am doing differential gene expression analysis between control and disease tissue. It is a skin disorder. So the normal tissue and lesion tissue are taken from same persons in the microarray analysis. the normal tissue is considered as control and the lesion is considered as disease. My doubt is to whether to use paired or unpaired t-test. I think that since the normal and disease tissue are from the same person the samples have pairing relationship and should go for paired test. Am i right ? Or should I go for unpaired t test ?
Relevant answer
Answer
due to the high cost of gene expression chips, it is often with limited number of samples to compare across groups. That's why bootstrap is usually used for gene expression comparison.
  • asked a question related to Microarray Analysis
Question
5 answers
I am planning to extract the whole serum protein from serum samples for further microarray analysis. what is your recommendations please. a good protocol is highly appreciated.
Thanks in advance
Relevant answer
Answer
Dear Pranita Kamble Waghmare,
Thank you so much for your answer. It is really helpful.
  • asked a question related to Microarray Analysis
Question
13 answers
Hi there
I would like to know and to get your feedback about you favourite Gene-enrichment analysis software based in a graphical environment, preferably on-line.
Please give me your feedback about what you like most !... Thanks
All the best
Paco
Relevant answer
Answer
Dear Francisco,
For quick GO enrichment in multiple gene lists at once, I use g:Cocoa from g:Profiler:
Its main problem is that usually the analysis goes too deep into the GO annotation and the results might be hard to "digest".
G:Cocoa also allows other enrichment analyses such as KEGG pathways.
For broad functions enrichment I use GOTermMapper:
Another useful online tool for GO enrichment is Metascape:
To analyse protein interactions / genes network from a list of genes I use Genemania:
Best,
Gautier
  • asked a question related to Microarray Analysis
Question
5 answers
Can anyone suggest me collection numeric variables for deep learning? I have set of features (DEGs with their fold change value) from micro array. I want to prepare training as well as test set for deep learning. For this atleast three numeric variables are needed, any suggestion?
Thanks in advance
Relevant answer
Answer
Machine learning works on a simple rule – if you put garbage in, you will only get garbage to come out. By garbage here, I mean noise in data.
This becomes even more important when the number of features are very large. You need not use every feature at your disposal for creating an algorithm. You can assist your algorithm by feeding in only those features that are really important. I have myself witnessed feature subsets giving better results than complete set of feature for the same algorithm. Or as Rohan Rao puts it – “Sometimes, less is better!”
Not only in the competitions but this can be very useful in industrial applications as well. You not only reduce the training time and the evaluation time, you also have less things to worry about!
Top reasons to use feature selection are:
  • It enables the machine learning algorithm to train faster.
  • It reduces the complexity of a model and makes it easier to interpret.
  • It improves the accuracy of a model if the right subset is chosen.
  • It reduces overfitting.
  • The following methods can be used for feature selection:
  • a. Filter methods (like LDA, ANOVA, Chi-square, pearson correlation)
  • b. Wrapper methods (forward selection, backward selection, recursive feature elimination methodsp)
  • asked a question related to Microarray Analysis
Question
16 answers
Happy Sunday everyone,
I am trying to calculate row - wise mean and variance in R and then I will sort them. I used to "Absent/Present" calls from the Affymetrix algorithm to flag genes with questionable expression levels, but there are many NAs in the dataframe. So, I have to remove those genes which have questionable expression levels (NA's) and do mean - variance calculations. What I did is that;
library(data.table)
dat <- as.data.table(df)
rowvar <- function(x, na.rm =F) rowSums((x - rowMeans(x, na.rm=T))^2, na.rm=T)/(rowSums(!is.na(x)) - 1)
dat[,`:=` (variance = rowvar(.SD, na.rm = T), mean = rowMeans(.SD, na.rm = T))]
But; it gives an error like, "Error in rowMeans(x, na.rm = T) : 'x' must be numeric". So, how I can handle with this error?
I have attached the document that I am currently working on it.
Thank you for your interest.
Hasan,
Relevant answer
Answer
@Hasan, I also recommend stackoverflow.com for all your R questions, it is by far the best and most up to date forum.
  • asked a question related to Microarray Analysis
Question
29 answers
Hello everyone,
I have a Excel spreadsheet which contains 17 columns, 54,675 rows and I need to calculate each of 54,675 row's mean and variance in R Studio. After that, I have to add each rows' mean and variance as a new column in Excel spreadsheet. So, how I can deal with this issues? I suppose apply() function works but somehow I could not do it. Any suggestions?
Thanks,
Hasan
Relevant answer
Answer
If you need it in Excel, why don't you do it in Excel?
If R, you get the row means with rowMeans(). To get the variances you will have to apply() the function var() to the rows. Here is an example code, assuming that the data is in a 54675x17 data.frame or matrix "df":
rm <- rowMeans(df, na.rm=TRUE)
rv <- apply(df, MARGIN=1, FUN=var, na.rm=TRUE)
Both, rm and rv, will be numerical vectors of length 54675. You can save them as a csv and import it in Excel. You can also add them as new columns to df (df <- cbind(df, "Mean"=rm, "Variance"=rv)) and save the entire df object. If you use the package openxlsx, you can use write.xlsx(df, file="new filename.xlsx") to save it as an xlsx file.
  • asked a question related to Microarray Analysis
Question
4 answers
I want to draw a trend of the expression of genes which i have and for doing so i want to know whether the samples in GEO series matrix (I mean GSMxxxxxx) are comparable with each other (sth like normalization and these processes happened to them) or not? and also whats the meaning of this caution in GEO that i attached to my question?
thanks
Relevant answer
Answer
Not directly. The might be comparable to a certain degree after a common normalization, but even then it is important to consider the experimental differences (cell types/extraction, passage numbers, RNA isolation, amplification and labelling procedures). It is very dangerous to interpret differences between groups that are taken from different experiments/series, as there might be a serieous confounding of experimental conditions and the groups.
  • asked a question related to Microarray Analysis
Question
3 answers
Hello everyone,
I have to analyse data from Affymetrix microarray (Human Genome U133 Plus 2.0 Array) with Bioconductor and it is the first time I am using Bioconductor. I got .cel files from NCBI GEO but I could not get the chip description file. So, how I can obtain a CDF?
And one more thing, when I check number of genes in my dataset, the R program shows that it contains 54,675 genes. However; this number should be between 20,000 - 25,000. So, I am wondering that there might be any replica of them?
Any suggestions and someone can help please?
Thanks,
Hasan
Relevant answer
Answer
Thank you for your answer and time Dr. Joachim. The link that you send helped me a lot and as you said the number 54675 probe set is true as well. Thanks again.
  • asked a question related to Microarray Analysis
Question
2 answers
Am trying to design a probe for microarray analysis. For the control genes i prefer constitutively expressed House-keeping genes. If anyone could guide me with designing a probe and primer for a particular gene say for example 23s rRNA it would be helpful. manuscripts state the sequence and corresponding primer pairs but no description about the methodology involved in designing. Can anyone could help me with this?
Relevant answer
Answer
Thank you Marcus for your suggestion.
  • asked a question related to Microarray Analysis
Question
7 answers
Dear all, I am totally new for RNA-seq data analysis. Here is my dataset background. There are 3 replicates for Normalized RNA-seq data in 2 conditions. I first want to check How is gene expression profile differences from 2 conditions. So I combine 3 replicates (using mean across 3 samples Q1: Is this correct way to do so?)  and check MA plot (it looks fine). However, when I check the MA plots for each sample, I see clearly two clusters of gene expression levels. I am wondering is any expertise can explain me this? Is that because of experiments issue or nature of data? Thanks a lot in advance!    
Relevant answer
Answer
Hi Lin,
First of all please keep in mind that the MA (ratio intensity) plot is meant to compare two or two group of samples. It concludes how different your samples are in terms of signal intensities (in microarray) or read counts (in RNAseq experiments). If you try to plot sample A versus sample A (same sample) you will notice the data points converge to zero at Y-axis because log (A/A) is zero. Having said that I would recommend you to https://en.wikipedia.org/wiki/MA_plot for basics.
Coming back to your plots it look like that the data is not correct or not normalized correctly (as other people pointed out above). If it is the RNAseq experiment how come the average readcount (x-axes in all your plots) be negative??
I would suggest first normalize or pre-process your data correctly and use any of the bioconductor tools (EdgeR or DESeq2) .
Best
  • asked a question related to Microarray Analysis
Question
6 answers
Hello.
I have been working on a transcriptomics data from NCBI's GEO for a while now. However, I have recently been made aware of this phenomenon [see image attached]:
- When plotting the probes (y-axis) against the subjects (x-axis), the heatmap generated shows a very large area (around 10,000 probes) with an intensity lower than their vicinity, both for control and for diseases individuals;
- This effect is also (more visible) on the other image, that shows that for around 10,000 probes (x-axis), the intensity (y-axis) is lower than the average intensity.
My question is: do you have any idea what this could be due to? Is this "area" of lower intensity a common thing in microarray analysis? Should I exclude this area from the analysis? 
Thank you in advance
Relevant answer
Answer
It seems you did not scale your data when you plotted the heatmap, and you did not use hierarchical clustering to show the relationship between your samples. So your heatmap is full of low absolute expression probes, but not the relative expression after scaling.
The low expression of microarray data is not common, but for RNA-seq is very often. Your data should be microarray data, so the best way is to obtain CEL file or other raw data from the author to repeat the normalization process by yourself.
  • asked a question related to Microarray Analysis
Question
5 answers
I am working on some of the microarray data of some of the genes I am intersted in. I wanted to know the expression levels of some of the genes in different cell types. People have deposited data in duplicates or triplicates for every particular cell types and all the microarray experiment was done in the same kind of microarray chip. Now I have to get a mean value for every cell types and compare it with other cell types. Till now I have normalized the data using gcrma package. Now I think I have to normalize the values with any house keeping gene values and proceed furthur. I am not sure how to proceed. I need help. Please guide me through this.
I also have to find the gene coexpression values for all the gene pairs. I calculated the pearson correlation coefficient from the normalized values from all the data sets. Is that okay or I have to calculate PCC only after normalizing the values by any house keeping gene value. Please help me. Thank you
Relevant answer
Answer
COMBAT is a useful tool to correct batch.Or simple you can just adjust it in your model without COMBAT.
  • asked a question related to Microarray Analysis
Question
9 answers
I need to compare a gene's expression between tumor site and matched normal tissue from TCGA database. I've tried using Firehose to search differential expression of the gene among different types of cancers. The problem is that the amount of tumor samples is not equal to the amount of normal samples. But I need to compare matched tumors and normal tissues.  Is there any tools to do that? 
Relevant answer
Answer
Try xena.ucsc. You will fall in love with it.
  • asked a question related to Microarray Analysis
Question
12 answers
I want to convert expression level value to z-score (mean-x/sd). I have two type of samples in my microarray (Affymetrix GeneChip Human Genome U133 Plus 2.0) (31 normal vs 30 case) Do I have to calculate the mean and sd for the Normal samples only and use z-score formula then do it for case samples or I have to find the mean and sd for the whole samples ?
Relevant answer
Answer
mean and sd are calcuated using all 61 samples.
Make sure you use log(signals) to calculate the z-scores.
Note that z-scores are usually calculated to show co-regulation patterns or to identify samples with differing (outlying) profiles in heatmaps. If you have just want to identify the genes with the best statistical evidence for differential expression between the groups, then you may simply run t-tests (again, using the log(signal)-values).
  • asked a question related to Microarray Analysis
Question
4 answers
From microarray  or RNA seq expression data, for valadating the data do we need to select the genes randomly or we can choose what matters to us?
Relevant answer
Answer
Depends on what your aim is, If you really want to validate the overall screening result, you should select genes at random.
If you just want to confirm that those genes you aim to continue to work on are regulated as estimated from the screening, you should better use these genes. But this is still only a technical validation. If there are not too many interesting candidate genes and if possible, you should go for a biological validation using appropriate experimental assays to determine the biological role of the regulation (knock-in, knock-out, inhibitors, enhancers).
  • asked a question related to Microarray Analysis
Question
2 answers
Analysis of data
Relevant answer
Answer
Thanks mam. I have received data from DMET analysis. How will I present the data in various representations? What all software and applications will help in the same?
  • asked a question related to Microarray Analysis
Question
4 answers
While working with gcrma I found that the package ‘hgu95av2cdf’ is not available (for R version 3.4.0). 
So I would like to know a stable version of R for which all packages from Bioconductor are available 
Relevant answer
Answer
The current release of Bioconductor is version 3.5; it works with R version 3.4.0. Users of older R andBioconductor users must update their installation to take advantage of new features and to access packages that have been added to Bioconductor since the last release..
regards,
Milan
  • asked a question related to Microarray Analysis
Question
12 answers
Hi experts,
Since RNA-seq with NGS technology is changing gene expression studies with great advantages. We still observe a lot of studies using microarray (i.e. Affymetrix Gene Atlas, etc.) techniques and even qPCR (to a certain extent).
I personally believe and biased towards NGS technology and RNA-sequencing for gene expression studies. Not only that, RNA-seq has the ability to discover novel gene transcripts to open a potential new field of study.
However, RNA-seq can be costly, but I personally believe in the end, it's better than microarray. So in what instances can I say that microarray is better than RNA-seq? I am working with primary cells, cell lines, and mouse as my animal model for brain-related studies.
I am looking forward to hearing your opinion.
Relevant answer
Answer
RNA-Seq is a powerful tool if you're trying to detect novel transcripts/splice forms, go on an unbiased "fishing trip" for genes/biomarkers, detect extremely rare transcripts, or look at changes in transcript abundance that occur over a very wide dynamic range. 
However, if you're interested in studying the expression of a known panel of transcripts and none of them are expressed at an extremely high or low level, a well-designed microarray will work just as well and cost less. Microarrays can be used for most of the same experiments as targeted PCR primer sets for RNA-Seq.
  • asked a question related to Microarray Analysis
Question
4 answers
Hi!
Does anybody know a programme/software/website to perform HeatMaps without using the R language??
I have a set of 3099 genes up regulated and a set of 2686 genes down regulated under my unique experimental condition and I would like to compare them.
Thanks a lot!
Relevant answer
Answer
Hi,
I never use before, but I see good heatmaps from this website:
Furthermore, you can use excel to do heatmaps, I think taht you need to asing a color grade to a value range:
Good Luck!!
I agree with Thomas, lear R is a great tool to do bioinformatics analysis and figures.
  • asked a question related to Microarray Analysis
Question
1 answer
So, basically there are abundance values for my first 3 sets of experiment that have variable control values for all peptides within each protein. How do I make use of this data to get significant peptides or calculate the fold change statistically? I was wondering if there is a way to do this without control abundance data. Also, should I use normalisation techniques, and which one?
Relevant answer
Answer
Dear Abhijeet KISHANPAL Mavi,
I will recommend that you use Perseus for your statistics. Perseus is a fairly userfriendly "click" software to do the kind of statistics I understand you are planning to do.
The program can be downloaded from the link below:
The developing group each year hosts an excellent summer school, which I really can recommend. However, if attending the summer school is not an option they also publish all the lectures on YouTube. Try and search for e.g. "Perseus summer school iTRAQ" for some instructions.
I hope this helps and good luck with your research.
  • asked a question related to Microarray Analysis
Question
5 answers
I am new for RNA-seq analysis. I have normalized data rather than raw count for RNA-seq, and i want do the differential expression study (negative binomial model). Is anyone can recommend one R package to handle this kind of data? As I study from DESeq, it is only accept raw count data. Any answer is appreciated!
Relevant answer
Answer
Limma
  • asked a question related to Microarray Analysis
Question
10 answers
Hello everyone. I have normalized reads of RNA-seq data and I am trying to generate a venn diagram of upregulated and downregulated genes. I have three replicates each of control and test samples. I tried to search online but couldn't decide which tools would be better to use. Can anyone please suggest me any windows based offline/online tools to generate venn diagrams from RNA-seq data? Thank you very much.
Raghu.
Relevant answer
Answer
Hello Raghuram Sir,
As I understand your question, you have read counts of the control and treatment samples but not the list of differentially expressed genes. Therefore, first, you have to perform the differential expression which can be done using Cuffdiff, DESeq or any other program. However, these programs are Linux based. I have no idea about any windows based program for differential expression studied. After performing the differential expression analysis you will get the list of genes upregulated/downregulated in the treatment vs control with the level of significance in term of P and Q values. Now with this list you can make venn diagram using the program 'Venny' . This Venny program is online tool and very simple.
All the Best
  • asked a question related to Microarray Analysis
Question
2 answers
I have used the MultiNA to quantify RNA for the first time.
Could someone help me interpret the output results? Does it have an equivalent number to RIN?
Can I trust the "Total conc" readout?
Many thanks
Relevant answer
I think you can trust the "Total conc" readout.
  • asked a question related to Microarray Analysis
Question
3 answers
Our GeneSpring user license has expired so I am investigating whether there is an appropriate online open source application I can use to analyse microarray data.
Relevant answer
Answer
R is a good option for open source analysis tools of a wide variety of data. If you were using GeneSpring I'm assuming you may  want to analyze Agilent array data. There are specific packages for different Agilent arrays, such as https://www.bioconductor.org/packages/release/bioc/html/agilp.html, https://www.bioconductor.org/packages/3.5/bioc/html/AgiMicroRna.html and https://www.bioconductor.org/packages/3.5/bioc/html/LVSmiRNA.html to name a few.
  • asked a question related to Microarray Analysis
Question
4 answers
Hi,
I have 33 ligands in total, which were analyzed through SAM. Reported in an article entitled "Analysis of the major patterns of B cell gene expression changes in response to short-term stimulation with 33 single ligands". I selected 10 ligands from above data and wants additional analysis but they didn't provide the RAW data/CEL, I downloaded the Processed data from "ArrayExpress". I reviewed the limma tutorial and want to make sure the downloaded data file for limma. I need a starting point for analysis through limma, I attached one of processed data file as an example, Can I use processed data files as an input for limma and which type of analysis will be performed? I will be waiting for your valuable answers.
Thank you,
Relevant answer
Answer
I have some sample code here for a paper, where the data is downloaded from the GEO database and analysed using Limma. You can modify some sections, and use the Bioconductor Array Express package instead of Geoquery. 
Hope it helps. Good luck.
  • asked a question related to Microarray Analysis
Question
7 answers
I analysed PPI network after integrated gene expression data from alzheimer's disease experiment within PPI network and reveals some sub network. First, I used (limma package) for Differentially Expressed Gene analysis. Second, I mapped DEG genes on the PPI network and assign the gene fold change value to corresponding proteins. Third, I search the network by selected my candidate gene and reveals sub-networks. I scored them by my formula, then I merge the top scoring sub networks.
Now, I want to validate my results (merged sub network) and I have no idea how to do.
Could anyone help me or suggested a method to validate my outcome please? I will highly appreciated
Relevant answer
Answer
we may perform Gene Ontology enrichment analysis for specific outcome in that sub-network  and correlate to the computational prediction. but before doing that we could have some clue of evidence about that specific PPI in databases like String, then we could do comparative analysis.
  • asked a question related to Microarray Analysis
Question
5 answers
Dear colleagues, I have Affymetrix microarray data, from endothelial cells, co-cultured with mononuclear cells in conditions of normoxia, hyperoxia and hypoxia. Control cultures of endothelial cells are also cultured (alone without mononuclear cells) in these same conditions. The affymetrix microarray  data have been processed with the Expression Console(Gene level >> extended:RMA-Sketch) and filtered.  I wish to use excel to elucidate differentially expressed genes.
Please, what steps do I need to take to  proceed in the elucidation of  these differentially expressed gene using excel? I am new to high-throughput data analysis.
Thank you in advance for your response.
Relevant answer
Answer
these steps are helpful for exporting the excel file
you can just input that excel file to this jar tool and it will do that query
  • asked a question related to Microarray Analysis
Question
5 answers
Hi all,
Performing RNA-Seq data sets needs to know which the most accurate and reliable platform to go with. Could you suggest such pipeline?
Note// I have good experience with the Tuxedo package (Bowtie, Top Hat, and CummeRbund) in addition to EdgeR, 
Thanks
Relevant answer
Answer
This paper should answer most of your questions:
Anders, S., McCarthy, D., Chen, Y., Okoniewski, M., Smyth, G., Huber, W., & Robinson, M. (2013). Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nature Protocols, 8(9), 1765–1786. http://doi.org/10.1038/nprot.2013.099
If you want to look at some example scripts, you can see them in my github repository for a recent data set we analysed. https://github.com/uhkniazi/BRC_Organoid_Joana
Once you get the count matrix, you have various options for performing any sort of differential expression of genes - EdgeR, DESeq2 etc. 
  • asked a question related to Microarray Analysis
Question
3 answers
I have access to 3 experiments from GEO. The sample type for one experiment is blood and other two types is skin. I have the RPKM values of control and patient samples from these tissues. The platform for all these experiments is same.
How to proceed with meta-analysis for these experiments? There are very few papers regarding the protocol. It will be a great help if I can get any pipeline.
  • asked a question related to Microarray Analysis
Question
4 answers
I am doing differential expression studies using iTRAQ. I have problems with identifying the fold change / fold enrichment on the downregulated iTRAQ ratios. For example, iTRAQ ratio for 117:114 shows 3.256, which means that it shows upregulation of 3 fold change, but how about downregulated ratios since it shows value less than 1, for example 0.2679. Is it possible for us to calculate how many fold change from the iTRAQ ratio with PVal (ratio) given? I am using ProteinPilot Software.
Thanks in advance! 
Relevant answer
Answer
Hi Yee, 
Basically, using iTRAQ you can measure the absolute and relative quantitative ratios of peptides and proteins. 
Maybe you should take a look this paper. You can find more detailed answers of your questions. 
  • asked a question related to Microarray Analysis
Question
9 answers
While finding the differently expressed genes from the microarray data, which are the necessary parameters that we have to taken into account for a more satisfying result? Which are the intervals(maximum value and minimum value) can be set for FDR, fold change etc. in accordance with log2 normalized p-value.
Relevant answer
Answer
Unfortunately, there is no good general answer to this.
There are genes for which a slight regulation is biologically relevant, and others that can be considerable regulated without much biological impact.
The p-values and whatever you get from an FDR-based selection depends not only on your selected cut-off but also on the sample size.
A strategy to select "candidates" could involve two steps:
1) select the top 50 genes with largest abs. LFC and also the top 50 genes with the lowest p-values. Some may overlap, so you get a list of at most 100 genes.
2) go through this list and make a subselection based on your biological understanding of the genes in the experimental context.
Then you should have a list that should allow you to get an idea what experiments to plan next.
  • asked a question related to Microarray Analysis
Question
1 answer
Hello.
I tried to perform meta-analysis of differential gene expression data  using GEO.
A-madman program looks like fancy. However, it is not working in the process.
The error occurred when I perform click analyze after grouping on Basket tap.
Any one help this program or recommend another program or R-package?
Thanks in advance
  • asked a question related to Microarray Analysis
Question
5 answers
Dear All, 
I am trying to see which CpG sites (with its associated genes) are involved in particular pathways and diseases, and get an overview of the functions of these genes. 
Currently, I have tried to import my dataset (>800k CpG sites total) which shows the following: 1) each CpG site as the ID, 2) p-value, 3) q-value, 4) fold change and 5) difference. My data sets are quite large with >200,000 CpG sites (the row limit of IPA) -  is there a way to import a file this large? 
I have also tried importing a file with more specific CpG sites of around 1000 CpG sites but it is not being mapped properly by IPA as I have 0 mapped sites due to errors or possibly I am using the wrong template (i.e. not expression data)? 
I think the errors are coming from my formatting in my excel file to IPA, where either the headings are incorrect and the way I am assigning each header/observation is incorrect i.e. I think I set my Identifier as Illumina (which is what I used to get my CpG methylation data), but I do not know what other options I can choose instead of this. IPA also showed errors first with 'no IDs matched to particular genes',and then with 'removing fold change between 1 and -1'. 
In summary, I would really appreciate any tips/guidance with uploading CpG methylation data into IPA. 
Thank you very much.
Relevant answer
Answer
Thank you very much Mr. Kamstra and Dr. Muley! 
I am looking at human samples from infant cord blood lymphocytes. 
So far, I have a smaller list of CpG sites that are relevant based on p-values and fold-change (as what Dr.Muley suggested). 
Mr. Kamstra, I have associated CpG sites with genes using GenomeStudio, but would I have to use biomart to convert these gene names to ensemble IDs or entrez IDs? 
Thank you for your help!
  • asked a question related to Microarray Analysis
Question
3 answers
I am trying to find out expression profile of my candidate genes from RNAseq or CAGE data from cancers using publicly available RNA seq data
I prefer any online search tools at this stage for a quick analysis.
  • asked a question related to Microarray Analysis
Question
15 answers
I have some genes with their FPKM values now i want to convert this value in to log2 fold change. 
Relevant answer
Hello Tinku,
First, you have to divide the FPKM of the second value (of the second group) on the FPKM of the first value to get the Fold Change (FC). then, put the equation in Excel =Log(FC, 2) to get the log2 fold change value from FPKM value. 
  • asked a question related to Microarray Analysis
Question
11 answers
Hi Everyone,
I'm using microarray data to identify DEGs and map its PPI network but now I want to use multiple datasets reported by different studies in Acute Myeloide Leukemia (AML). Please specify a good methodology step by step and also please specifically I can Merge different datasets. Please also need some info regarding the requirements for merging.
Thanking you in Advance.
Relevant answer
Answer
Batch correction only works when the experimental conditions are (nearly) evely distributed over the batches. Otherwise, batch correction models will make things even worst (see link).
This is a severe problem when selecting data from different experiments, where many experimental issues are 100% confounded with the batches/experiments. If additionally biological groups are counfounded with experiments (like: I take group A from GSEx and group B from GSEy and compare A against B), the result is almost completely arbitrary (usually, results will already cluster nicely within experiments/batches, indicating considerable (but artificial!) differences in the expression profiles between the groups, but after batch-correction, these differences will be even exaggerated.
Further, if there are many different batches and relatively few samples per batch, a correction with combat will lead to an overestimate of the residual degrees of freedom for tests for DE (what might be corrected manually, but ignorance would be a bad guide). This prblem is also discussed in the attached link.
  • asked a question related to Microarray Analysis
Question
6 answers
I heard about, scanned with microarrayed slide like show image below, black background+fluorescence dots. But what I saw is all-of-white slide with black frosted ends(conatantly black, below). I have no photos on my PC, but I saw white slide+black end.   I don't know about why scanned photo is only block&white. plz give your opinion..
Relevant answer
Answer
@Björn Abendroth typhoon 9410
  • asked a question related to Microarray Analysis
Question
6 answers
Hello guys!
We have several transcriptome data sets, which came from the samples that were treated at low temperature for different time length, let’s say at 4 ℃ for 1, 3, 5, 7 hours. After analyzing those data we have got deferentially expressed genes (DEGs) for each treatment time point. For example, when sample treated with4 ℃ for 1 hour, we got 2000 up-regulated and 3000 down-regulated genes; For 3 hours, 1500 up and 2500 down-regulated genes; 5 hours …; 7 hours ….
My question is how I can analyze these DEGs further to get certain portion of genes which are really crucial at low temperature in this sample?
And is there tools to this work?
By the way, my data sets come from RNA-seq, and the fold-change value of each unigene at different treatment time point is calculated with DEseq.
Thanks in advance.
Relevant answer
Answer
thank you @ Audrey.  
  • asked a question related to Microarray Analysis
Question
2 answers
well... I ruined my microarray... So, I want to ask something for you.
Is that OK I store my buffers in RT?
Pre hybridization buffer-5X SSC/0.1% SDS/1% BSA
Hybridization buffer-5X SSC/0.1% SDS/50% Formamide
Low stringencity wash buffer-1X SSC/0.2% SDS
High stringencity wash buffer-0.1X SSC/0.2% SDS
0.1X SSC
50% DMSO
this is my buffers. I store these buffers at 4 celcius now, because BSA is store at 4 degree. but some solute cannot solve at 4 degree.
Relevant answer
Answer
Thx :)
BSA and SDS salted out when it storaged at 4 degree, but it's completely solved at RT.
  • asked a question related to Microarray Analysis
Question
2 answers
Currently differential gene expression identification usually using RPKM, TMP or TMM, however the sequencing depth is controlled by people and all the quantification are relative. To compare between samples, some methods use the distribution based normalization, like DESeq2 and edgeR. The problem is that these methods are not that correct too. While we sequence a low expression samples with high depth and a high expression sample shallow, all these methods seems can not detect the true difference.  One of the idea is that if there are a group of universal genes with unchanged expression level, these genes should be taken as the baseline to perform  normalization and compare between samples. 
I have noticed about one paper using this idea to normalize the gene expression of plant tissue, they established the stable expression database. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5178351/.
But for prokaryotic microorganisms, it seems that there does not have any stable expressed gene set yet. 
Any comments will be appreciated.
Relevant answer
Answer
Hi Xiao-Tao,
DESeq2 and edgeR, indeed, have a problem when global changes come into play (e.g. if cells produce different overall amounts of RNA under different conditions - a fairly normal situation among prokaryotes), because they actually assume that the majority of genes DO NOT change. As you said, there are apparently no genes that keep their expression levels under all possible conditions in all possible genetic backgrounds, even ribosomal RNAs vary hugely between different growth phases.
I can see to possible solutions here. 1) Use spike-in RNA. When you know from how many cells you isolate RNA, the spike-in will always permit you to estimate the actual abundances of transcripts independently of the depth of sequencing. 2) Use RNA-seq in conjunction with other quantitative techniques (e.g. northern blotting or RT-qPCR) applied to the same samples. Again, if you know from how many cells you isolate RNA and you deposit on the gel accordingly (e.g. 1/10 of the amount you isolated, whatever it be), by probing for several RNAs you will get a precise idea about how the abundance of these RNAs in one sample relates to that in another. Like this you will derive normalisation factors and will be able to account for the differential sequencing depth.
  • asked a question related to Microarray Analysis
Question
13 answers
Hi all,
I want to analyse an RNA-seq data set from a paper so I've got the data from GEO. In the data, they have a column called "unique hits" for each ID that I think mostly relevant to the next step. However, I just don't know if they are the values that I can use to analyse the gene expression level.
I've used log_RMA and RMA in the microarray before to analyse gene expression but I don't know if these are the same.
Thank you!
Relevant answer
Answer
In next gen sequencing the first crucial step is mapping your sequence fragments, the reads, to the reference sequence (genome or transcriptome). However, reads are short (typically 75 bp or 100 bp) and it is not always possible to get a single best hit.  It is common to use only those reads with a single best match (= "unique hits") in downstream  analyses.
RNA seq is different from microarrays. Microarrays are based on hybridization and produce a continous fluorescence signal, whereas RNA-seq is based on short read sequencing and mapping and produces as a discrete signal the number of unique reads mapped per gene. The normalization and statistical analysis for RNA seq differs from microarrays. The most commonly used analysis packages for RNA seq are EdgeR and DESeq2. You can find those on Bioconductor.
  • asked a question related to Microarray Analysis
Question
3 answers
Hello friends
I am doing micro array data analysis(HGU1333plus2), i got the expression matrix file by using gcrma , but the some probe is represent multiple gene like this . how can we treat this, then some probe is not matched it shows NA can delete it , next i take this file for analyze  WGCNA , please share your knowledge ,
221251_x_at
1
221251_x_at
INO80B /// INO80B-WBP1
NA
65133_i_at
1
65133_i_at
INO80B /// INO80B-WBP1
NA
223072_s_at
1
223072_s_at
INO80B /// INO80B-WBP1 /// WBP1
NA
1559716_at
1
1559716_at
INO80C
INO80C
229582_at
1
229582_at
INO80C
INO80C
220165_at
1
220165_at
INO80D
INO80D
Relevant answer
Answer
Hi Mathavan,
If you are asking about this particular case, it can be explained by the fact that INO80B and WBP1 genes can sometimes occur as a read-through transcript that cannot be distinguished from either of the genes individually by the probes in question (see https://genome.ucsc.edu/cgi-bin/hgc?hgsid=579057461_Tm25CaPexEPPpKrMhRyrGSqaNmCR&c=chr2&l=74456725&r=74462493&o=74455022&t=74460891&g=refGene&i=NR_037849).
You can also look up more information about the U133 array's probes sensitivity and specificity to certain transcripts to understand why this could be for other probes and genes. For this particular example you can look at the information for probe 65133_i_at here https://genecards.weizmann.ac.il/cgi-bin/geneannot/GA_search.pl?keyword_type=probe_set_id&array=HG-U95&target=genecards&keyword=65133_i_at and for the 223072_s_at probe  see https://genecards.weizmann.ac.il/cgi-bin/geneannot/GA_search.pl?keyword_type=probe_set_id&array=HG-U133&target=genecards&keyword=223072_s_at.
  • asked a question related to Microarray Analysis
Question
1 answer
I have obtained the microarrays data for the large cohort (both sexes). I have performed initial GWAS for all the SNPs from all the chromosomes to check the genetic association with trait which I am interested in. I found some regions but the most interesting is the one in X chromosome (in my opinion it is not a fake). However, I am a bit confused because I do not know - can I? and how can I? - analyse these data. for women there is standard 3 alleles distribution but for men, it possible to have only 2 variants: presence of allel or lack of allel.
- should I divide cohort for separate analysis for men and women subsets?
- what kind of statistics should I use for men, because I think there is impossible use simple MAF? and are the statistics results only for men subset from PLINK are reliable?
- or do you have any more advice?
I would be very grateful for all you help.
Relevant answer
Answer
I guess separate analyses need be done for men and women as men have XY chromosomes and would differ from the normal (MFA) analysis.
I think software PLINK should help you do the needful.
There is plenty of literature on GWAS in different organisms. The field is exploding. For fruitful guidance in your case, a journal like American Journal of Human Genetics would be very useful to search problems and solutions similar to yours.
  • asked a question related to Microarray Analysis
Question
1 answer
I performed the Cell Cycle Control Phospho Antibody Array (http://www.fullmoonbio.com/product/cell-cycle-control-phospho-antibody-array/) with 7 control and 7 treatment samples. To identify the signal intensities I used GenePix Pro 7 and created .GPR files.
How do I continue with my statistics? I want to normalize the data and calculate z-scores or SAM. I can normalize tha data in Excel, but I am sure there is a more convenient way to proceede. I read about the program Prospector from Invitrogen and the protMAT website, but Prospector is not working with my .GPR files. 
I am new to protein array and microarray research and would be very happy for any suggestions.
Thank you so much!
Relevant answer
Answer
Dear Denise,
the basic problem of high-througput data (common to metabolomics, transcriptomics and protein arrays) is the huge number of variables (protein species in your case) with respect to statistical units (the 14 samples) that open the way to a plethora of chance correlations. Thus the main lane IS TO EXCHANGE THE ROLE OF VARIABLES AND STATISTICAL UNITS. Simply operate on the transpose of tour original data matrix, i,e. the matrox having as rows the protein species and as columns (variables) the samples. On this matrix operate a Principal Component Analysis (that allow a dual representation of the same data set in terms of loadings (correlation coefficients of the variables with components) and scores (values of different component for each sample). So operate this pC, if you have 14 variables (7 control + 7 treatment) you will have in principle 14 components but, due to the mutual correlation between the abundance of different protein species you will end up into very few (2 or 3) principal acomponents explaining the by far major part of total variance.
Look at the component loadings and you will surely get a PC1 with all loadings of the same sign (size component) this is the signature of a common global profile shared by all the samples. Then go to PC2, PC3..PC4. You must see if there is a component in which the loadig values are significantly different between control and treated (ideally you will get  a component in which control and treated have opposite sign loadings) allowing for a perfect separation of the two groups in the loading space (shape components). If this is the case you will go to the scores of the discriminating component and look for protein species having the higher (in absolute value) scores on the component : those protein species are the ones allowing for the separation of the two groups and you solved your problem, If you will run a PCA on the correlation matrix you do not need to normalize protein values, normalization is implicit in the correlation metrics.
See:
  • asked a question related to Microarray Analysis
Question
5 answers
I am working on a biological dataset which is not following ideal normal/gaussian distribution.. Which statistical test and technique would be best to analyze this dataset ??
Relevant answer
Answer
What "difference" do you mean?
- a difference in distribution?
- a stocastic difference in magnitude?
- a more specific difference of the distributions (e.g. difference in concentration, in variation, or in some quantile?)
  • asked a question related to Microarray Analysis
Question
3 answers
Hi,
First let me start of by saying that working with Proteomic datasets is quite new, and while I find it terribly interesting I am currently having trouble finding some answers related to my dataset.
Very briefly, my question would be, how and if I can use "Raw intensities" to examine protein expression and interactions (i am using perseus). I am working with raw intensities as I've been told that LFQ intensities cannot be used if there are large variations of protein identifications between samples, which there in my case is. Nevertheless, first let me start of by describing my dataset before moving on to the specific questions I have.
Dataset
* I am comparing four different methods for isolation of the same plasma constituent.
* There are three unique biological samples (3 different controls) in each isolation method (12 samples).
* Additionally, all four methods are performed as technical duplicates, meaning I have a A and a B series, both on the same dataset (22 samples)
Questions
1. First and foremost, am I even able to do statistical analysis on my dataset?
2. Should I normalize my peak intensities? What I've understood from my reading, is that raw intensities only somewhat correlate with actual abundance and if one want to analyse raw intensities one need to use some form of peak intensity normalization. I've been looking at a normalization method called EigenMS and Global normalization, and while global normalization seems simple enough my thought is that due to large differences between isolation methods, this form of normalization cannot be used. My question would then be, should I normalize my data, and if yes, what would be the best method?
3. How should I group the different methods when analysing? Currently I am grouping all three controls per isolation method (6 with technical duplicates) into the same group using the annotation rows feature.
Any help is greatly appreciated, and if there is any features of my dataset I forgot to tell, please dont hesitate to ask.
Relevant answer
Answer
Hi, sorry for kinda late reply.
I think your suggested workflow could work. Be careful when imputating data. Always check the histogram of log2 LFQ intensities that you do not introduce too many data (e.g. for normal distribution). You would be able to see that by a kinda bimodal distribution. I think in your dataset this will be difficult when comparing different isolation methods because alone by comparing 4 and 1 you would have to imputate more values than you actually measured in method 4. Again, comparing between similiar isolation methods is fine I guess. :)
To compare different methods I really would simply compare the numbers of quantified proteins for your replicates.
For ibaq- indeed it was "developed" for aboslute quantification but it is also used for relative comparision in some studies and immunoprecipitations.
Best
Hendrik
  • asked a question related to Microarray Analysis
Question
1 answer
We have done miRNA Microarray using Agilent Human miRNA Microarray Kit
Ver. 3.0 (Cat No: AGT-G4470C). I have .gpr files of my samples but I could not analyze their miRNA profile on genespring. How can analyze them on Genespring?
Relevant answer
Answer
  • asked a question related to Microarray Analysis
Question
3 answers
cancer microarray dataset from geo dataset and .cel file using
  • asked a question related to Microarray Analysis
Question
3 answers
In the microarray database, when 2 probe set ID for one gene showed significant difference. How to interpret the expression of the gene by the result?
Relevant answer
Answer
That depends on the probe sets.
If they are against different isoforms / splice variants, the difference may be due to a variant-specific expression. If the probes are designed against different builds of the genome/transcriptome there might be a more "reliable" build.
Eventually you may consider sequencing the respective genes and confirm the regulation with a different assay (ISH, northern blot, qPCR, ...).
  • asked a question related to Microarray Analysis
Question
6 answers
I observed this in one of my microarray experiments in which the first two gene got upregulated and last gene was downregulated. these three genes belonged to the same operon. Kindly suggest
Relevant answer
Answer
If you are sure that these genes are in the operon and can confirm these results using qRT-PCR, then the difference can be due transcriptional attenuation, e. g.  premature termination of transcription. Transcriptional termination is one of the mechanisms to regulate gene expression.