Science topic
Microarray Analysis - Science topic
Explore the latest questions and answers in Microarray Analysis, and find Microarray Analysis experts.
Questions related to Microarray Analysis
I am performing an Affymetrix microarray analysis and aiming to identify differentially expressed genes. I have a list of differentially expressed genes after my analysis, however, there are some probe sets which are mapping to a single gene. For example, probe sets 209201_x_at, 211919_s_at, and 217028_at are mapping to CXCR4 with 3 different expression values.
What is an appropriate method to select a specific probe set if I want to identify differentially expressed gene? Is averaging the expression values of the probe sets for a single gene works?
Many thanks!
Good day, dear colleagues!
Can I use LogFC values for co-expression analysis?
We study the role of RPOTmp - the dual targetting (mitochondria and plastids) single-subunit RNA-polymerase in plants - for this reason our lab made various transgenic plants with altered expression of RPOTmp and conducted two-channel DNA-microarray experiment.
What I'm trying to estimate - are there any genes that are co-expressed with RPOTmp? Or clusters of genes that are co-expressed in response to retrograde and anterograde signals made by altered RPOTmp expression.
So there's likely no any sense to perform the enrichment analysis using a table of expression values of the lines and wild type (although near every package stated that).
" Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length "
this error was showed me when I was trying to download gset data in R program but its seems there are some problems.
> gset <- getGEO("GSE77182", GSEMatrix =TRUE, AnnotGPL=FALSE, destdir ="data/")
Found 1 file(s)
GSE77182_series_matrix.txt.gz
Using locally cached version: data//GSE77182_series_matrix.txt.gz
Rows: 59899 Columns: 6
-- Column specification ---------------------------------------------
Delimiter: "\t"
chr (1): ID_REF
dbl (5): GSM2045612, GSM2045615, GSM2045616, GSM2045618, GSM2045620
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Using locally cached version of GPL21369 found here:
data//GPL21369.soft
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
I have a dataset with 149 columns as GSM ID's and first columns as a Gene Name (Screenshot attached). Total 20,000 rows (Genes) are present. How can I analyze the dataset to find the Biological pathways using KEGG or other pathway database.
All GSM ID's are lung cancer micro array expression data.
I know how to do Differential Gene Expression data and pathway analysis but don't know how to analyze this type of datasets.
Also comment if you feel that dataset is not correct or cannot be used to find the pathways or other information is required.
Any help will do great. Thanks in advance

for PCR analysis and western blot, is there a minimum required amount of average expression level of DEGs? For example, more than 200 or 500. I have various answers to that question and I am a little confused.
I have a microarray data and for analysing that I am using r software. I have a cancer data for each time pt.
I have 5 samples for to
5 samples for t1
4 samples for t2
3 samples for t3
All r disease model...I want to do microarray analysis to find out the differential gene for each time pt. Compare to t0. I want to take t0 as a reference for each time pt.. s
So how can I make contrast matrix for that.
I have a question about "network score" of IPA Network analysis. In many papers, the top 5 networks were listed in tables, while in these tables some network scores are high (around 50), but others are low (less than 20). We use the same method for network analyses, and got the impression that we can see tight association between genes when "the network score" is higher than 40. However, we have not found literature discussing the meaningful "network score" (we found one paper described that “the networks are selected if their score is higher than 21”). We would appreciate it if you could let us know information about such a meaningful network score or your impression/experience of the network score (for example, did you see tight association of genes when the network score was less than 20?).
I tried to find the data of all together from GEO but I couldn't, so what if I got the data of the breast cancer cell lines which are MCF7, MDAMB321 and SKB3R. Then, I got the data of the gene I want to check, which is HK2 and do the microarray analysis through R studio to check the differential gene expression of KH2 among the cell lines.
Dear fellow Researchers,
I am currently trying to analyze Affymetrix microarray data through dChip software and I have the input files - probe sequence and CDF for Rat 230 2, yet facing issues in obtaining expected results. Could anyone please help me out if gene info file is much necessary (as only CDFinput is mentioned as mandatory as per the protocol I have) and where to obtain them?
Thank you in advance
I would like to perform differential expressed Genes analysis of a NimbleGen data. I have dataset of 48 .pair files and 48 .calls files.
1) Can I perform DE genes analysis only with these data without using oligo package? ( my data contains single channel only 532 output)
2) what is the appropriate method for getting differential expressed genes?
3) When I transformed my pair info to xys file by extracting X-Y and signal values, those results are not accurate. The genes that were shown to be DE are not correlated with my experiment conditions.
Please help me
Thank you in advance
Best Regards
Tunc
Also want to add : Our pair and call file don't have header. That is why, we don't know any thing about the NDF file. We do know that our chip is 100718.hg18 but we don't know the correct GPL file. In the lab method, it was reportad that Nimblegen Human Expression Array 12 x135K chip used.
Does anyone know of a source for microarrays to study tRNA expression? I was told microarrays.com provided these. I have asked them directly but no reply so far.
On a given microarray design there are multiple different probes spotted for many genes. The (normalized) signals of the features (all referring to the same gene) often are quite different (log2 values can vary between 2 and 16, so essentially from "almost undetectable" to "completely saturated").
If a gene set analysis or an over-representation analysis is performed, there should be one value per gene.
How to select which signal to use for the gene? I don't feel good to take the average of all the multiple features, because they are often so different. Taking the highest signal only also seems to be wrong.
Any ideas?
The attached file shows a table with example data (from an Agilent Microarray) with 5 different probes addressing the gene "PRDM". The last 4 colums show the log signal intensities for 4 different samples. The values range from 3 to 10, so there is a more than 100-fold difference in the signal intensities between the probes.
hello ..
I'm trying to analysis two different microarray datasets from different chips using web-based tool.
i don't know how to do that .. should i use one off them only ?
or should i combine them using some kind of algorithm ?
thank you
Does anybody use a specific software (free) for the densitometric analysis of protein array data, or do you know how to add this tool in ImageJ?
Thanks
I have analyzed the dataset of GSE38132 from gene expression omnibus. The data is from cell line breast cancer ZR-75-1 which comprises of 9 conditions with 4 replicates for each condition making it 36 samples. I used limma R package to normalize the data (quantile normalization). I noticed a great change in a group where three replicates shows similar expression where the 4th replicate of the same sample is different from all other three. I confirmed the expression from raw data by cross checking with the probe id and found 1 replicate is different from all other three. As a double validation I checked the normalized sample deposited in NCBI-GEO and I found the same. Is this possible?
Please see the heat map first four are replicates from the same sample

I have two batches of samples which were collected during two time period. If I perform batch correction to remove batch effect will it affect the downstream analysis of gene expression studies?
Generally in microarray differential expression analysis studies the lower bound for |logc| is chosen around 1 to make fold change 2 which sounds like a common sense. In other cases, when |logfc| >= 1 gives zero differentally expressed genes, logfc is chosen to get a "reasonable" amount of differentially expressed genes. It stands to reason, that a more rational way of choosing logfc would be to infer it from the microarray platform's accuracy or the quality of the hybridizations in the particular microarray-experiment or some other evidence-based criteria.
How to decide which logfc to choose?
Hello everyone,
Currently I am trying to do K - mean clustering on microarray dataset which consists of 127 columns and 1000 rows. When I plot the graph, it gives an error like "figure margins too large". Then, I write this in R console:
par("mar") #It will give current dimensions
par(mar=c(1,1,1,1) #Tried to update the dimensions
But; it did not work. So, can anyone suggest me another way of fixing this problem? (Attached the some part of code in below)
Thanks,
Hasan
--------------------------------------------------------------------------------------------------------------
x = as.data.frame(x)
km_out = kmeans(x, 2, nstart = 20)
km_out$cluster
plot(x, col=(km.out$cluster+1), main="K - Means Clustering Results with K=2",xlab"", ylab"", pch=20, cex=2)
>Error in plot.new() : figure margins too large
While examination of the differential expression of non coding RNAs from blood samples or cell cultrues and animal models, how many times should we repeat the microarray analysis experiments ?
Is the repetition of the microarray analysis change according to the examples? For example, we should repeat the experiments at least 3 times on cell culture model to identify non codings expression profiles, is that certain information?
Hi all,
I am running a Random Forest –Mean Decrease in Accuracy algorithm for feature selection on my Microarray data in order to use the selected genes as a classifier to discriminate between 2 classes of cell lines. I am having problems to interpret the output information given by the algorithm. It gives me a small list of selected genes and for each gene there is a Pearson correlation value, a fold change value and a q-value (False Discovery Rate) .
The variable “class” is discrete (normal vs disease), so what does the Pearson correlation mean in this case?
Should I take the q-value showed as a multiple test correction and give less importance, or exclude, the genes that showed a q-value (FDR) higher than 0.2 (or any other pre-determined cut-off for significance)?
I would appreciate any suggestion on how to interpret the results of a RF-MDA for feature selection algorithm.
I'm looking to take an Arabidopsis RNA-Seq differentially expressed gene set and search it against other publicly available RNA-Seq (and possibly microarray) experiments to find the experiments that found the most similar patterns of deferentially expressed genes.
Does anyone know if a tool that enables this has already been created?
Hello,
I am trying to do normalization the data of GSE8397 with MAS5.0 by using R:
setwd("D:/justforR/GSE8397")
source("http://bioconductor.org/biocLite.R")
biocLite()
library(affy)
affy.data = ReadAffy()
However, the data used to 2 platforms: Affymetrix Human Genome U133A and Affymetrix Human Genome U133B Array.
The code gave me the warning message: "Error in affyio::read_abatch(filenames, rm.mask, rm.outliers, rm.extra, :
Cel file D:/justforR/GSE8397/GSM208669.cel does not seem to be of HG-U133A type"
So, how can I keep normalizing the data when they are in both U133A and B? Should I try another method of normalization (RMA or GCRMA?)
DO you have any ideas about this problem?
Thank you so much!
As example i have download .cel file now how can i get the data regarding upregulation and downregulation? and what is the principal behind this values?
I am trying to perform miRNA microarray analysis for a human cell line. Though the samples have good A260/A280 ratio (2 and above) and good RIN (most above 8) the A260/A230 ratio is low in many of the samples (as low as 0.3 in some samples).
So the question is, is it wise to proceed with these samples? can the A260/A230 ratio affect the quality of the microarray (I will use Agilent microarray chip)?
PS: The analysis will be done with a company and not in-house. Qiagen miRNeasy kit was used to extract the RNA, a DNASE treatment step was included.
I am planning to extract the whole serum protein from serum samples for further microarray analysis. what is your recommendations please. a good protocol is highly appreciated.
Thanks in advance
Hi there
I would like to know and to get your feedback about you favourite Gene-enrichment analysis software based in a graphical environment, preferably on-line.
Please give me your feedback about what you like most !... Thanks
All the best
Paco
Can anyone suggest me collection numeric variables for deep learning? I have set of features (DEGs with their fold change value) from micro array. I want to prepare training as well as test set for deep learning. For this atleast three numeric variables are needed, any suggestion?
Thanks in advance
Happy Sunday everyone,
I am trying to calculate row - wise mean and variance in R and then I will sort them. I used to "Absent/Present" calls from the Affymetrix algorithm to flag genes with questionable expression levels, but there are many NAs in the dataframe. So, I have to remove those genes which have questionable expression levels (NA's) and do mean - variance calculations. What I did is that;
library(data.table)
dat <- as.data.table(df)
rowvar <- function(x, na.rm =F) rowSums((x - rowMeans(x, na.rm=T))^2, na.rm=T)/(rowSums(!is.na(x)) - 1)
dat[,`:=` (variance = rowvar(.SD, na.rm = T), mean = rowMeans(.SD, na.rm = T))]
But; it gives an error like, "Error in rowMeans(x, na.rm = T) : 'x' must be numeric". So, how I can handle with this error?
I have attached the document that I am currently working on it.
Thank you for your interest.
Hasan,
Hello everyone,
I have a Excel spreadsheet which contains 17 columns, 54,675 rows and I need to calculate each of 54,675 row's mean and variance in R Studio. After that, I have to add each rows' mean and variance as a new column in Excel spreadsheet. So, how I can deal with this issues? I suppose apply() function works but somehow I could not do it. Any suggestions?
Thanks,
Hasan
I want to draw a trend of the expression of genes which i have and for doing so i want to know whether the samples in GEO series matrix (I mean GSMxxxxxx) are comparable with each other (sth like normalization and these processes happened to them) or not? and also whats the meaning of this caution in GEO that i attached to my question?
thanks

Hello everyone,
I have to analyse data from Affymetrix microarray (Human Genome U133 Plus 2.0 Array) with Bioconductor and it is the first time I am using Bioconductor. I got .cel files from NCBI GEO but I could not get the chip description file. So, how I can obtain a CDF?
And one more thing, when I check number of genes in my dataset, the R program shows that it contains 54,675 genes. However; this number should be between 20,000 - 25,000. So, I am wondering that there might be any replica of them?
Any suggestions and someone can help please?
Thanks,
Hasan
Am trying to design a probe for microarray analysis. For the control genes i prefer constitutively expressed House-keeping genes. If anyone could guide me with designing a probe and primer for a particular gene say for example 23s rRNA it would be helpful. manuscripts state the sequence and corresponding primer pairs but no description about the methodology involved in designing. Can anyone could help me with this?
Hello.
I have been working on a transcriptomics data from NCBI's GEO for a while now. However, I have recently been made aware of this phenomenon [see image attached]:
- When plotting the probes (y-axis) against the subjects (x-axis), the heatmap generated shows a very large area (around 10,000 probes) with an intensity lower than their vicinity, both for control and for diseases individuals;
- This effect is also (more visible) on the other image, that shows that for around 10,000 probes (x-axis), the intensity (y-axis) is lower than the average intensity.
My question is: do you have any idea what this could be due to? Is this "area" of lower intensity a common thing in microarray analysis? Should I exclude this area from the analysis?
Thank you in advance

I am working on some of the microarray data of some of the genes I am intersted in. I wanted to know the expression levels of some of the genes in different cell types. People have deposited data in duplicates or triplicates for every particular cell types and all the microarray experiment was done in the same kind of microarray chip. Now I have to get a mean value for every cell types and compare it with other cell types. Till now I have normalized the data using gcrma package. Now I think I have to normalize the values with any house keeping gene values and proceed furthur. I am not sure how to proceed. I need help. Please guide me through this.
I also have to find the gene coexpression values for all the gene pairs. I calculated the pearson correlation coefficient from the normalized values from all the data sets. Is that okay or I have to calculate PCC only after normalizing the values by any house keeping gene value. Please help me. Thank you
I need to compare a gene's expression between tumor site and matched normal tissue from TCGA database. I've tried using Firehose to search differential expression of the gene among different types of cancers. The problem is that the amount of tumor samples is not equal to the amount of normal samples. But I need to compare matched tumors and normal tissues. Is there any tools to do that?
I want to convert expression level value to z-score (mean-x/sd). I have two type of samples in my microarray (Affymetrix GeneChip Human Genome U133 Plus 2.0) (31 normal vs 30 case) Do I have to calculate the mean and sd for the Normal samples only and use z-score formula then do it for case samples or I have to find the mean and sd for the whole samples ?
From microarray or RNA seq expression data, for valadating the data do we need to select the genes randomly or we can choose what matters to us?
Hi experts,
Since RNA-seq with NGS technology is changing gene expression studies with great advantages. We still observe a lot of studies using microarray (i.e. Affymetrix Gene Atlas, etc.) techniques and even qPCR (to a certain extent).
I personally believe and biased towards NGS technology and RNA-sequencing for gene expression studies. Not only that, RNA-seq has the ability to discover novel gene transcripts to open a potential new field of study.
However, RNA-seq can be costly, but I personally believe in the end, it's better than microarray. So in what instances can I say that microarray is better than RNA-seq? I am working with primary cells, cell lines, and mouse as my animal model for brain-related studies.
I am looking forward to hearing your opinion.
Hi!
Does anybody know a programme/software/website to perform HeatMaps without using the R language??
I have a set of 3099 genes up regulated and a set of 2686 genes down regulated under my unique experimental condition and I would like to compare them.
Thanks a lot!
So, basically there are abundance values for my first 3 sets of experiment that have variable control values for all peptides within each protein. How do I make use of this data to get significant peptides or calculate the fold change statistically? I was wondering if there is a way to do this without control abundance data. Also, should I use normalisation techniques, and which one?
Hello everyone. I have normalized reads of RNA-seq data and I am trying to generate a venn diagram of upregulated and downregulated genes. I have three replicates each of control and test samples. I tried to search online but couldn't decide which tools would be better to use. Can anyone please suggest me any windows based offline/online tools to generate venn diagrams from RNA-seq data? Thank you very much.
Raghu.
I have used the MultiNA to quantify RNA for the first time.
Could someone help me interpret the output results? Does it have an equivalent number to RIN?
Can I trust the "Total conc" readout?
Many thanks
Our GeneSpring user license has expired so I am investigating whether there is an appropriate online open source application I can use to analyse microarray data.
Hi,
I have 33 ligands in total, which were analyzed through SAM. Reported in an article entitled "Analysis of the major patterns of B cell gene expression changes in response to short-term stimulation with 33 single ligands". I selected 10 ligands from above data and wants additional analysis but they didn't provide the RAW data/CEL, I downloaded the Processed data from "ArrayExpress". I reviewed the limma tutorial and want to make sure the downloaded data file for limma. I need a starting point for analysis through limma, I attached one of processed data file as an example, Can I use processed data files as an input for limma and which type of analysis will be performed? I will be waiting for your valuable answers.
Thank you,
I analysed PPI network after integrated gene expression data from alzheimer's disease experiment within PPI network and reveals some sub network. First, I used (limma package) for Differentially Expressed Gene analysis. Second, I mapped DEG genes on the PPI network and assign the gene fold change value to corresponding proteins. Third, I search the network by selected my candidate gene and reveals sub-networks. I scored them by my formula, then I merge the top scoring sub networks.
Now, I want to validate my results (merged sub network) and I have no idea how to do.
Could anyone help me or suggested a method to validate my outcome please? I will highly appreciated
Dear colleagues, I have Affymetrix microarray data, from endothelial cells, co-cultured with mononuclear cells in conditions of normoxia, hyperoxia and hypoxia. Control cultures of endothelial cells are also cultured (alone without mononuclear cells) in these same conditions. The affymetrix microarray data have been processed with the Expression Console(Gene level >> extended:RMA-Sketch) and filtered. I wish to use excel to elucidate differentially expressed genes.
Please, what steps do I need to take to proceed in the elucidation of these differentially expressed gene using excel? I am new to high-throughput data analysis.
Thank you in advance for your response.
Hi all,
Performing RNA-Seq data sets needs to know which the most accurate and reliable platform to go with. Could you suggest such pipeline?
Note// I have good experience with the Tuxedo package (Bowtie, Top Hat, and CummeRbund) in addition to EdgeR,
Thanks
I have access to 3 experiments from GEO. The sample type for one experiment is blood and other two types is skin. I have the RPKM values of control and patient samples from these tissues. The platform for all these experiments is same.
How to proceed with meta-analysis for these experiments? There are very few papers regarding the protocol. It will be a great help if I can get any pipeline.
I am doing differential expression studies using iTRAQ. I have problems with identifying the fold change / fold enrichment on the downregulated iTRAQ ratios. For example, iTRAQ ratio for 117:114 shows 3.256, which means that it shows upregulation of 3 fold change, but how about downregulated ratios since it shows value less than 1, for example 0.2679. Is it possible for us to calculate how many fold change from the iTRAQ ratio with PVal (ratio) given? I am using ProteinPilot Software.
Thanks in advance!
While finding the differently expressed genes from the microarray data, which are the necessary parameters that we have to taken into account for a more satisfying result? Which are the intervals(maximum value and minimum value) can be set for FDR, fold change etc. in accordance with log2 normalized p-value.
Hello.
I tried to perform meta-analysis of differential gene expression data using GEO.
A-madman program looks like fancy. However, it is not working in the process.
The error occurred when I perform click analyze after grouping on Basket tap.
Any one help this program or recommend another program or R-package?
Thanks in advance

Dear All,
I am trying to see which CpG sites (with its associated genes) are involved in particular pathways and diseases, and get an overview of the functions of these genes.
Currently, I have tried to import my dataset (>800k CpG sites total) which shows the following: 1) each CpG site as the ID, 2) p-value, 3) q-value, 4) fold change and 5) difference. My data sets are quite large with >200,000 CpG sites (the row limit of IPA) - is there a way to import a file this large?
I have also tried importing a file with more specific CpG sites of around 1000 CpG sites but it is not being mapped properly by IPA as I have 0 mapped sites due to errors or possibly I am using the wrong template (i.e. not expression data)?
I think the errors are coming from my formatting in my excel file to IPA, where either the headings are incorrect and the way I am assigning each header/observation is incorrect i.e. I think I set my Identifier as Illumina (which is what I used to get my CpG methylation data), but I do not know what other options I can choose instead of this. IPA also showed errors first with 'no IDs matched to particular genes',and then with 'removing fold change between 1 and -1'.
In summary, I would really appreciate any tips/guidance with uploading CpG methylation data into IPA.
Thank you very much.
I am trying to find out expression profile of my candidate genes from RNAseq or CAGE data from cancers using publicly available RNA seq data
I prefer any online search tools at this stage for a quick analysis.
I have some genes with their FPKM values now i want to convert this value in to log2 fold change.
Hi Everyone,
I'm using microarray data to identify DEGs and map its PPI network but now I want to use multiple datasets reported by different studies in Acute Myeloide Leukemia (AML). Please specify a good methodology step by step and also please specifically I can Merge different datasets. Please also need some info regarding the requirements for merging.
Thanking you in Advance.
I heard about, scanned with microarrayed slide like show image below, black background+fluorescence dots. But what I saw is all-of-white slide with black frosted ends(conatantly black, below). I have no photos on my PC, but I saw white slide+black end. I don't know about why scanned photo is only block&white. plz give your opinion..

Hello guys!
We have several transcriptome data sets, which came from the samples that were treated at low temperature for different time length, let’s say at 4 ℃ for 1, 3, 5, 7 hours. After analyzing those data we have got deferentially expressed genes (DEGs) for each treatment time point. For example, when sample treated with4 ℃ for 1 hour, we got 2000 up-regulated and 3000 down-regulated genes; For 3 hours, 1500 up and 2500 down-regulated genes; 5 hours …; 7 hours ….
My question is how I can analyze these DEGs further to get certain portion of genes which are really crucial at low temperature in this sample?
And is there tools to this work?
By the way, my data sets come from RNA-seq, and the fold-change value of each unigene at different treatment time point is calculated with DEseq.
Thanks in advance.
well... I ruined my microarray... So, I want to ask something for you.
Is that OK I store my buffers in RT?
Pre hybridization buffer-5X SSC/0.1% SDS/1% BSA
Hybridization buffer-5X SSC/0.1% SDS/50% Formamide
Low stringencity wash buffer-1X SSC/0.2% SDS
High stringencity wash buffer-0.1X SSC/0.2% SDS
0.1X SSC
50% DMSO
this is my buffers. I store these buffers at 4 celcius now, because BSA is store at 4 degree. but some solute cannot solve at 4 degree.
Currently differential gene expression identification usually using RPKM, TMP or TMM, however the sequencing depth is controlled by people and all the quantification are relative. To compare between samples, some methods use the distribution based normalization, like DESeq2 and edgeR. The problem is that these methods are not that correct too. While we sequence a low expression samples with high depth and a high expression sample shallow, all these methods seems can not detect the true difference. One of the idea is that if there are a group of universal genes with unchanged expression level, these genes should be taken as the baseline to perform normalization and compare between samples.
I have noticed about one paper using this idea to normalize the gene expression of plant tissue, they established the stable expression database. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5178351/.
But for prokaryotic microorganisms, it seems that there does not have any stable expressed gene set yet.
Any comments will be appreciated.
Hi all,
I want to analyse an RNA-seq data set from a paper so I've got the data from GEO. In the data, they have a column called "unique hits" for each ID that I think mostly relevant to the next step. However, I just don't know if they are the values that I can use to analyse the gene expression level.
I've used log_RMA and RMA in the microarray before to analyse gene expression but I don't know if these are the same.
Thank you!
Hello friends
I am doing micro array data analysis(HGU1333plus2), i got the expression matrix file by using gcrma , but the some probe is represent multiple gene like this . how can we treat this, then some probe is not matched it shows NA can delete it , next i take this file for analyze WGCNA , please share your knowledge ,
221251_x_at
1
221251_x_at
INO80B /// INO80B-WBP1
NA
65133_i_at
1
65133_i_at
INO80B /// INO80B-WBP1
NA
223072_s_at
1
223072_s_at
INO80B /// INO80B-WBP1 /// WBP1
NA
1559716_at
1
1559716_at
INO80C
INO80C
229582_at
1
229582_at
INO80C
INO80C
220165_at
1
220165_at
INO80D
INO80D
I have obtained the microarrays data for the large cohort (both sexes). I have performed initial GWAS for all the SNPs from all the chromosomes to check the genetic association with trait which I am interested in. I found some regions but the most interesting is the one in X chromosome (in my opinion it is not a fake). However, I am a bit confused because I do not know - can I? and how can I? - analyse these data. for women there is standard 3 alleles distribution but for men, it possible to have only 2 variants: presence of allel or lack of allel.
- should I divide cohort for separate analysis for men and women subsets?
- what kind of statistics should I use for men, because I think there is impossible use simple MAF? and are the statistics results only for men subset from PLINK are reliable?
- or do you have any more advice?
I would be very grateful for all you help.
I performed the Cell Cycle Control Phospho Antibody Array (http://www.fullmoonbio.com/product/cell-cycle-control-phospho-antibody-array/) with 7 control and 7 treatment samples. To identify the signal intensities I used GenePix Pro 7 and created .GPR files.
How do I continue with my statistics? I want to normalize the data and calculate z-scores or SAM. I can normalize tha data in Excel, but I am sure there is a more convenient way to proceede. I read about the program Prospector from Invitrogen and the protMAT website, but Prospector is not working with my .GPR files.
I am new to protein array and microarray research and would be very happy for any suggestions.
Thank you so much!
I am working on a biological dataset which is not following ideal normal/gaussian distribution.. Which statistical test and technique would be best to analyze this dataset ??
Hi,
First let me start of by saying that working with Proteomic datasets is quite new, and while I find it terribly interesting I am currently having trouble finding some answers related to my dataset.
Very briefly, my question would be, how and if I can use "Raw intensities" to examine protein expression and interactions (i am using perseus). I am working with raw intensities as I've been told that LFQ intensities cannot be used if there are large variations of protein identifications between samples, which there in my case is. Nevertheless, first let me start of by describing my dataset before moving on to the specific questions I have.
Dataset
* I am comparing four different methods for isolation of the same plasma constituent.
* There are three unique biological samples (3 different controls) in each isolation method (12 samples).
* Additionally, all four methods are performed as technical duplicates, meaning I have a A and a B series, both on the same dataset (22 samples)
Questions
1. First and foremost, am I even able to do statistical analysis on my dataset?
2. Should I normalize my peak intensities? What I've understood from my reading, is that raw intensities only somewhat correlate with actual abundance and if one want to analyse raw intensities one need to use some form of peak intensity normalization. I've been looking at a normalization method called EigenMS and Global normalization, and while global normalization seems simple enough my thought is that due to large differences between isolation methods, this form of normalization cannot be used. My question would then be, should I normalize my data, and if yes, what would be the best method?
3. How should I group the different methods when analysing? Currently I am grouping all three controls per isolation method (6 with technical duplicates) into the same group using the annotation rows feature.
Any help is greatly appreciated, and if there is any features of my dataset I forgot to tell, please dont hesitate to ask.
We have done miRNA Microarray using Agilent Human miRNA Microarray Kit
Ver. 3.0 (Cat No: AGT-G4470C). I have .gpr files of my samples but I could not analyze their miRNA profile on genespring. How can analyze them on Genespring?
I observed this in one of my microarray experiments in which the first two gene got upregulated and last gene was downregulated. these three genes belonged to the same operon. Kindly suggest
Dear colleagues,
Which statistical test(s) can I use, to elucidate differentially expressed genes after affymetrix microarray analysis? I am using endothelial cells, co-cultured with mononuclear cells, in three different oxygen tensions.
Thank you in advance for your answer...
Dear All,
I am working on Micro-array data analysis with this GEO acc no. GSE31747, In this paper PMID 22024983 (https://www.ncbi.nlm.nih.gov/pubmed/22028943) they have used ANOVA method to identify the significant genes,Now i am trying to compare their results with Limma package but i am not getting any single significant genes(P-Value<0.05 ). So can any one tell me if Limma not give any significant genes what else other methods i can go.
Good morning! Recently, I tried to confirm the gene microarray results by qPCR using SYBR methods (StepOne Plus), but got some 2-Tm Peaks melt curve (Figure 1). After decreasing the primer concentration from 400nM to 200nM and increasing the temperature from 60 to 65, some of them started to look better (Figure 2), so I decided to decrease the primer concentration again, but the problem came again (Figure 3). Is there anyone can give me some suggestions?
See the figures in my Google Drive, thanks
I am culturing mesenchymal stem cells on treated 1 mm^2 culture surfaces. This cannot be scaled up in area and may only be replicated in triplicate, i.e I have 3 mm^2 total area. If I assume that confluent cells are approx 625 µm^2 in area, then that will give me approx 1.5 x 10^3 cells per substrate. Is it best to pool these three samples to give me 4,500 cells total, or can I extract and detect proteins via microarray in triplicate with 1,500 cells? would DNA microarray be a better option?
I want to analyze RNA_Seq data that I found on the internet to practice. I found 8 RNA_Seq datasets (four different immune cells from mice and everytime two biological replicates). The goal of the analysis is to discover if there are genes that are differentially expressed.
I want to used edgeR to analyze the data. In order to order to do that, I need to specify a design matrix. There I am stuck. Cell type is one factor with 4 different levels in this analysis, but what about biological replicate? Should it also get an own factor with two levels?
I think biological replicate should not get a own factor, but I can not really explain why. It is a hunch.
Thanks in advance.
I never did microarray before and have limited knowledge in the analysis, therefore looking for a place which would give me an out put easy to analyze as well as good in quality. I am studying the changes of IFN alpha stimulating genes with the adeno virus vector therapy.
Normalizing microarray data across platforms can be very tidious. But, when data is present across different platforms like Illumina, Affymetrix and Agilent, does quantile normalization across individually normalized data for each experiment and platform remove batcheffects.
I'm working with samples from FACS where I sort one specific population and I aim to do a microarray analysis. After the FACS, I centrifuge the sorted cells at 4500rpm for 15min, remove the supernatant and store the samples at -80C. Then I extract the RNA using the Promega ReliaPrep RNA cell kit for small samples which includes a step for elimination of the genomic DNA. After this, we check the quantitity and purity on nanodrop (we usually get around 5-6ng/ul and A260/A280 of 2.0) and check the integrity (RIN) with bioanalyzer (usually we have RIN around 6-8 but the quantity is very lower than nanodrop). Why the quantity measurments are different using different methods? In this case, should I trust nanodrop values or it is not recommended?
For how long we can store the processed microarray chips before scanning
Dear all,
I want to ask the company to make the tissue array with our own liver tissue (HCC) from our tissue bank.
Can someone tell me how many sample should be on one slide? I mean how to design the array of the tissue?
Thanks for your answer.
Dachen.
Hi, Biomart is down so is there any other way to convert a list of official gene symbols back to probeset IDs for a given chip.
We have a list of gene symbols from the Hugene 1.0ST array but I need to convert those IDs to equivalent probeset IDs from the HUG133A array to run them through CMAP.
Thanks,
Steve
I have data from microarray analysis using affymetrix. I have the fold change done by the affymetrix, but not the log2. I am wondering What is better for doing the heat map; using the fold change or log2 fold change? why?
Thank you in advance for your contributions.
I’ve worked on salinity tolerance & transcriptomics in rice using Agilent 4x44K rice genome array. And now writing a paper on it.
I’ve done extensive Gene Ontology enrichment analysis and Gene network analysis and have some lists of probes. Such as..
Os01g0557500
Os01g0645200
Os05g0382200
Os06g0152200
Os06g0701600
Os08g0503700
Os09g0286400
Os09g0299400
Os09g0484900
Os10g0436900
I want to know which genes are these probes referring to.
Also, which stress adaptation related pathway(s) is/are influenced by my list of significant genes.
Can anyone please help? You can be a co-author of my paper as well.
Thanks
I'm thinking of doing some GeneOntology on some supplementary information I found in a paper (DOI: 10.1021/acs.jproteome.5b00770). However the information on identified proteins is available from both iTRAQ and iBAQ analysis. I'm not sure what the difference is or what one to use. What is the difference between the two?
Thank you.
Let's say that I want to compare the effect of monocytes' stimulation with factor X on the gene expression (microarrays) performed by five distinct groups (this is just an example). All of them have uploaded the data into bioinformatical databases. Each of these groups analyzed gene expression with a different microarray platform. Each of these groups have preprocessed the signal from the machine in a slightly different way and applied different normalization procedure therefore two types of data are available: raw and processed. It seems that the authors knew what they were doing while processing the data so I am trying to make use of the processed results. Naturally it would be wrong just to compare the processed expression data of the cells after the stimulation between the groups. I wonder if for every group, for every gene I could calculate the relative change in the expression as the ratio: before_stimulation/after_stimulation and compare these values?
- That would free me from the effect of distinct platforms (since within each pair the same platform was utilized)
- Reduce the effect of the data transformation on the resulting values of gene expression (since data transformation within the pair was the same)
- Free me from the effect of distinct monocyte cultures in the beginning
Alternatively, I will have to utilize the raw data but since by transforming the raw signal in a different way than most of the authors I will mostly obtain slightly different results that they have published... This seems odd...
*Also, I know that distinct microarray platforms analyze different genes sets - that is another problem.
Hi, I am using Ingenuity Pathway Analysis (IPA) to analyse data of breast cancer microarrays. I want to use the upstream regulator analysis for obtain relevant transcriptional regulators,but I don´t understand exactly how it works. The analysis predicts the upstream regulators that explain the observed gene expression changes in my data, but why that upstream regulator is not in the differentially expressed genes that I have?
For example I have TNF as a upstream regulator, but this gene in not part of my data. Why?
Thanks for the help.
Hello,
I was wondering which cell detachment technique would be most suitable for RNA extraction? I will be doing microarray analysis and need the gene expression profile to stay intact.
Thanks!
I am working on optimizing Gene selection in microarray data for Cancer Classification. I am going to use SVM in (libsvm) as wrapper approach to evaluate Gene subsets using 10 K fold cross validation.
Microarray data consider as huge dimensional data ( i.e Lymphoma data set consists 4026 Genes 'features' and 62 instances and 3 class labels).
Does libsvm support multiclass classification, As in my work, Lymphoma & MLL has 3 classes?
What is the appropriate svm type and kernal type and parameters for the chosen kernal (c,gamma, etc...) in LIBSVM multi class classification like microarray data?
Hii..everyone..
I am planning to conduct a QPCR gene expression analysis of some defense related genes in wheat. I am using Resistant and Susceptible plants to compare the gene expression. I am planning to do time-course study at 0hour(control), 12hpi, 24hpi and 48hpi time points after fungal inoculation. I am planning to have my time point 0 hour (un inoculated)as a control in my experiment.
My question is that can we use the 0 hour(uninoculated) samples as a control in my experiment. I am not using the Tween 20 in fungal suspension spraying?
I need to extract some human cell lines' RNA for Microarray analysis but I don't know what is the best way for purifying RNA from human cell line
Online searching, I found 'MagMAX™-96 for Microarrays Total RNA Isolation Kit'. but it needs some accessories currently we don have in our lab such as Magnetic-Ring stand.
Anybody has the experience on Microarray sample preparation?
I have microarray data with fold change values with cut-off value of 0.6 and above for upregulated genes, and -0.6 and below for down regulated genes. I want to ask- how can I convert fold change values to log2 Ratio and vice versa? Does excel has such feature? I also want to ask how can I generate heat map using my microarray data? What are the prerequisites like fold change or p-value or log2 ratio for generation of heatmap? What tools are required for it? Can it also be generated in excel, how?
hello ,
we have a study were we used two different chips and we need to analyze the data now ..
does any one know how to normalize microarray data from two different chips ?
We are starting some work with online available data file.
The Agilent micro-array result file shows several columns representing various parameters.
Please indicate whether "gMedianSignal" is the actual expression values? I suppose that the "gBGMedianSignal" should be subtracted to remove background effect.
Then the remainder is to be log transformed.
Please apprise whether this is correct or not!
Looking forward,
I have had a request for 15 µm sections from very precious TMAs. I don't think this will work, because thick sections usually roll, the individual cores usually fold if the section is too thick. I usually section them at 3-5 µm.
Does anyone have any experience of cutting TMAs thicker?
Hi,
I would like to know what is Rinmatched in a Microarray experiment? and what does it indicate when it is equal to 0 or 1? I am familiar with RIN that is RNA Integrity Number , but don't have a clue about Rinmatched. Is there any difference between these parameters? I have also enclosed a screenshot of GEO2R panel. What is the difference between two controls with RIn=6? One has Rinmatched=0 and another equals to 1
Many thanks
Mona Azodi

Which datasets are you exploiting to evaluate recommender systems?
I am using ImageJ with micro-array profile but it takes a lot of time to place the circles of the software on my object to measure the color intensity.
Basically I want that the software recognize all my samples in the same picture and allows me to measure color intensity. Thanks!