Science topic
Microarray Analysis - Science topic
Explore the latest questions and answers in Microarray Analysis, and find Microarray Analysis experts.
Questions related to Microarray Analysis
" Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length "
this error was showed me when I was trying to download gset data in R program but its seems there are some problems.
> gset <- getGEO("GSE77182", GSEMatrix =TRUE, AnnotGPL=FALSE, destdir ="data/")
Found 1 file(s)
GSE77182_series_matrix.txt.gz
Using locally cached version: data//GSE77182_series_matrix.txt.gz
Rows: 59899 Columns: 6
-- Column specification ---------------------------------------------
Delimiter: "\t"
chr (1): ID_REF
dbl (5): GSM2045612, GSM2045615, GSM2045616, GSM2045618, GSM2045620
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Using locally cached version of GPL21369 found here:
data//GPL21369.soft
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
I have a dataset with 149 columns as GSM ID's and first columns as a Gene Name (Screenshot attached). Total 20,000 rows (Genes) are present. How can I analyze the dataset to find the Biological pathways using KEGG or other pathway database.
All GSM ID's are lung cancer micro array expression data.
I know how to do Differential Gene Expression data and pathway analysis but don't know how to analyze this type of datasets.
Also comment if you feel that dataset is not correct or cannot be used to find the pathways or other information is required.
Any help will do great. Thanks in advance
for PCR analysis and western blot, is there a minimum required amount of average expression level of DEGs? For example, more than 200 or 500. I have various answers to that question and I am a little confused.
I have a microarray data and for analysing that I am using r software. I have a cancer data for each time pt.
I have 5 samples for to
5 samples for t1
4 samples for t2
3 samples for t3
All r disease model...I want to do microarray analysis to find out the differential gene for each time pt. Compare to t0. I want to take t0 as a reference for each time pt.. s
So how can I make contrast matrix for that.
I have a question about "network score" of IPA Network analysis. In many papers, the top 5 networks were listed in tables, while in these tables some network scores are high (around 50), but others are low (less than 20). We use the same method for network analyses, and got the impression that we can see tight association between genes when "the network score" is higher than 40. However, we have not found literature discussing the meaningful "network score" (we found one paper described that “the networks are selected if their score is higher than 21”). We would appreciate it if you could let us know information about such a meaningful network score or your impression/experience of the network score (for example, did you see tight association of genes when the network score was less than 20?).
I tried to find the data of all together from GEO but I couldn't, so what if I got the data of the breast cancer cell lines which are MCF7, MDAMB321 and SKB3R. Then, I got the data of the gene I want to check, which is HK2 and do the microarray analysis through R studio to check the differential gene expression of KH2 among the cell lines.
Dear fellow Researchers,
I am currently trying to analyze Affymetrix microarray data through dChip software and I have the input files - probe sequence and CDF for Rat 230 2, yet facing issues in obtaining expected results. Could anyone please help me out if gene info file is much necessary (as only CDFinput is mentioned as mandatory as per the protocol I have) and where to obtain them?
Thank you in advance
I would like to perform differential expressed Genes analysis of a NimbleGen data. I have dataset of 48 .pair files and 48 .calls files.
1) Can I perform DE genes analysis only with these data without using oligo package? ( my data contains single channel only 532 output)
2) what is the appropriate method for getting differential expressed genes?
3) When I transformed my pair info to xys file by extracting X-Y and signal values, those results are not accurate. The genes that were shown to be DE are not correlated with my experiment conditions.
Please help me
Thank you in advance
Best Regards
Tunc
Also want to add : Our pair and call file don't have header. That is why, we don't know any thing about the NDF file. We do know that our chip is 100718.hg18 but we don't know the correct GPL file. In the lab method, it was reportad that Nimblegen Human Expression Array 12 x135K chip used.
Does anyone know of a source for microarrays to study tRNA expression? I was told microarrays.com provided these. I have asked them directly but no reply so far.
On a given microarray design there are multiple different probes spotted for many genes. The (normalized) signals of the features (all referring to the same gene) often are quite different (log2 values can vary between 2 and 16, so essentially from "almost undetectable" to "completely saturated").
If a gene set analysis or an over-representation analysis is performed, there should be one value per gene.
How to select which signal to use for the gene? I don't feel good to take the average of all the multiple features, because they are often so different. Taking the highest signal only also seems to be wrong.
Any ideas?
The attached file shows a table with example data (from an Agilent Microarray) with 5 different probes addressing the gene "PRDM". The last 4 colums show the log signal intensities for 4 different samples. The values range from 3 to 10, so there is a more than 100-fold difference in the signal intensities between the probes.
hello ..
I'm trying to analysis two different microarray datasets from different chips using web-based tool.
i don't know how to do that .. should i use one off them only ?
or should i combine them using some kind of algorithm ?
thank you
I have analyzed the dataset of GSE38132 from gene expression omnibus. The data is from cell line breast cancer ZR-75-1 which comprises of 9 conditions with 4 replicates for each condition making it 36 samples. I used limma R package to normalize the data (quantile normalization). I noticed a great change in a group where three replicates shows similar expression where the 4th replicate of the same sample is different from all other three. I confirmed the expression from raw data by cross checking with the probe id and found 1 replicate is different from all other three. As a double validation I checked the normalized sample deposited in NCBI-GEO and I found the same. Is this possible?
Please see the heat map first four are replicates from the same sample
I have two batches of samples which were collected during two time period. If I perform batch correction to remove batch effect will it affect the downstream analysis of gene expression studies?
Generally in microarray differential expression analysis studies the lower bound for |logc| is chosen around 1 to make fold change 2 which sounds like a common sense. In other cases, when |logfc| >= 1 gives zero differentally expressed genes, logfc is chosen to get a "reasonable" amount of differentially expressed genes. It stands to reason, that a more rational way of choosing logfc would be to infer it from the microarray platform's accuracy or the quality of the hybridizations in the particular microarray-experiment or some other evidence-based criteria.
How to decide which logfc to choose?
Hello everyone,
Currently I am trying to do K - mean clustering on microarray dataset which consists of 127 columns and 1000 rows. When I plot the graph, it gives an error like "figure margins too large". Then, I write this in R console:
par("mar") #It will give current dimensions
par(mar=c(1,1,1,1) #Tried to update the dimensions
But; it did not work. So, can anyone suggest me another way of fixing this problem? (Attached the some part of code in below)
Thanks,
Hasan
--------------------------------------------------------------------------------------------------------------
x = as.data.frame(x)
km_out = kmeans(x, 2, nstart = 20)
km_out$cluster
plot(x, col=(km.out$cluster+1), main="K - Means Clustering Results with K=2",xlab"", ylab"", pch=20, cex=2)
>Error in plot.new() : figure margins too large
While examination of the differential expression of non coding RNAs from blood samples or cell cultrues and animal models, how many times should we repeat the microarray analysis experiments ?
Is the repetition of the microarray analysis change according to the examples? For example, we should repeat the experiments at least 3 times on cell culture model to identify non codings expression profiles, is that certain information?
I want to generate a heatmap and clustering in R for DEGs of 3 samples (foldchange), some 15000 genes. Everytime I run the command it shows
" Error in heatmap(data_matrix) :'x' must be a numeric matrix"
I have searched web and tried multiple commands but every time it gives the same error.
Hi all,
I am running a Random Forest –Mean Decrease in Accuracy algorithm for feature selection on my Microarray data in order to use the selected genes as a classifier to discriminate between 2 classes of cell lines. I am having problems to interpret the output information given by the algorithm. It gives me a small list of selected genes and for each gene there is a Pearson correlation value, a fold change value and a q-value (False Discovery Rate) .
The variable “class” is discrete (normal vs disease), so what does the Pearson correlation mean in this case?
Should I take the q-value showed as a multiple test correction and give less importance, or exclude, the genes that showed a q-value (FDR) higher than 0.2 (or any other pre-determined cut-off for significance)?
I would appreciate any suggestion on how to interpret the results of a RF-MDA for feature selection algorithm.
I'm looking to take an Arabidopsis RNA-Seq differentially expressed gene set and search it against other publicly available RNA-Seq (and possibly microarray) experiments to find the experiments that found the most similar patterns of deferentially expressed genes.
Does anyone know if a tool that enables this has already been created?
Hello,
I am trying to do normalization the data of GSE8397 with MAS5.0 by using R:
setwd("D:/justforR/GSE8397")
source("http://bioconductor.org/biocLite.R")
biocLite()
library(affy)
affy.data = ReadAffy()
However, the data used to 2 platforms: Affymetrix Human Genome U133A and Affymetrix Human Genome U133B Array.
The code gave me the warning message: "Error in affyio::read_abatch(filenames, rm.mask, rm.outliers, rm.extra, :
Cel file D:/justforR/GSE8397/GSM208669.cel does not seem to be of HG-U133A type"
So, how can I keep normalizing the data when they are in both U133A and B? Should I try another method of normalization (RMA or GCRMA?)
DO you have any ideas about this problem?
Thank you so much!
As example i have download .cel file now how can i get the data regarding upregulation and downregulation? and what is the principal behind this values?
I am trying to perform miRNA microarray analysis for a human cell line. Though the samples have good A260/A280 ratio (2 and above) and good RIN (most above 8) the A260/A230 ratio is low in many of the samples (as low as 0.3 in some samples).
So the question is, is it wise to proceed with these samples? can the A260/A230 ratio affect the quality of the microarray (I will use Agilent microarray chip)?
PS: The analysis will be done with a company and not in-house. Qiagen miRNeasy kit was used to extract the RNA, a DNASE treatment step was included.
I am doing differential gene expression analysis between control and disease tissue. It is a skin disorder. So the normal tissue and lesion tissue are taken from same persons in the microarray analysis. the normal tissue is considered as control and the lesion is considered as disease. My doubt is to whether to use paired or unpaired t-test. I think that since the normal and disease tissue are from the same person the samples have pairing relationship and should go for paired test. Am i right ? Or should I go for unpaired t test ?
I am planning to extract the whole serum protein from serum samples for further microarray analysis. what is your recommendations please. a good protocol is highly appreciated.
Thanks in advance
Hi there
I would like to know and to get your feedback about you favourite Gene-enrichment analysis software based in a graphical environment, preferably on-line.
Please give me your feedback about what you like most !... Thanks
All the best
Paco
Can anyone suggest me collection numeric variables for deep learning? I have set of features (DEGs with their fold change value) from micro array. I want to prepare training as well as test set for deep learning. For this atleast three numeric variables are needed, any suggestion?
Thanks in advance
Happy Sunday everyone,
I am trying to calculate row - wise mean and variance in R and then I will sort them. I used to "Absent/Present" calls from the Affymetrix algorithm to flag genes with questionable expression levels, but there are many NAs in the dataframe. So, I have to remove those genes which have questionable expression levels (NA's) and do mean - variance calculations. What I did is that;
library(data.table)
dat <- as.data.table(df)
rowvar <- function(x, na.rm =F) rowSums((x - rowMeans(x, na.rm=T))^2, na.rm=T)/(rowSums(!is.na(x)) - 1)
dat[,`:=` (variance = rowvar(.SD, na.rm = T), mean = rowMeans(.SD, na.rm = T))]
But; it gives an error like, "Error in rowMeans(x, na.rm = T) : 'x' must be numeric". So, how I can handle with this error?
I have attached the document that I am currently working on it.
Thank you for your interest.
Hasan,
Hello everyone,
I have a Excel spreadsheet which contains 17 columns, 54,675 rows and I need to calculate each of 54,675 row's mean and variance in R Studio. After that, I have to add each rows' mean and variance as a new column in Excel spreadsheet. So, how I can deal with this issues? I suppose apply() function works but somehow I could not do it. Any suggestions?
Thanks,
Hasan
I want to draw a trend of the expression of genes which i have and for doing so i want to know whether the samples in GEO series matrix (I mean GSMxxxxxx) are comparable with each other (sth like normalization and these processes happened to them) or not? and also whats the meaning of this caution in GEO that i attached to my question?
thanks
Hello everyone,
I have to analyse data from Affymetrix microarray (Human Genome U133 Plus 2.0 Array) with Bioconductor and it is the first time I am using Bioconductor. I got .cel files from NCBI GEO but I could not get the chip description file. So, how I can obtain a CDF?
And one more thing, when I check number of genes in my dataset, the R program shows that it contains 54,675 genes. However; this number should be between 20,000 - 25,000. So, I am wondering that there might be any replica of them?
Any suggestions and someone can help please?
Thanks,
Hasan
Am trying to design a probe for microarray analysis. For the control genes i prefer constitutively expressed House-keeping genes. If anyone could guide me with designing a probe and primer for a particular gene say for example 23s rRNA it would be helpful. manuscripts state the sequence and corresponding primer pairs but no description about the methodology involved in designing. Can anyone could help me with this?
Dear all, I am totally new for RNA-seq data analysis. Here is my dataset background. There are 3 replicates for Normalized RNA-seq data in 2 conditions. I first want to check How is gene expression profile differences from 2 conditions. So I combine 3 replicates (using mean across 3 samples Q1: Is this correct way to do so?) and check MA plot (it looks fine). However, when I check the MA plots for each sample, I see clearly two clusters of gene expression levels. I am wondering is any expertise can explain me this? Is that because of experiments issue or nature of data? Thanks a lot in advance!
Hello.
I have been working on a transcriptomics data from NCBI's GEO for a while now. However, I have recently been made aware of this phenomenon [see image attached]:
- When plotting the probes (y-axis) against the subjects (x-axis), the heatmap generated shows a very large area (around 10,000 probes) with an intensity lower than their vicinity, both for control and for diseases individuals;
- This effect is also (more visible) on the other image, that shows that for around 10,000 probes (x-axis), the intensity (y-axis) is lower than the average intensity.
My question is: do you have any idea what this could be due to? Is this "area" of lower intensity a common thing in microarray analysis? Should I exclude this area from the analysis?
Thank you in advance
I am working on some of the microarray data of some of the genes I am intersted in. I wanted to know the expression levels of some of the genes in different cell types. People have deposited data in duplicates or triplicates for every particular cell types and all the microarray experiment was done in the same kind of microarray chip. Now I have to get a mean value for every cell types and compare it with other cell types. Till now I have normalized the data using gcrma package. Now I think I have to normalize the values with any house keeping gene values and proceed furthur. I am not sure how to proceed. I need help. Please guide me through this.
I also have to find the gene coexpression values for all the gene pairs. I calculated the pearson correlation coefficient from the normalized values from all the data sets. Is that okay or I have to calculate PCC only after normalizing the values by any house keeping gene value. Please help me. Thank you
I need to compare a gene's expression between tumor site and matched normal tissue from TCGA database. I've tried using Firehose to search differential expression of the gene among different types of cancers. The problem is that the amount of tumor samples is not equal to the amount of normal samples. But I need to compare matched tumors and normal tissues. Is there any tools to do that?
I want to convert expression level value to z-score (mean-x/sd). I have two type of samples in my microarray (Affymetrix GeneChip Human Genome U133 Plus 2.0) (31 normal vs 30 case) Do I have to calculate the mean and sd for the Normal samples only and use z-score formula then do it for case samples or I have to find the mean and sd for the whole samples ?
From microarray or RNA seq expression data, for valadating the data do we need to select the genes randomly or we can choose what matters to us?
While working with gcrma I found that the package ‘hgu95av2cdf’ is not available (for R version 3.4.0).
So I would like to know a stable version of R for which all packages from Bioconductor are available
Hi experts,
Since RNA-seq with NGS technology is changing gene expression studies with great advantages. We still observe a lot of studies using microarray (i.e. Affymetrix Gene Atlas, etc.) techniques and even qPCR (to a certain extent).
I personally believe and biased towards NGS technology and RNA-sequencing for gene expression studies. Not only that, RNA-seq has the ability to discover novel gene transcripts to open a potential new field of study.
However, RNA-seq can be costly, but I personally believe in the end, it's better than microarray. So in what instances can I say that microarray is better than RNA-seq? I am working with primary cells, cell lines, and mouse as my animal model for brain-related studies.
I am looking forward to hearing your opinion.
Hi!
Does anybody know a programme/software/website to perform HeatMaps without using the R language??
I have a set of 3099 genes up regulated and a set of 2686 genes down regulated under my unique experimental condition and I would like to compare them.
Thanks a lot!
So, basically there are abundance values for my first 3 sets of experiment that have variable control values for all peptides within each protein. How do I make use of this data to get significant peptides or calculate the fold change statistically? I was wondering if there is a way to do this without control abundance data. Also, should I use normalisation techniques, and which one?
I am new for RNA-seq analysis. I have normalized data rather than raw count for RNA-seq, and i want do the differential expression study (negative binomial model). Is anyone can recommend one R package to handle this kind of data? As I study from DESeq, it is only accept raw count data. Any answer is appreciated!
Hello everyone. I have normalized reads of RNA-seq data and I am trying to generate a venn diagram of upregulated and downregulated genes. I have three replicates each of control and test samples. I tried to search online but couldn't decide which tools would be better to use. Can anyone please suggest me any windows based offline/online tools to generate venn diagrams from RNA-seq data? Thank you very much.
Raghu.
I have used the MultiNA to quantify RNA for the first time.
Could someone help me interpret the output results? Does it have an equivalent number to RIN?
Can I trust the "Total conc" readout?
Many thanks
Our GeneSpring user license has expired so I am investigating whether there is an appropriate online open source application I can use to analyse microarray data.
Hi,
I have 33 ligands in total, which were analyzed through SAM. Reported in an article entitled "Analysis of the major patterns of B cell gene expression changes in response to short-term stimulation with 33 single ligands". I selected 10 ligands from above data and wants additional analysis but they didn't provide the RAW data/CEL, I downloaded the Processed data from "ArrayExpress". I reviewed the limma tutorial and want to make sure the downloaded data file for limma. I need a starting point for analysis through limma, I attached one of processed data file as an example, Can I use processed data files as an input for limma and which type of analysis will be performed? I will be waiting for your valuable answers.
Thank you,
I analysed PPI network after integrated gene expression data from alzheimer's disease experiment within PPI network and reveals some sub network. First, I used (limma package) for Differentially Expressed Gene analysis. Second, I mapped DEG genes on the PPI network and assign the gene fold change value to corresponding proteins. Third, I search the network by selected my candidate gene and reveals sub-networks. I scored them by my formula, then I merge the top scoring sub networks.
Now, I want to validate my results (merged sub network) and I have no idea how to do.
Could anyone help me or suggested a method to validate my outcome please? I will highly appreciated
Dear colleagues, I have Affymetrix microarray data, from endothelial cells, co-cultured with mononuclear cells in conditions of normoxia, hyperoxia and hypoxia. Control cultures of endothelial cells are also cultured (alone without mononuclear cells) in these same conditions. The affymetrix microarray data have been processed with the Expression Console(Gene level >> extended:RMA-Sketch) and filtered. I wish to use excel to elucidate differentially expressed genes.
Please, what steps do I need to take to proceed in the elucidation of these differentially expressed gene using excel? I am new to high-throughput data analysis.
Thank you in advance for your response.
Hi all,
Performing RNA-Seq data sets needs to know which the most accurate and reliable platform to go with. Could you suggest such pipeline?
Note// I have good experience with the Tuxedo package (Bowtie, Top Hat, and CummeRbund) in addition to EdgeR,
Thanks
I have access to 3 experiments from GEO. The sample type for one experiment is blood and other two types is skin. I have the RPKM values of control and patient samples from these tissues. The platform for all these experiments is same.
How to proceed with meta-analysis for these experiments? There are very few papers regarding the protocol. It will be a great help if I can get any pipeline.
I am doing differential expression studies using iTRAQ. I have problems with identifying the fold change / fold enrichment on the downregulated iTRAQ ratios. For example, iTRAQ ratio for 117:114 shows 3.256, which means that it shows upregulation of 3 fold change, but how about downregulated ratios since it shows value less than 1, for example 0.2679. Is it possible for us to calculate how many fold change from the iTRAQ ratio with PVal (ratio) given? I am using ProteinPilot Software.
Thanks in advance!
While finding the differently expressed genes from the microarray data, which are the necessary parameters that we have to taken into account for a more satisfying result? Which are the intervals(maximum value and minimum value) can be set for FDR, fold change etc. in accordance with log2 normalized p-value.
Hello.
I tried to perform meta-analysis of differential gene expression data using GEO.
A-madman program looks like fancy. However, it is not working in the process.
The error occurred when I perform click analyze after grouping on Basket tap.
Any one help this program or recommend another program or R-package?
Thanks in advance
Dear All,
I am trying to see which CpG sites (with its associated genes) are involved in particular pathways and diseases, and get an overview of the functions of these genes.
Currently, I have tried to import my dataset (>800k CpG sites total) which shows the following: 1) each CpG site as the ID, 2) p-value, 3) q-value, 4) fold change and 5) difference. My data sets are quite large with >200,000 CpG sites (the row limit of IPA) - is there a way to import a file this large?
I have also tried importing a file with more specific CpG sites of around 1000 CpG sites but it is not being mapped properly by IPA as I have 0 mapped sites due to errors or possibly I am using the wrong template (i.e. not expression data)?
I think the errors are coming from my formatting in my excel file to IPA, where either the headings are incorrect and the way I am assigning each header/observation is incorrect i.e. I think I set my Identifier as Illumina (which is what I used to get my CpG methylation data), but I do not know what other options I can choose instead of this. IPA also showed errors first with 'no IDs matched to particular genes',and then with 'removing fold change between 1 and -1'.
In summary, I would really appreciate any tips/guidance with uploading CpG methylation data into IPA.
Thank you very much.
I am trying to find out expression profile of my candidate genes from RNAseq or CAGE data from cancers using publicly available RNA seq data
I prefer any online search tools at this stage for a quick analysis.
I have some genes with their FPKM values now i want to convert this value in to log2 fold change.
Hi Everyone,
I'm using microarray data to identify DEGs and map its PPI network but now I want to use multiple datasets reported by different studies in Acute Myeloide Leukemia (AML). Please specify a good methodology step by step and also please specifically I can Merge different datasets. Please also need some info regarding the requirements for merging.
Thanking you in Advance.
I heard about, scanned with microarrayed slide like show image below, black background+fluorescence dots. But what I saw is all-of-white slide with black frosted ends(conatantly black, below). I have no photos on my PC, but I saw white slide+black end. I don't know about why scanned photo is only block&white. plz give your opinion..
Hello guys!
We have several transcriptome data sets, which came from the samples that were treated at low temperature for different time length, let’s say at 4 ℃ for 1, 3, 5, 7 hours. After analyzing those data we have got deferentially expressed genes (DEGs) for each treatment time point. For example, when sample treated with4 ℃ for 1 hour, we got 2000 up-regulated and 3000 down-regulated genes; For 3 hours, 1500 up and 2500 down-regulated genes; 5 hours …; 7 hours ….
My question is how I can analyze these DEGs further to get certain portion of genes which are really crucial at low temperature in this sample?
And is there tools to this work?
By the way, my data sets come from RNA-seq, and the fold-change value of each unigene at different treatment time point is calculated with DEseq.
Thanks in advance.
well... I ruined my microarray... So, I want to ask something for you.
Is that OK I store my buffers in RT?
Pre hybridization buffer-5X SSC/0.1% SDS/1% BSA
Hybridization buffer-5X SSC/0.1% SDS/50% Formamide
Low stringencity wash buffer-1X SSC/0.2% SDS
High stringencity wash buffer-0.1X SSC/0.2% SDS
0.1X SSC
50% DMSO
this is my buffers. I store these buffers at 4 celcius now, because BSA is store at 4 degree. but some solute cannot solve at 4 degree.
Currently differential gene expression identification usually using RPKM, TMP or TMM, however the sequencing depth is controlled by people and all the quantification are relative. To compare between samples, some methods use the distribution based normalization, like DESeq2 and edgeR. The problem is that these methods are not that correct too. While we sequence a low expression samples with high depth and a high expression sample shallow, all these methods seems can not detect the true difference. One of the idea is that if there are a group of universal genes with unchanged expression level, these genes should be taken as the baseline to perform normalization and compare between samples.
I have noticed about one paper using this idea to normalize the gene expression of plant tissue, they established the stable expression database. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5178351/.
But for prokaryotic microorganisms, it seems that there does not have any stable expressed gene set yet.
Any comments will be appreciated.
Hi all,
I want to analyse an RNA-seq data set from a paper so I've got the data from GEO. In the data, they have a column called "unique hits" for each ID that I think mostly relevant to the next step. However, I just don't know if they are the values that I can use to analyse the gene expression level.
I've used log_RMA and RMA in the microarray before to analyse gene expression but I don't know if these are the same.
Thank you!
Hello friends
I am doing micro array data analysis(HGU1333plus2), i got the expression matrix file by using gcrma , but the some probe is represent multiple gene like this . how can we treat this, then some probe is not matched it shows NA can delete it , next i take this file for analyze WGCNA , please share your knowledge ,
221251_x_at
1
221251_x_at
INO80B /// INO80B-WBP1
NA
65133_i_at
1
65133_i_at
INO80B /// INO80B-WBP1
NA
223072_s_at
1
223072_s_at
INO80B /// INO80B-WBP1 /// WBP1
NA
1559716_at
1
1559716_at
INO80C
INO80C
229582_at
1
229582_at
INO80C
INO80C
220165_at
1
220165_at
INO80D
INO80D
I have obtained the microarrays data for the large cohort (both sexes). I have performed initial GWAS for all the SNPs from all the chromosomes to check the genetic association with trait which I am interested in. I found some regions but the most interesting is the one in X chromosome (in my opinion it is not a fake). However, I am a bit confused because I do not know - can I? and how can I? - analyse these data. for women there is standard 3 alleles distribution but for men, it possible to have only 2 variants: presence of allel or lack of allel.
- should I divide cohort for separate analysis for men and women subsets?
- what kind of statistics should I use for men, because I think there is impossible use simple MAF? and are the statistics results only for men subset from PLINK are reliable?
- or do you have any more advice?
I would be very grateful for all you help.
I performed the Cell Cycle Control Phospho Antibody Array (http://www.fullmoonbio.com/product/cell-cycle-control-phospho-antibody-array/) with 7 control and 7 treatment samples. To identify the signal intensities I used GenePix Pro 7 and created .GPR files.
How do I continue with my statistics? I want to normalize the data and calculate z-scores or SAM. I can normalize tha data in Excel, but I am sure there is a more convenient way to proceede. I read about the program Prospector from Invitrogen and the protMAT website, but Prospector is not working with my .GPR files.
I am new to protein array and microarray research and would be very happy for any suggestions.
Thank you so much!
I am working on a biological dataset which is not following ideal normal/gaussian distribution.. Which statistical test and technique would be best to analyze this dataset ??
Hi,
First let me start of by saying that working with Proteomic datasets is quite new, and while I find it terribly interesting I am currently having trouble finding some answers related to my dataset.
Very briefly, my question would be, how and if I can use "Raw intensities" to examine protein expression and interactions (i am using perseus). I am working with raw intensities as I've been told that LFQ intensities cannot be used if there are large variations of protein identifications between samples, which there in my case is. Nevertheless, first let me start of by describing my dataset before moving on to the specific questions I have.
Dataset
* I am comparing four different methods for isolation of the same plasma constituent.
* There are three unique biological samples (3 different controls) in each isolation method (12 samples).
* Additionally, all four methods are performed as technical duplicates, meaning I have a A and a B series, both on the same dataset (22 samples)
Questions
1. First and foremost, am I even able to do statistical analysis on my dataset?
2. Should I normalize my peak intensities? What I've understood from my reading, is that raw intensities only somewhat correlate with actual abundance and if one want to analyse raw intensities one need to use some form of peak intensity normalization. I've been looking at a normalization method called EigenMS and Global normalization, and while global normalization seems simple enough my thought is that due to large differences between isolation methods, this form of normalization cannot be used. My question would then be, should I normalize my data, and if yes, what would be the best method?
3. How should I group the different methods when analysing? Currently I am grouping all three controls per isolation method (6 with technical duplicates) into the same group using the annotation rows feature.
Any help is greatly appreciated, and if there is any features of my dataset I forgot to tell, please dont hesitate to ask.
We have done miRNA Microarray using Agilent Human miRNA Microarray Kit
Ver. 3.0 (Cat No: AGT-G4470C). I have .gpr files of my samples but I could not analyze their miRNA profile on genespring. How can analyze them on Genespring?
In the microarray database, when 2 probe set ID for one gene showed significant difference. How to interpret the expression of the gene by the result?
I observed this in one of my microarray experiments in which the first two gene got upregulated and last gene was downregulated. these three genes belonged to the same operon. Kindly suggest