Science topic
Bioconductor - Science topic
Explore the latest questions and answers in Bioconductor, and find Bioconductor experts.
Questions related to Bioconductor
Hi good people,
I am trying to analyze cytoF data with the phonograph algorithm of Cytokit. I faced a lot of issues running the code in R. Is there anyone who has performed the same analysis? I would like to know which version of R and Bioconductor did you use.
Thank You,
Sadi
I'm aware that Biobase is part of the Bioconductor project and that various other packages use it. But what are the functions of this package, and what kind of data do we use it for?
When I want to install biocondutor pakage, there was a problem:" package ‘bioconductor’ is not available (for R version 3.6.3)" , anyone can help?
Dear all, hope you all are doing well,
I'm installing edgeR in R version 3.4.4 (2018-03-15) in ubuntu 18.4 by running the command..
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
then
BiocManager::install("edgeR")
but I'm getting the error which is as follows...
if (!requireNamespace("BiocManager", quietly = TRUE))
+ install.packages("BiocManager")
> BiocManager::install("edgeR")
'getOption("repos")' replaces Bioconductor standard repositories, see
'?repositories' for details
replacement repositories:
CRAN: https://cloud.r-project.org
Bioconductor version 3.6 (BiocManager 1.30.16), R 3.4.4 (2018-03-15)
Installing package(s) 'BiocVersion', 'edgeR'
Error in download.file(url, destfile, method, mode = "wb", ...) :
unused argument (checkBuilt = FALSE)
In addition: Warning messages:
1: In .inet_warning(msg) :
package ‘BiocVersion’ is not available (for R version 3.4.4)
2: In .inet_warning(msg) : dependency ‘locfit’ is not available
Installation paths not writeable, unable to update packages
path: /usr/lib/R/library
packages:
boot, class, cluster, codetools, KernSmooth, lattice, MASS, nlme, nnet,
rpart, spatial
Warning message:
In .inet_warning(msg) : download of package ‘edgeR’ failed
Please help and let me know how can i solve the above issue.
thank you.
dear all, hope you all are doing well, I just installed R version 4.1.2 in ubuntu 20.4 and after that I have done the installation of edgeR by running the following command..
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
then..
BiocManager::install("edgeR")
After that it require a limma package, so when I was installing it throwing an error, which is as follows..
Warning message:
package(s) not installed when version(s) same as current; use `force = TRUE` to
re-install: 'limma'
Please help and let me know to find solution.
thank you
I am unable to open the raw data files of microarray data
into R program. I contacted
I checked for a few resources online, bioconductor packages
also EBI training courses
etc but it does not open the files.
Can anyone suggest to me what will be the best way to learn how to use the Perl or R bioconductor? Can anyone suggest a good link so I am able to learn on my own? Though my PhD was focussed on immunology, I want to learn how to use these two tools in the field of epigenetics?
I am analysing a 14-parameter flow cytometry panel in FlowJo v10.3 and would like to clean up the data before analysis. There are two plugins (flowClean and FlowAI) which use R to get rid of bad quality data (e.g. interrupted flow or signal acquisition issues).
Despite following the tutorials, I am getting various error messages including:
"Could not create Gating-ML elements:
gating:RectangleGate
The target sample does not have some parameters referenced in the GatingML definition"
When this happens, I get some basic plots, but the programme does not split my "good events" from my "bad quality" events.
Alternatively "FlowJo could not derive the expected parameter" (ie the calculation fails totally).
Can anyone tell me why this is happening and how to fix it please?
I have a dataset with 149 columns as GSM ID's and first columns as a Gene Name (Screenshot attached). Total 20,000 rows (Genes) are present. How can I analyze the dataset to find the Biological pathways using KEGG or other pathway database.
All GSM ID's are lung cancer micro array expression data.
I know how to do Differential Gene Expression data and pathway analysis but don't know how to analyze this type of datasets.
Also comment if you feel that dataset is not correct or cannot be used to find the pathways or other information is required.
Any help will do great. Thanks in advance
Hi everyone, I have a transcription dateset in the format HG-U133_Plus_2 affymatrix and I want to convert it into official gene symbols. I have tried bioconductor "hgu133a.db" library but it doesn't work. Can somebody help me out understanding how to do it? I put here some of the probes I'm unable to convert. Thanks in advance!
1552256_a_at
1552257_a_at
1552258_at
1552261_at
1552263_at
1552264_a_at
1552266_at
21552269_at
1552271_at
1552272_a_at
1552274_at
1552275_s_at
1552276_a_at
1552277_a_at
1552278_a_at
1552279_a_at
1552280_at
1552281_at
1552283_s_at
1552286_at
1552287_s_at
1552288_at
The Bioconductor package and some other libraries need this version. So if someone uses or knows about the alternate source kindly help.
I am trying to install Bioconductor packages to open CEL files and analyze the raw data files generated by Affymetrix microarrays. I found some workflows on the Bioconductor website but I could not install the packages, maybe due to the different Bioconductor versions. I would greatly appreciate if anyone can give some suggestions about the workflows that I should use and/or how to download the old Bioconductor versions.
Dear colleagues,
I am looking for a python library equivalent to the one found in the bioconductor annotation package for TxDb object "TxDb.Hsapiens.UCSC.hg19.knownGene". Also, I am interested to know how essential this particular package is to you and among your fellows in the field of Genomics or in similar disciplines?
The biocondctor package direct URL: https://bioconductor.org/packages/release/data/annotation/html/TxDb.Hsapiens.UCSC.hg19.knownGene.html
much appreciated,
Nawaf.
I have more than 500 Datasets of RNA-sequencing data (both FASTA and FASTQ format) and I'd like to analyze gene expression and differentially expressed genes.
files with FASTA format are in my PC (windows OS) and with FASTQ format are imported to the galaxy website (usegalaxy . eu).
I'm not familiar with gene expression analysis (GEA) and recently installed R and I'm working with Bioconductor packages (like DESq2,edgeR, biobase and etc..) to learn how to use them for GEA. IDK how much, but it seems it takes a long time to learn and use them.
Here my question is could anyone let me know what is the best and fast way for GEA.
is R the best software for GEA, is yes, is any simple tutorial for GEA by R?
Regarding the huge mass of RNA-seq data and my pc may not be able to analyze them, is there any software on Galaxy website for GEA.
Any guide is warmly apriciated
tl;dr: Why data linearization is applied in the identification o differentially expressed genes/proteins?
I am new to data analysis of big data like proteomics. As far as I know, a simple t-test is not enough as there is a high chance of false positives. I've been reading and it seems that Limma is a good package with better statistics to be applied in the identification of differentially expressed proteins (and genes). Most of the papers apply linearization in the process of identifying the genes but I would like to understand why this step is necessary.
An example of a R code that I generally see people using is this one:
design <- model.matrix(~factor(c(2,2,2,2,1,1,1,1)))
Thank you in advance!
Hi, I have send my RRBS raw data to 2 different bio-informatics. The gene lists i got as result differ extremely, with an overlap of less than 25%. The only step i detect as different in the pipelines they used is the Methylation calling step: one used Bismarck, and the other used Bioconductor. How should i choose the right one? THanks in advance
Hello everyone,
Currently I am trying to do K - mean clustering on microarray dataset which consists of 127 columns and 1000 rows. When I plot the graph, it gives an error like "figure margins too large". Then, I write this in R console:
par("mar") #It will give current dimensions
par(mar=c(1,1,1,1) #Tried to update the dimensions
But; it did not work. So, can anyone suggest me another way of fixing this problem? (Attached the some part of code in below)
Thanks,
Hasan
--------------------------------------------------------------------------------------------------------------
x = as.data.frame(x)
km_out = kmeans(x, 2, nstart = 20)
km_out$cluster
plot(x, col=(km.out$cluster+1), main="K - Means Clustering Results with K=2",xlab"", ylab"", pch=20, cex=2)
>Error in plot.new() : figure margins too large
We are doing a study analyzing the expression of certain genes and correlating that with response to chemotherapy. So far I have been manually going through every dataset on the NCBI website and teasing out which ones have "therapy response" or any variation of that as a variable. Is there a more efficient way to do this? Like a query to filter out highthroughput/microarray data that also contains therapy response/pathologic complete response/etc. Any help would be greatly appreciated. Thanks.
Hi.
I have two conditions of my Gene Expression microarray data sets. The control/untreated and the treated sample. Each condition has two replicates. However, the first and second replicates were prepared at different times.
I analysed the data using the two replicates in Agilent GeneSpring. I wanted to perform the Volcano plate. I chose the Moderate T-Test and Benjamini-Hochberg FDR, (FC >2) however I did not get any significant entities using the corrected P-value. I changed the corrected P-value cut off until P>0.98 then I get some entities, but still there are very few, less than 5.
I thought this problem could be due to the batch effect as the replicates were prepared at different times.
And since I'm still new in preparing the samples, the handling was inconsistent.
I believe that the the best thing is to run again the replicates at the same time, but hopefully that will be the last thing that I have to do since it is costly to repeat the microarray again.
I need some suggestion on how to correct this.
Is it fine to report the entities without performing the multiple testing and corrected P-value? How should I resolve the problem of false positives if I don't do the correction?
Besides Agilent GeneSpring, is there any recommendation to perform the analysis (Agilent datasets), maybe by Bioconductor R platform to tackle this problem?
Thank you.
Good day!
I'm trying to carry out co-expression analysis using CEMiTool after limma preparation of microarray results.
It's pointed in CEMiTool userguide that one should use unprocessed expression data.
Experimentally I found that there is no big difference between evaluated co-expression modules if I change FDR p.value in topTable function from 0.05 to 0.1.
But it appears that there really is a big difference whether to use topTable(...adjust.method = "BH" or "none" before submitting data to CEMiTool - the genes changes their positions in co-expression modules.
Should I use the Benjamini-Hochberg correction? Or maybe I should not filter data by p.values and correct it at all?
The values I use are average Lfc's, each from 3 repeatings.
> source("https://bioconductor.org/biocLite.R")
Bioconductor version 3.8 (BiocInstaller 1.32.1), ?biocLite for help
Warning message:
'BiocInstaller' and 'biocLite()' are deprecated, use the 'BiocManager' CRAN package
instead.
> if (!requireNamespace("BiocManager", quietly = TRUE))
+ install.packages("BiocManager")
> BiocManager::install("BiocInstaller", version = "3.8")
Bioconductor version 3.8 (BiocManager 1.30.4), R 3.5.2 (2018-12-20)
Installing package(s) 'BiocInstaller'
Warning: package ‘BiocInstaller’ is in use and will not be installed
installation path not writeable, unable to update packages: class, codetools
Need Suggestions
I am running an R script that downloads and preprocesses all the available methylation data sets from TCGA. I'm using the Bioconductor package MethylMix for this. However, when I try to process the 450K breast cancer methylation data set (size ~13GB), I get a "Cannot allocate vector of size 12.8 GB" error.
I am running R 3.4.0 on 64-bit x86_64_pc-linux-gnu using my school's computing cluster, and each node has the following properties:
- Dual Socket
- Xeon E5-2690 v3 (Haswell) : 12 cores per socket (24 cores/node), 2.6 GHz
- 64 GB DDR4-2133 (8 x 8GB dual rank x8 DIMMS)
- No local disk
- Hyperthreading Enabled - 48 threads (logical CPUs) per node
so it seems as though there should be enough memory for this operation. The operating system is Linux, so I thought that R will just use all available memory, unlike on Windows? And checking the process memory using ulimit returns "unlimited." I am not sure where the problem lies. My script is a loop that iterates over all cancers available on TCGA, if that makes any difference.
Hi all
Actually i have to design sgRNA using CRISPRseek and screen the whole genome for off-target analysis. Unfortunately, for banana plants BS genome Packages is not available with Bioconductor website (https://bioconductor.org/packages/release/bioc/html/BSgenome.html). Therefore, can anyone suggest alternatives or Packages. Or else how to create BS genome Packages for Banana using R script. I have no idea about creating a new BS genome Packages
Thanks in advance
Our research group works with viral detection in human samples through PCR-based methods. We use to sequence the PCR amplicons to confirm the specific amplification of viral sequences in a Sanger-based platform (Applied Biosystems 3500 genetic analyzer). When analyzing the electropherograms generated it is common to observe degenerated bases (usually Ys and Ws) that seems to be not generated by errors in sequencing process, but to rather represent intra-host variability in the viral sequences.
This raised our interest in further investigate these candidate variations and search for possible active mutational processes, specifically we are interested in quantify the possible influence of APOBEC cytidine deaminases in generating these variations (by searching for mutations in APOBEC specific recognition sites, namely 5'-TC-3' over random candidate mutations). Is there any software, package or pipeline adapted for this analysis?
I've read about and downloaded the Minnor Variant Finder software (MVF, from Applied Bisystems), but it seems to be not suited for this question, once it was developed to identify low-frequency human variants and requires the parallel sequencing of a control sequence, which I don't know what could be in my case.
Thank you!
Hello everyone,
I have to analyse data from Affymetrix microarray (Human Genome U133 Plus 2.0 Array) with Bioconductor and it is the first time I am using Bioconductor. I got .cel files from NCBI GEO but I could not get the chip description file. So, how I can obtain a CDF?
And one more thing, when I check number of genes in my dataset, the R program shows that it contains 54,675 genes. However; this number should be between 20,000 - 25,000. So, I am wondering that there might be any replica of them?
Any suggestions and someone can help please?
Thanks,
Hasan
Hello.
So I have 2 large fastq files that I need to analyze and compare for differential gene expression in R.
1. How would I go about opening them to see how they look like?
2. What packages can I use to analyze and compare them? I tried bioconductor, but it does not work because these files are too large.
Thanks for your help in advance!
While working with gcrma I found that the package ‘hgu95av2cdf’ is not available (for R version 3.4.0).
So I would like to know a stable version of R for which all packages from Bioconductor are available
Hi,
I have 33 ligands in total, which were analyzed through SAM. Reported in an article entitled "Analysis of the major patterns of B cell gene expression changes in response to short-term stimulation with 33 single ligands". I selected 10 ligands from above data and wants additional analysis but they didn't provide the RAW data/CEL, I downloaded the Processed data from "ArrayExpress". I reviewed the limma tutorial and want to make sure the downloaded data file for limma. I need a starting point for analysis through limma, I attached one of processed data file as an example, Can I use processed data files as an input for limma and which type of analysis will be performed? I will be waiting for your valuable answers.
Thank you,
Dear All,
I am trying to see which CpG sites (with its associated genes) are involved in particular pathways and diseases, and get an overview of the functions of these genes.
Currently, I have tried to import my dataset (>800k CpG sites total) which shows the following: 1) each CpG site as the ID, 2) p-value, 3) q-value, 4) fold change and 5) difference. My data sets are quite large with >200,000 CpG sites (the row limit of IPA) - is there a way to import a file this large?
I have also tried importing a file with more specific CpG sites of around 1000 CpG sites but it is not being mapped properly by IPA as I have 0 mapped sites due to errors or possibly I am using the wrong template (i.e. not expression data)?
I think the errors are coming from my formatting in my excel file to IPA, where either the headings are incorrect and the way I am assigning each header/observation is incorrect i.e. I think I set my Identifier as Illumina (which is what I used to get my CpG methylation data), but I do not know what other options I can choose instead of this. IPA also showed errors first with 'no IDs matched to particular genes',and then with 'removing fold change between 1 and -1'.
In summary, I would really appreciate any tips/guidance with uploading CpG methylation data into IPA.
Thank you very much.
I would like to integrate mRNA expression data (microarray) with PPI network. I tried to use the bioconductor package "STRINGdb" but when I tried to get the network. I get Error, it is only support 200 nodes.
Is there other methd to integrate the data??
Thanks in advance
Hello friends
I am doing micro array data analysis(HGU1333plus2), i got the expression matrix file by using gcrma , but the some probe is represent multiple gene like this . how can we treat this, then some probe is not matched it shows NA can delete it , next i take this file for analyze WGCNA , please share your knowledge ,
221251_x_at
1
221251_x_at
INO80B /// INO80B-WBP1
NA
65133_i_at
1
65133_i_at
INO80B /// INO80B-WBP1
NA
223072_s_at
1
223072_s_at
INO80B /// INO80B-WBP1 /// WBP1
NA
1559716_at
1
1559716_at
INO80C
INO80C
229582_at
1
229582_at
INO80C
INO80C
220165_at
1
220165_at
INO80D
INO80D
I am trying to analyze GEO Data with Bioconductor R. I have imported the required packages and downloaded the dataset. But when i import simpleaffy using library function, I am getting the following message "Loading required package: genefilter
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called ‘digest’
Error: package ‘genefilter’ could not be loaded"
And also, I am unable to read the GEO data using "read.affy". I am getting "Error: could not find function "read.affy".
How to rectify this error?
I performed the Cell Cycle Control Phospho Antibody Array (http://www.fullmoonbio.com/product/cell-cycle-control-phospho-antibody-array/) with 7 control and 7 treatment samples. To identify the signal intensities I used GenePix Pro 7 and created .GPR files.
How do I continue with my statistics? I want to normalize the data and calculate z-scores or SAM. I can normalize tha data in Excel, but I am sure there is a more convenient way to proceede. I read about the program Prospector from Invitrogen and the protMAT website, but Prospector is not working with my .GPR files.
I am new to protein array and microarray research and would be very happy for any suggestions.
Thank you so much!
I am trying to develop a classification model using RNA-Seq gene expression data. Two independent models were developed successfully using RSEM and RPKM values. However, I was wondering if a transfer learning approach can be used to develop a more general model. I am also wondering if such approach would be useful for extracting a biologically relevant learning rule.
We have done miRNA Microarray using Agilent Human miRNA Microarray Kit
Ver. 3.0 (Cat No: AGT-G4470C). I have .gpr files of my samples but I could not analyze their miRNA profile on genespring. How can analyze them on Genespring?
Is it possible to extract the data from the GSE (or GPL) file from getGEO for this analysis as well?
i am very new to bioconductor and R, so have this in mind when answering the question. for my experiment i first have to run an in silico analysis on several gene expression datas from GEO and ArrayExpress. i wanna check the biologically meaningful differential expression between samples ( which are gene expressions from 3 different cell types) and then to visualize the data more explicitly i wanna do gene enrichment and also interpret it as a pathway. how can i do so? can someone please walk me through the steps and softwares or packages i would need to do this in silico analysis?
I want to get Entrez IDs for Affymetrix probe sets (hgu133a) to map them on genome-scale metabolic models (GSMMs). Generally, GSMMs use Entrez gene IDs; therefore, to integrate gene expression profiles with these reconstructions, ID conversion between probe sets in respective Affy platform and Entrez IDs is required. The problem with this conversion in Bioconductor is the presence of multiple mapping between identifiers. A simple example would be:
> select(hgu133a.db, c("200080_s_at"), c("SYMBOL","ENTREZID", "GENENAME"))
'select()' returned 1:many mapping between keys and columns
PROBEID SYMBOL ENTREZID GENENAME
1 200080_s_at H3F3A 3020 H3 histone, family 3A
2 200080_s_at H3F3B 3021 H3 histone, family 3B (H3.3B)
3 200080_s_at H3F3AP4 440926 H3 histone, family 3A, pseudogene 4
The question is, what is the choice here? 3020, 3021 or 440926?
One should notice that the resulting Entrez IDs will be used for GPR (gene-protein-reaction) purpose; therefore, the expression level of all these three Entrez IDs is the same.
Thanks in advance for sharing your thoughts.
I understand the practical details but is there anyone using Bioconductor programs to do the full gene sequencing pathway from alignment to variant calling.
Hello,
I'm studying about detection of differential expressed genes (DEGs) by using disease vs healthy samples microarray data. I use Limma in Bioconductor for analyze the DEGs. I realize that some of DEGs are both up and down regulated. For example, while ARAP2 gene was upregulated in 2 probe set, this gene down regulated in 3 probe set at one dataset. How is this situation occur in transcriptome level. Is this gene up or else down regulated in real? How are we explain both up and down regulated genes in same dataset?
> edesign
Time Replicate Control hypoxic
Array1 3 1 1 2
Array2 2 1 2 1
Array3 2 1 1 2
Array4 1 2 2 1
Array5 1 2 1 2
Array6 3 2 2 1
Array7 3 2 1 2
Array8 2 2 2 1
Array9 2 2 1 2
Array10 1 3 2 1
Array11 1 3 1 2
Array12 3 3 2 1
Array13 3 3 1 2
Array14 2 3 2 1
Array15 2 3 1 2
Array16 1 1 2 1
Array17 1 1 1 2
Array18 3 1 2 1
fit <- p.vector(eset, design, Q = 0.05, MT.adjust = "BH", min.obs = 20)
Error in dat[, as.character(rownames(dis))] : subscript out of bounds\
I have taken E-GEOD-35819 dataset from http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-35819/samples/
I have a dataframe with a first column contains the gene symbol and the others column contains an expression values. the Column of symbol can contain the same symbol more then one time.
So I would like for each set of line with the same symbol calculate the average (or median) of the lines. I will have only a single line by gene in the end.
Thank you in advance
Dear All,
I would like to analyze several datasets from GEO. This would be some cancer data involving genes, as well as, miRNA. However, the approaches how to analyze the differential expression are very different and sometimes unclear. I decided to try the method used in some paper. They wrote that they calculated the absolute log2FC using limma @ Bioconductor, although the FC calculated by limma is not absolute (or is?). In addition, I have not found anywhere some information how to do it… so is there something wrong or do I do something wrong? I appreciate any suggestion.
Second thing is that in my work I would like to evaluate the differential expression of genes from few platforms (I mean, integrate the mRNA and miRNA data from e.g. Affymetrix and Agilent arrays). What would be the best method for array normalization?
Thanks in advance!
I have control samples (6 biological replicates, and each of the inturn have a technical replicate), I need to include all of them (12 samples) in contrast analysis.
Targets file:
Sample Block Treatment
Control11 1 Control
Control12 2 Control
Control13 3 Control
Control14 4 Control
Control15 5 Control
Control16 6 Control
Control21 1 Control
Control22 2 Control
Control23 3 Control
Control24 4 Control
Control25 5 Control
Control26 6 Control
Treat1 1 treatment
Treat2 2 treatment
Treat3 3 treatment
Treat4 4 treatment
Treat5 5 treatment
Treat6 6 treatment
In LIMMA for gene expression data normalization, offset is used to correct background and quantile for between array normalization. How same works and setting different offset values like 16 or 50, means what ?. It will be easy for me to understand in words rather than in equations..
Could anyone please tell me about the right choice of Bioconductor annotation package for Affymetrix porcine gene 1.0 ST array data set?
Hi,
I have calculated the RPKM value from the RNA_SEQ(NGS) raw data. And also i known a lot of people analyze it by the R and Biocondutor GES packages. But i do not know the details process steps and the details R program. Who can gives a detail R program to process the RPKM using the R program? Thank you very much!
Hi everyone,
Experimentalist trying to get introduced to some basic bioinformatics, I've recently started to use R Bioconductor, but getting quite lost yet.
I'm would like to use this new skills to convert a Genebank file (.gbk) downloaded from ncbi, to a multifasta file containing all genes, nucleotides (.ffn), so it would be great if any now can give me the exact script I have to use:
1) Import the .gb file
2) transform .bg to .ffn
3) Export .ffn to tab file (so I can open it using textedit)
I know is kind of very basics, but thank you in advance.
Cristina
Hello,
I was wondering if the following approach is correct:
- I have a predefined list of the Ensembl gene IDs (n=28) and I want to perform Gene Ontology using topGO in R.
- I don't need to use expression values, but I do need to set a universe of genes. For that I chose all gene IDs available in Ensembl (n=64769)
If needed, the code can be provided.
Thanks!
I have an experiment with multiple time points and I have a table of enriched GO terms for each time. After wondering and discussing with colleagues, we couldn't reach an agreement on how to best represent such information.
I'm just beginning to wade into the world that is R. I'm currently having some very basic problems. The problem is so basic that I cannot find any examples out there. I don't have any problem getting my data into R, but what I want to know is how to best group the data prior to importing.
Please refer to the spreadsheet for details. Should I use option_1 (sheet 1) or option_2 (sheet 2) as the format for my data? Does it matter? Will this affect what I can do for my analysis?
What I ultimately want to do is compare the data (ANOVA) from the SC animals to the SD animals. There are some data points missing for some of the data as you can see from the file. I want to be able to compare the data using either the protein column or the peptide column. Do I need the unique_ID column?
The actual data has more samples and more data points.
Any and all suggestions are greatly appreciated.
I'm reprocessing a previously processed microarray gene expression dataset from NCBI GEO, using bioconductor packages. This is Illumina microarray chip. The data provided at GEO is non-normalized but it contains a lot of negative values. Probably it is caused by previous background subtraction. Is there any way to convert/transform those negatives and use them in further analyses? Or should I exclude them?
I am trying to use the oligo package for bioconductors to analyze my latest affy, a MoGene 2.0 array. I am encountering problems with the build of the pd.mogene.2.0.st.v1 package using pdInfoBuilder. I would like to know if anyone has a working script.
I've used the "AgiMicroRna" package of 'bioconductor' using R to analyze my miRNA microarray data. till data analysis was just fine. Arriving at diffferential expression was butter smooth using Pedro Lopez's guide to AgiMicrRna package. Now further on to gene annotation, pathway enrichment GO and interactome (KEGG).etc. lies the hurdle.. I'd really appreciate inputs from one and all in this regard...Any body done this before..??..could you share your strings with me..??
CRAN and BioConductor are full of very exciting applications. Therefore it can be hard to get visibility amongst the mass. Furthermore it is not always sufficient to refer to a package available on R/BioConductor when this package has not been peer-reviewed.
Publishing software related to an established method can get tricky if you aim for visibility. High impact factor journals are unlikely to accept and those who do are not necessarily visible enough.
So what are the "best" journals to publish such methods?
Using Geneplotter R package, there is a function named plotMA (http://www.bioconductor.org/packages/2.13/bioc/manuals/geneplotter/man/geneplotter.pdf). To get the plot, your object (data.frame) needs at least three columns, the first containing the mean expression values (for the x-axis), the second one is logarithmic fold change (for the-y axis) and the third is a logical vector indicating significance (for the coloring of the dots). I have attached my file. I uploaded the file via Rcmdr with the name of my data-set as dat1 and then input the following command:
plotMA(dat1, ylim = NULL, colNonSig = "gray32", colSig = "red3", colLine = "#ff000080", log = "x", cex=0.45, xlab="mean expression", ylab="log fold change")
However, it gave me the following error:
Error in .local(object, ...) :
When called with a data.frame, plotMA expects the data frame to have 3 columns, two numeric ones for mean and log fold change, and a logical one for significance.
I tried many things, without any luck. Any suggestions?
Can you direct me to an open source program or good tutorials in R or matlab that can do/infer copy number variations from microarray data?
In R, what is your favourite approach to cluster genes by their expression profiles? There is a myriad approaches and tools all over: standard clustering, specialized tools, tools like Aracne. I often use a linear model to remove the group-wise effects and then apply a clustering using abs(cor) as the distance metric.
To my knowledge there are at least 11 different methods available (http://www.biomedcentral.com/1471-2105/14/91/). What tests do you prefer and for which kind of data/conditions?
I am unable to install biocLite.R in my system. I am using R 2.15.7 on Windows 7. I used the following command:
source("http://bioconductor.org/biocLite.R")
but its showing this error message:
Error in file(filename, "r", encoding = encoding) :
cannot open the connection
In addition: Warning message:
In file(filename, "r", encoding = encoding) :
unable to resolve 'bioconductor.org'
I have already set proxy for R using http_proxy="address:port"; and http_proxy_user="username:password". I don't know what is the problem.
Hi All
I am completely new to R and I am currently working on a project using R. I would like to know, how do we normalize using R for gene expression data.