Questions related to Blast2GO
I'm working with transcriptome of a non-model organism. I want to compare a subset of the transcripts with the whole transcriptome to see if any GO is enriched. All transcripts have already been annotated in Blast2GO. So here, I simply need a program to run Fisher's exact test for every GO in my genes-of-interest list (I can do the FDR correction myself afterwards). I've tried topGO and got very strange results--some enriched GOs didn't even appear in my genes-of-interest list. So now I'm trying Bingo in Cytoscape.
I followed the manual for customization https://www.psb.ugent.be/cbd/papers/BiNGO/Customize.html as well as this post https://infoplatter.wordpress.com/2014/04/03/gene-ontology-go-enrichment-analysis-in-novel-transcriptomes-using-bingo/ but I only got an output of 0 byte. In fact when I hit "start bingo", no progress indicator appears as the manual says would but the cytoscape log does indicate the task was finished.
Here is what my annotation file looks like:
(species=laupala hybrid)(type=Biological Process)(curator=GO)
And I attached my Bingo settings.
I'm wondering if anyone has experience getting Bingo to work for custom annotation or can suggest other programs to use. I attached my Bingo settings and the gene association file here. Many thanks!
I feel this must be something simple I'm not getting. I'm trying to obtain GO terms for genes predicted by Maker. I ran blast+ locally against the NCBI nr database (animalia subset) and imported the xml file into blast2go. The blast results show up fine but the "mapping" button is grey. Does anyone know why? Thanks a lot!
I would like to find out which KEGG pathways and/or gene ontologies (GO) are present in a microbial community from shotgun metagenomic data. I understand it can be done without pre-assembly of the reads, and I would prefer such an approach, although advice with assembled scaffolds is also welcome. The environment I am studying (plant aerial surfaces) is not so well known, i.e., not a lot of genomes are available.
What I have:
- Shotgun metagenomics reads from Illumina (2x150) from three biologically independent samples
- Access to a large computer cluster, although online methods might be preferable, as software installation is complicated.
What I need:
- A list of KEGG ids present (maybe above a threshold of abundance) in the samples.
- A list of gene ontologies present (maybe above a threshold of abundance) in the samples.
What for? To know what metabolic capabilities are present in the microbial communities.
What I have considered:
- GhostKoala. Advantages: Best option so far. Well integrated with KEGG. Disadvantages: I would have to submit a subsample of my reads, due to file-size constraints (Not too bad, right?). Only KEGG, no GO (I can live with that).
- Blast2GO. Advantages: Seems to be just what I need for GO. Drawback: Very expensive. Only a one-week trial version for free.
- Kaas. Advantages: Easy to use. Can upload clean fasta files directly. Disadvantages: It only searches against a small number of reference organisms.
- Blast against nr or another database (Swiss Prot?). Advantages: Search against a really large and diverse query database. Disadvantages: Collect and parse the results seems very difficult and would give a messy collection of reference sequence names.
So far, my plan is this: clean the reads with trimmomatic, join forward and reverse pairs with vsearch, translate to amino acids in all three possible reading frames, discard all reads with a stop codon, take a random subsample of 1 to 3 million reads, submit each biological sample separately to GhostKoala. Does this sound right? Are there better options I have not considered?
Thanks for your time.
I have been asked to discover what are the genetic causes that allow Moloch horridus to be able to drink water through the skin and the change of colour. There was no genomic information about this specie, so we have sequenced it, assembly and annotated structurally (thanks to ab initio and transcriptomics approach) and functionally through GO terms with BLAST2GO.
However, we have to use comparative genomics in order to identify the genes. We thought of using 1 to 1 orthologues because the most part of these kind of projects use it, but if we are comparing close species that do not share this property I don't see the point in looking for them.
Another doubt I have is about the study of expansion or reduction of family genes and the use of a phylogenetic tree. And the last thing is about enrichment of GO terms, I would like to know why is it useful. Thank you so much
I am using Blast2GO software and for inputting data i need to input the data in fastaq format . I have me gene seq data as as a text file
please i need recommendation which software i can use for this purpose and for information i am using Windows 10
Thanks a lot
I am working on a plant gene family. The family has ~12000 genes. I have Protein and CDS sequences in fasta format. I want to identify their gene ontologies and KEGG pathways.
I am using Blast2GO tool, but it has limitation for large number of genes.
Is there any tool or procedure (blast system for GO identification) that may be helpful for this analysis?
Hi. I'm using BLast2GO v. 5.1.13 to do my analysis.
I have RNASeq data for leaves under well-watered (WW) and droughted (D) conditions at 3 different time points. Differential expression (DE) was performed at each time point comparing WW x D.
For each time point, I generated a list of DE genes (q-value < 0.01). Using this list, I am performing enrichment analysis with Fisher's at 0.01. My reference set is always the same, only the test set changes to one of the gene lists I mentioned.
However, B2GO report on Application Messages window shows that every time I start a new enrichment analysis, the number of Fisher's Reference-Set Size changes. Ex: from from 49650 to 48641.
I am not loading a new .b2g file between analyses.
Does anyone has an explanation for this? Hopefully my explanation was clear enough.
Thanks in advance.
I have a bundle of sequence data need to be annotated. I want to try prokka annotation pipeline for this activity. (http://www.vicbioinformatics.com/software.prokka.shtml), but I have no idea how to run this. If any one can give me a some detailed in put, is highly appreciated.
Can we compare two separately denovo RNA-Seq assemblies of 2 different plant species ?
If yes, then which source code should be used or can we use RSEM expression values to calculate annotated gene specific expression pattern in both the plant species or some other tools need to be used.
I want to predict the GO annotation of newly sequenced insect genome. I could do this in Blast2GO but I am not able to apply the bonferroni correction.
What is an effective and user friendly method for GO and pathway enrichment analysis in RNA-seq data for rice? I saw some tools like AgriGO, Bingo, Blast2Go likewise, with little bioinformatics knowledge which is a easy tool to go for enrichment when the differentially expressed data-set is available for the wild type and the mutant?
Let's suppose I have the following transcript assembled from reads generated by RNA-seq:
Now, I want to use blast2go or argot2 to assing gene ontology (GO) terms to this transcript. What's the first step? Blastx!
In this case, blastx finds an ORF highly similar to other proteins in the reading frame -2, e.g. in the reverse complement of this sequence. See below a piece of xml output in which we can see description of hit 1:
<Hit_def>PREDICTED: UDP-glycosyltransferase 73C3-like [Nicotiana tomentosiformis] >gi|62241063|dbj|BAD93688.1| glucosyltransferase [Nicotiana tabacum]</Hit_def>
Now let's suppose (again!) that this protein (XP_009623864) is associated to the GO term "metabolic process". Blats2go or argot2 will, based on their algorithms, transfer this term to my transcript_A and, in the end, my transcript_A will be associated to metabolic process.
However, is this really OK? This GO term is associated to a protein that is highly similar to an ORF in the REVERSE COMPLEMENT of my transcript. But, here, we are considering a transcript and, to the best of my knowledge, a transcript is
the "final" gene product that will be translated to a protein by ribosomes. In this regard, there is no sense in considering a reverse complement and, accordingly, there is no sense in associating a GO term transferred from a protein coded in the reverse complement of this transcript.
Please, I kindly ask someone to let me know if I am right or wrong.
I have the following questions:
1).I've raw sequences of differentially expressed ESTs in different libraries. It was the mRNA of differentially expressed genes reverse transcribed to cDNA and then to double stranded cDNA which were separately cloned using plasmid vectors in E. coli. I used universal primers (sp6 & T7 as forward and reverse) for insert verification and sequencing. Right now I have two sequences for a single segment insert (clone) known as EST for all my clones. It's expected that one is the complementary of the other strand. The question is that during homology search (annotation) which sequence (of SP6 or T7) should I choose (if so based on what criateria) for every EST as either of the primers amplify the complementary sequence of the gene (but not a coding sequence)? In some cases I found homologies for both strands (amplified by both primers) as different genes. Which one should be chosen is the difficult question I have raised!
2). Any software to clean of the deep blue trash sequences that I'm manually doing right know so that I can make my sequences ready for annotation and analysis with a minimum eanergy and time envestment?
3). If any annotation algorithm better than BLAST2GO in terms of simplicity and fidelity?
Does anyone know a nice and managable tool to annotate the GO (gene ontology) terms for 1000s of ESTs of a non-model organism? Blast2Go is just too slow.
I am trying to setup a local blast2go database following the official instruction http://www.blast2go.com/b2gsupport/resources/35-localb2gdb. I reached step 6 of the list where I am trying to import the database but the terminal window is kind of stuck to the same point since ~24h. I know the process is quite slow and computationally heavy, but I was wondering if anyone could give me an indication of the time involved in this operation?
I am using a 32bit machine with 4GB ram running Ubuntu 12.04 and I have downloaded the databases on the local machine.