Science topic

Blast2GO - Science topic

Explore the latest questions and answers in Blast2GO, and find Blast2GO experts.
Questions related to Blast2GO
  • asked a question related to Blast2GO
Question
7 answers
Hi,
I'm working with transcriptome of a non-model organism. I want to compare a subset of the transcripts with the whole transcriptome to see if any GO is enriched. All transcripts have already been annotated in Blast2GO. So here, I simply need a program to run Fisher's exact test for every GO in my genes-of-interest list (I can do the FDR correction myself afterwards). I've tried topGO and got very strange results--some enriched GOs didn't even appear in my genes-of-interest list. So now I'm trying Bingo in Cytoscape.
I followed the manual for customization https://www.psb.ugent.be/cbd/papers/BiNGO/Customize.html as well as this post https://infoplatter.wordpress.com/2014/04/03/gene-ontology-go-enrichment-analysis-in-novel-transcriptomes-using-bingo/ but I only got an output of 0 byte. In fact when I hit "start bingo", no progress indicator appears as the manual says would but the cytoscape log does indicate the task was finished.
Here is what my annotation file looks like:
(species=laupala hybrid)(type=Biological Process)(curator=GO)
TRINITY_GG_2359_c0_g1_i1=0045805
TRINITY_GG_2359_c0_g1_i1=0035002
TRINITY_GG_2359_c0_g1_i1=0035003
TRINITY_GG_2359_c0_g1_i1=0007593
TRINITY_GG_2359_c0_g1_i1=0048526
TRINITY_GG_2359_c0_g1_i1=0045179
And I attached my Bingo settings.
I'm wondering if anyone has experience getting Bingo to work for custom annotation or can suggest other programs to use. I attached my Bingo settings and the gene association file here. Many thanks!
Relevant answer
Answer
TopGO or david
  • asked a question related to Blast2GO
Question
1 answer
Hi,
I feel this must be something simple I'm not getting. I'm trying to obtain GO terms for genes predicted by Maker. I ran blast+ locally against the NCBI nr database (animalia subset) and imported the xml file into blast2go. The blast results show up fine but the "mapping" button is grey. Does anyone know why? Thanks a lot!
Relevant answer
Answer
Same issue, here. Did you manage to find a solution ?
  • asked a question related to Blast2GO
Question
7 answers
Hi,
I would like to find out which KEGG pathways and/or gene ontologies (GO) are present in a microbial community from shotgun metagenomic data. I understand it can be done without pre-assembly of the reads, and I would prefer such an approach, although advice with assembled scaffolds is also welcome. The environment I am studying (plant aerial surfaces) is not so well known, i.e., not a lot of genomes are available.
What I have:
- Shotgun metagenomics reads from Illumina (2x150) from three biologically independent samples
- Access to a large computer cluster, although online methods might be preferable, as software installation is complicated.
What I need:
- A list of KEGG ids present (maybe above a threshold of abundance) in the samples.
- A list of gene ontologies present (maybe above a threshold of abundance) in the samples.
What for? To know what metabolic capabilities are present in the microbial communities.
What I have considered:
- GhostKoala. Advantages: Best option so far. Well integrated with KEGG. Disadvantages: I would have to submit a subsample of my reads, due to file-size constraints (Not too bad, right?). Only KEGG, no GO (I can live with that).
- Blast2GO. Advantages: Seems to be just what I need for GO. Drawback: Very expensive. Only a one-week trial version for free.
- Kaas. Advantages: Easy to use. Can upload clean fasta files directly. Disadvantages: It only searches against a small number of reference organisms.
- Blast against nr or another database (Swiss Prot?). Advantages: Search against a really large and diverse query database. Disadvantages: Collect and parse the results seems very difficult and would give a messy collection of reference sequence names.
So far, my plan is this: clean the reads with trimmomatic, join forward and reverse pairs with vsearch, translate to amino acids in all three possible reading frames, discard all reads with a stop codon, take a random subsample of 1 to 3 million reads, submit each biological sample separately to GhostKoala. Does this sound right? Are there better options I have not considered?
Thanks for your time.
Relevant answer
Answer
I think maybe it depends on the goals that you want to reach. If you map to the databases, I am not sure if you will identify something new.
However, I understand what you are saying about the possibility of bias your analysis; maybe that could potentially be solved by trying different binning software and try to recover the low abundant organisms.
I think you already went further with your analysis, and I think you are right; you will be discarding some good reads, but then maybe you can map those.
I don't know if someone has compared both strategies, probably they have; it would be good to see.
  • asked a question related to Blast2GO
Question
5 answers
I have been asked to discover what are the genetic causes that allow Moloch horridus to be able to drink water through the skin and the change of colour. There was no genomic information about this specie, so we have sequenced it, assembly and annotated structurally (thanks to ab initio and transcriptomics approach) and functionally through GO terms with BLAST2GO.
However, we have to use comparative genomics in order to identify the genes. We thought of using 1 to 1 orthologues because the most part of these kind of projects use it, but if we are comparing close species that do not share this property I don't see the point in looking for them.
Another doubt I have is about the study of expansion or reduction of family genes and the use of a phylogenetic tree. And the last thing is about enrichment of GO terms, I would like to know why is it useful. Thank you so much
Relevant answer
Answer
Got it. Regrettably, I have not had sufficient time to explore positive selection, only getting as far as installing and implementing PAML, which is a useful tool for such measurements. Although we tried an analysis, we did not use sufficient rigor for me to speak knowledgeably besides to say this seems like a useful way to distinguish orthologs.
  • asked a question related to Blast2GO
Question
4 answers
I am using Blast2GO software and for inputting data i need to input the data in fastaq format . I have me gene seq data as as a text file
please i need recommendation which software i can use for this purpose and for information i am using Windows 10
Thanks a lot
Relevant answer
Answer
You need a quality file to add to the fasta to convert it to fasq, because fastq contain sequencing quality values for each nucleotide. Normally, if your data comes from some sequencing service, you have to had the .fastq files or some kind of files with nucleotide quality values.
After find the quality files, try this software, It looks like it is made for windows, but I can't prove it because I have a linux computer.
  • asked a question related to Blast2GO
Question
6 answers
I am working on a plant gene family. The family has ~12000 genes. I have Protein and CDS sequences in fasta format. I want to identify their gene ontologies and KEGG pathways.
I am using Blast2GO tool, but it has limitation for large number of genes.
Is there any tool or procedure (blast system for GO identification) that may be helpful for this analysis?
Relevant answer
Answer
Hello Athar,
If you still want to use blast2Go, you can run first your blast remotely (using BLAST+ https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download) and save the results in an xml file, the same format as the one blast2Go produces (I think it is outfmt 5). Then you can upload the file with the results on blast2go and get your GO terms.
Alternatively, you can use Interpro http://www.ebi.ac.uk/interpro/
  • asked a question related to Blast2GO
Question
2 answers
Hi. I'm using BLast2GO v. 5.1.13 to do my analysis.
I have RNASeq data for leaves under well-watered (WW) and droughted (D) conditions at 3 different time points. Differential expression (DE) was performed at each time point comparing WW x D.
For each time point, I generated a list of DE genes (q-value < 0.01). Using this list, I am performing enrichment analysis with Fisher's at 0.01. My reference set is always the same, only the test set changes to one of the gene lists I mentioned.
However, B2GO report on Application Messages window shows that every time I start a new enrichment analysis, the number of Fisher's Reference-Set Size changes. Ex: from from 49650 to 48641.
I am not loading a new .b2g file between analyses.
Does anyone has an explanation for this? Hopefully my explanation was clear enough.
Thanks in advance.
Best wishes,
Renata
Relevant answer
Answer
Thanks, Dibakar. I thought it would only remove the IDs if I ticked the option. I understand now! :)
  • asked a question related to Blast2GO
Question
3 answers
Dear All,
How to analyse Gene ontology for the blast results like Standalone Blast ? BLAST2GO is now become a limited analysis in Trail. Kindly anyone suggest a new tool for GO analysis
Relevant answer
Answer
Instead of David or Webgestalt which are becoming quite obsolete, I'd suggest XGR (http://galahad.well.ox.ac.uk:3020/) or FUMA (http://fuma.ctglab.nl/gene2func). The first one has been developed by Wellcome Trust Centre for Human Genetics, University of Oxford, the latter by VU University Amsterdam.
  • asked a question related to Blast2GO
Question
5 answers
I have a bundle of sequence data need to be annotated. I want to try prokka annotation pipeline for this activity. (http://www.vicbioinformatics.com/software.prokka.shtml), but I have no idea how to run this. If any one can give me a some detailed in put, is highly appreciated. 
Thank you,
Regards,
Amal.
Relevant answer
Answer
Hi Amal,
You have a 3rd option. If you don't have enough expertice with linux you can use online tools based on galaxy server (https://orione.crs4.it/).
Just upload your sequence and annotate it with Prokka.  
Best regards,
Fernando
  • asked a question related to Blast2GO
Question
3 answers
Can we compare two separately denovo RNA-Seq assemblies of 2 different plant species ? 
If yes, then which source code should be used or can we use RSEM expression values to calculate annotated gene specific expression pattern in both the plant species or some other tools need to be used.
Thanks.
Relevant answer
Answer
Yes, but you have to deal with alternative transcripts in your de novo assembly. You could assemble the transcriptomes of each plant with Oases, trinity or DRAP (https://peerj.com/articles/2988/) then use glutton in find orthologous contigs 
and merge the counts using the orthology links.
  • asked a question related to Blast2GO
Question
1 answer
As far as I know, Phobius doesn't show any kind of threshold when run via Blast2GO. In particular, for the plot of probabilities. How can we apply some kind of threshold then?
Phobius website: http://phobius.sbc.su.se/
Relevant answer
Answer
Hi Bernardo.
Which threshold are you referring to in Phobious?  
In Blast2GO,  InterProScan is used and at the moment no parameters are supported by InterProScan for the individual databases/algorithms. 
  • asked a question related to Blast2GO
Question
2 answers
Could someone suggest me what should be the test set for performing Fisher's Exact Test in blast2GO?
Relevant answer
Answer
This really depends on the biological question you want to answer.
A typical basic scenario would be: You have a functionally annotated genome (e.g. with Blast2GO) and identified a list of up-regulated genes under a certain condition.
Your question would be: Is there a certain molecular function overrepresented among these genes?
Technically speaking: You take the GO annotations of the whole genome (reference set) and a list of upregulated genes (test set) and perform a Fisher's Exact Test to find statistically overrepresented functions. The test compares all functional associated to the genes in your list to the ones of the rest/whole genome. Since this test is done for each function  (e.g. a GO term) separately you have to adjust for multiple testing via e.g. a false discovery rate (FDR). The result is a list of functions with an adjusted p-value below (or close to) 0.05. Blast2GO allows to further summarize, visualize, condense and export these functions.    
  • asked a question related to Blast2GO
Question
5 answers
I want to predict the GO annotation of newly sequenced insect genome. I could do this in Blast2GO but I am not able to apply the bonferroni correction.
Relevant answer
Answer
The question is really why one should do this, and how one should reasonably select the family-wise error-rate.
If you have some plan to use the selected candidates for some further experiments and if you were able to guess the expected losses you have from selecting a wrong candidate, then you might be able to value a reasonable error-rate. But I doubt that this is possible.
Any cut-off you "just use" - based on whatever principle - is arbitrary and remains arbitrary and gives you a list of arbitrary length. So I would not invest too much time and effort in thinking about such things. It's arbitrary anyway.
The imortant result of such analysis is the order, the ranking of the GO terms in the entire list. The interpretation should be done wisely, based on expert knowledge, and one may find that the top 5 terms are sufficient, or that the top 100 terms are required to get some idea about what is going on (it may happen that one never get's an idea, but that's a different problem...).
For reporting or some further global analysis of "enriched terms" or something similar it might be a bad idea to control the famili-wise error-rate. Instead it is usually better to control the false-discovery rate. But again it is usually impossible to give a reasonable cut-off (same problem as above: it is arbitrary).
  • asked a question related to Blast2GO
Question
7 answers
What is an effective and user friendly method for GO and pathway enrichment analysis in RNA-seq data for rice? I saw some tools like AgriGO, Bingo, Blast2Go likewise, with little bioinformatics knowledge which is a easy tool to go for enrichment when the differentially expressed data-set is available for the wild type and the mutant?
Relevant answer
Answer
Gene Set Enrichment Analysis (GSEA) is one of the good tool for performing enrichment analysis. Also you can use GOrilla(Gene Ontology enRIchment anaLysis and visuaLizAtion tool), WebGestalt (WEB-based GEne SeT AnaLysis Toolkit), DAVID etc. Hope these tools will help you to get the good results. Good Luck!
  • asked a question related to Blast2GO
Question
11 answers
Hi everyone!
Let's suppose I have the following transcript assembled from reads generated by RNA-seq:
>transcript_A
tccgcaatgagtcaatactccaccaattgcagggtgtgaaagtataagcacttgaggagcccatcctctaatcaaaactcctctttctttaattctttgctcaaatccattctcaagaatccatttctctaaatcatttaatttatcacctcctcctaatacccatacaaaaggcctatttgactcttctaaaccaagaccaagttccaccatttgcaataatgtcaaacgagataaacttccaagacttgcataaaccacagattctgtttcaaaattatctaaccatttcaagcaatcttgattatcaattgcagttttattaccccttgtaaccaaatcttcaatttccttattacacaaagaaacaggaccaacacaccaaacttttttccctctagctttcctatattctttctcatacacttgctccaactcctcaaaactattaacaattacaccatatgatgattcctcggctaatctgatttgctcagtaacttctttcaatacagaagaactaacagaagtagtatttttcgtcgatcctgaaacctgagctttcgttagttcaactctatcgggtaaatcaggaacaacaaaatactctgaatctgaggttatattttcaagaatgttggaggaaagtattttataggaacataaaagtgagaaacaacaagtaccatgaaaaacaattcttgggatattaaaattttgtgcaatttgagtagtccaaggaaatcccatatctgaaataacacaacttggacttggatttattccttctaagagattttcaacttgttgtttcagcatactaattgcagcaaaaaactttgaagccaagtcaagagaaggaagcatg
Now, I want to use blast2go or argot2 to assing gene ontology (GO) terms to this transcript. What's the first step? Blastx!
In this case, blastx finds an ORF highly similar to other proteins in the reading frame -2, e.g. in the reverse complement of this sequence. See below a piece of xml output in which we can see description of hit 1:
<Hit_num>1</Hit_num>
<Hit_id>gi|697139547|ref|XP_009623864.1|</Hit_id>
<Hit_def>PREDICTED: UDP-glycosyltransferase 73C3-like [Nicotiana tomentosiformis] &gt;gi|62241063|dbj|BAD93688.1| glucosyltransferase [Nicotiana tabacum]</Hit_def>
<Hit_accession>XP_009623864</Hit_accession>
<Hit_len>496</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>562.377</Hsp_bit-score>
<Hsp_score>1448</Hsp_score>
<Hsp_evalue>0</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>867</Hsp_query-to>
<Hsp_hit-from>87</Hsp_hit-from>
<Hsp_hit-to>375</Hsp_hit-to>
<Hsp_query-frame>-2</Hsp_query-frame>
<Hsp_hit-frame>0</Hsp_hit-frame>
<Hsp_identity>289</Hsp_identity>
<Hsp_positive>289</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>289</Hsp_align-len>
Now let's suppose (again!) that this protein (XP_009623864) is associated to the GO term "metabolic process". Blats2go or argot2 will, based on their algorithms, transfer this term to my transcript_A and, in the end, my transcript_A will be associated to metabolic process.
However, is this really OK? This GO term is associated to a protein that is highly similar to an ORF in the REVERSE COMPLEMENT of my transcript. But, here, we are considering a transcript and, to the best of my knowledge, a transcript is
the "final" gene product that will be translated to a protein by ribosomes. In this regard, there is no sense in considering a reverse complement and, accordingly, there is no sense in associating a GO term transferred from a protein coded in the reverse complement of this transcript.
Please, I kindly ask someone to let me know if I am right or wrong.
Best regards,
Marcio
Relevant answer
Answer
Hi Marcio,
The transcript that the cell produces has, of course, a defined orientation. We have to differentiate between the actual RNA transcript and a computational reconstruction.
The problem is that with current sequencing techniques it is not possible to sequencing a typical transcript because it is too long. Then a fragmentation step is required.
in addition, what is actually sequenced is not the original RNA but a synthesized cDNA, and in this synthesis step with the standard protocol is when the directionally is lost. It is obvious that it is possible to retrieve the proper orientation of fragments after sequencing, and we have the protocols to do so, but the non-stranded one is cheaper and is still in use in case when the strand information is not required.
Maybe this page can help to clarify your doubts: http://rnaseq.uoregon.edu/. Otherwise I don't have in mind any reference that explicitly explain this issue. In this publication, http://www.nature.com/nmeth/journal/v7/n9/full/nmeth.1491.html, for instance, the authors just say: "ynthesis of randomly primed double-stranded cDNA followed by addition of adaptors for next-generation sequencing leads to the loss of information about which strand was present in the original mRNA template".
Best,
Juan
  • asked a question related to Blast2GO
Question
7 answers
I have the following questions:
1).I've raw sequences of differentially expressed ESTs in different libraries. It was the mRNA of differentially expressed genes reverse transcribed to cDNA and then to double stranded cDNA which were separately cloned using plasmid vectors in E. coli. I used universal primers (sp6 & T7 as forward and reverse) for insert verification and sequencing. Right now I have two sequences for a single segment insert (clone) known as EST for all my clones. It's expected that one is the complementary of the other strand. The question is that during homology search (annotation) which sequence (of SP6 or T7) should I choose (if so based on what criateria) for every EST as either of the primers amplify the complementary sequence of the gene (but not a coding sequence)? In some cases I found homologies for both strands (amplified by both primers) as different genes. Which one should be chosen is the difficult question I have raised!  
2). Any software to clean of the deep blue trash sequences that I'm manually doing right know so that I can make my sequences ready for annotation and analysis with a minimum eanergy and time envestment?
3). If any annotation algorithm better than BLAST2GO in terms of simplicity and fidelity?
Relevant answer
Answer
Pleasure :)
  • asked a question related to Blast2GO
Question
4 answers
Does anyone know a nice and managable tool to annotate the GO (gene ontology) terms for 1000s of ESTs of a non-model organism? Blast2Go is just too slow.
Relevant answer
Answer
You can give a try to Argot http://www.medcomp.medicina.unipd.it/Argot2/index.php You'll need to perform BLAST yourself though.
  • asked a question related to Blast2GO
Question
1 answer
I am trying to setup a local blast2go database following the official instruction http://www.blast2go.com/b2gsupport/resources/35-localb2gdb. I reached step 6 of the list where I am trying to import the database but the terminal window is kind of stuck to the same point since ~24h. I know the process is quite slow and computationally heavy, but I was wondering if anyone could give me an indication of the time involved in this operation?
I am using a 32bit machine with 4GB ram running Ubuntu 12.04 and I have downloaded the databases on the local machine.
Relevant answer
Answer
I also faced the same problem. I will suggest to select approx 40-50 and then it will quickly finish the job. After saving, start the next 50. Repeat this till the end of your project. I hope in this way, you can save your time and complete the job.
  • asked a question related to Blast2GO
Question
4 answers
I have analysed a transcriptome recently, after blastx with Nr database. I want to do Go analysis. I want to know is there any software that can do GO analysis except Blast2go?
Relevant answer
Answer
use this link, may be it is simple and will help you. I am also using the same thing,