Question
Looking for a method to filter out data from related BLAST results
I am analyzing expressed sequence tags from a fern species and using them to search (blastn) against the miRNA database on mirbase (both mature and hairpin sequences). To search for non-coding ESTs I am searching (blastx) against the plant database on UniProt. I have completed the initial searches which have yielded quite a bit of data. I have to now filter out all the protein coding ESTs (evidenced by the blastx result) from my blastn result to end up with only non-protein coding ESTs and their respective mature/hairpin alignments.
Does anyone know how to filter data from one excel sheet from another? Or another method.
Does anyone know how to filter data from one excel sheet from another? Or another method.
All Answers (10)
-
I'm not sure if that's gonna help but you can try ASAP, just google it, it should come on the first page, it's free software for excel, has lots of features, you may try it and see if it helps. On other note i might not have understood what you want to do. Do you want to identify ncRNA sequences among all ESTs? If so, why don't you just filter everything against the mirbase? you should be able to detect all ncRNAs, assuming that's what you looking for. -
Hi Yevgeniy
I have done a similar problem few years ago...where I could use excel to arrange the blast results, sort the results based on the blast out put values and filter the non-essentail entries....I actually down loaded the databases for coding sequencing, non-coding sequences and made individual local databases on my PC...and I was performing local blast searches on these databases.....everything I was doing on Excel was on my own and not anyother software or tool.....the best thing was I was able to do filtering of sequences after doing the blast...but the steps were very slow and ....these searches and analysis on excel (if the files are huge) takes a lot of time....I hope better tools will be available now...
bye -
Can u send a sample data with telling where is the info of descimination. I will tell u the formula to do it in excel
-
Send a sample? Unix utilities "Sed, Ed, Awk, Nawk, and Grep", or Excel or WinGrep
-
Hey all,
Thank you for your answers. However, I have found a very simple solution. Using Galaxy (https://main.g2.bx.psu.edu/) tools I can convert the columns of interest using Convert delimiters to TAB, to remove white spaces and then using Join, Subtract, Group -Compare two Datasets tool to find common or distinct rows to display only non-matching alignments. -
I have heard of blast2go and how it is useful for annotations which I my include in my workflow a bit later on. Just to update there is another method using excel LOOKUP formula
For example
Copying the query id column from blastx with %identity>85 into the blastn results column AA, you can create a new column , B and use =lookup(A2,$AA$1:$AA$50,000). Then copy the formula into all B rows adjacent to an A row , until the last A row. Hope that made sense. Once the formula finishes you copy A and B and paste as values into AB/AC to enable further data sorting.
This formula yielded better results than the galaxy compare tool because it is still in excel, no need to use .txt delimited... -
Vlookup in excel is best, specially when i found to search non unique datasets too :)
-
Probably this article will help you.
Http://bib.oxfordjournals.org/content/8/1/6.full.pdf+html