University of Cape Coast
Question
Asked 12 May 2018
How do i edit away these strange nucleotide letters (N, K, Y, B etc) from a Sanger sequence DNA read?
Hi there!! I want to do quality check some Sanger sequence reads and realized that the reads contain some odd letters (N, K, Y, B etc) different from the normal 4 DNA base letters (ATGC). What can i do to edit away these strange nucleotides represented by the strange letters?
Most recent answer
Ambiguous nucleotide bases are seen when the amplified region yields double peaks. Seeing double peaks may @imply that co-amplification of nuclear pseudogenes has occurred or the PCR product was prepared from a diploid genomic DNA, where polymorphic regions will simultaneously show both nucleotides.
1 Recommendation
Popular answers (1)
These are not strange nucleotides, but they represent ambiguous peaks at those positions, and polymorphism. Each of these letter specific combinations of signals and are defined in the IUPAC list of ambiguous nucleotides. You need to check these positions in your sequence electrpherograms and confirm. If you have peaks of comparable intensity, you may retain these letters while submitting your sequences to the GenBank. GenBank accept sequences with IUPAC codes. Fllow the link while revising your sequences:
SD
5 Recommendations
All Answers (7)
University College London
These letters represent multiple bases and can be polymorphism sor mutations and are important or just area of poor sequencing and are telling you to not trust sequence in this area if there are many Ns.Either way they are possibly important and should either be left alone or your sequencing improved if there are too many as the best way of dealing with them. Ns at the very beginning of the sequence and when the sequence is finished at the end of the sequence are not important
2 Recommendations
These are not strange nucleotides, but they represent ambiguous peaks at those positions, and polymorphism. Each of these letter specific combinations of signals and are defined in the IUPAC list of ambiguous nucleotides. You need to check these positions in your sequence electrpherograms and confirm. If you have peaks of comparable intensity, you may retain these letters while submitting your sequences to the GenBank. GenBank accept sequences with IUPAC codes. Fllow the link while revising your sequences:
SD
5 Recommendations
After going through the IUPAC nucleotide code list, everything now makes sense. It shows that the ambiguous positions could be mutations. Now the purpose of my analysis is to identify SNPs relative to reference sequences. How do I proceed with all the ambiguous nucleotide positions in almost every sequence. ?
Vall d’Hebron Institute of Oncology
Hi Nebangwa,
Probably, your secuence could be heterogeneous, this means that you could have different sequencing products, and as a result, the sequencing program can't discrimine them. If this is the case, you have to perform some improvements in your pcr program in order to obtain a single and homogeneous product. another option is that you could have a poor concentration of DNA, and all the peaks that you see could be noise.
Can you attached an image of the electropherogram of your sequence?
1 Recommendation
University of the Ryukyus
I do have a similar question and i did get lot of help from your question.
1 Recommendation
University of Cape Coast
Ambiguous nucleotide bases are seen when the amplified region yields double peaks. Seeing double peaks may @imply that co-amplification of nuclear pseudogenes has occurred or the PCR product was prepared from a diploid genomic DNA, where polymorphic regions will simultaneously show both nucleotides.
1 Recommendation
Similar questions and discussions
Related Publications
In general, molecular testing for solid tumors is geared towards the detection of somatic variants or mutations. A wide variety of methods have been developed over the years to identify such variants in the genetic material isolated from tumor cells. Some methods are specifically designed to detect single nucleotide variants (SNVs) or small inserti...
DNA sequencing is a technique that has become central to many laboratories engaged in molecular biology. The two basic sequencing methods commonly used are known as the Maxam-Gilbert method of chemical cleavage (1) and the dideoxy chain termination method developed originally by Sanger (2). One of the most critical steps in either of these two meth...
This paper describes a classroom role-playing exercise that simulates the Sanger method. A mass- participation version of the method uses from 6 to more than 100 students to play the role of nucleotides and can simulate the sequencing of DNA strands up to about 6 nucleotides long. An instructor demonstration version of the method can be used in cla...