Asked 17th Apr, 2015

How can I align intron sequences consisting of repeat regions?

I get some sequences of a intron to reconstruct phylogeny, but sequence alignment of this gene is difficult because it's sequence is consist of repeat regions and sequence length variation is considerable. Does anybody know how to deal with this problem?

17th Apr, 2015
Andrii Tarieiev
Georg-August-Universität Göttingen
It's a general problem not only for introns but also for microsatellites and so on. But also no general answer how to improve the alignment is exist. But you can do some steps to improve it:
1. Be sure, that all your sequences are in the same direction
2. Check the results by using different algorythms (Clustal, MUSCLE and so on) and features (i.e. penalties and so on)
3. If it would be not helpful, you can try the algorythm for SSR alignment (information from
1- User must identify the following items:
a. Data set file
b. Repeated units
c. SSR length (first and last nucleotide)
2- Identify the sequences that do not match the first
repeated unit from the beginning of the selected SSR
3- Do this for each repeated unit
a. Put the tandem repeat in a temporary array
b. Check if the next nucleotides match the next
repeated unit
c. If not, put the unmatched nucleotides in
another temporary array
d. Fill the gaps to the longest sequence of the
repeats in the same array
e. Merge the temporary arrays
4- Put your results instead of the SSR region.
Information like above also published here:
For  protein sequences with repeats some special software already created (like RADAR or ExPASy). I don't know much of such software for nucleotide sequences analysis, but maybe it would be helpful:
4th May, 2015
Ray C. Schmidt
Randolph-Macon College
Hi Guohua,
Most users remove these areas of variation (insertions/deletions) from the alignment prior to the analysis. I have found that the inclusion of these gap characters actually improves resolution in species level phylogenies. 
The standard alignment software is unable to align markers with considerable variation in size and indel areas. You may be interested in looking at our study using Growth Hormone introns in resolving the phylogeny of some East African fishes. I used Prankster to align the introns and then coded the gaps with FastGap. FastGap generates a matrix of gap characters which you can then analyze in addition to your sequence data.

