Asked 17th Apr, 2015

How can I align intron sequences consisting of repeat regions?

I get some sequences of a intron to reconstruct phylogeny, but sequence alignment of this gene is difficult because it's sequence is consist of repeat regions and sequence length variation is considerable. Does anybody know how to deal with this problem?

All Answers (3)

17th Apr, 2015
Andrii Tarieiev
Georg-August-Universität Göttingen
It's a general problem not only for introns but also for microsatellites and so on. But also no general answer how to improve the alignment is exist. But you can do some steps to improve it:
1. Be sure, that all your sequences are in the same direction
2. Check the results by using different algorythms (Clustal, MUSCLE and so on) and features (i.e. penalties and so on)
3. If it would be not helpful, you can try the algorythm for SSR alignment (information from
1- User must identify the following items:
a. Data set file
b. Repeated units
c. SSR length (first and last nucleotide)
2- Identify the sequences that do not match the first
repeated unit from the beginning of the selected SSR
3- Do this for each repeated unit
a. Put the tandem repeat in a temporary array
b. Check if the next nucleotides match the next
repeated unit
c. If not, put the unmatched nucleotides in
another temporary array
d. Fill the gaps to the longest sequence of the
repeats in the same array
e. Merge the temporary arrays
4- Put your results instead of the SSR region.
Information like above also published here:
For  protein sequences with repeats some special software already created (like RADAR or ExPASy). I don't know much of such software for nucleotide sequences analysis, but maybe it would be helpful:
2 Recommendations
4th May, 2015
Ray C. Schmidt
Randolph-Macon College
Hi Guohua,
Most users remove these areas of variation (insertions/deletions) from the alignment prior to the analysis. I have found that the inclusion of these gap characters actually improves resolution in species level phylogenies. 
The standard alignment software is unable to align markers with considerable variation in size and indel areas. You may be interested in looking at our study using Growth Hormone introns in resolving the phylogeny of some East African fishes. I used Prankster to align the introns and then coded the gaps with FastGap. FastGap generates a matrix of gap characters which you can then analyze in addition to your sequence data.

Similar questions and discussions

Related Publications

Full-text available
Phylogenetic analysis is sometimes regarded as being an intimidating, complex process that requires expertise and years of experience. In fact, it is a fairly straightforward process that can be learned quickly and applied effectively. This Protocol describes the several steps required to produce a phylogenetic tree from molecular data for novices....
Full-text available
The efflux pumps from the Resistance-Nodulation-Division family, RND, are main contributors to intrinsic antibiotic resistance in Gram-negative bacteria. Among this family, the MdtABC pump is unusual by having two inner membrane components. The two components, MdtB and MdtC are homologs, therefore it is evident that the two components arose by gene...
Comparative studies are often used to infer the evolutionary histories of phenotypic traits. In this study, hypotheses suggesting that the evolution of Sceloporus push-up displays was constrained by the evolution of body size and the adoption of an arboreal lifestyle are tested with data from the literature on the push-up displays of 42 species of...
Got a technical question?
Get high-quality answers from experts.