Article
Sequence alignment with an appropriate substitution matrix
Department of Computer Science, Iowa State University, Ames, Iowa 500111040, USA.
Journal of Computational Biology (Impact Factor: 1.74). 04/2008; 15(2):12938. DOI: 10.1089/cmb.2007.0155 Source: PubMed
Fulltext preview
ku.edu Available from: Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.

 "One approach is to take a set of reference alignments, and to derive parameters that generate alignments that best match this reference set, either by matching the substitution parameters to observed statistics [4] [5] [6] [7] [8], or by varying parameters in order to maximise the alignment accuracy with respect to the reference set [9] [10] [11] [12] [13]. An alternative approach is to iteratively align the set of sequences, at each iteration deriving a new matrix from the observed pair frequencies within the aligned dataset [14]. "
[Show abstract] [Hide abstract]
ABSTRACT: We outline a procedure for jointly sampling substitution matrices and multiple sequence alignments, according to an approximate posterior distribution, using an MCMCbased algorithm. This procedure provides an efficient and simple method by which to generate alternative alignments according to their expected accuracy, and allows appropriate parameters for substitution matrices to be selected in an automated fashion. In the cases considered here, the sampled alignments with the highest likelihood have an accuracy consistently higher than alignments generated using the standard BLOSUM62 matrix. 
 "The distance d T (i,j) between nodes e i and e j in the tree T is the sum of lengths of all branches on the path between e i and e j . Let S(d) be a substitution matrix at evolutionary distance d in PAM (Point Accepted Mutations) units (Dayhoff et al., 1978; Müller and Vingron, 2000; Huang, 2008). The similarity score s T (i,j) of sequences t i and t j with respect to the tree T is the similarity score of the alignment of t i and t j computed with the substitution matrix S(d T (i,j)). "
[Show abstract] [Hide abstract]
ABSTRACT: We present a new formulation of phylogenetic reconstruction named maximum similarity. We describe basic algorithms based on the maximum similarity objective for computing distances between subtrees and for combining two subtrees. We present distance methods for constructing an initial tree and updating the initial tree by incorporating those basic algorithms into the Neighbor Joining (NJ) method and the NearestNeighbor Interchange (NNI) framework of the FastME program. The new distance methods have been implemented as a computer program named MS. The time requirement of the MS program is about five times the cost of computing observed sequence distances. The MS program was compared by simulation with four existing programs: NJ, FastME, STC, and Weighbor. Experimental results show that incorporating the maximum similarity objective into existing methods leads to improvements both in topology and in branch length.Journal of computational biology: a journal of computational molecular cell biology 08/2009; 16(7):88796. DOI:10.1089/cmb.2008.0232 · 1.74 Impact Factor  [Show abstract] [Hide abstract]
ABSTRACT: Pairwise sequence alignment forms the basis of numerous other applications in bioinformatics. The quality of an alignment is gauged by statistical significance rather than by alignment score alone. Therefore, accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, it was shown that pairwise statistical significance does better in practice than database statistical significance, and also provides quicker individual pairwise estimates of statistical significance without having to perform timeconsuming database search. Under an evolutionary model, a substitution matrix can be derived using a rate matrix and a fixed distance. Although the commonly used substitution matrices like BLOSUM62, etc. were not originally derived from a rate matrix under an evolutionary model, the corresponding rate matrices can be back calculated. Many researchers have derived different rate matrices using different methods and data. In this paper, we show that pairwise statistical significance using rate matrices with sequencepairspecific distance performs significantly better compared to using a fixed distance. Pairwise statistical significance using sequencepairspecific distanced substitution matrices also outperforms database statistical significance reported by BLAST.