Comparison of the PAM and BLOSUM Amino Acid Substitution Matrices

Cold Spring Harbor Protocols (Impact Factor: 4.63). 06/2008; 2008(6):pdb.ip59. DOI: 10.1101/pdb.ip59
Source: PubMed

ABSTRACT INTRODUCTIONThe choice of a scoring system including scores for matches, mismatches, substitutions, insertions, and deletions influences the alignment of both DNA and protein sequences. To score matches and mismatches in alignments of proteins, it is necessary to know how often one amino acid is substituted for another in related proteins. Percent accepted mutation (PAM) matrices list the likelihood of change from one amino acid to another in homologous protein sequences during evolution and thus are focused on tracking the evolutionary origins of proteins. In contrast, the blocks amino acid substitution matrices (BLOSUM) are based on scoring substitutions found over a range of evolutionary periods. There are important differences in the ways that the PAM and BLOSUM scoring matrices were derived. These differences, which are discussed in this article, should be appreciated when interpreting the results of protein sequence alignments obtained with these matrices.

159 Reads
  • Source
    • "Generally, obtaining an efficient multiple alignment looks impossible when the sequences do not have enough similarity between them. Sequence alignment programs use a scoring matrix such as point accepted mutation (PAM) and BLOcks SUbstitution Matrix (BLOSUM) to generate a score for the alignment [11]. Some limitations of alignment-based approaches are [12] as follows. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth.
    06/2014; 2014(4):173869. DOI:10.1155/2014/173869
  • Source
    • "In bioinformatics several matrices are available with the most popular being PAMxx and BLOSUMxx (Mount, 2008). PAMxx provides scores based on the observed frequencies of alignments in related proteins (xx meaning up to xx% of divergence between two genes i.e. xx=50), where identities are given the highest scores (frequently observed substitutions are given a positive score and rarely observed substitutions a negative score). "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a novel approach to comparing saccadic eye movement sequences based on the Needleman-Wunsch algorithm used in bioinformatics to compare DNA sequences. In the proposed method, the saccade sequence is spatially and temporally binned and then recoded to create a sequence of letters that retains fixation location, time, and order information. The comparison of two letter sequences is made by maximizing the similarity score computed from a substitution matrix that provides the score for all letter pair substitutions and a penalty gap. The substitution matrix provides a meaningful link between each location coded by the individual letters. This link could be distance but could also encode any useful dimension, including perceptual or semantic space. We show, by using synthetic and behavioral data, the benefits of this method over existing methods. The ScanMatch toolbox for MATLAB is freely available online (
    Behavior Research Methods 08/2010; 42(3):692-700. DOI:10.3758/BRM.42.3.692 · 2.12 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As a new strain of computer malware is discovered, it triggers a meticulous process of analyzing its behavior and developing appropriate defenses. A systematic process which identifies regions of commonality and variability with known samples can ease the burden of malware analysis. We address this challenge using an interdisciplinary approach which applies biological sequence analysis methods to computer malware. Specifically, we have developed a method which has the goal of classifying a digital artifact (possibly malware) based on its similarity to known digital artifacts (or known malware samples) using methods and tools of bioinformatics. Our approach is analogous to classifications of biological sequences, which are routinely performed using online databases of known biological sequences.
    The 2012 International Conference on Security & Management (SAM2012),; 07/2012
Show more