Katoh K, Kuma K, Toh H, Miyata TMAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33:511-518

Bioinformatics Center, Institute for Chemical Research, Kyoto University Uji, Kyoto 611-0011, Japan.
Nucleic Acids Research (Impact Factor: 9.11). 02/2005; 33(2):511-8. DOI: 10.1093/nar/gki198
Source: PubMed


The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative
refinement options, H-INS-i, F-INS-i and G-INS-i, in which pairwise alignment information are incorporated into objective
function. These new options of MAFFT showed higher accuracy than currently available methods including TCoffee version 2 and
CLUSTAL W in benchmark tests consisting of alignments of >50 sequences. Like the previously available options, the new options
of MAFFT can handle hundreds of sequences on a standard desktop computer. We also examined the effect of the number of homologues
included in an alignment. For a multiple alignment consisting of ∼8 sequences with low similarity, the accuracy was improved
(2–10 percentage points) when the sequences were aligned together with dozens of their close homologues (E-value < 10−5–10−20) collected from a database. Such improvement was generally observed for most methods, but remarkably large for the new options
of MAFFT proposed here. Thus, we made a Ruby script, mafftE.rb, which aligns the input sequences together with their close
homologues collected from SwissProt using NCBI-BLAST.

Download full-text


Available from: Kazutaka Katoh, Jan 15, 2014
  • Source
    • "We approached the alignment of 18S and 28S rRNA genes in two different ways, in order to also explore the effects of the rRNA alignment using its secondary structure. Initially, both genes were aligned using the multiple alignment using fast Fourier transform (MA- FFT) program (Katoh et al., 2005, 2009), version 7, which implements iterative refinement methods (Katoh and Standley, 2013). The E-INS-I strategy was chosen because it is optimized for a small-scale alignment and recommended for sequences with multiple conserved domains and long gaps, such as rRNA genes (Katoh et al., 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The taxonomic rank and phylogenetic relationships of the pipizine flower flies (Diptera: Syrphidae: Pipizini) were estimated based on DNA sequence data from three gene regions (COI, 28S and 18S) and 111 adult morphological characters. Pipizini has been treated as a member of the subfamily Eristalinae based on diagnostic adult morphological characteristics, while the larval feeding mode and morphology is shared with members of the subfamily Syrphinae. We analysed each dataset, both separately and combined, in a total evidence approach under maximum parsimony and maximum likelihood. To evaluate the influence of different alignment strategies of rDNA 28S and 18S genes on the resulting topologies, we compared the topologies inferred from a multiple alignment using fast Fourier transform (MAFFT) program with those topologies resulting from aligning the secondary structure of these rDNA genes. Total evidence analyses resolved pipizines as a sister group of the subfamily Syrphinae. Although the structural alignment and the MAFFT alignment differed in the inferred relationships of some clades and taxa, there was congruence in the placement of pipizines. The homogeneous morphology of the Pipizini clade in combination with their unique combination of characters among the Syrphidae suggest a change of rank to subfamily. Thus, we propose to divide Syrphidae into four subfamilies, including the subfamily Pipizinae stat. rev.
    Full-text · Article · Oct 2015 · Cladistics
  • Source
    • "Multiple sequence alignment was conducted in MA - FFT v . 5 ( Katoh et al. , 2002 ) . Default parameters were used for COI , H3 , and EF1 - a , for which the alignment is relatively trivial . For the three ribosomal loci , how - ever , the E - INS - I algorithm was used , which is suitable for sequences with large unalignable regions ( Katoh et al . , 2005 ) ."
    [Show abstract] [Hide abstract]
    ABSTRACT: The phylogeny of the paper wasp genus Polistes is investigated using morphological and behavioural characters, as well as molecular data from six genes (COI, 12S, 16S, 28S, H3, and EF1-α). The results are used to investigate the following evolutionary hypotheses about the genus: (i) that Polistes first evolved in Southeast Asia, (ii) that dispersal to the New World occurred only once, and (iii) that long-term monogyny evolved as an adaptation to overwintering in a temperate climate. Optimization of distribution records on the recovered tree does not allow unambiguous reconstruction of the ancestral area of Polistes. While the results indicate that Polistes dispersed into the New World from Asia, South America is recovered as the ancestral area for all New World Polistes: Nearctic species groups evolved multiple times from this South American stock. The final tree topology suggests strongly that the genus first arose in a tropical environment, refuting the idea of monogyny as an overwintering adaptation.
    Full-text · Article · Sep 2015 · Cladistics
  • Source
    • "Alignments of the amino acid sequences for each gene were performed by using MAFFT (Katoh et al. 2005) with default settings. Based on the aligned amino acid and the original (non-aligned) nucleotide sequences, we reconstructed nucleotide alignment by substituting each amino acid with the corresponding codon from the nucleotide sequence and by substituting each gap in the amino acid alignment with three gaps in the nucleotide alignment. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genetic factors may play an important role in species extinction but their actual effect remains poorly understood, particularly because of a strong and potentially masking effect expected from ecological traits. We investigated the role of genetics in mammal extinction taking both ecological and genetic factors into account. As a proxy for the role of genetics we used the ratio of the rates of nonsynonymous (amino acid changing) to synonymous (leaving the amino acid unchanged) nucleotide substitutions, Ka / Ks. Because most nonsynonymous substitutions are likely to be slightly deleterious and thus selected against, this ratio is a measure of the inefficiency of selection: if large (but less than 1), it implies a low efficiency of selection against nonsynonymous mutations. As a result, nonsynonymous mutations may accumulate and thus contribute to extinction. As a proxy for the role of ecology we used body mass W, with which most extinction-related ecological traits strongly correlate. As a measure of extinction risk we used species’ affiliation with the five levels of extinction threat according to the IUCN Red List of Threatened Species. We calculated Ka / Ks for mitochondrial protein-coding genes of 211 mammalian species, each of which was characterized by body mass and the level of threat. Using logistic regression analysis, we then constructed a set of logistic regression models of extinction risk on ln(Ka / Ks) and lnW. We found that Ka / Ks and body mass are responsible for a 38% and a 62% increase in extinction risk, respectively. Given that the standard error of these values is 13%, the contribution of genetic factors to extinction risk in mammals is estimated to be one-quarter to one-half of the total of ecological and genetic effects. We conclude that the effect of genetics on extinction is significant, though it is almost certainly smaller than the effect of ecological traits.
    Full-text · Article · Aug 2015 · Oikos
Show more