MAFFT version 5: Improvement in accuracy of multiple sequence alignment

Bioinformatics Center, Institute for Chemical Research, Kyoto University Uji, Kyoto 611-0011, Japan.
Nucleic Acids Research (Impact Factor: 9.11). 02/2005; 33(2):511-8. DOI: 10.1093/nar/gki198
Source: PubMed

ABSTRACT The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative
refinement options, H-INS-i, F-INS-i and G-INS-i, in which pairwise alignment information are incorporated into objective
function. These new options of MAFFT showed higher accuracy than currently available methods including TCoffee version 2 and
CLUSTAL W in benchmark tests consisting of alignments of >50 sequences. Like the previously available options, the new options
of MAFFT can handle hundreds of sequences on a standard desktop computer. We also examined the effect of the number of homologues
included in an alignment. For a multiple alignment consisting of ∼8 sequences with low similarity, the accuracy was improved
(2–10 percentage points) when the sequences were aligned together with dozens of their close homologues (E-value < 10−5–10−20) collected from a database. Such improvement was generally observed for most methods, but remarkably large for the new options
of MAFFT proposed here. Thus, we made a Ruby script, mafftE.rb, which aligns the input sequences together with their close
homologues collected from SwissProt using NCBI-BLAST.

Download full-text


Available from: Kazutaka Katoh, Jan 15, 2014
  • Source
    • "with coding sequences , the nucleotide alignment was constrained by the amino acid sequences alignment . Although the number of sequences was small in each taxonomic group , we compared the alignments obtained with MUSCLE and submitted the raw amino acid sequences to the GUIDANCE filter ( Penn et al . 2010 ) , using the alignment algorithm MAFFT ( Katoh et al . 2005 ) . MUSCLE and MAFFT produced very similar outputs and we therefore chose the alignments produced by the former ( Additional file 1 : Figure S1 , Additional file 3 : Figure S2 , Additional file 4 : Figure S3 ) . GUIDANCE provided us alignment scores and regions of the align - ments that were not well supported ( Additional file 1 : Figu"
    [Show abstract] [Hide abstract]
    ABSTRACT: Multi-domain proteins form the majority of proteins in eukaryotes. During their formation by tandem duplication or gene fusion, new interactions between domains may arise as a result of the structurally-forced proximity of domains. The proper function of the formed proteins likely required the molecular adjustment of these stress zones by specific amino acid replacements, which should be detectable by the molecular signature of selection that governed their changes. We used multi-domain globins from three different invertebrate lineages to investigate the selective forces that acted throughout the evolution of these molecules. In the youngest of these molecules [Branchipolynoe scaleworm; original duplication ca. 60 million years (Ma)], we were able to detect some amino acids under positive selection corresponding to the initial duplication event. In older lineages (didomain globin from bivalve mollusks and nematodes), there was no evidence of amino acid positions under positive selection, possibly the result of accumulated non-adaptative mutations since the original duplication event (165 and 245 Ma, respectively). Some amino acids under positive selection were sometimes detected in later branches, either after speciation events, or after the initial duplication event. In Branchipolynoe, the position of the amino acids under positive selection on a 3D model suggests some of them are located at the interface between two domains; while others are locate in the heme pocket.
    SpringerPlus 07/2015; 4:354. DOI:10.1186/s40064-015-1124-2
    • "created by Biomatters:, and MAFFT (Katoh et al. 2005) was used to conduct multiple sequence alignments which contained no indels. A Bayesian inference (BI) phylogenetic analysis was conducted with MrBayes (v. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The coffee berry borer, Hypothenemus hampei (Coleoptera: Curculionidae: Scolytinae), was first discovered in coffee farms on the Big Island of Hawaii in 2010, after over 200 yr of borer-free coffee production. Because there are multiple pathways by which H. hampei could have entered Hawaii from >50 coffee-producing nations that harbor the pest, determining the invasion route requires genetic analyses. A previous study identified 27 H. hampei cytochrome c oxidase subunit I haplotypes from around the world using phylogenetic analyses to identify putative species. We sequenced cytochrome c oxidase subunit I from specimens collected in Hawaii and conducted phylogenetic and haplotype network analyses to trace the route of invasion. We conducted a network analysis to trace the most likely pathway that H. hampei could have taken to Hawaii and a phylogenetic analysis to assess clade support for broader groupings in the network analysis that are unlikely to have recently hybridized. The Hawaiian haplotype was identical to a haplotype from six Latin American countries, and our network analysis suggests the most likely route of invasion was from Kenya to Uganda to Latin America to Hawaii. Most coffee shipments from Latin America are fumigated, arrive on Oahu, and are processed before being shipped to other islands. Therefore, it is likely that H. hampei was accidentally transported to the Big Island by farm workers or other travelers from Latin America who carried borer-infested seeds in their clothing or luggage, or else by small quantities of illegally imported beans, although improper fumigation of shipments from Latin America remains a possibility.
    Annals of the Entomological Society of America 07/2015; DOI:10.1093/aesa/sav024 · 1.17 Impact Factor
  • Source
    • "The alignment of C. temensis mtCR sequences was made with the L-INS-i algorithm of MAFFT (Katoh et al. 2005). Unique haplotypes were confirmed from our previous study with DNAsp (Rozas et al. 2003) (considering gaps as a fifth state), which was also used to calculate molecular diversity at each locality. "
Show more