Katoh K, Toh H.. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9: 286-298

Digital Medicine Initiative, Kyushu University, Fukuoka 812-8582, Japan.
Briefings in Bioinformatics (Impact Factor: 9.62). 08/2008; 9(4):286-98. DOI: 10.1093/bib/bbn013
Source: PubMed


The accuracy and scalability of multiple sequence alignment (MSA) of DNAs and proteins have long been and are still important
issues in bioinformatics. To rapidly construct a reasonable MSA, we developed the initial version of the MAFFT program in
2002. MSA software is now facing greater challenges in both scalability and accuracy than those of 5 years ago. As increasing
amounts of sequence data are being generated by large-scale sequencing projects, scalability is now critical in many situations.
The requirement of accuracy has also entered a new stage since the discovery of functional noncoding RNAs (ncRNAs); the secondary
structure should be considered for constructing a high-quality alignment of distantly related ncRNAs. To deal with these problems,
in 2007, we updated MAFFT to Version 6 with two new techniques: the PartTree algorithm and the Four-way consistency objective
function. The former improved the scalability of progressive alignment and the latter improved the accuracy of ncRNA alignment.
We review these and other techniques that MAFFT uses and suggest possible future directions of MSA software as a basis of
comparative analyses. MAFFT is available at

Download full-text


Available from: Kazutaka Katoh, Jan 04, 2014
123 Reads
  • Source
    • "The alignment of the cox1 sequences was trivial, as they showed no evidence of indel mutations. The ITS-2 fragments were aligned with the online version of MAFFT (Katoh & Toh, 2008;, using the Q-INS-I strategy with default options. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We designed a comparative study to unravel the phylogeography of two Alpine endemic spiders characterized by a different degree of adaptation to subterranean life: Troglohyphantes vignai (Araneae, Linyphiidae) and Pimoa rupicola (Araneae, Pimoidae), the latter showing minor adaptation to hypogean life. We sampled populations of the model species in caves and other subterranean habitats across their known geographical range in the Western Alps. By combining phylogeographic inferences and Ecological Niche Modeling techniques, we inferred the biogeographic scenario that led to the present day population structure of the two species. According to our divergent time estimates and relative uncertainties, the isolation of T. vignai and P. rupicola from their northern sister groups was tracked back to Middle–Late Miocene. Furthermore, the fingerprint left by Pleistocene glaciations on the population structure revealed by the genetic data, led to the hypothesis that a progressive adaptation to subterranean habitats occurred in T. vignai, followed by strong population isolation. On the other hand, P. rupicola underwent a remarkable genetic bottleneck during the Pleistocene glaciations, that shaped its present population structure. It seems likely that such shallow population structure is both the result of the minor degree of specialization to hypogean life and the higher dispersal ability characterizing this species. The simultaneous study of overlapping spider species showing different levels of adaptation to hypogean life, disclosed a new way to clarify patterns of biological diversification and to understand the effects of past climatic shift on the subterranean biodiversity.
    PeerJ 11/2015; 3(e):1384. DOI:10.7717/peerj.1384 · 2.11 Impact Factor
  • Source
    • "4.9 (Gene Codesª, MI, USA) was used to assess the quality of sequence chromatograms and to edit them when necessary. Sequences were aligned using MAFFT version 6 with the EeINSei strategy (Katoh and Hiroyuki, 2008). Alignments were visualized using MESQUITE version 7.2 (Maddison and Maddison, 2009) and the 5 0 and 3 0 ends were trimmed to a uniform length. "
    [Show abstract] [Hide abstract]
    ABSTRACT: It has been shown that the disappearance of, or drastic changes in, ancestral and indigenous (or native) endosymbiotic microbiota can lead to many adverse health consequences. However, the effects of changes in beneficial endosymbionts in plants are poorly known (except for mycorrhizal and rhizobial associations). We sampled and compared endophytes from hundreds of trees belonging to the economically important genus Hevea, the source of natural rubber, in their native range in the Amazon basin and in plantations. We also conducted antagonism tests to determine the potential effects that some of these endophytes may have on selected plant pathogenic fungi. The natural and indigenous endosymbiotic mycota of the rubber tree (Hevea) contains a high diversity of beneficial fungi that may protect against pathogens (protective mutualism). In contrast, plantation trees have a reduced and different diversity of these beneficial fungi. We propose that abundance, and not just presence, of competitive fungal strains and species (i.e., Trichoderma and Tolypocladium) create a protective effect against pathogens in wild trees. This study provides support for the importance of mutualistic endosymbionts in plant health and ecosystem resilience, and calls for awareness of their potential loss by human-related activities.
    Fungal Ecology 10/2015; 17. DOI:10.1016/j.funeco.2015.04.001 · 2.93 Impact Factor
    • "multiple alignment and consensus sequence construction) can be resumed. The S N' single contigs are then aligned using MAFFT (Katoh and Toh, 2008), and a consensus sequence is constructed from the multiple alignment. In this study, we conducted 200 assembly simulations for each N (T = 200). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent improvements in next-generation sequencing technology have made it possible to do whole genome sequencing, on even non-model eukaryote species with no available reference genomes. However, de novo assembly of diploid genomes is still a big challenge because of allelic variation. The aim of this study was to determine the feasibility of utilizing the genome of haploid fish larvae for de novo assembly of whole-genome sequences. We compared the efficiency of assembly using the haploid genome of yellowtail (Seriola quinqueradiata) with that using the diploid genome obtained from the dam. De novo assembly from the haploid and the diploid sequence reads (100 million reads per each datasets) generated by the Ion Proton sequencer (200 bp) was done under two different assembly algorithms, namely overlap-layout-consensus (OLC) and de Bruijn graph (DBG). This revealed that the assembly of the haploid genome significantly reduced (approximately 22% for OLC, 9% for DBG) the total number of contigs (with longer average and N50 contig lengths) when compared to the diploid genome assembly. The haploid assembly also improved the quality of the scaffolds by reducing the number of regions with unassigned nucleotides (Ns) (total length of Ns; 45,331,916 bp for haploids and 67,724,360 bp for diploids) in OLC-based assemblies. It appears clear that the haploid genome assembly is better because the allelic variation in the diploid genome disrupts the extension of contigs during the assembly process. Our results indicate that utilizing the genome of haploid larvae leads to a significant improvement in the de novo assembly process, thus providing a novel strategy for the construction of reference genomes from non-model diploid organisms such as fish.
    Gene 10/2015; DOI:10.1016/j.gene.2015.10.015 · 2.14 Impact Factor
Show more