AQUA: Automated quality improvement for multiple sequence alignments

European Molecular Biology Laboratory, Meyerhofstrasse 1, 69012 Heidelberg, Germany.
Bioinformatics (Impact Factor: 4.98). 11/2009; 26(2):263-5. DOI: 10.1093/bioinformatics/btp651
Source: PubMed


Multiple sequence alignment (MSA) is a central tool in most modern biology studies. However, despite generations of valuable
tools, human experts are still able to improve automatically generated MSAs. In an effort to automatically identify the most
reliable MSA for a given protein family, we propose a very simple protocol, named AQUA for ‘Automated quality improvement
for multiple sequence alignments’. Our current implementation relies on two alignment programs (MUSCLE and MAFFT), one refinement
program (RASCAL) and one assessment program (NORMD), but other programs could be incorporated at any of the three steps.

Availability: AQUA is implemented in Tcl/Tk and runs in command line on all platforms. The source code is available under the GNU GPL license.
Source code, README and Supplementary data are available at

Contact: muller{at}, bork{at}

Full-text preview

Available from:
  • Source
    • "The 56 telomere associated genes for this study [14] were clustered into 54 gene families across 26 placental mammals and 4 outgroup species (Monodelphis domestica, Ornithorhynchus anatinus, Taeniopygia guttata and Gallus gallus). Multiple sequence alignments (MSAs) were generated using both distance and evolutionary aware methods [38,39] ensuring a comprehensive exploration of alignment space. Sequences with less than 60% coverage over the entire length of the MSA, or individual columns that did not have 60% minimum coverage across a position, were removed using trimAl [40], giving a final dataset of 52 gene family alignments for further analyses. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Placental mammals display a huge range of life history traits, including size, longevity, metabolic rate and germ line generation time. Although a number of general trends have been proposed between these traits, there are exceptions that warrant further investigation. Species such as naked mole rat, human and certain bat species all exhibit extreme longevity with respect to body size. It has long been established that telomeres and telomere maintenance have a clear role in ageing but it has not yet been established whether there is evidence for adaptation in telomere maintenance proteins that could account for increased longevity in these species. Here we carry out a molecular investigation of selective pressure variation, specifically focusing on telomere associated genes across placental mammals. In general we observe a large number of instances of positive selection acting on telomere genes. Although these signatures of selection overall are not significantly correlated with either longevity or body size we do identify positive selection in the microbat species Myotis lucifugus in functionally important regions of the telomere maintenance genes DKC1 and TERT, and in naked mole rat in the DNA repair gene BRCA1. These results demonstrate the multifarious selective pressures acting across the mammal phylogeny driving lineage-specific adaptations of telomere associated genes. Our results show that regardless of the longevity of a species, these proteins have evolved under positive selection thereby removing increased longevity as the single selective force driving this rapid rate of evolution. However, evidence of molecular adaptations specific to naked mole rat and Myotis lucifugus highlight functionally significant regions in genes that may alter the way in which telomeres are regulated and maintained in these longer-lived species.
    Full-text · Article · Nov 2013 · BMC Evolutionary Biology
  • Source
    • "We built a phylogenetic species tree based on the NCBI taxonomy, which is known to be accurate for most taxa (Benson et al, 2010; Sayers et al, 2011), and inferred branch lengths. To assess the branch lengths, we generated alignments of 40 ubiquitous, single copy marker genes (Ciccarelli et al, 2006) for 853 different species using AQUA (Muller et al, 2010a) and combined the tree topology of the NCBI taxonomy tree with them using PhyML (Guindon et al, 2010). The resulting tree was manually curated and genomes that had an erroneous placement in the NCBI taxonomy tree were removed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Various post-translational modifications (PTMs) fine-tune the functions of almost all eukaryotic proteins, and co-regulation of different types of PTMs has been shown within and between a number of proteins. Aiming at a more global view of the interplay between PTM types, we collected modifications for 13 frequent PTM types in 8 eukaryotes, compared their speed of evolution and developed a method for measuring PTM co-evolution within proteins based on the co-occurrence of sites across eukaryotes. As many sites are still to be discovered, this is a considerable underestimate, yet, assuming that most co-evolving PTMs are functionally associated, we found that PTM types are vastly interconnected, forming a global network that comprise in human alone >50,000 residues in about 6000 proteins. We predict substantial PTM type interplay in secreted and membrane-associated proteins and in the context of particular protein domains and short-linear motifs. The global network of co-evolving PTM types implies a complex and intertwined post-translational regulation landscape that is likely to regulate multiple functional states of many if not all eukaryotic proteins.
    Full-text · Article · Jul 2012 · Molecular Systems Biology
  • Source
    • "To assess the performance of the aligners used in this study, we used the sum-of-pairs score (SP) (22) to compare the alignments produced by the aligner with the reference alignments. The SP score corresponds to the proportion of pairs of residues aligned the same in both alignments. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Multiple sequence alignment (MSA) is a cornerstone of modern molecular biology and represents a unique means of investigating the patterns of conservation and diversity in complex biological systems. Many different algorithms have been developed to construct MSAs, but previous studies have shown that no single aligner consistently outperforms the rest. This has led to the development of a number of ‘meta-methods’ that systematically run several aligners and merge the output into one single solution. Although these methods generally produce more accurate alignments, they are inefficient because all the aligners need to be run first and the choice of the best solution is made a posteriori. Here, we describe the development of a new expert system, AlexSys, for the multiple alignment of protein sequences. AlexSys incorporates an intelligent inference engine to automatically select an appropriate aligner a priori, depending only on the nature of the input sequences. The inference engine was trained on a large set of reference multiple alignments, using a novel machine learning approach. Applying AlexSys to a test set of 178 alignments, we show that the expert system represents a good compromise between alignment quality and running time, making it suitable for high throughput projects. AlexSys is freely available from∼aniba/alexsys.
    Full-text · Article · Oct 2010 · Nucleic Acids Research
Show more