Article

Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments

Department of Physiology, Institute of Molecular Biology of Barcelona, Barcelona, Spain.
Systematic Biology (Impact Factor: 11.53). 09/2007; 56(4):564-77. DOI: 10.1080/10635150701472164
Source: PubMed

ABSTRACT Alignment quality may have as much impact on phylogenetic reconstruction as the phylogenetic methods used. Not only the alignment algorithm, but also the method used to deal with the most problematic alignment regions, may have a critical effect on the final tree. Although some authors remove such problematic regions, either manually or using automatic methods, in order to improve phylogenetic performance, others prefer to keep such regions to avoid losing any information. Our aim in the present work was to examine whether phylogenetic reconstruction improves after alignment cleaning or not. Using simulated protein alignments with gaps, we tested the relative performance in diverse phylogenetic analyses of the whole alignments versus the alignments with problematic regions removed with our previously developed Gblocks program. We also tested the performance of more or less stringent conditions in the selection of blocks. Alignments constructed with different alignment methods (ClustalW, Mafft, and Probcons) were used to estimate phylogenetic trees by maximum likelihood, neighbor joining, and parsimony. We show that, in most alignment conditions, and for alignments that are not too short, removal of blocks leads to better trees. That is, despite losing some information, there is an increase in the actual phylogenetic signal. Overall, the best trees are obtained by maximum-likelihood reconstruction of alignments cleaned by Gblocks. In general, a relaxed selection of blocks is better for short alignment, whereas a stringent selection is more adequate for longer ones. Finally, we show that cleaned alignments produce better topologies although, paradoxically, with lower bootstrap. This indicates that divergent and problematic alignment regions may lead, when present, to apparently better supported although, in fact, more biased topologies.

Download full-text

Full-text

Available from: Gerard Talavera, Aug 20, 2015
2 Followers
 · 
184 Views
  • Source
    • "Using the MEGA interface, gaps in the aligned sequences were manually removed and the sequences were entered into Gblocks (Talavera, and Castresana, 2007). This removed poorly aligned and divergent regions within the alignment, creating a more suitable alignment for the phylogenetic analysis. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Pseudoroegneria species are perennial grasses in the Triticeae tribe, whose St genome has been linked to several important polyploid species. Due to frequent hybridization and complex genetic mechanism, the relationships within Pseudoroegneria, and within the Triticeae have been heavily disputed. Using the chloroplast rbcL gene we estimated the nucleotide diversity of 8 Pseudoroegneria species. We also examined the phylogenetic relationships within Pseudoroegneria and of Pseudoroegneria within the Triticeae. The estimates of nucleotide diversity indicated that Pseudoroegneria tauri and Pseudoroegneria spicata species had the highest diversity, while Pseudoroegneria gracillima had the lowest diversity. The phylogenetic analysis of Pseudoroegneria placed all P. spicata species into a clade separate from the other Pseudoroegneria species, while the relationship of the other Pseudoroegneria species could not be determined. Due to the groupings of Pseudoroegneria with the polyploid Elymus, our results strongly supported Pseudoroegneria as the maternal genome donor to Elymus. There was also weak support that P. spicata may be the maternal donor to the StH Elymus species.
    Biochemical Systematics and Ecology 08/2015; DOI:10.1016/j.bse.2015.07.038 · 1.17 Impact Factor
  • Source
    • "to align the sequences. We used the online GBlocks server v. 0.91b (Castresana 2002), using the option 'Allow gap positions within the final blocks', to detect alignmentambiguous sites that were subsequently excluded from the analysis (Gatesy, DeSalle & Wheeler 1993; Castresana 2000; Talavera & Castresana 2007). The gene partitions were concatenated using Mesquite v. 2.75 (Maddison & Maddison 2008). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Antonbruunia sociabilis sp. nov., an abundant endosymbiont of Thyasira scotiae from a putative sulphidic ‘seep’ in the Hatton-Rockall Basin (1187–1200 m), North-East Atlantic Ocean, is described. The new species is compared with A. viridis and A. gerdesi from the West Indian Ocean and South-East Pacific Ocean respectively. The three species can be distinguished using a suite of morphological characters, and are associated with geographically separated chemosynthetic bivalve molluscs from different families (Thyasiridae, Lucinidae, Vesicomyidae) living in sediments at different depths. New morphological features are recognized for Antonbruunia and a re-assessment of its systematic affinities indicates a close relationship with the Pilargidae. Previous suggestions of an affiliation with the Nautiliniellidae, recently incorporated into the Calamyzinae (Chrysopetalidae), were not supported. The apparent morphological similarities between the two groups are indicative of convergence related to their shared relationships with chemosynthetic bivalves. The first molecular analyses of Antonbruunia (16S and 18S rDNA) clearly indicate that a close relationship to Pilargidae (represented by Ancistrosyllis sp. and Sigambra sp.) is more likely than an affinity to Calamyzinae (represented by Calamyzas amphictenicola, Natushima sp., and Vigtorniella sp.).
    Zootaxa 08/2015; 3995(1):20-36. DOI:10.11646/zootaxa.3995.1.4 · 1.06 Impact Factor
  • Source
    • "Each protein family was individually aligned using Clustal Omega (Sievers et al. 2011). The aligned protein sequences were then trimmed using Gblocks 0.91b (Castresana 2000) with relaxed parameters (Talavera and Castresana 2007). The trimmed alignments were concatenated to create a combined dataset that was 55 890 amino acid residues long. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The phylum Chlamydiae contains nine ecologically and genetically diverse families all placed within a single order. In this work, we have completed a comprehensive comparative analysis of 36 sequenced Chlamydiae genomes in order to identify shared molecular characteristics, namely conserved signature insertions/deletions (CSIs) and conserved signature proteins (CSPs), which can serve as distinguishing characteristics of supra-familial clusters within the phylum Chlamydiae. Our analysis has led to the identification of 32 CSIs which are specific to clusters within the phylum Chlamydiae at various phylogenetic depths. Importantly, 17 CSIs and 98 CSPs were found to be specific for the family Chlamydiaceae while another 3 CSI variants and 15 CSPs were specific for a grouping of the families Criblamydiaceae, Parachlamydiaceae, Simkaniaceae and Waddliaceae. These two clusters were also found to be distinguishable in 16S rRNA based phylogenetic trees, concatenated protein based phylogenetic trees, character compatibility based phylogenetic analyses, and on the basis of 16S rRNA gene sequence identity and average amino acid identity values. On the basis of the identified molecular characteristics, branching in phylogenetic trees, and the genetic distance between the two clusters within the phylum Chlamydiae we propose a division of the class Chlamydiia into two orders: an emended order Chlamydiales, containing the family Chlamydiaceae and the closely related Candidatus family Clavichlamydiaceae, and the novel order Parachlamydiales ord. nov. containing the families Parachlamydiaceae, Simkaniaceae and Waddliaceae and the Candidatus families Criblamydiaceae, Parilichlamydiaceae, Piscichlamydiaceae, and Rhabdochlamydiaceae. We also include a brief discussion of the reunification of the genera Chlamydia and Chlamydophila.
    Antonie van Leeuwenhoek 07/2015; 108(3). DOI:10.1007/s10482-015-0532-1 · 2.14 Impact Factor
Show more