Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments

Department of Physiology, Institute of Molecular Biology of Barcelona, Barcelona, Spain.
Systematic Biology (Impact Factor: 14.39). 09/2007; 56(4):564-77. DOI: 10.1080/10635150701472164
Source: PubMed

ABSTRACT Alignment quality may have as much impact on phylogenetic reconstruction as the phylogenetic methods used. Not only the alignment algorithm, but also the method used to deal with the most problematic alignment regions, may have a critical effect on the final tree. Although some authors remove such problematic regions, either manually or using automatic methods, in order to improve phylogenetic performance, others prefer to keep such regions to avoid losing any information. Our aim in the present work was to examine whether phylogenetic reconstruction improves after alignment cleaning or not. Using simulated protein alignments with gaps, we tested the relative performance in diverse phylogenetic analyses of the whole alignments versus the alignments with problematic regions removed with our previously developed Gblocks program. We also tested the performance of more or less stringent conditions in the selection of blocks. Alignments constructed with different alignment methods (ClustalW, Mafft, and Probcons) were used to estimate phylogenetic trees by maximum likelihood, neighbor joining, and parsimony. We show that, in most alignment conditions, and for alignments that are not too short, removal of blocks leads to better trees. That is, despite losing some information, there is an increase in the actual phylogenetic signal. Overall, the best trees are obtained by maximum-likelihood reconstruction of alignments cleaned by Gblocks. In general, a relaxed selection of blocks is better for short alignment, whereas a stringent selection is more adequate for longer ones. Finally, we show that cleaned alignments produce better topologies although, paradoxically, with lower bootstrap. This indicates that divergent and problematic alignment regions may lead, when present, to apparently better supported although, in fact, more biased topologies.

Download full-text


Available from: Gerard Talavera, Sep 29, 2015
96 Reads
  • Source
    • "Because tests of positive selection are sensitive to sequencing, annotation and alignment errors (Talavera and Castresana 2007; Scheinfeldt et al. 2009), we used highly stringent criteria to filter our alignments. First, unreliably aligned regions were removed using Gblocks version 0.91 b (Talavera and Castresana 2007), with default parameters . Additionally, we used an ad-hoc filtering procedure in order to remove annotation errors, including the following steps: 1) Identification of unique amino acid replacement (i.e., amino acids that are unique to a given species in a certain alignment column); 2) identification of alignment regions with a very high incidence of unique substitutions in the same species ; in particular, we used a sliding window approach to identify regions of 15 amino acids containing ten or more unique substitutions in the same sequence, as well as regions of five amino acids containing five unique substitutions in the same sequence; these patterns are unlikely to represent true Luisi et al. "
  • Source
    • "Alignments were visually inspected to detect and manually correct misaligned positions. Poorly aligned and highly divergent positions of the HcpR data sets were excluded with the program Gblocks, version 0.91b [28] [29]. Pairwise distances were computed in MEGA, version 6 [30], using p-distance and pairwise deletion, to obtain values that represent the proportion of positions that differ between every two sequences in our alignments. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Desulfovibrio gigas belongs to the group of sulfate reducing bacteria (SRB). These ubiquitous and metabolically versatile microorganisms are often exposed to reactive nitrogen species (RNS). Nonetheless, the mechanisms and regulatory elements involved in nitrosative stress protection are still poorly understood. The transcription factor HcpR has emerged as a putative regulator of nitrosative stress response among anaerobic bacteria. HcpR is known to orchestrate the expression of the hybrid cluster protein gene, hcp, proposed to be involved in cellular defense against RNS. According to phylogenetic analyses, the occurrence of hcpR paralog genes is a common feature among several Desulfovibrio species. Within the D. gigas genome we have identified two HcpR-related sequences. One of these sequences, hcpR1, was found in the close vicinity of the hcp gene and this finding prompted us to proceed with its functional characterization. We observed that the growth of a D. gigas strain lacking hcpR1 is severely impaired under nitrosative stress. An in silico search revealed several putative targets of HcpR1 that were experimentally validated. The fact that HcpR1 regulates several genes encoding proteins involved in nitrite and nitrate metabolism, together with the sensitive growth phenotype to NO displayed by an hcpR1 mutant strain, strongly supports a relevant role of this factor under nitrosative stress. Moreover, the finding that several Desulfovibrio species possess HcpR paralogs, which have been transmitted vertically in the evolution and diversification of the genus, suggests that these sequences may confer adaptive or survival advantage to these organisms, possibly by increasing their tolerance to nitrosative stress.
    FEBS Open Bio 08/2015; 5:594-604. DOI:10.1016/j.fob.2015.07.001 · 1.52 Impact Factor
  • Source
    • "Customized computer scripts were then used to extract the best reciprocal hits from all the strains and to align these protein sequences with Clustal omega (Sievers et al., 2011). The alignments were then filtered using Gblocks version 0.91b (Talavera & Castresana, 2007) with default options and concatenated. A final alignment of 1,647 concatenated proteins (514,787 amino acids) was used in the phylogenetic analyses. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Stenotrophomonas maltophilia, a ubiquitous Gram negative γ-proteobacterium, has emerged as an important opportunistic pathogen responsible for nosocomial infections. A major characteristic of clinical isolates is their high intrinsic or acquired antibiotic resistance level. The aim of this study was to decipher the genetic determinism of antibiotic resistance among strains from different origins (i.e. natural environment and clinical origin) showing various antibiotic resistance profiles. To this purpose we selected 3 strains isolated from soil collected in France or Burkina Faso that showed contrasting antibiotic resistance profiles. After whole genome sequencing, the phylogenetic relationships of these 3 strains and 11 strains with available genome sequences were determined. Results showed that a strain's phylogeny did not match their origin or antibiotic resistance profiles. Numerous antibiotic resistance coding genes and efflux pump operons were revealed by the genome analysis, with 57% of the identified genes not previously described. No major variation in the antibiotic resistance gene content was observed between strains irrespective of their origin and antibiotic resistance profiles. Although environmental strains generally carry as many MDR efflux pumps as clinical strains, the absence of RND pumps (i.e. SmeABC) previously described to be specific to S. maltophilia was revealed in two environmental strains (BurA1 and PierC1). Furthermore the genome analysis of the environmental MDR strain BurA1 showed the absence of SmeABC but the presence of another putative MDR RND efflux pump, named EbyCAB on a genomic island probably acquired via horizontal gene transfer. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
    Genome Biology and Evolution 08/2015; 7(9). DOI:10.1093/gbe/evv161 · 4.23 Impact Factor
Show more