The order Hymenoptera (bees, ants, wasps, sawflies) contains about eight percent of all described species, but no analytical studies have addressed the origins of this richness at family-level or above. To investigate which major subtaxa experienced significant shifts in diversification, we assembled a family-level phylogeny of the Hymenoptera using supertree methods. We used sister-group species-richness comparisons to infer the phylogenetic position of shifts in diversification.
The supertrees most supported by the underlying input trees are produced using matrix representation with compatibility (MRC) (from an all-in and a compartmentalised analysis). Whilst relationships at the tips of the tree tend to be well supported, those along the backbone of the tree (e.g. between Parasitica superfamilies) are generally not. Ten significant shifts in diversification (six positive and four negative) are found common to both MRC supertrees. The Apocrita (wasps, ants, bees) experienced a positive shift at their origin accounting for approximately 4,000 species. Within Apocrita other positive shifts include the Vespoidea (vespoid wasps/ants containing 24,000 spp.), Anthophila + Sphecidae (bees/thread-waisted wasps; 22,000 spp.), Bethylidae + Chrysididae (bethylid/cuckoo wasps; 5,200 spp.), Dryinidae (dryinid wasps; 1,100 spp.), and Proctotrupidae (proctotrupid wasps; 310 spp.). Four relatively species-poor families (Stenotritidae, Anaxyelidae, Blasticotomidae, Xyelidae) have undergone negative shifts. There are some two-way shifts in diversification where sister taxa have undergone shifts in opposite directions.
Our results suggest that numerous phylogenetically distinctive radiations contribute to the richness of large clades. They also suggest that evolutionary events restricting the subsequent richness of large clades are common. Problematic phylogenetic issues in the Hymenoptera are identified, relating especially to superfamily validity (e.g. "Proctotrupoidea", "Mymarommatoidea"), and deeper apocritan relationships. Our results should stimulate new functional studies on the causes of the diversification shifts we have identified. Possible drivers highlighted for specific adaptive radiations include key anatomical innovations, the exploitation of rich host groups, and associations with angiosperms. Low richness may have evolved as a result of geographical isolation, specialised ecological niches, and habitat loss or competition.
The Solanaceae is a plant family of great economic importance. Despite a wealth of phylogenetic work on individual clades and a deep knowledge of particular cultivated species such as tomato and potato, a robust evolutionary framework with a dated molecular phylogeny for the family is still lacking. Here we investigate molecular divergence times for Solanaceae using a densely-sampled species-level phylogeny. We also review the fossil record of the family to derive robust calibration points, and estimate a chronogram using an uncorrelated relaxed molecular clock.
Our densely-sampled phylogeny shows strong support for all previously identified clades of Solanaceae and strongly supported relationships between the major clades, particularly within Solanum. The Tomato clade is shown to be sister to section Petota, and the Regmandra clade is the first branching member of the Potato clade. The minimum age estimates for major splits within the family provided here correspond well with results from previous studies, indicating splits between tomato and potato around 8 Million years ago (Ma) with a 95% highest posterior density (HPD) 7-10 Ma, Solanum and Capsicum c. 19 Ma (95% HPD 17-21), and Solanum and Nicotiana c. 24 Ma (95% HPD 23-26).
Our large time-calibrated phylogeny provides a significant step towards completing a fully sampled species-level phylogeny for Solanaceae, and provides age estimates for the whole family. The chronogram now includes 40% of known species and all but two monotypic genera, and is one of the best sampled angiosperm family phylogenies both in terms of taxon sampling and resolution published thus far. The increased resolution in the chronogram combined with the large increase in species sampling will provide much needed data for the examination of many biological questions using Solanaceae as a model system.
Analyzing regulation of bacteriophage gene expression historically lead to establishing major paradigms of molecular biology, and may provide important medical applications in the future. Temporal regulation of bacteriophage transcription is commonly analyzed through a labor-intensive combination of biochemical and bioinformatic approaches and macroarray measurements. We here investigate to what extent one can understand gene expression strategies of lytic phages, by directly analyzing their genomes through bioinformatic methods. We address this question on a recently sequenced lytic bacteriophage 7 - 11 that infects bacterium Salmonella enterica.
We identify novel promoters for the bacteriophage-encoded σ factor, and test the predictions through homology with another bacteriophage (phiEco32) that has been experimentally characterized in detail. Interestingly, standard approach based on multiple local sequence alignment (MLSA) fails to correctly identify the promoters, but a simpler procedure that is based on pairwise alignment of intergenic regions identifies the desired motifs; we argue that such search strategy is more effective for promoters of bacteriophage-encoded σ factors that are typically well conserved but appear in low copy numbers, which we also verify on two additional bacteriophage genomes. Identifying promoters for bacteriophage encoded σ factors together with a more straightforward identification of promoters for bacterial encoded σ factor, allows clustering the genes in putative early, middle and late class, and consequently predicting the temporal regulation of bacteriophage gene expression, which we demonstrate on phage 7-11.
While MLSA algorithms proved highly useful in computational analysis of transcription regulation, we here established that a simpler procedure is more successful for identifying promoters that are recognized by bacteriophage encoded σ factor/RNA polymerase. We here used this approach for predicting sequence specificity of a novel (bacteriophage encoded) σ factor, and consequently inferring phage 7-11 transcription strategy. Therefore, direct analysis of bacteriophage genome sequences is a plausible first-line approach for efficiently inferring phage transcription strategies, and may provide a wealth of information on transcription initiation by diverse σ factors/RNA polymerases.
Ecological character displacement is a process of phenotypic differentiation of sympatric populations caused by interspecific competition. Such differentiation could facilitate speciation by enhancing reproductive isolation between incipient species, although empirical evidence for it at early stages of divergence when gene flow still occurs between the species is relatively scarce. Here we studied patterns of morphological variation in sympatric and allopatric populations of two hybridizing species of birds, the Common Nightingale (Luscinia megarhynchos) and the Thrush Nightingale (L. luscinia).
We conducted principal component (PC) analysis of morphological traits and found that nightingale species converged in overall body size (PC1) and diverged in relative bill size (PC3) in sympatry. Closer analysis of morphological variation along geographical gradients revealed that the convergence in body size can be attributed largely to increasing body size with increasing latitude, a phenomenon known as Bergmann's rule. In contrast, interspecific interactions contributed significantly to the observed divergence in relative bill size, even after controlling for the effects of geographical gradients. We suggest that the divergence in bill size most likely reflects segregation of feeding niches between the species in sympatry.
Our results suggest that interspecific competition for food resources can drive species divergence even in the face of ongoing hybridization. Such divergence may enhance reproductive isolation between the species and thus contribute to speciation.
Due to its history, with a high number of migration events, the Mediterranean basin represents a challenging area for population genetic studies. A large number of genetic studies have been carried out in the Mediterranean area using different markers but no consensus has been reached on the genetic landscape of the Mediterranean populations. In order to further investigate the genetics of the human Mediterranean populations, we typed 894 individuals from 11 Mediterranean populations with 25 single-nucleotide polymorphisms (SNPs) located on the X-chromosome.
A high overall homogeneity was found among the Mediterranean populations except for the population from Morocco, which seemed to differ genetically from the rest of the populations in the Mediterranean area. A very low genetic distance was found between populations in the Middle East and most of the western part of the Mediterranean Sea.A higher migration rate in females versus males was observed by comparing data from X-chromosome, mt-DNA and Y-chromosome SNPs both in the Mediterranean and a wider geographic area.Multilocus association was observed among the 25 SNPs on the X-chromosome in the populations from Ibiza and Cosenza.
Our results support both the hypothesis of (1) a reduced impact of the Neolithic Wave and more recent migration movements in NW-Africa, and (2) the importance of the Strait of Gibraltar as a geographic barrier. In contrast, the high genetic homogeneity observed in the Mediterranean area could be interpreted as the result of the Neolithic wave caused by a large demic diffusion and/or more recent migration events. A differentiated contribution of males and females to the genetic landscape of the Mediterranean area was observed with a higher migration rate in females than in males. A certain level of background linkage disequilibrium in populations in Ibiza and Cosenza could be attributed to their demographic background.
Some of the most difficult phylogenetic questions in evolutionary biology involve identification of the free-living relatives of parasitic organisms, particularly those of parasitic flowering plants. Consequently, the number of origins of parasitism and the phylogenetic distribution of the heterotrophic lifestyle among angiosperm lineages is unclear.
Here we report the results of a phylogenetic analysis of 102 species of seed plants designed to infer the position of all haustorial parasitic angiosperm lineages using three mitochondrial genes: atp1, coxI, and matR. Overall, the mtDNA phylogeny agrees with independent studies in terms of non-parasitic plant relationships and reveals at least 11 independent origins of parasitism in angiosperms, eight of which consist entirely of holoparasitic species that lack photosynthetic ability. From these results, it can be inferred that modern-day parasites have disproportionately evolved in certain lineages and that the endoparasitic habit has arisen by convergence in four clades. In addition, reduced taxon, single gene analyses revealed multiple horizontal transfers of atp1 from host to parasite lineage, suggesting that parasites may be important vectors of horizontal gene transfer in angiosperms. Furthermore, in Pilostyles we show evidence for a recent host-to-parasite atp1 transfer based on a chimeric gene sequence that indicates multiple historical xenologous gene acquisitions have occurred in this endoparasite. Finally, the phylogenetic relationships inferred for parasites indicate that the origins of parasitism in angiosperms are strongly correlated with horizontal acquisitions of the invasive coxI group I intron.
Collectively, these results indicate that the parasitic lifestyle has arisen repeatedly in angiosperm evolutionary history and results in increasing parasite genomic chimerism over time.
Stearoyl-CoA desaturases (SCDs) are key enzymes involved in de novo monounsaturated fatty acid synthesis. They catalyze the desaturation of saturated fatty acyl-CoA substrates at the delta-9 position, generating essential components of phospholipids, triglycerides, cholesterol esters and wax esters. Despite being crucial for interpreting SCDs roles across species, the evolutionary history of the SCD gene family in vertebrates has yet to be elucidated, in particular their isoform diversity, origin and function. This work aims to contribute to this fundamental effort.
We show here, through comparative genomics and phylogenetics that the SCD gene family underwent an unexpectedly complex history of duplication and loss events. Paralogy analysis hints that SCD1 and SCD5 genes emerged as part of the whole genome duplications (2R) that occurred at the stem of the vertebrate lineage. The SCD1 gene family expanded in rodents with the parallel loss of SCD5 in the Muridae family. The SCD1 gene expansion is also observed in the Lagomorpha although without the SCD5 loss. In the amphibian Xenopus tropicalis we find a single SCD1 gene but not SCD5, though this could be due to genome incompleteness. In the analysed teleost species no SCD5 is found, while the surrounding SCD5-less locus is conserved in comparison to tetrapods. In addition, the teleost SCD1 gene repertoire expanded to two copies as a result of the teleost specific genome duplication (3R). Finally, we describe clear orthologues of SCD1 and SCD5 in the chondrichthian, Scyliorhinus canicula, a representative of the oldest extant jawed vertebrate clade. Expression analysis in S. canicula shows that whilst SCD1 is ubiquitous, SCD5 is mainly expressed in the brain, a pattern which might indicate an evolutionary conserved function.
We conclude that the SCD1 and SCD5 genes emerged as part of the 2R genome duplications. We propose that the evolutionary conserved gene expression between distinct lineages underpins the importance of SCD activity in the brain (and probably the pancreas), in a yet to be defined role. We argue that an expression independent of an external stimulus, such as diet induced activity, emerged as a novel function in vertebrate ancestry allocated to the SCD5 isoform in various tissues (e.g. brain and pancreas), and it was selectively maintained throughout vertebrate evolution.
Antibiotic resistance represents a significant public health problem. When resistance genes are mobile, being carried on plasmids or phages, their spread can be greatly accelerated. Plasmids in particular have been implicated in the spread of antibiotic resistance genes. However, the selective pressures which favour plasmid-carried resistance genes have not been fully established. Here we address this issue with mathematical models of plasmid dynamics in response to different antibiotic treatment regimes.
We show that transmission of plasmids is a key factor influencing plasmid-borne antibiotic resistance, but the dosage and interval between treatments is also important. Our results also hold when plasmids carrying the resistance gene are in competition with other plasmids that do not carry the resistance gene. By altering the interval between antibiotic treatments, and the dosage of antibiotic, we show that different treatment regimes can select for either plasmid-carried, or chromosome-carried, resistance.
Our research addresses the effect of environmental variation on the evolution of plasmid-carried antibiotic resistance.
Environmental stress can result in strong ecological and evolutionary effects on natural populations, but to what extent it drives adaptive divergence of natural populations is little explored. We used common garden experiments to study adaptive divergence in embryonic and larval fitness traits (embryonic survival, larval growth, and age and size at metamorphosis) in eight moor frog, Rana arvalis, populations inhabiting an acidification gradient (breeding pond pH 4.0 to 7.5) in southwestern Sweden. Embryos were raised until hatching at three (pH 4.0, 4.3 and 7.5) and larvae until metamorphosis at two (pH 4.3 and 7.5) pH treatments. To get insight into the putative selective agents along this environmental gradient, we measured relevant abiotic and biotic environmental variables from each breeding pond, and used linear models to test for phenotype-environment correlations.
We found that acid origin populations had higher embryonic and larval acid tolerance (survival and larval period were less negatively affected by low pH), higher larval growth but slower larval development rates, and metamorphosed at a larger size. The phenotype-environment correlations revealed that divergence in embryonic acid tolerance and metamorphic size correlated most strongly with breeding pond pH, whereas divergence in larval period and larval growth correlated most strongly with latitude and predator density, respectively.
Our results suggest that R. arvalis has diverged in response to pH mediated selection along this acidification gradient. However, as latitude and pH were closely spatially correlated in this study, further studies are needed to disentangle the specific agents of natural selection along acidification gradients. Our study highlights the need to consider the multiple interacting selective forces that drive adaptive divergence of natural populations along environmental stress gradients.
PCR-based surveys have shown that guppies (Poecilia reticulata) have an unusually large visual-opsin gene repertoire. This has led to speculation that opsin duplication and divergence has enhanced the evolution of elaborate male coloration because it improves spectral sensitivity and/or discrimination in females. However, this conjecture on evolutionary connections between opsin repertoire, vision, mate choice, and male coloration was generated with little data on gene expression. Here, we used RT-qPCR to survey visual-opsin gene expression in the eyes of males, females, and juveniles in order to further understand color-based sexual selection from the perspective of the visual system.
Juvenile and adult (male and female) guppies express 10 visual opsins at varying levels in the eye. Two opsin genes in juveniles, SWS2B and RH2-2, accounted for > 85% of all visual-opsin transcripts in the eye, excluding RH1. This relative abundance (RA) value dropped to about 65% in adults, as LWS-A180 expression increased from approximately 3% to 20% RA. The juvenile-to-female transition also showed LWS-S180 upregulation from about 1.5% to 7% RA. Finally, we found that expression in guppies' SWS2-LWS gene cluster is negatively correlated with distance from a candidate locus control region (LCR).
Selective pressures influencing visual-opsin gene expression appear to differ among age and sex. LWS upregulation in females is implicated in augmenting spectral discrimination of male coloration and courtship displays. In males, enhanced discrimination of carotenoid-rich food and possibly rival males are strong candidate selective pressures driving LWS upregulation. These developmental changes in expression suggest that adults possess better wavelength discrimination than juveniles. Opsin expression within the SWS2-LWS gene cluster appears to be regulated, in part, by a common LCR. Finally, by comparing our RT-qPCR data to MSP data, we were able to propose the first opsin-to-λmax assignments for all photoreceptor types in the cone mosaic.
Bos primigenius, the aurochs, is the wild ancestor of modern cattle breeds and was formerly widespread across Eurasia and northern Africa. After a progressive decline, the species became extinct in 1627. The origin of modern taurine breeds in Europe is debated. Archaeological and early genetic evidence point to a single Near Eastern origin and a subsequent spread during the diffusion of herding and farming. More recent genetic data are instead compatible with local domestication events or at least some level of local introgression from the aurochs. Here we present the analysis of the complete mitochondrial genome of a pre-Neolithic Italian aurochs.
In this study, we applied a combined strategy employing both multiplex PCR amplifications and 454 pyrosequencing technology to sequence the complete mitochondrial genome of an 11,450-year-old aurochs specimen from Central Italy. Phylogenetic analysis of the aurochs mtDNA genome supports the conclusions from previous studies of short mtDNA fragments--namely that Italian aurochsen were genetically very similar to modern cattle breeds, but highly divergent from the North-Central European aurochsen.
Complete mitochondrial genome sequences are now available for several modern cattle and two pre-Neolithic mtDNA genomes from very different geographic areas. These data suggest that previously identified sub-groups within the widespread modern cattle mitochondrial T clade are polyphyletic, and they support the hypothesis that modern European breeds have multiple geographic origins.
Seed storage proteins are a major source of dietary protein, and the content of such proteins determines both the quantity and quality of crop yield. Significantly, examination of the protein content in the seeds of crop plants shows a distinct difference between monocots and dicots. Thus, it is expected that there are different evolutionary patterns in the genes underlying protein synthesis in the seeds of these two groups of plants.
Gene duplication, evolutionary rate and positive selection of a major gene family of seed storage proteins (the 11S globulin genes), were compared in dicots and monocots. The results, obtained from five species in each group, show more gene duplications, a higher evolutionary rate and positive selections of this gene family in dicots, which are rich in 11S globulins, but not in the monocots.
Our findings provide evidence to support the suggestion that gene duplication and an accelerated evolutionary rate may be associated with higher protein synthesis in dicots as compared to monocots.
Cryptic species are two or more distinct but morphologically similar species that were classified as a single species. During the past two decades we observed an exponential growth of publications on cryptic species. Recently published reviews have demonstrated cryptic species have profound consequences on many biological disciplines. It has been proposed that their distribution is non-random across taxa and biomes.
We analysed a literature database for the taxonomic and biogeographical distribution of cryptic animal species reports. Results from regression analysis indicate that cryptic species are almost evenly distributed among major metazoan taxa and biogeographical regions when corrected for species richness and study intensity.
This indicates that morphological stasis represents an evolutionary constant and that cryptic metazoan diversity does predictably affect estimates of earth's animal diversity. Our findings have direct theoretical and practical consequences for a number of prevailing biological questions with regard to global biodiversity estimates, conservation efforts and global taxonomic initiatives.
The lancelet Asymmetron inferum (subphylum Cephalochordata) was recently discovered on the ocean floor off the southwest coast of Japan at a depth of 229 m, in an anaerobic and sulfide-rich environment caused by decomposing bodies of the sperm whale Physeter macrocephalus. This deep sulfide-rich habitat of A. inferum is unique among the lancelets. The distinguishing adaptation of this species to such an extraordinary habitat can be considered in a phylogenetic framework. As the first step of reconstruction of the evolutionary processes in this species, we investigated its phylogenetic position based on 11 whole mitochondrial genome sequences including the newly determined ones of the whale-fall lancelet A. inferum and two coral-reef congeners.
Our phylogenetic analyses showed that extant lancelets are clustered into two major clades, the Asymmetron clade and the Epigonichthys + Branchiostoma clade. A. inferum was in the former and placed in the sister group to A. lucayanum complex. The divergence time between A. inferum and A. lucayanum complex was estimated to be 115 Mya using the penalized likelihood (PL) method or 97 Mya using the nonparametric rate smoothing (NPRS) method (the middle Cretaceous). These are far older than the first appearance of large whales (the middle Eocene, 40 Mya). We also discovered that A. inferum mitogenome (mitochondrial genome) has been subjected to large-scale gene rearrangements, one feature of rearrangements being unique among the lancelets and two features shared with A. lucayanum complex.
Our study supports the monophyly of genus Asymmetron assumed on the basis of the morphological characters. Furthermore, the features of the A. inferum mitogenome expand our knowledge of variation within cephalochordate mitogenomes, adding a new case of transposition and inversion of the trnQ gene. Our divergence time estimation suggests that A. inferum remained a member of the Mesozoic and the early Cenozoic large vertebrate-fall communities before shifting to become a whale-fall specialist.
The extant squamates (>9400 known species of lizards and snakes) are one of the most diverse and conspicuous radiations of terrestrial vertebrates, but no studies have attempted to reconstruct a phylogeny for the group with large-scale taxon sampling. Such an estimate is invaluable for comparative evolutionary studies, and to address their classification. Here, we present the first large-scale phylogenetic estimate for Squamata.
The estimated phylogeny contains 4161 species, representing all currently recognized families and subfamilies. The analysis is based on up to 12896 base pairs of sequence data per species (average = 2497 bp) from 12 genes, including seven nuclear loci (BDNF, c-mos, NT3, PDC, R35, RAG-1, and RAG-2), and five mitochondrial genes (12S, 16S, cytochrome b, ND2, and ND4). The tree provides important confirmation for recent estimates of higher-level squamate phylogeny based on molecular data (but with more limited taxon sampling), estimates that are very different from previous morphology-based hypotheses. The tree also includes many relationships that differ from previous molecular estimates and many that differ from traditional taxonomy.
We present a new large-scale phylogeny of squamate reptiles that should be a valuable resource for future comparative studies. We also present a revised classification of squamates at the family and subfamily level to bring the taxonomy more in line with the new phylogenetic hypothesis. This classification includes new, resurrected, and modified subfamilies within gymnophthalmid and scincid lizards, and boid, colubrid, and lamprophiid snakes.
The use of mitochondrial DNA data in phylogenetics is controversial, yet studies that combine mitochondrial and nuclear DNA data (mtDNA and nucDNA) to estimate phylogeny are common, especially in vertebrates. Surprisingly, the consequences of combining these data types are largely unexplored, and many fundamental questions remain unaddressed in the literature. For example, how much do trees from mtDNA and nucDNA differ? How are topological conflicts between these data types typically resolved in the combined-data tree? What determines whether a node will be resolved in favor of mtDNA or nucDNA, and are there any generalities that can be made regarding resolution of mtDNA-nucDNA conflicts in combined-data trees? Here, we address these and related questions using new and published nucDNA and mtDNA data for Plethodon salamanders and published data from 13 other vertebrate clades (including fish, frogs, lizards, birds, turtles, and mammals).
We find widespread discordance between trees from mtDNA and nucDNA (30-70% of nodes disagree per clade), but this discordance is typically not strongly supported. Despite often having larger numbers of variable characters, mtDNA data do not typically dominate combined-data analyses, and combined-data trees often share more nodes with trees from nucDNA alone. There is no relationship between the proportion of nodes shared between combined-data and mtDNA trees and relative numbers of variable characters or levels of homoplasy in the mtDNA and nucDNA data sets. Congruence between trees from mtDNA and nucDNA is higher on branches that are longer and deeper in the combined-data tree, but whether a conflicting node will be resolved in favor mtDNA or nucDNA is unrelated to branch length. Conflicts that are resolved in favor of nucDNA tend to occur at deeper nodes in the combined-data tree. In contrast to these overall trends, we find that Plethodon have an unusually large number of strongly supported conflicts between data types, which are generally resolved in favor of mtDNA in the combined-data tree (despite the large number of nuclear loci sampled).
Overall, our results from 14 vertebrate clades show that combined-data analyses are not necessarily dominated by the more variable mtDNA data sets. However, given cases like Plethodon, there is also the need for routine checking of incongruence between mtDNA and nucDNA data and its impacts on combined-data analyses.
Maternally transmitted symbionts have evolved a variety of ways to promote their spread through host populations. One strategy is to hamper the reproduction of uninfected females by a mechanism called cytoplasmic incompatibility (CI). CI occurs in crosses between infected males and uninfected females and leads to partial to near-complete infertility. CI-infections are under positive frequency-dependent selection and require genetic drift to overcome the range of low frequencies where they are counter-selected. Given the importance of drift, population sub-division would be expected to facilitate the spread of CI. Nevertheless, a previous model concluded that variance in infection between competing groups of breeding individuals impedes the spread of CI.
In this paper we derive a model on the spread of CI-infections in populations composed of demes linked by restricted migration. Our model shows that population sub-division facilitates the invasion of CI. While host philopatry (low migration) favours the spread of infection, deme size has a non-monotonous effect, with CI-invasion being most likely at intermediate deme size. Individual-based simulations confirm these predictions and show that high levels of local drift speed up invasion but prevent high levels of prevalence across the entire population. Additional simulations with sex-specific migration rates further show that low migration rates of both sexes are required to facilitate the spread of CI.
Our analyses show that population structure facilitates the invasion of CI-infections. Since some level of sub-division is likely to occur in most natural populations, our results help to explain the high incidence of CI-infections across species of arthropods. Furthermore, our work has important implications for the use of CI-systems in order to genetically modify natural populations of disease vectors.
It has long been known that rates of synonymous substitutions are unusually low in mitochondrial genes of flowering and other land plants. Although two dramatic exceptions to this pattern have recently been reported, it is unclear how often major increases in substitution rates occur during plant mitochondrial evolution and what the overall magnitude of substitution rate variation is across plants.
A broad survey was undertaken to evaluate synonymous substitution rates in mitochondrial genes of angiosperms and gymnosperms. Although most taxa conform to the generality that plant mitochondrial sequences evolve slowly, additional cases of highly accelerated rates were found. We explore in detail one of these new cases, within the genus Silene. A roughly 100-fold increase in synonymous substitution rate is estimated to have taken place within the last 5 million years and involves only one of ten species of Silene sampled in this study. Examples of unusually slow sequence evolution were also identified. Comparison of the fastest and slowest lineages shows that synonymous substitution rates vary by four orders of magnitude across seed plants. In other words, some plant mitochondrial lineages accumulate more synonymous change in 10,000 years than do others in 100 million years. Several perplexing cases of gene-to-gene variation in sequence divergence within a plant were uncovered. Some of these probably reflect interesting biological phenomena, such as horizontal gene transfer, mitochondrial-to-nucleus transfer, and intragenomic variation in mitochondrial substitution rates, whereas others are likely the result of various kinds of errors.
The extremes of synonymous substitution rates measured here constitute by far the largest known range of rate variation for any group of organisms. These results highlight the utility of examining absolute substitution rates in a phylogenetic context rather than by traditional pairwise methods. Why substitution rates are generally so low in plant mitochondrial genomes yet occasionally increase dramatically remains mysterious.
While genes that are conserved between related bacterial species are usually thought to have evolved along with the species, phylogenetic trees reconstructed for individual genes may contradict this picture and indicate horizontal gene transfer. Individual trees are often not resolved with high confidence, however, and in that case alternative trees are generally not considered as contradicting the species tree, although not confirming it either. Here we conduct an in-depth analysis of 401 protein phylogenetic trees inferred with varying levels of confidence for three lactobacilli from the acidophilus complex. At present the relationship between these bacteria, isolated from environments as diverse as the gastrointestinal tract (Lactobacillus acidophilus and Lactobacillus johnsonii) and yogurt (Lactobacillus delbrueckii ssp. bulgaricus), is ambiguous due to contradictory phenotypical and 16S rRNA based classifications.
Among the 401 phylogenetic trees, those that could be reconstructed with high confidence support the 16S-rRNA tree or one alternative topology in an astonishing 3:2 ratio, while the third possible topology is practically absent. Lowering the confidence threshold for trees to be taken into consideration does not significantly affect this ratio, and therefore suggests that gene transfer may have affected as much as 40% of the core genome genes. Gene function bias suggests that the 16S rRNA phylogeny of the acidophilus complex, which indicates that L. acidophilus and L. delbrueckii ssp. bulgaricus are the closest related of these three species, is correct. A novel approach of comparison of interspecies protein divergence data employed in this study allowed to determine that gene transfer most likely took place between the lineages of the two species found in the gastrointestinal tract.
This case-study reports an unprecedented level of phylogenetic incongruence, presumably resulting from extensive horizontal gene transfer. The data give a first indication of the large extent of gene transfer that may take place in the gastrointestinal tract and its accumulated effect. For future studies, our results should encourage a careful weighing of data on phylogenetic tree topology, confidence and distribution to conclude on the absence or presence and extent of horizontal gene transfer.
The timescale of the origins of Daphnia O. F. Mueller (Crustacea: Cladocera) remains controversial. The origin of the two main subgenera has been associated with the breakup of the supercontinent Pangaea. This vicariance hypothesis is supported by reciprocal monophyly, present day associations with the former Gondwanaland and Laurasia regions, and mitochondrial DNA divergence estimates. However, previous multilocus nuclear DNA sequence divergence estimates at < 10 Million years are inconsistent with the breakup of Pangaea. We examined new and existing cladoceran fossils from a Mesozoic Mongolian site, in hopes of gaining insights into the timescale of the evolution of Daphnia.
We describe new fossils of ephippia from the Khotont site in Mongolia associated with the Jurassic-Cretaceous boundary (about 145 MYA) that are morphologically similar to several modern genera of the family Daphniidae, including the two major subgenera of Daphnia, i.e., Daphnia s. str. and Ctenodaphnia. The daphniid fossils co-occurred with fossils of the predaceous phantom midge (Chaoboridae).
Our findings indicate that the main subgenera of Daphnia are likely much older than previously known from fossils (at least 100 MY older) or from nuclear DNA estimates of divergence. The results showing co-occurrence of the main subgenera far from the presumed Laurasia/Gondwanaland dispersal barrier shortly after formation suggests that vicariance from the breakup of Pangaea is an unlikely explanation for the origin of the main subgenera. The fossil impressions also reveal that the coevolution of a dipteran predator (Chaoboridae) with the subgenus Daphnia is much older than previously known -- since the Mesozoic.
Speciation often occurs in complex or uncertain temporal and spatial contexts. Processes such as reinforcement, allopatric divergence, and assortative mating can proceed at different rates and with different strengths as populations diverge. The Central American Midas cichlid fish species complex is an important case study for understanding the processes of speciation. Previous analyses have demonstrated that allopatric processes led to species formation among the lakes of Nicaragua as well as sympatric speciation that is occurring within at least one crater lake. However, since speciation is an ongoing process and sampling genetic diversity of such lineages can be biased by collection scheme or random factors, it is important to evaluate the robustness of conclusions drawn on individual time samples.
In order to assess the validity and reliability of inferences based on different genetic samples, we have analyzed fish from several lakes in Nicaragua sampled at three different times over 16 years. In addition, this time series allows us to analyze the population genetic changes that have occurred between lakes, where allopatric speciation has operated, as well as between different species within lakes, some of which have originated by sympatric speciation. Focusing on commonly used genetic markers, we have analyzed both DNA sequences from the complete mitochondrial control region as well as nuclear DNA variation at ten microsatellite loci from these populations, sampled thrice in a 16 year time period, to develop a robust estimate of the population genetic history of these diversifying lineages.
The conclusions from previous work are well supported by our comprehensive analysis. In particular, we find that the genetic diversity of derived crater lake populations is lower than that of the source population regardless of when and how each population was sampled. Furthermore, changes in various estimates of genetic diversity within lakes are minimal and provide no evidence for drastic changes during the last 20 years, supporting the hypothesis that the processes which have resulted in rapid speciation are primarily historical. In contrast, there is some evidence for ongoing evolution, particularly selection, in all lakes except crater Lake Masaya, perhaps reflecting the persistence of speciational processes. Importantly, we find that the crater Lake Apoyo population, for which strong evidence of sympatric speciation has been demonstrated, has lower genetic diversity than other crater lakes and the strongest evidence for ongoing selection.
Senescence is integral to the flowering plant life-cycle. Senescence-like processes occur also in non-angiosperm land plants, algae and photosynthetic prokaryotes. Increasing numbers of genes have been assigned functions in the regulation and execution of angiosperm senescence. At the same time there has been a large expansion in the number and taxonomic spread of plant sequences in the genome databases. The present paper uses these resources to make a study of the evolutionary origins of angiosperm senescence based on a survey of the distribution, across plant and microbial taxa, and expression of senescence-related genes.
Phylogeny analyses were carried out on protein sequences corresponding to genes with demonstrated functions in angiosperm senescence. They include proteins involved in chlorophyll catabolism and its control, homeoprotein transcription factors, metabolite transporters, enzymes and regulators of carotenoid metabolism and of anthocyanin biosynthesis. Evolutionary timelines for the origins and functions of particular genes were inferred from the taxonomic distribution of sequences homologous to those of angiosperm senescence-related proteins. Turnover of the light energy transduction apparatus is the most ancient element in the senescence syndrome. By contrast, the association of phenylpropanoid metabolism with senescence, and integration of senescence with development and adaptation mediated by transcription factors, are relatively recent innovations of land plants. An extended range of senescence-related genes of Arabidopsis was profiled for coexpression patterns and developmental relationships and revealed a clear carotenoid metabolism grouping, coordinated expression of genes for anthocyanin and flavonoid enzymes and regulators and a cluster pattern of genes for chlorophyll catabolism consistent with functional and evolutionary features of the pathway.
The expression and phylogenetic characteristics of senescence-related genes allow a framework to be constructed of decisive events in the evolution of the senescence syndrome of modern land-plants. Combining phylogenetic, comparative sequence, gene expression and morphogenetic information leads to the conclusion that biochemical, cellular, integrative and adaptive systems were progressively added to the ancient primary core process of senescence as the evolving plant encountered new environmental and developmental contexts.
Broad-scale phylogeographic studies of freshwater organisms provide not only an invaluable framework for understanding the evolutionary history of species, but also a genetic imprint of the paleo-hydrological dynamics stemming from climatic change. Few such studies have been carried out in Siberia, a vast region over which the extent of Pleistocene glaciation is still disputed. Brachymystax lenok is a salmonid fish distributed throughout Siberia, exhibiting two forms hypothesized to have undergone extensive range expansion, genetic exchange, and multiple speciation. A comprehensive phylogeographic investigation should clarify these hypotheses as well as provide insights on Siberia's paleo-hydrological stability.
Molecular-sequence (mtDNA) based phylogenetic and morphological analysis of Brachymystax throughout Siberia support that sharp- and blunt-snouted lenok are independent evolutionary lineages, with the majority of their variation distributed among major river basins. Their evolutionary independence was further supported through the analysis of 11 microsatellite loci in three areas of sympatry, which revealed little to no evidence of introgression. Phylogeographic structure reflects climatic limitations, especially for blunt-snouted lenok above 56 degrees N during one or more glacial maxima. Presumed glacial refugia as well as interbasin exchange were not congruent for the two lineages, perhaps reflecting differing dispersal abilities and response to climatic change. Inferred demographic expansions were dated earlier than the Last Glacial Maximum (LGM). Evidence for repeated trans-basin exchange was especially clear between the Amur and Lena catchments. Divergence of sharp-snouted lenok in the Selenga-Baikal catchment may correspond to the isolation of Lake Baikal in the mid-Pleistocene, while older isolation events are apparent for blunt-snouted lenok in the extreme east and sharp-snouted lenok in the extreme west of their respective distributions.
Sharp- and blunt-snouted lenok have apparently undergone a long, independent, and demographically dynamic evolutionary history in Siberia, supporting their recognition as two good biological species. Considering the timing and extent of expansions and trans-basin dispersal, it is doubtful that these historical dynamics could have been generated without major rearrangements in the paleo-hydrological network, stemming from the formation and melting of large-scale glacial complexes much older than the LGM.
Amblyomma cajennense F. is one of the best known and studied ticks in the New World because of its very wide distribution, its economical importance as pest of domestic ungulates, and its association with a variety of animal and human pathogens. Recent observations, however, have challenged the taxonomic status of this tick and indicated that intraspecific cryptic speciation might be occurring. In the present study, we investigate the evolutionary and demographic history of this tick and examine its genetic structure based on the analyses of three mitochondrial (12SrDNA, d-loop, and COII) and one nuclear (ITS2) genes. Because A. cajennense is characterized by a typical trans-Amazonian distribution, lineage divergence dating is also performed to establish whether genetic diversity can be linked to dated vicariant events which shaped the topology of the Neotropics.
Total evidence analyses of the concatenated mtDNA and nuclear + mtDNA datasets resulted in well-resolved and fully congruent reconstructions of the relationships within A. cajennense. The phylogenetic analyses consistently found A. cajennense to be monophyletic and to be separated into six genetic units defined by mutually exclusive haplotype compositions and habitat associations. Also, genetic divergence values showed that these lineages are as distinct from each other as recognized separate species of the same genus. The six clades are deeply split and node dating indicates that they started diverging in the middle-late Miocene.
Behavioral differences and the results of laboratory cross-breeding experiments had already indicated that A. cajennense might be a complex of distinct taxonomic units. The combined and congruent mitochondrial and nuclear genetic evidence from this study reveals that A. cajennense is an assembly of six distinct species which have evolved separately from each other since at least 13.2 million years ago (Mya) in the earliest and 3.3 Mya in the latest lineages. The temporal and spatial diversification modes of the six lineages overlap the phylogeographical history of other organisms with similar extant trans-Amazonian distributions and are consistent with the present prevailing hypothesis that Neotropical diversity often finds its origins in the Miocene, after the Andean uplift changed the topology and consequently the climate and ecology of the Neotropics.
Patagonia extends for more than 84,000 km of irregular coasts is an area especially apt to evaluate how historic and contemporary processes influence the distribution and connectivity of shallow marine benthic organisms. The true limpet Nacella magellanica has a wide distribution in this province and represents a suitable model to infer the Quaternary glacial legacy on marine benthic organisms. This species inhabits ice-free rocky ecosystems, has a narrow bathymetric range and consequently should have been severely affected by recurrent glacial cycles during the Quaternary. We performed phylogeographic and demographic analyses of N. magellanica from 14 localities along its distribution in Pacific Patagonia, Atlantic Patagonia, and the Falkland/Malvinas Islands.
Mitochondrial (COI) DNA analyses of 357 individuals of N. magellanica revealed an absence of genetic differentiation in the species with a single genetic unit along Pacific Patagonia. However, we detected significant genetic differences among three main groups named Pacific Patagonia, Atlantic Patagonia and Falkland/Malvinas Islands. Migration rate estimations indicated asymmetrical gene flow, primarily from Pacific Patagonia to Atlantic Patagonia (Nem=2.21) and the Falkland/Malvinas Islands (Nem=16.6). Demographic reconstruction in Pacific Patagonia suggests a recent recolonization process (< 10 ka) supported by neutrality tests, mismatch distribution and the median-joining haplotype genealogy.
Absence of genetic structure, a single dominant haplotype, lack of correlation between geographic and genetic distance, high estimated migration rates and the signal of recent demographic growth represent a large body of evidence supporting the hypothesis of rapid postglacial expansion in this species in Pacific Patagonia. This expansion could have been sustained by larval dispersal following the main current system in this area. Lower levels of genetic diversity in inland sea areas suggest that fjords and channels represent the areas most recently colonized by the species. Hence recolonization seems to follow a west to east direction to areas that were progressively deglaciated. Significant genetic differences among Pacific, Atlantic and Falkland/Malvinas Islands populations may be also explained through disparities in their respective glaciological and geological histories. The Falkland/Malvinas Islands, more than representing a glacial refugium for the species, seems to constitute a sink area considering the strong asymmetric gene flow detected from Pacific to Atlantic sectors. These results suggest that historical and contemporary processes represent the main factors shaping the modern biogeography of most shallow marine benthic invertebrates inhabiting the Patagonian Province.
The Batrachoididae family is a group of marine teleosts that includes several species with more complicated physiological characteristics, such as their excretory, reproductive, cardiovascular and respiratory systems. Previous studies of the 5S rDNA gene family carried out in four species from the Western Atlantic showed two types of this gene in two species but only one in the other two, under processes of concerted evolution and birth-and-death evolution with purifying selection. Here we present results of the 5S rDNA and another two gene families in Halobatrachus didactylus, an Eastern Atlantic species, and draw evolutionary inferences regarding the gene families. In addition we have also mapped the genes on the chromosomes by two-colour fluorescence in situ hybridization (FISH).
Two types of 5S rDNA were observed, named type α and type β. Molecular analysis of the 5S rDNA indicates that H. didactylus does not share the non-transcribed spacer (NTS) sequences with four other species of the family; therefore, it must have evolved in isolation. Amplification with the type β specific primers amplified a specific band in 9 specimens of H. didactylus and two of Sparus aurata. Both types showed regulatory regions and a secondary structure which mark them as functional genes. However, the U2 snRNA gene and the ITS-1 sequence showed one electrophoretic band and with one type of sequence. The U2 snRNA sequence was the most variable of the three multigene families studied. Results from two-colour FISH showed no co-localization of the gene coding from three multigene families and provided the first map of the chromosomes of the species.
A highly significant finding was observed in the analysis of the 5S rDNA, since two such distant species as H. didactylus and Sparus aurata share a 5S rDNA type. This 5S rDNA type has been detected in other species belonging to the Batrachoidiformes and Perciformes orders, but not in the Pleuronectiformes and Clupeiformes orders. Two hypotheses have been outlined: one is the possible vertical permanence of the shared type in some fish lineages, and the other is the possibility of a horizontal transference event between ancient species of the Perciformes and Batrachoidiformes orders. This finding opens a new perspective in fish evolution and in the knowledge of the dynamism of the 5S rDNA. Cytogenetic analysis allowed some evolutionary trends to be roughed out, such as the progressive change in the U2 snDNA and the organization of (GATA)n repeats, from dispersed to localized in one locus. The accumulation of (GATA)n repeats in one chromosome pair could be implicated in the evolution of a pair of proto-sex chromosomes. This possibility could situate H. didactylus as the most highly evolved of the Batrachoididae family in terms of sex chromosome biology.
Owl monkeys, belonging to the genus Aotus, have been extensively used as animal models in biomedical research but few reports have focused on the taxonomy and phylogeography of this genus. Moreover, the morphological similarity of several Aotus species has led to frequent misidentifications, mainly at the boundaries of their distribution. In this study, sequence data from five mitochondrial regions and the nuclear, Y-linked, SRY gene were used for species identification and phylogenetic reconstructions using well characterized specimens of Aotus nancymaae, A. vociferans, A. lemurinus, A. griseimembra, A. trivirgatus, A. nigriceps, A. azarae boliviensis and A. infulatus.
The complete MT-CO1, MT-TS1, MT-TD, MT-CO2, MT-CYB regions were sequenced in 18 Aotus specimens. ML and Bayesian topologies of concatenated data and separate regions allowed for the proposition of a tentative Aotus phylogeny, indicating that Aotus diverged some 4.62 Million years before present (MYBP). Similar analyses with included GenBank specimens were useful for assessing species identification of deposited data.
Alternative phylogenetic reconstructions, when compared with karyotypic and biogeographic data, led to the proposition of evolutionary scenarios questioning the conventional diversification of this genus in monophyletic groups with grey and red necks. Moreover, genetic distance estimates and haplotypic differences were useful for species validations.
The thin-spined porcupine, also known as the bristle-spined rat, Chaetomys subspinosus (Olfers, 1818), the only member of its genus, figures among Brazilian endangered species. In addition to being threatened, it is poorly known, and even its taxonomic status at the family level has long been controversial. The genus Chaetomys was originally regarded as a porcupine in the family Erethizontidae, but some authors classified it as a spiny-rat in the family Echimyidae. Although the dispute seems to be settled in favor of the erethizontid advocates, further discussion of its affinities should be based on a phylogenetic framework. In the present study, we used nucleotide-sequence data from the complete mitochondrial cytochrome b gene and karyotypic information to address this issue. Our molecular analyses included one individual of Chaetomys subspinosus from the state of Bahia in northeastern Brazil, and other hystricognaths.
All topologies recovered in our molecular phylogenetic analyses strongly supported Chaetomys subspinosus as a sister clade of the erethizontids. Cytogenetically, Chaetomys subspinosus showed 2n = 52 and FN = 76. Although the sexual pair could not be identified, we assumed that the X chromosome is biarmed. The karyotype included 13 large to medium metacentric and submetacentric chromosome pairs, one small subtelocentric pair, and 12 small acrocentric pairs. The subtelocentric pair 14 had a terminal secondary constriction in the short arm, corresponding to the nucleolar organizer region (Ag-NOR), similar to the erethizontid Sphiggurus villosus, 2n = 42 and FN = 76, and different from the echimyids, in which the secondary constriction is interstitial.
Both molecular phylogenies and karyotypical evidence indicated that Chaetomys is closely related to the Erethizontidae rather than to the Echimyidae, although in a basal position relative to the rest of the Erethizontidae. The high levels of molecular and morphological divergence suggest that Chaetomys belongs to an early radiation of the Erethizontidae that may have occurred in the Early Miocene, and should be assigned to its own subfamily, the Chaetomyinae.
Bats of the family Phyllostomidae show a unique diversity in feeding specializations. This taxon includes species that are highly specialized on insects, blood, small vertebrates, fruits or nectar, and pollen. Feeding specialization is accompanied by morphological, physiological and behavioural adaptations. Several attempts were made to resolve the phylogenetic relationships within this family in order to reconstruct the evolutionary transitions accompanied by nutritional specialization. Nevertheless, the evolution of nectarivory remained equivocal.
Phylogenetic reconstructions, based on a concatenated nuclear-and mitochondrial data set, revealed a paraphyletic relationship of nectarivorous phyllostomid bats. Our phylogenetic reconstructions indicate that the nectarivorous genera Lonchophylla and Lionycteris are closer related to mainly frugivorous phyllostomids of the subfamilies Rhinophyllinae, Stenodermatinae, Carolliinae, and the insectivorous Glyphonycterinae rather than to nectarivorous bats of the Glossophaginae. This suggests an independent origin of morphological adaptations to a nectarivorous lifestyle within Lonchophyllinae and Glossophaginae. Molecular clock analysis revealed a relatively short time frame of about ten million years for the divergence of subfamilies.
Our study provides strong support for diphyly of nectarivorous phyllostomids. This is remarkable, since their morphological adaptations to nutrition, like elongated rostrums and tongues, reduced teeth and the ability to use hovering flight while ingestion, closely resemble each other. However, more precise examinations of their tongues (e.g. type and structure of papillae and muscular innervation) revealed levels of difference in line with an independent evolution of nectarivory in these bats.
Owing to its independence from the main Central European drainage systems, the Italian freshwater fauna is characterized by a high degree of endemicity. Three main ichthyogeographic districts have been proposed in Italy. Yet, the validity of these regions has not been confirmed by phylogenetic and population genetic analyses and a phylogeographic scenario for Italy's primary freshwater fish fauna is still lacking. Here, we investigate the phylogeography of the Italian vairone (Telestes muticellus).
We sampled 38 populations representing the species' entire distribution range and covering all relevant drainage systems, and genotyped 509 individuals at eight variable microsatellite loci. Applying various population genetic analyses, we identify five distinct groups of populations that are only partly in agreement with the proposed ichthyogeographic districts. Our group I, which is formed by specimens from Veneto and the Po River system draining into the Adriatic Sea, corresponds to the Padano-Venetian ichthyogeographic district (PV), except for two Middle Adriatic drainages, which we identify as a separate group (III). The Tuscano-Latium district (TL) is equivalent to our group V. A more complex picture emerges for the Ligurian drainages: populations from Central Liguria belong to group I, while populations from West (group II) and East Liguria (group IV) form their own groups, albeit with affinities to PV and TL, respectively.
We propose a phylogeographic scenario for T. muticellus in which an initial T. muticellus stock became isolated from the 'Alpine' clade and survived the various glaciation cycles in several refugia. These were situated in the Upper Adriatic (groups I and II), the Middle Adriatic (group III), (East) Liguria (group IV) and Tuscano-Latium (group V). The population structure in the vairone is, in principal, in agreement with the two main ichthyogeographic districts (PV and TL), except for the two populations in the Middle Adriatic, which we identify as additional major "district".
Large pelagic fishes are generally thought to have little population genetic structuring based on their cosmopolitan distribution, large population sizes and high dispersal capacities. However, gene flow can be influenced by ecological (e.g. homing behaviour) and physical (e.g. present-day ocean currents, past changes in sea temperature and levels) factors. In this regard, Atlantic bigeye tuna shows an interesting genetic structuring pattern with two highly divergent mitochondrial clades (Clades I and II), which are assumed to have been originated during the last Pleistocene glacial maxima. We assess genetic structure patterns of Atlantic bigeye tuna at the nuclear level, and compare them with mitochondrial evidence.
We examined allele size variation of nine microsatellite loci in 380 individuals from the Gulf of Guinea, Canary, Azores, Canada, Indian Ocean, and Pacific Ocean. To investigate temporal stability of genetic structure, three Atlantic Ocean sites were re-sampled a second year. Hierarchical AMOVA tests, RST pairwise comparisons, isolation by distance (Mantel) tests, Bayesian clustering analyses, and coalescence-based migration rate inferences supported unrestricted gene flow within the Atlantic Ocean at the nuclear level, and therefore interbreeding between individuals belonging to both mitochondrial clades. Moreover, departures from HWE in several loci were inferred for the samples of Guinea, and attributed to a Wahlund effect supporting the role of this region as a spawning and nursery area. Our microsatellite data supported a single worldwide panmictic unit for bigeye tunas. Despite the strong Agulhas Current, immigration rates seem to be higher from the Atlantic Ocean into the Indo-Pacific Ocean, but the actual number of individuals moving per generation is relatively low compared to the large population sizes inhabiting each ocean basin.
Lack of congruence between mt and nuclear evidences, which is also found in other species, most likely reflects past events of isolation and secondary contact. Given the inferred relatively low number of immigrants per generation around the Cape of Good Hope, the proportions of the mitochondrial clades in the different oceans may keep stable, and it seems plausible that the presence of individuals belonging to the mt Clade I in the Atlantic Ocean may be due to extensive migrations that predated the last glaciation.
Mesoamerica is one of the world's most complex biogeographical regions, mostly due to its complex geological history. This complexity has led to interesting biogeographical processes that have resulted in the current diversity and distribution of fauna in the region. The fish genus Astyanax represents a useful model to assess biogeographical hypotheses due to it being one of the most diverse and widely distributed freshwater fish species in the New World. We used mitochondrial and nuclear DNA to evaluate phylogenetic relationships within the genus in Mesoamerica, and to develop historical biogeographical hypotheses to explain its current distribution.
Analysis of the entire mitochondrial cytochrome b (Cytb) gene in 208 individuals from 147 localities and of a subset of individuals for three mitochondrial genes (Cytb, 16 S, and COI) and a single nuclear gene (RAG1) yielded similar topologies, recovering six major groups with significant phylogeographic structure. Populations from North America and Upper Central America formed a monophyletic group, while Middle Central America showed evidence of rapid radiation with incompletely resolved relationships. Lower Central America lineages showed a fragmented structure, with geographically restricted taxa showing high levels of molecular divergence. All Bramocharax samples grouped with their sympatric Astyanax lineages (in some cases even with allopatric Astyanax populations), with less than 1% divergence between them. These results suggest a homoplasic nature to the trophic specializations associated with Bramocharax ecomorphs, which seem to have arisen independently in different Astyanax lineages. We observed higher taxonomic diversity compared to previous phylogenetic studies of the Astyanax genus. Colonization of Mesoamerica by Astyanax before the final closure of the Isthmus of Panama (3.3 Mya) explains the deep level of divergence detected in Lower Central America. The colonization of Upper Mesoamerica apparently occurred by two independent routes, with lineage turnover over a large part of the region.
Our results support multiple, independent origins of morphological traits in Astyanax, whereby the morphotype associated with Bramocharax represents a recurrent trophic adaptation. Molecular clock estimates indicate that Astyanax was present in Mesoamerica during the Miocene (approximately 8 Mya), which implies the existence of an incipient land-bridge connecting South America and Central America before the final closure of the Isthmus of Panama (approximately 3.3 Mya).
Studies of the phylogeography of Mexican species are steadily revealing genetic patterns shared by different species, which will help to unravel the complex biogeographic history of the region. Campostoma ornatum is a freshwater fish endemic to montane and semiarid regions in northwest Mexico and southern Arizona. Its wide range of distribution and the previously observed morphological differentiation between populations in different watersheds make this species a useful model to investigate the biogeographic role of the Sierra Madre Occidental and to disentangle the actions of Pliocene tecto-volcanic processes vs Quaternary climatic change. Our phylogeographic study was based on DNA sequences from one mitochondrial gene (cytb, 1110 bp, n=285) and two nuclear gene regions (S7 and RAG1, 1822 bp in total, n=56 and 43, respectively) obtained from 18 to 29 localities, in addition to a morphological survey covering the entire distribution area. Such a dataset allowed us to assess whether any of the populations/lineages sampled deserve to be categorised as an evolutionarily significant unit.
We found two morphologically and genetically well-differentiated groups within C. ornatum. One is located in the northern river drainages (Yaqui, Mayo, Fuerte, Sonora, Casas Grandes, Santa Clara and Conchos) and another one is found in the southern drainages (Nazas, Aguanaval and Piaxtla). The split between these two lineages took place about 3.9 Mya (CI=2.1-5.9). Within the northern lineage, there was strong and significant inter-basin genetic differentiation and also several secondary dispersal episodes whit gene homogenization between drainages. Interestingly, three divergent mitochondrial lineages were found in sympatry in two northern localities from the Yaqui river basin.
Our results indicate that there was isolation between the northern and southern phylogroups since the Pliocene, which was related to the formation of the ancient Nazas River paleosystem, where the southern group originated. Within groups, a complex reticulate biogeographic history for C. ornatum populations emerges, following the taxon pulse theory and mainly related with Pliocene tecto-volcanic processes. In the northern group, several events of vicariance promoted by river or drainage isolation episodes were found, but within both groups, the phylogeographic patterns suggest the occurrence of several events of river capture and fauna interchange. The Yaqui River supports the most diverse populations of C. ornatum, with several events of dispersal and isolation within the basin. Based on our genetic results, we defined three ESUs within C. ornatum as a first attempt to promote the conservation of the evolutionary processes determining the genetic diversity of this species. They will likely be revealed as a valuable tool for freshwater conservation policies in northwest Mexico, where many environmental problems concerning the use of water have rapidly arisen in recent decades.
BackgroundThe large Glycoside Hydrolase family 5 (GH5) groups together a wide range of enzymes acting on β-linked oligo- and polysaccharides, and glycoconjugates from a large spectrum of organisms. The long and complex evolution of this family of enzymes and its broad sequence diversity limits functional prediction. With the objective of improving the differentiation of enzyme specificities in a knowledge-based context, and to obtain new evolutionary insights, we present here a new, robust subfamily classification of family GH5.ResultsAbout 80% of the current sequences were assigned into 51 subfamilies in a global analysis of all publicly available GH5 sequences and associated biochemical data. Examination of subfamilies with catalytically-active members revealed that one third are monospecific (containing a single enzyme activity), although new functions may be discovered with biochemical characterization in the future. Furthermore, twenty subfamilies presently have no characterization whatsoever and many others have only limited structural and biochemical data. Mapping of functional knowledge onto the GH5 phylogenetic tree revealed that the sequence space of this historical and industrially important family is far from well dispersed, highlighting targets in need of further study. The analysis also uncovered a number of GH5 proteins which have lost their catalytic machinery, indicating evolution towards novel functions.ConclusionOverall, the subfamily division of GH5 provides an actively curated resource for large-scale protein sequence annotation for glycogenomics; the subfamily assignments are openly accessible via the Carbohydrate-Active Enzyme database at
Tectonic, volcanic and climatic events that produce changes in hydrographic systems are the main causes of diversification and speciation of freshwater fishes. Elucidate the evolutionary history of freshwater fishes permits to infer theories on the biotic and geological evolution of a region, which can further be applied to understand processes of population divergence, speciation and for conservation purposes. The freshwater ecosystems in Central Mexico are characterized by their genesis dynamism, destruction, and compartmentalization induced by intense geologic activity and climatic changes since the early Miocene. The endangered goodeid Zoogoneticus quitzeoensis is widely distributed across Central México, thus making it a good model for phylogeographic analyses in this area.
We addressed the phylogeography, evolutionary history and genetic structure of populations of Z. quitzeoensis through a sequential approach, based on both microsatellite and mitochondrial cytochrome b sequences. Most haplotypes were private to particular locations. All the populations analysed showed a remarkable number of haplotypes. The level of gene diversity within populations was Hd = 0.987 (0.714 - 1.00). However, in general the nucleotide diversity was low, pi = 0.0173 (0.0015 - 0.0049). Significant genetic structure was found among populations at the mitochondrial and nuclear level (PhiST = 0.836 and FST = 0.262, respectively). We distinguished two well-defined mitochondrial lineages that were separated ca. 3.3 million years ago (Mya). The time since expansion was ca. 1.5 x 10(6) years ago for Lineage I and ca. 860,000 years ago for Lineage II. Also, genetic patterns of differentiation, between and within lineages, are described at different historical timescales.
Our mtDNA data indicates that the evolution of the different genetic groups is more related to ancient geological and climatic events (Middle Pliocene, ca. 3.3 Mya) than to the current hydrographic configuration of the basins. In general, mitochondrial and nuclear data supported the same relationships between populations, with the exception of some reduced populations in highly polluted basins (Lower Lerma River), where the effects of genetic drift are suggested by the different analyses at the nuclear and mitochondrial level. Further, our findings are of special interest for the conservation of this endangered species.
Tunicates have been recently revealed to be the closest living relatives of vertebrates. Yet, with more than 2500 described species, details of their evolutionary history are still obscure. From a molecular point of view, tunicate phylogenetic relationships have been mostly studied based on analyses of 18S rRNA sequences, which indicate several major clades at odds with the traditional class-level arrangements. Nonetheless, substantial uncertainty remains about the phylogenetic relationships and taxonomic status of key groups such as the Aplousobranchia, Appendicularia, and Thaliacea.
Thirty new complete 18S rRNA sequences were acquired from previously unsampled tunicate species, with special focus on groups presenting high evolutionary rate. The updated 18S rRNA dataset has been aligned with respect to the constraint on homology imposed by the rRNA secondary structure. A probabilistic framework of phylogenetic reconstruction was adopted to accommodate the particular evolutionary dynamics of this ribosomal marker. Detailed Bayesian analyses were conducted under the non-parametric CAT mixture model accounting for site-specific heterogeneity of the evolutionary process, and under RNA-specific doublet models accommodating the occurrence of compensatory substitutions in stem regions. Our results support the division of tunicates into three major clades: 1) Phlebobranchia + Thaliacea + Aplousobranchia, 2) Appendicularia, and 3) Stolidobranchia, but the position of Appendicularia could not be firmly resolved. Our study additionally reveals that most Aplousobranchia evolve at extremely high rates involving changes in secondary structure of their 18S rRNA, with the exception of the family Clavelinidae, which appears to be slowly evolving. This extreme rate heterogeneity precluded resolving with certainty the exact phylogenetic placement of Aplousobranchia. Finally, the best fitting secondary-structure and CAT-mixture models suggest a sister-group relationship between Salpida and Pyrosomatida within Thaliacea.
An updated phylogenetic framework for tunicates is provided based on phylogenetic analyses using the most realistic evolutionary models currently available for ribosomal molecules and an unprecedented taxonomic sampling. Detailed analyses of the 18S rRNA gene allowed a clear definition of the major tunicate groups and revealed contrasting evolutionary dynamics among major lineages. The resolving power of this gene nevertheless appears limited within the clades composed of Phlebobranchia + Thaliacea + Aplousobranchia and Pyuridae + Styelidae, which were delineated as spots of low resolution. These limitations underline the need to develop new nuclear markers in order to further resolve the phylogeny of this keystone group in chordate evolution.
Marker gene studies often use short amplicons spanning one or more hypervariable regions from an rRNA gene to interrogate the community structure of uncultured environmental samples. Target regions are chosen for their discriminatory power, but the limited phylogenetic signal of short high¿throughput sequencing reads precludes accurate phylogenetic analysis. This is particularly unfortunate in the study of microscopic eukaryotes where horizontal gene flow is limited and the rRNA gene is expected to accurately reflect the species phylogeny. A promising alternative to full phylogenetic analysis is phylogenetic placement, where a reference phylogeny is inferred using the complete marker gene and iteratively extended with the short sequences from a metagenetic sample under study.ResultsBased on the phylogenetic placement approach we built Séance, a community analysis pipeline focused on the analysis of 18S marker gene data. Séance combines the alignment extension and phylogenetic placement capabilities of the Pagan multiple sequence alignment program with a suite of tools to preprocess, cluster and visualise datasets composed of many samples. We showcase Séance by analysing 454 data from a longitudinal study of intestinal parasite communities in wild rufous mouse lemurs (Microcebus rufus) as well as in simulation. We demonstrate both improved OTU picking at higher levels of sequence similarity for 454 data and show the accuracy of phylogenetic placement to be comparable to maximum likelihood methods for lower numbers of taxa.ConclusionsSéance is an open source community analysis pipeline that provides reference¿based phylogenetic analysis for rRNA marker gene studies. Whilst in this article we focus on studying nematodes using the 18S marker gene, the concepts are generic and reference data for alternative marker genes can be easily created. Séance can be downloaded from http://wasabiapp.org/software/seance/.
The 18S rRNA gene is one of the most important molecular markers, used in diverse applications such as molecular phylogenetic analyses and biodiversity screening. The Mollusca is the second largest phylum within the animal kingdom and mollusks show an outstanding high diversity in body plans and ecological adaptations. Although an enormous amount of 18S data is available for higher mollusks, data on some early branching lineages are still limited. Despite of some partial success in obtaining these data from Solenogastres, by some regarded to be the most "basal" mollusks, this taxon still remained problematic due to contamination with food organisms and general amplification difficulties.
We report here the first authentic 18S genes of three Solenogastres species (Mollusca), each possessing a unique sequence composition with regions conspicuously rich in guanine and cytosine. For these GC-rich regions we calculated strong secondary structures. The observed high intra-molecular forces hamper standard amplification and appear to increase formation of chimerical sequences caused by contaminating foreign DNAs from potential prey organisms. In our analyses, contamination was avoided by using RNA as a template. Indication for contamination of previously published Solenogastres sequences is presented. Detailed phylogenetic analyses were conducted using RNA specific models that account for compensatory substitutions in stem regions.
The extreme morphological diversity of mollusks is mirrored in the molecular 18S data and shows elevated substitution rates mainly in three higher taxa: true limpets (Patellogastropoda), Cephalopoda and Solenogastres. Our phylogenetic tree based on 123 species, including representatives of all mollusk classes, shows limited resolution at the class level but illustrates the pitfalls of artificial groupings formed due to shared biased sequence composition.
The study of organisms with restricted dispersal abilities and presence in the fossil record is particularly adequate to understand the impact of climate changes on the distribution and genetic structure of species. Trochoidea geyeri (Soós 1926) is a land snail restricted to a patchy, insular distribution in Germany and France. Fossil evidence suggests that current populations of T. geyeri are relicts of a much more widespread distribution during more favourable climatic periods in the Pleistocene.
Phylogeographic analysis of the mitochondrial 16S rDNA and nuclear ITS-1 sequence variation was used to infer the history of the remnant populations of T. geyeri. Nested clade analysis for both loci suggested that the origin of the species is in the Provence from where it expanded its range first to Southwest France and subsequently from there to Germany. Estimated divergence times predating the last glacial maximum between 25-17 ka implied that the colonization of the northern part of the current species range occurred during the Pleistocene.
We conclude that T. geyeri could quite successfully persist in cryptic refugia during major climatic changes in the past, despite of a restricted capacity of individuals to actively avoid unfavourable conditions.
Atlantolacerta andreanskyi is an enigmatic lacertid lizard that, according to the most recent molecular analyses, belongs to the tribe Eremiadini, family Lacertidae. It is a mountain specialist, restricted to areas above 2400 m of the High Atlas Mountains of Morocco with apparently no connection between the different populations. In order to investigate its phylogeography, 92 specimens of A. andreanskyi were analyzed from eight different populations across the distribution range of the species for up to 1108 base pairs of mitochondrial DNA (12S, ND4 and flanking tRNA-His) and 2585 base pairs of nuclear DNA including five loci (PDC, ACM4, C-MOS, RAG1, MC1R).
The results obtained with both concatenated and coalescent approaches and clustering methods, clearly show that all the populations analyzed present a very high level of genetic differentiation for the mitochondrial markers used and are also generally differentiated at the nuclear level.
These results indicate that A. andreanskyi is an additional example of a montane species complex.
The phylogenetic relationships of many taxa remain poorly known because of a lack of appropriate data and/or analyses. Despite substantial recent advances, amphibian phylogeny remains poorly resolved in many instances. The phylogenetic relationships of the Ethiopian endemic monotypic genus Ericabatrachus has been addressed thus far only with phenotypic data and remains contentious.
We obtained fresh samples of the now rare and Critically Endangered Ericabatrachus baleensis and generated DNA sequences for two mitochondrial and four nuclear genes. Analyses of these new data using de novo and constrained-tree phylogenetic reconstructions strongly support a close relationship between Ericabatrachus and Petropedetes, and allow us to reject previously proposed alternative hypotheses of a close relationship with cacosternines or Phrynobatrachus.
We discuss the implications of our results for the taxonomy, biogeography and conservation of E. baleensis, and suggest a two-tiered approach to the inclusion and analyses of new data in order to assess the phylogenetic relationships of previously unsampled taxa. Such approaches will be important in the future given the increasing availability of relevant mega-alignments and potential framework phylogenies.
Enterovirus (EV) 71 is one of the common causative agents for hand, foot, and, mouth disease (HFMD). In recent years, the virus caused several outbreaks with high numbers of deaths and severe neurological complications. Despite the importance of these epidemics, several aspects of the evolutionary and epidemiological dynamics, including viral nucleotide variations within and between different outbreaks, rates of change in immune-related structural regions vs. non-structural regions, and forces driving the evolution of EV71, are still not clear.
We sequenced four genomic segments, i.e., the 5' untranslated region (UTR), VP1, 2A, and 3C, of 395 EV71 viral strains collected from 1998 to 2003 in Taiwan. The phylogenies derived from different genomic segments revealed different relationships, indicating frequent sequence recombinations as previously noted. In addition to simple recombinations, exchanges of the P1 domain between different species/genotypes of human enterovirus species (HEV)-A were repeatedly observed. Contrasting patterns of polymorphisms and divergences were found between structural (VP1) and non-structural segments (2A and 3C), i.e., the former was less polymorphic within an outbreak but more divergent between different HEV-A species than the latter two. Our computer simulation demonstrated a significant excess of amino acid replacements in the VP1 region implying its possible role in adaptive evolution. Between different epidemic seasons, we observed high viral diversity in the epidemic peaks followed by severe reductions in diversity. Viruses sampled in successive epidemic seasons were not sister to each other, indicating that the annual outbreaks of EV71 were due to genetically distinct lineages.
Based on observations of accelerated amino acid changes and frequent exchanges of the P1 domain, we propose that positive selection and subsequent frequent domain shuffling are two important mechanisms for generating new genotypes of HEV-A. Our viral dynamics analysis suggested that the importation of EV71 from surrounding areas likely contributes to local EV71 outbreaks.
EFL (or elongation factor-like) is a member of the translation superfamily of GTPase proteins. It is restricted to eukaryotes, where it is found in a punctate distribution that is almost mutually exclusive with elongation factor-1 alpha (EF-1alpha). EF-1alpha is a core translation factor previously thought to be essential in eukaryotes, so its relationship to EFL has prompted the suggestion that EFL has spread by horizontal or lateral gene transfer (HGT or LGT) and replaced EF-1alpha multiple times. Among green algae, trebouxiophyceans and chlorophyceans have EFL, but the ulvophycean Acetabularia and the sister group to green algae, land plants, have EF-1alpha. This distribution singles out green algae as a particularly promising group to understand the origin of EFL and the effects of its presence on EF-1alpha.
We have sampled all major lineages of green algae for both EFL and EF-1alpha. EFL is unexpectedly broad in its distribution, being found in all green algal lineages (chlorophyceans, trebouxiophyceans, ulvophyceans, prasinophyceans, and mesostigmatophyceans), except charophyceans and the genus Acetabularia. The presence of EFL in the genus Mesostigma and EF-1alpha in Acetabularia are of particular interest, since the opposite is true of all their closest relatives. The phylogeny of EFL is poorly resolved, but the Acetabularia EF-1alpha is clearly related to homologues from land plants and charophyceans, demonstrating that EF-1alpha was present in the common ancestor of the green lineage.
The distribution of EFL and EF-1alpha in the green lineage is not consistent with the phylogeny of the organisms, indicating a complex history of both genes. Overall, we suggest that after the introduction of EFL (in the ancestor of green algae or earlier), both genes co-existed in green algal genomes for some time before one or the other was lost on multiple occasions.
Elongation factor-1alpha (EF-1alpha) and elongation factor-like (EFL) proteins are functionally homologous to one another, and are core components of the eukaryotic translation machinery. The patchy distribution of the two elongation factor types across global eukaryotic phylogeny is suggestive of a 'differential loss' hypothesis that assumes that EF-1alpha and EFL were present in the most recent common ancestor of eukaryotes followed by independent differential losses of one of the two factors in the descendant lineages. To date, however, just one diatom and one fungus have been found to have both EF-1alpha and EFL (dual-EF-containing species).
In this study, we characterized 35 new EF-1alpha/EFL sequences from phylogenetically diverse eukaryotes. In so doing we identified 11 previously unreported dual-EF-containing species from diverse eukaryote groups including the Stramenopiles, Apusomonadida, Goniomonadida, and Fungi. Phylogenetic analyses suggested vertical inheritance of both genes in each of the dual-EF lineages. In the dual-EF-containing species we identified, the EF-1alpha genes appeared to be highly divergent in sequence and suppressed at the transcriptional level compared to the co-occurring EFL genes.
According to the known EF-1alpha/EFL distribution, the differential loss process should have occurred independently in diverse eukaryotic lineages, and more dual-EF-containing species remain unidentified. We predict that dual-EF-containing species retain the divergent EF-1alpha homologues only for a sub-set of the original functions. As the dual-EF-containing species are distantly related to each other, we propose that independent re-modelling of EF-1alpha function took place in multiple branches in the tree of eukaryotes.
Pyridine-2,6-bis(thiocarboxylic acid) (pdtc) is a small secreted metabolite that has a high affinity for transition metals, increases iron uptake efficiency by 20% in Pseudomonas stutzeri, has the ability to reduce both soluble and mineral forms of iron, and has antimicrobial activity towards several species of bacteria. Six GenBank sequences code for proteins similar in structure to MoeZ, a P. stutzeri protein necessary for the synthesis of pdtc.
Analysis of sequences similar to P. stutzeri MoeZ revealed that it is a member of a superfamily consisting of related but structurally distinct proteins that are members of pathways involved in the transfer of sulfur-containing moieties to metabolites. Members of this family of enzymes are referred to here as MoeB, MoeBR, MoeZ, and MoeZdR. MoeB, the molybdopterin synthase activating enzyme in the molybdopterin cofactor biosynthesis pathway, is the most characterized protein from this family. Remarkably, lengths of greater than 73% nucleic acid homology ranging from 35 to 486 bp exist between Pseudomonas stutzeri moeZ and genomic sequences found in some Mycobacterium, Mesorhizobium, Pseudomonas, Streptomyces, and cyanobacteria species.
The phylogenetic relationship among moeZ sequences suggests that P. stutzeri may have acquired moeZ through lateral gene transfer from a donor more closely related to mycobacteria and cyanobacteria than to proteobacteria. The importance of this relationship lies in the fact that pdtc, the product of the P. stutzeri pathway that includes moeZ, has an impressive set of capabilities, some of which could make it a potent pathogenicity factor.
Ecological interaction strength may increase under environmental stress including temperature. How such stress enhances and interacts with parasite selection is almost unknown. We studied the importance of resistance genes of the major histocompatibility complex (MHC) class II in 14 families of three-spined sticklebacks Gasterosteus aculeatus exposed to their natural macroparasites in field enclosures in the extreme summer of 2003.
After a mass die-off during the 2003-European heat wave killing 78% of 277 experimental fish, we found strong differences in survival among and within families. In families with higher average parasite load fewer individuals survived. Multivariate analysis revealed that the composition of the infecting parasite fauna was family specific. Within families, individuals with an intermediate number of MHC class IIB sequence variants survived best and had the lowest parasite load among survivors, suggesting a direct functional link between MHC diversity and fitness. The within family MHC effects were, however, small compared to between family effects, suggesting that other genetic components or non-genetic effects were also important.
The correlation between parasite load and mortality that we found at both individual and family level might have appeared only in the extraordinary heatwave of 2003. Due to global warming the frequency of extreme climatic events is predicted to increase, which might intensify costs of parasitism and enhance selection on immune genes.
The emergence of the 2009 H1N1 Influenza pandemic followed a multiple reassortment event from viruses originally circulating in swines and humans, but the adaptive nature of this emergence is poorly understood.
Here we base our analysis on 1180 complete genomes of H1N1 viruses sampled in North America between 2000 and 2010 in swine and human hosts. We show that while transmission to a human host might require an adaptive phase in the HA and NA antigens, the emergence of the 2009 pandemic was essentially nonadaptive. A more detailed analysis of the NA protein shows that the 2009 pandemic sequence is characterized by novel epitopes and by a particular substitution in loop 150, which is responsible for a nonadaptive structural change tightly associated with the emergence of the pandemic.
Because this substitution was not present in the 1918 H1N1 pandemic virus, we posit that the emergence of pandemics is due to epistatic interactions between sites distributed over different segments. Altogether, our results are consistent with population dynamics models that highlight the epistatic and nonadaptive rise of novel epitopes in viral populations, followed by their demise when the resulting virus is too virulent.
The availability of newly sequenced vertebrate genomes, along with more efficient and accurate alignment algorithms, have enabled the expansion of the field of comparative genomics. Large-scale genome rearrangement events modify the order of genes and non-coding conserved regions on chromosomes. While certain large genomic regions have remained intact over much of vertebrate evolution, others appear to be hotspots for genomic breakpoints. The cause of the non-uniformity of breakpoints that occurred during vertebrate evolution is poorly understood.
We describe a machine learning method to distinguish genomic regions where breakpoints would be expected to have deleterious effects (called breakpoint-refractory regions) from those where they are expected to be neutral (called breakpoint-susceptible regions). Our predictor is trained using breakpoints that took place along the human lineage since amniote divergence. Based on our predictions, refractory and susceptible regions have very distinctive features. Refractory regions are significantly enriched for conserved non-coding elements as well as for genes involved in development, whereas susceptible regions are enriched for housekeeping genes, likely to have simpler transcriptional regulation.
We postulate that long-range transcriptional regulation strongly influences chromosome break fixation. In many regions, the fitness cost of altering the spatial association between long-range regulatory regions and their target genes may be so high that rearrangements are not allowed. Consequently, only a limited, identifiable fraction of the genome is susceptible to genome rearrangements.
The quality of multiple sequence alignments plays an important role in the accuracy of phylogenetic inference. It has been shown that removing ambiguously aligned regions, but also other sources of bias such as highly variable (saturated) characters, can improve the overall performance of many phylogenetic reconstruction methods. A current scientific trend is to build phylogenetic trees from a large number of sequence datasets (semi-)automatically extracted from numerous complete genomes. Because these approaches do not allow a precise manual curation of each dataset, there exists a real need for efficient bioinformatic tools dedicated to this alignment character trimming step.
Here is presented a new software, named BMGE (Block Mapping and Gathering with Entropy), that is designed to select regions in a multiple sequence alignment that are suited for phylogenetic inference. For each character, BMGE computes a score closely related to an entropy value. Calculation of these entropy-like scores is weighted with BLOSUM or PAM similarity matrices in order to distinguish among biologically expected and unexpected variability for each aligned character. Sets of contiguous characters with a score above a given threshold are considered as not suited for phylogenetic inference and then removed. Simulation analyses show that the character trimming performed by BMGE produces datasets leading to accurate trees, especially with alignments including distantly-related sequences. BMGE also implements trimming and recoding methods aimed at minimizing phylogeny reconstruction artefacts due to compositional heterogeneity.
BMGE is able to perform biologically relevant trimming on a multiple alignment of DNA, codon or amino acid sequences. Java source code and executable are freely available at ftp://ftp.pasteur.fr/pub/GenSoft/projects/BMGE/.
Long alpha-helical coiled-coil proteins are involved in diverse organizational and regulatory processes in eukaryotic cells. They provide cables and networks in the cyto- and nucleoskeleton, molecular scaffolds that organize membrane systems and tissues, motors, levers, rotating arms, and possibly springs. Mutations in long coiled-coil proteins have been implemented in a growing number of human diseases. Using the coiled-coil prediction program MultiCoil, we have previously identified all long coiled-coil proteins from the model plant Arabidopsis thaliana and have established a searchable Arabidopsis coiled-coil protein database.
Here, we have identified all proteins with long coiled-coil domains from 21 additional fully sequenced genomes. Because regions predicted to form coiled-coils interfere with sequence homology determination, we have developed a sequence comparison and clustering strategy based on masking predicted coiled-coil domains. Comparing and grouping all long coiled-coil proteins from 22 genomes, the kingdom-specificity of coiled-coil protein families was determined. At the same time, a number of proteins with unknown function could be grouped with already characterized proteins from other organisms.
MultiCoil predicts proteins with extended coiled-coil domains (more than 250 amino acids) to be largely absent from bacterial genomes, but present in archaea and eukaryotes. The structural maintenance of chromosomes proteins and their relatives are the only long coiled-coil protein family clearly conserved throughout all kingdoms, indicating their ancient nature. Motor proteins, membrane tethering and vesicle transport proteins are the dominant eukaryote-specific long coiled-coil proteins, suggesting that coiled-coil proteins have gained functions in the increasingly complex processes of subcellular infrastructure maintenance and trafficking control of the eukaryotic cell.