BMC Evolutionary Biology

Published by BioMed Central
Online ISSN: 1471-2148
A stylised summary of hymenopteran relationships. Traditional suborders represented in capital letters. Terminal taxa indicate superfamilies, or those families not assigned to a superfamily. Dashed lines indicate hypothesised sister group relationships.
A consensus supernetwork highlighting the uncertainty of phylogenetic relationships between hymenopteran families.
The method of Davies et al. (2004) explained. In A taxa a and b have significantly different species richnesses (N). To detect the direction of the shift here we compare Na and Nb to N of their nearest outgroup c. As N does not differ significantly between a and c, but does between b and c, we have detected a significant downshift in species richness associated with b. B shows a more complicated scenario, where taxon c is made of two taxa d and e, which have significantly different species richnesses and they themselves need comparing to their outgroup (i.e. a + b). However, it is not possible to compare the values for Na, Nb, Nd or Ne as they are relative outgroups to one another in which we have not been able to detect the direction of the significant shift. In such circumstances, the combined N of species rich taxa (for example a and d) are compared to N of the next outgroup f. The same goes for the species poor taxa (for example b and e). In this example it is Nb + Ne which is significantly different to Nf and we have therefore detected significant downshifts in taxa b and e.
The extended majority rule MRC supertree of hymenopteran families from an all-in analysis. Numbers in brackets next to extant families indicate number of species. Membership of families with non-monophyletic "superfamilies" indicated as follows - Ch = Chalcidoidea, Cy = Cynipoidea, Ic = Ichneumonoidea, Pr = Proctotrupoidea. Taxa colour coded in relation to previous figures: Symphyta - green, Parasitica - red, Aculeata - blue.
The order Hymenoptera (bees, ants, wasps, sawflies) contains about eight percent of all described species, but no analytical studies have addressed the origins of this richness at family-level or above. To investigate which major subtaxa experienced significant shifts in diversification, we assembled a family-level phylogeny of the Hymenoptera using supertree methods. We used sister-group species-richness comparisons to infer the phylogenetic position of shifts in diversification. The supertrees most supported by the underlying input trees are produced using matrix representation with compatibility (MRC) (from an all-in and a compartmentalised analysis). Whilst relationships at the tips of the tree tend to be well supported, those along the backbone of the tree (e.g. between Parasitica superfamilies) are generally not. Ten significant shifts in diversification (six positive and four negative) are found common to both MRC supertrees. The Apocrita (wasps, ants, bees) experienced a positive shift at their origin accounting for approximately 4,000 species. Within Apocrita other positive shifts include the Vespoidea (vespoid wasps/ants containing 24,000 spp.), Anthophila + Sphecidae (bees/thread-waisted wasps; 22,000 spp.), Bethylidae + Chrysididae (bethylid/cuckoo wasps; 5,200 spp.), Dryinidae (dryinid wasps; 1,100 spp.), and Proctotrupidae (proctotrupid wasps; 310 spp.). Four relatively species-poor families (Stenotritidae, Anaxyelidae, Blasticotomidae, Xyelidae) have undergone negative shifts. There are some two-way shifts in diversification where sister taxa have undergone shifts in opposite directions. Our results suggest that numerous phylogenetically distinctive radiations contribute to the richness of large clades. They also suggest that evolutionary events restricting the subsequent richness of large clades are common. Problematic phylogenetic issues in the Hymenoptera are identified, relating especially to superfamily validity (e.g. "Proctotrupoidea", "Mymarommatoidea"), and deeper apocritan relationships. Our results should stimulate new functional studies on the causes of the diversification shifts we have identified. Possible drivers highlighted for specific adaptive radiations include key anatomical innovations, the exploitation of rich host groups, and associations with angiosperms. Low richness may have evolved as a result of geographical isolation, specialised ecological niches, and habitat loss or competition.
(See legend on next page.)
Fossil records of Solanaceae
Supermatrix details
Molecular age estimates
Solanaceae phylogeny. Phylogenetic relationships between major clades of Solanaceae based on a Maximum Likelihood analysis of a 1076 taxon supermatrix (ITS, waxy, ndhF, matK, psbA-trnH, trnS-G, trnL-F) with 10,672 bp of sequence data. Major clades recovered by previous phylogenetic studies [22,43,64] are labelled, as is the M Clade identified for the first time here. Clades with low bootstrap support (60-79%) are shown in pink, while strongly supported clades (boostrap support 80-100%) are in black. A. Major clades of Solanaceae. B. Relationships within Solanum.
The Solanaceae is a plant family of great economic importance. Despite a wealth of phylogenetic work on individual clades and a deep knowledge of particular cultivated species such as tomato and potato, a robust evolutionary framework with a dated molecular phylogeny for the family is still lacking. Here we investigate molecular divergence times for Solanaceae using a densely-sampled species-level phylogeny. We also review the fossil record of the family to derive robust calibration points, and estimate a chronogram using an uncorrelated relaxed molecular clock. Our densely-sampled phylogeny shows strong support for all previously identified clades of Solanaceae and strongly supported relationships between the major clades, particularly within Solanum. The Tomato clade is shown to be sister to section Petota, and the Regmandra clade is the first branching member of the Potato clade. The minimum age estimates for major splits within the family provided here correspond well with results from previous studies, indicating splits between tomato and potato around 8 Million years ago (Ma) with a 95% highest posterior density (HPD) 7-10 Ma, Solanum and Capsicum c. 19 Ma (95% HPD 17-21), and Solanum and Nicotiana c. 24 Ma (95% HPD 23-26). Our large time-calibrated phylogeny provides a significant step towards completing a fully sampled species-level phylogeny for Solanaceae, and provides age estimates for the whole family. The chronogram now includes 40% of known species and all but two monotypic genera, and is one of the best sampled angiosperm family phylogenies both in terms of taxon sampling and resolution published thus far. The increased resolution in the chronogram combined with the large increase in species sampling will provide much needed data for the examination of many biological questions using Solanaceae as a model system.
The organization of the bacteriophage 7 - 11 genome. Upon cell entry, the genome likely takes a circular form, so that the two gene clusters are divergently transcribed and separated by a long intergenic region which consists of 5' and 3' ends of the genome. The two genes with special importance in transcriptional regulation - σ and anti-sigma factor genes - are marked. The groups of genes involved in DNA replication and nucleotide metabolism are also marked.
Comparison of the sequence logos. The first three lines show the sequence logos for respectively: i) experimentally found phiEco32 promoters 6, ii) 7-11 long motifs (Table 1), iii) 7-11 short motifs (Table 1). The sequence logos were aligned, and one bp gap was introduced in phiEco32 sequence logo, so that similarities between the specificities can be compared. The logos were constructed by enoLOGOS 24.
Comparison of promoter layout and temporal classification for 7-11 and phiEco32. The upper and the lower line correspond to the promoter layout for phiEco32 and 7-11 genomes, respectively. The color code for the promoter and gene temporal classes is indicated in the figure legend.
Analyzing regulation of bacteriophage gene expression historically lead to establishing major paradigms of molecular biology, and may provide important medical applications in the future. Temporal regulation of bacteriophage transcription is commonly analyzed through a labor-intensive combination of biochemical and bioinformatic approaches and macroarray measurements. We here investigate to what extent one can understand gene expression strategies of lytic phages, by directly analyzing their genomes through bioinformatic methods. We address this question on a recently sequenced lytic bacteriophage 7 - 11 that infects bacterium Salmonella enterica. We identify novel promoters for the bacteriophage-encoded σ factor, and test the predictions through homology with another bacteriophage (phiEco32) that has been experimentally characterized in detail. Interestingly, standard approach based on multiple local sequence alignment (MLSA) fails to correctly identify the promoters, but a simpler procedure that is based on pairwise alignment of intergenic regions identifies the desired motifs; we argue that such search strategy is more effective for promoters of bacteriophage-encoded σ factors that are typically well conserved but appear in low copy numbers, which we also verify on two additional bacteriophage genomes. Identifying promoters for bacteriophage encoded σ factors together with a more straightforward identification of promoters for bacterial encoded σ factor, allows clustering the genes in putative early, middle and late class, and consequently predicting the temporal regulation of bacteriophage gene expression, which we demonstrate on phage 7-11. While MLSA algorithms proved highly useful in computational analysis of transcription regulation, we here established that a simpler procedure is more successful for identifying promoters that are recognized by bacteriophage encoded σ factor/RNA polymerase. We here used this approach for predicting sequence specificity of a novel (bacteriophage encoded) σ factor, and consequently inferring phage 7-11 transcription strategy. Therefore, direct analysis of bacteriophage genome sequences is a plausible first-line approach for efficiently inferring phage transcription strategies, and may provide a wealth of information on transcription initiation by diverse σ factors/RNA polymerases.
Ecological character displacement is a process of phenotypic differentiation of sympatric populations caused by interspecific competition. Such differentiation could facilitate speciation by enhancing reproductive isolation between incipient species, although empirical evidence for it at early stages of divergence when gene flow still occurs between the species is relatively scarce. Here we studied patterns of morphological variation in sympatric and allopatric populations of two hybridizing species of birds, the Common Nightingale (Luscinia megarhynchos) and the Thrush Nightingale (L. luscinia). We conducted principal component (PC) analysis of morphological traits and found that nightingale species converged in overall body size (PC1) and diverged in relative bill size (PC3) in sympatry. Closer analysis of morphological variation along geographical gradients revealed that the convergence in body size can be attributed largely to increasing body size with increasing latitude, a phenomenon known as Bergmann's rule. In contrast, interspecific interactions contributed significantly to the observed divergence in relative bill size, even after controlling for the effects of geographical gradients. We suggest that the divergence in bill size most likely reflects segregation of feeding niches between the species in sympatry. Our results suggest that interspecific competition for food resources can drive species divergence even in the face of ongoing hybridization. Such divergence may enhance reproductive isolation between the species and thus contribute to speciation.
Due to its history, with a high number of migration events, the Mediterranean basin represents a challenging area for population genetic studies. A large number of genetic studies have been carried out in the Mediterranean area using different markers but no consensus has been reached on the genetic landscape of the Mediterranean populations. In order to further investigate the genetics of the human Mediterranean populations, we typed 894 individuals from 11 Mediterranean populations with 25 single-nucleotide polymorphisms (SNPs) located on the X-chromosome. A high overall homogeneity was found among the Mediterranean populations except for the population from Morocco, which seemed to differ genetically from the rest of the populations in the Mediterranean area. A very low genetic distance was found between populations in the Middle East and most of the western part of the Mediterranean Sea.A higher migration rate in females versus males was observed by comparing data from X-chromosome, mt-DNA and Y-chromosome SNPs both in the Mediterranean and a wider geographic area.Multilocus association was observed among the 25 SNPs on the X-chromosome in the populations from Ibiza and Cosenza. Our results support both the hypothesis of (1) a reduced impact of the Neolithic Wave and more recent migration movements in NW-Africa, and (2) the importance of the Strait of Gibraltar as a geographic barrier. In contrast, the high genetic homogeneity observed in the Mediterranean area could be interpreted as the result of the Neolithic wave caused by a large demic diffusion and/or more recent migration events. A differentiated contribution of males and females to the genetic landscape of the Mediterranean area was observed with a higher migration rate in females than in males. A certain level of background linkage disequilibrium in populations in Ibiza and Cosenza could be attributed to their demographic background.
Some of the most difficult phylogenetic questions in evolutionary biology involve identification of the free-living relatives of parasitic organisms, particularly those of parasitic flowering plants. Consequently, the number of origins of parasitism and the phylogenetic distribution of the heterotrophic lifestyle among angiosperm lineages is unclear. Here we report the results of a phylogenetic analysis of 102 species of seed plants designed to infer the position of all haustorial parasitic angiosperm lineages using three mitochondrial genes: atp1, coxI, and matR. Overall, the mtDNA phylogeny agrees with independent studies in terms of non-parasitic plant relationships and reveals at least 11 independent origins of parasitism in angiosperms, eight of which consist entirely of holoparasitic species that lack photosynthetic ability. From these results, it can be inferred that modern-day parasites have disproportionately evolved in certain lineages and that the endoparasitic habit has arisen by convergence in four clades. In addition, reduced taxon, single gene analyses revealed multiple horizontal transfers of atp1 from host to parasite lineage, suggesting that parasites may be important vectors of horizontal gene transfer in angiosperms. Furthermore, in Pilostyles we show evidence for a recent host-to-parasite atp1 transfer based on a chimeric gene sequence that indicates multiple historical xenologous gene acquisitions have occurred in this endoparasite. Finally, the phylogenetic relationships inferred for parasites indicate that the origins of parasitism in angiosperms are strongly correlated with horizontal acquisitions of the invasive coxI group I intron. Collectively, these results indicate that the parasitic lifestyle has arisen repeatedly in angiosperm evolutionary history and results in increasing parasite genomic chimerism over time.
Stearoyl-CoA desaturases (SCDs) are key enzymes involved in de novo monounsaturated fatty acid synthesis. They catalyze the desaturation of saturated fatty acyl-CoA substrates at the delta-9 position, generating essential components of phospholipids, triglycerides, cholesterol esters and wax esters. Despite being crucial for interpreting SCDs roles across species, the evolutionary history of the SCD gene family in vertebrates has yet to be elucidated, in particular their isoform diversity, origin and function. This work aims to contribute to this fundamental effort. We show here, through comparative genomics and phylogenetics that the SCD gene family underwent an unexpectedly complex history of duplication and loss events. Paralogy analysis hints that SCD1 and SCD5 genes emerged as part of the whole genome duplications (2R) that occurred at the stem of the vertebrate lineage. The SCD1 gene family expanded in rodents with the parallel loss of SCD5 in the Muridae family. The SCD1 gene expansion is also observed in the Lagomorpha although without the SCD5 loss. In the amphibian Xenopus tropicalis we find a single SCD1 gene but not SCD5, though this could be due to genome incompleteness. In the analysed teleost species no SCD5 is found, while the surrounding SCD5-less locus is conserved in comparison to tetrapods. In addition, the teleost SCD1 gene repertoire expanded to two copies as a result of the teleost specific genome duplication (3R). Finally, we describe clear orthologues of SCD1 and SCD5 in the chondrichthian, Scyliorhinus canicula, a representative of the oldest extant jawed vertebrate clade. Expression analysis in S. canicula shows that whilst SCD1 is ubiquitous, SCD5 is mainly expressed in the brain, a pattern which might indicate an evolutionary conserved function. We conclude that the SCD1 and SCD5 genes emerged as part of the 2R genome duplications. We propose that the evolutionary conserved gene expression between distinct lineages underpins the importance of SCD activity in the brain (and probably the pancreas), in a yet to be defined role. We argue that an expression independent of an external stimulus, such as diet induced activity, emerged as a novel function in vertebrate ancestry allocated to the SCD5 isoform in various tissues (e.g. brain and pancreas), and it was selectively maintained throughout vertebrate evolution.
Antibiotic resistance represents a significant public health problem. When resistance genes are mobile, being carried on plasmids or phages, their spread can be greatly accelerated. Plasmids in particular have been implicated in the spread of antibiotic resistance genes. However, the selective pressures which favour plasmid-carried resistance genes have not been fully established. Here we address this issue with mathematical models of plasmid dynamics in response to different antibiotic treatment regimes. We show that transmission of plasmids is a key factor influencing plasmid-borne antibiotic resistance, but the dosage and interval between treatments is also important. Our results also hold when plasmids carrying the resistance gene are in competition with other plasmids that do not carry the resistance gene. By altering the interval between antibiotic treatments, and the dosage of antibiotic, we show that different treatment regimes can select for either plasmid-carried, or chromosome-carried, resistance. Our research addresses the effect of environmental variation on the evolution of plasmid-carried antibiotic resistance.
Mixed model analysis of variance for larval traits.
F tests and AIC values for larval traits from mixed model analyses of variance including single environmental variables as a covariate.
Map of the location of the study and the study ponds. Map of Sweden showing A) the location of the study region (square) and study ponds (black dots) in relation to geographic variation in anthropogenic acidification in 1990 and B) the study region with nine populations and their pond pHs (in brackets). The pond Nitta (*) was only used for environmental variation. (Source: Swedish Environmental Protection Agency:
Effects of the pH treatments on embryonic survival and larval traits for eight R. arvalis populations. Raw data mean ± SE A) embryonic survival, B) metamorphic mass, C) larval period, and D) growth rate. The source pond pH is on the x-axis, and the different pH treatments are pH 7.5 (open circles), pH 4.3 (black circles) and pH 4.0 (black triangles).
linear mixed model of embryonic survival.
Environmental stress can result in strong ecological and evolutionary effects on natural populations, but to what extent it drives adaptive divergence of natural populations is little explored. We used common garden experiments to study adaptive divergence in embryonic and larval fitness traits (embryonic survival, larval growth, and age and size at metamorphosis) in eight moor frog, Rana arvalis, populations inhabiting an acidification gradient (breeding pond pH 4.0 to 7.5) in southwestern Sweden. Embryos were raised until hatching at three (pH 4.0, 4.3 and 7.5) and larvae until metamorphosis at two (pH 4.3 and 7.5) pH treatments. To get insight into the putative selective agents along this environmental gradient, we measured relevant abiotic and biotic environmental variables from each breeding pond, and used linear models to test for phenotype-environment correlations. We found that acid origin populations had higher embryonic and larval acid tolerance (survival and larval period were less negatively affected by low pH), higher larval growth but slower larval development rates, and metamorphosed at a larger size. The phenotype-environment correlations revealed that divergence in embryonic acid tolerance and metamorphic size correlated most strongly with breeding pond pH, whereas divergence in larval period and larval growth correlated most strongly with latitude and predator density, respectively. Our results suggest that R. arvalis has diverged in response to pH mediated selection along this acidification gradient. However, as latitude and pH were closely spatially correlated in this study, further studies are needed to disentangle the specific agents of natural selection along acidification gradients. Our study highlights the need to consider the multiple interacting selective forces that drive adaptive divergence of natural populations along environmental stress gradients.
PCR-based surveys have shown that guppies (Poecilia reticulata) have an unusually large visual-opsin gene repertoire. This has led to speculation that opsin duplication and divergence has enhanced the evolution of elaborate male coloration because it improves spectral sensitivity and/or discrimination in females. However, this conjecture on evolutionary connections between opsin repertoire, vision, mate choice, and male coloration was generated with little data on gene expression. Here, we used RT-qPCR to survey visual-opsin gene expression in the eyes of males, females, and juveniles in order to further understand color-based sexual selection from the perspective of the visual system. Juvenile and adult (male and female) guppies express 10 visual opsins at varying levels in the eye. Two opsin genes in juveniles, SWS2B and RH2-2, accounted for > 85% of all visual-opsin transcripts in the eye, excluding RH1. This relative abundance (RA) value dropped to about 65% in adults, as LWS-A180 expression increased from approximately 3% to 20% RA. The juvenile-to-female transition also showed LWS-S180 upregulation from about 1.5% to 7% RA. Finally, we found that expression in guppies' SWS2-LWS gene cluster is negatively correlated with distance from a candidate locus control region (LCR). Selective pressures influencing visual-opsin gene expression appear to differ among age and sex. LWS upregulation in females is implicated in augmenting spectral discrimination of male coloration and courtship displays. In males, enhanced discrimination of carotenoid-rich food and possibly rival males are strong candidate selective pressures driving LWS upregulation. These developmental changes in expression suggest that adults possess better wavelength discrimination than juveniles. Opsin expression within the SWS2-LWS gene cluster appears to be regulated, in part, by a common LCR. Finally, by comparing our RT-qPCR data to MSP data, we were able to propose the first opsin-to-λmax assignments for all photoreceptor types in the cone mosaic.
Mutations in the Italian Bos primigenius mtDNA genome compared to the Bovine Reference Sequence (BRS) [31]
Haplogroups age estimation
Geographical distribution of mtDNA major clades. Mitochondrial D-loop sequences in ancient aurochen are reported as green branches on the phylogenies, with the number of separate individuals indicated, along with the current lineage nomenclature (P, E and T). Complete mtDNA genomes in modern cattle breeds are reported as blue branches (lineages T, Q and R) and similarly numbered. The phylogenetic affiliation of the available aurochen mtDNA genomes ([30]; this study) are indicated by the two black arrows. The geographic location of the Vado all'Arancio site is indicated in the figure inset.
Phylogenetic tree of complete mtDNA genomes. Bayesian consensus phylogenetic tree produced by PHYCAS under a prior model allowing for polytomies. Clusters of sequences linked by posterior probabilities higher than 0.7 have been collapsed. Sequences belonging to cluster T are not collapsed in order to show sub groupings, and the traditional haplogroup nomenclature is shown on the right. Clades R, P, Q and T are monophyletic, but only subclades T2 and T5 are supported as definable groups amongst the previously recognized T subclades. The disparate phylogenetic positions of the Italian and the British aurochsen are indicated. All other tips refer to modern cattle genomes.
Bayesian skyline plot. Bayesian skyline plot constructed using the Italian cattle dataset with the Bos primigenius sample under three different evolutionary rates 3.3*10-8, red, based on [35]; 1.6*10-8, black, based on [27]; 6.6*10-9, green, based on [36]. The continuous lines represent the median estimates; dotted lines represent the 95% HPD interval.
Bos primigenius, the aurochs, is the wild ancestor of modern cattle breeds and was formerly widespread across Eurasia and northern Africa. After a progressive decline, the species became extinct in 1627. The origin of modern taurine breeds in Europe is debated. Archaeological and early genetic evidence point to a single Near Eastern origin and a subsequent spread during the diffusion of herding and farming. More recent genetic data are instead compatible with local domestication events or at least some level of local introgression from the aurochs. Here we present the analysis of the complete mitochondrial genome of a pre-Neolithic Italian aurochs. In this study, we applied a combined strategy employing both multiplex PCR amplifications and 454 pyrosequencing technology to sequence the complete mitochondrial genome of an 11,450-year-old aurochs specimen from Central Italy. Phylogenetic analysis of the aurochs mtDNA genome supports the conclusions from previous studies of short mtDNA fragments--namely that Italian aurochsen were genetically very similar to modern cattle breeds, but highly divergent from the North-Central European aurochsen. Complete mitochondrial genome sequences are now available for several modern cattle and two pre-Neolithic mtDNA genomes from very different geographic areas. These data suggest that previously identified sub-groups within the widespread modern cattle mitochondrial T clade are polyphyletic, and they support the hypothesis that modern European breeds have multiple geographic origins.
Evolutionary rate analysis of 11S globulin family using branch-specific model of PAML
The content of seed 11S globulins and the copy number of their genes
Source of the 11S globulin genes used in this study
The COG of 11S globulin gene family. Solid lines show symmetrical BeTs (the Best Hits) and broken lines show asymmetrical BeTs. Genes from the same species are adjacent. Gene ID is indicated and the prefix "Rc" denotes IDs from Ricinus communis. Among these IDs, At1g03880, At1g03890, At4g28520 and At5g44120 are known to encode CRB, CRU2, CRC and CRA1, respectively; Glyma03g32030, Glyma03g32020, Glyma19g34780, Glyma10g04280, Glyma13g18450 and Glyma19g34770 to encode Gy1-5 and Gy7, respectively; Rc29600.m000561, Rc29600.m000564, Rc30005.m001289, Rc30005.m001290, Rc29611.m000223, Rc29200.m000169, Rc29629.m001355, Rc29709.m001187, Rc29716.m000305, Rc29200.m000167 and Rc30005.m001288 to encode RcLEG1-1 to RcLEG1-5 and RcLEG2-1 to RcLEG2-6, respectively; and Os01g55690, Os10g26060, Os03g31360, Os02g15169, Os02g15178, Os02g15150, Os02g16820, Os02g16830, Os02g14600, Os02g15070, Os02g25640 and Os02g15090 to encode GluA-1, GluA-2, GluA-3, GluB-1a, GluB-1b, GluB-2, GluB-5, GluB-4, GluB-7, GluB-6, GluC-1 and GluD, respectively.
Phylogenetic relationships of sequences within the 11S globulin gene family by neighbor joining (NJ) method with bootstrap support above 50% shown at the nodes. Letter A-E indicates the branches used in analysis of evolutionary rate and positive selection.
Seed storage proteins are a major source of dietary protein, and the content of such proteins determines both the quantity and quality of crop yield. Significantly, examination of the protein content in the seeds of crop plants shows a distinct difference between monocots and dicots. Thus, it is expected that there are different evolutionary patterns in the genes underlying protein synthesis in the seeds of these two groups of plants. Gene duplication, evolutionary rate and positive selection of a major gene family of seed storage proteins (the 11S globulin genes), were compared in dicots and monocots. The results, obtained from five species in each group, show more gene duplications, a higher evolutionary rate and positive selections of this gene family in dicots, which are rich in 11S globulins, but not in the monocots. Our findings provide evidence to support the suggestion that gene duplication and an accelerated evolutionary rate may be associated with higher protein synthesis in dicots as compared to monocots.
The log10 of cryptic species reports (CSR) as a function of the log10 number of described species in the respective taxon. Deviations from the regression line represent CSR taxon variation. Dashed lines represent 95% confidence intervals.
Regression of CSR taxon variation on taxon study bias for 19 metazoan taxa. Dashed lines represent 95% confidence intervals.
The log10 of CSR as a function of the log10 number of described species in the respective region. Deviations from the regression line represent CSR region variation. Dashed lines represent 95% confidence intervals.
Regression of CSR taxon variation on biogeographical region study bias. Dashed lines represent 95% confidence intervals.
Cryptic species are two or more distinct but morphologically similar species that were classified as a single species. During the past two decades we observed an exponential growth of publications on cryptic species. Recently published reviews have demonstrated cryptic species have profound consequences on many biological disciplines. It has been proposed that their distribution is non-random across taxa and biomes. We analysed a literature database for the taxonomic and biogeographical distribution of cryptic animal species reports. Results from regression analysis indicate that cryptic species are almost evenly distributed among major metazoan taxa and biogeographical regions when corrected for species richness and study intensity. This indicates that morphological stasis represents an evolutionary constant and that cryptic metazoan diversity does predictably affect estimates of earth's animal diversity. Our findings have direct theoretical and practical consequences for a number of prevailing biological questions with regard to global biodiversity estimates, conservation efforts and global taxonomic initiatives.
The lancelet Asymmetron inferum (subphylum Cephalochordata) was recently discovered on the ocean floor off the southwest coast of Japan at a depth of 229 m, in an anaerobic and sulfide-rich environment caused by decomposing bodies of the sperm whale Physeter macrocephalus. This deep sulfide-rich habitat of A. inferum is unique among the lancelets. The distinguishing adaptation of this species to such an extraordinary habitat can be considered in a phylogenetic framework. As the first step of reconstruction of the evolutionary processes in this species, we investigated its phylogenetic position based on 11 whole mitochondrial genome sequences including the newly determined ones of the whale-fall lancelet A. inferum and two coral-reef congeners. Our phylogenetic analyses showed that extant lancelets are clustered into two major clades, the Asymmetron clade and the Epigonichthys + Branchiostoma clade. A. inferum was in the former and placed in the sister group to A. lucayanum complex. The divergence time between A. inferum and A. lucayanum complex was estimated to be 115 Mya using the penalized likelihood (PL) method or 97 Mya using the nonparametric rate smoothing (NPRS) method (the middle Cretaceous). These are far older than the first appearance of large whales (the middle Eocene, 40 Mya). We also discovered that A. inferum mitogenome (mitochondrial genome) has been subjected to large-scale gene rearrangements, one feature of rearrangements being unique among the lancelets and two features shared with A. lucayanum complex. Our study supports the monophyly of genus Asymmetron assumed on the basis of the morphological characters. Furthermore, the features of the A. inferum mitogenome expand our knowledge of variation within cephalochordate mitogenomes, adding a new case of transposition and inversion of the trnQ gene. Our divergence time estimation suggests that A. inferum remained a member of the Mesozoic and the early Cenozoic large vertebrate-fall communities before shifting to become a whale-fall specialist.
Background The extant squamates (>9400 known species of lizards and snakes) are one of the most diverse and conspicuous radiations of terrestrial vertebrates, but no studies have attempted to reconstruct a phylogeny for the group with large-scale taxon sampling. Such an estimate is invaluable for comparative evolutionary studies, and to address their classification. Here, we present the first large-scale phylogenetic estimate for Squamata. Results The estimated phylogeny contains 4161 species, representing all currently recognized families and subfamilies. The analysis is based on up to 12896 base pairs of sequence data per species (average = 2497 bp) from 12 genes, including seven nuclear loci (BDNF, c-mos, NT3, PDC, R35, RAG-1, and RAG-2), and five mitochondrial genes (12S, 16S, cytochrome b, ND2, and ND4). The tree provides important confirmation for recent estimates of higher-level squamate phylogeny based on molecular data (but with more limited taxon sampling), estimates that are very different from previous morphology-based hypotheses. The tree also includes many relationships that differ from previous molecular estimates and many that differ from traditional taxonomy. Conclusions We present a new large-scale phylogeny of squamate reptiles that should be a valuable resource for future comparative studies. We also present a revised classification of squamates at the family and subfamily level to bring the taxonomy more in line with the new phylogenetic hypothesis. This classification includes new, resurrected, and modified subfamilies within gymnophthalmid and scincid lizards, and boid, colubrid, and lamprophiid snakes.
The use of mitochondrial DNA data in phylogenetics is controversial, yet studies that combine mitochondrial and nuclear DNA data (mtDNA and nucDNA) to estimate phylogeny are common, especially in vertebrates. Surprisingly, the consequences of combining these data types are largely unexplored, and many fundamental questions remain unaddressed in the literature. For example, how much do trees from mtDNA and nucDNA differ? How are topological conflicts between these data types typically resolved in the combined-data tree? What determines whether a node will be resolved in favor of mtDNA or nucDNA, and are there any generalities that can be made regarding resolution of mtDNA-nucDNA conflicts in combined-data trees? Here, we address these and related questions using new and published nucDNA and mtDNA data for Plethodon salamanders and published data from 13 other vertebrate clades (including fish, frogs, lizards, birds, turtles, and mammals). We find widespread discordance between trees from mtDNA and nucDNA (30-70% of nodes disagree per clade), but this discordance is typically not strongly supported. Despite often having larger numbers of variable characters, mtDNA data do not typically dominate combined-data analyses, and combined-data trees often share more nodes with trees from nucDNA alone. There is no relationship between the proportion of nodes shared between combined-data and mtDNA trees and relative numbers of variable characters or levels of homoplasy in the mtDNA and nucDNA data sets. Congruence between trees from mtDNA and nucDNA is higher on branches that are longer and deeper in the combined-data tree, but whether a conflicting node will be resolved in favor mtDNA or nucDNA is unrelated to branch length. Conflicts that are resolved in favor of nucDNA tend to occur at deeper nodes in the combined-data tree. In contrast to these overall trends, we find that Plethodon have an unusually large number of strongly supported conflicts between data types, which are generally resolved in favor of mtDNA in the combined-data tree (despite the large number of nuclear loci sampled). Overall, our results from 14 vertebrate clades show that combined-data analyses are not necessarily dominated by the more variable mtDNA data sets. However, given cases like Plethodon, there is also the need for routine checking of incongruence between mtDNA and nucDNA data and its impacts on combined-data analyses.
Analytical predictions on the effect of population structure on CI-dynamics. The figure shows the threshold cost-to-benefit ratio (C/B) for the invasion of CI-symbionts. A. Threshold ratio as a function of host migration rate for deme sizes of N = 4 (solid line), N = 20 (dashed line), and N = 50 (dotted line). B. Threshold ratio as a function of deme size for host migration rates of m = 0.05 (solid line), m = 0.1 (dashed line), and m = 0.2 (dotted line).
Simulation results on the effect of population structure on CI-invasion frequency. The figure shows A, the frequency of invasion as a function of migration rate for deme sizes of N = 12 (solid line), N = 20 (dashed line), N = 60 (dotted line) and N = 180 (dashed-dotted line) and B, the frequency of invasion as a function of deme size for host migration rates of m = 0.01 (solid line), m = 0.05 (dashed line), and m = 0.1 (dotted line).
Simulation results on the mean population prevalence and between-deme polymorphism in infection. Mean population prevalence (± SE) and the proportion of demes infected as a function of deme size in simulations with equal migration rate in males and females. Results are shown for host migration rates of m = 0.01 (solid line), m = 0.05 (dashed line), m = 0.1 (dotted line), and m = 0.2 (dashed-dotted line). The grey line indicates the equilibrium frequency of pe = 0.94 predicted for a panmictic population of infinite size.
Simulation results on the effect of population structure on CI-invasion frequency in a stepping stone model. The figure shows the frequency of invasion, A, as a function of migration rate for deme sizes of N = 12 (solid line), N = 20 (dashed line), N = 60 (dotted line) and N = 180 (dashed-dotted line) and B, as a function of deme size for host migration rates of m = 0.01 (solid line), m = 0.05 (dashed line), and m = 0.1 (dotted line). See Figure 2 for the results with an island model.
Simulation results on the effect of sex-specific dispersal rates on CI-invasion frequency. Invasion frequency is shown for different combinations of male migration rate (on the abscissa) and female migration rate (mf = 0.01: solid line; mf = 0.05: dashed line; mf = 0.1: dotted line; mf = 0.2: dashed-dotted line). Dots indicate combinations of equal male and female migration.
Maternally transmitted symbionts have evolved a variety of ways to promote their spread through host populations. One strategy is to hamper the reproduction of uninfected females by a mechanism called cytoplasmic incompatibility (CI). CI occurs in crosses between infected males and uninfected females and leads to partial to near-complete infertility. CI-infections are under positive frequency-dependent selection and require genetic drift to overcome the range of low frequencies where they are counter-selected. Given the importance of drift, population sub-division would be expected to facilitate the spread of CI. Nevertheless, a previous model concluded that variance in infection between competing groups of breeding individuals impedes the spread of CI. In this paper we derive a model on the spread of CI-infections in populations composed of demes linked by restricted migration. Our model shows that population sub-division facilitates the invasion of CI. While host philopatry (low migration) favours the spread of infection, deme size has a non-monotonous effect, with CI-invasion being most likely at intermediate deme size. Individual-based simulations confirm these predictions and show that high levels of local drift speed up invasion but prevent high levels of prevalence across the entire population. Additional simulations with sex-specific migration rates further show that low migration rates of both sexes are required to facilitate the spread of CI. Our analyses show that population structure facilitates the invasion of CI-infections. Since some level of sub-division is likely to occur in most natural populations, our results help to explain the high incidence of CI-infections across species of arthropods. Furthermore, our work has important implications for the use of CI-systems in order to genetically modify natural populations of disease vectors.
It has long been known that rates of synonymous substitutions are unusually low in mitochondrial genes of flowering and other land plants. Although two dramatic exceptions to this pattern have recently been reported, it is unclear how often major increases in substitution rates occur during plant mitochondrial evolution and what the overall magnitude of substitution rate variation is across plants. A broad survey was undertaken to evaluate synonymous substitution rates in mitochondrial genes of angiosperms and gymnosperms. Although most taxa conform to the generality that plant mitochondrial sequences evolve slowly, additional cases of highly accelerated rates were found. We explore in detail one of these new cases, within the genus Silene. A roughly 100-fold increase in synonymous substitution rate is estimated to have taken place within the last 5 million years and involves only one of ten species of Silene sampled in this study. Examples of unusually slow sequence evolution were also identified. Comparison of the fastest and slowest lineages shows that synonymous substitution rates vary by four orders of magnitude across seed plants. In other words, some plant mitochondrial lineages accumulate more synonymous change in 10,000 years than do others in 100 million years. Several perplexing cases of gene-to-gene variation in sequence divergence within a plant were uncovered. Some of these probably reflect interesting biological phenomena, such as horizontal gene transfer, mitochondrial-to-nucleus transfer, and intragenomic variation in mitochondrial substitution rates, whereas others are likely the result of various kinds of errors. The extremes of synonymous substitution rates measured here constitute by far the largest known range of rate variation for any group of organisms. These results highlight the utility of examining absolute substitution rates in a phylogenetic context rather than by traditional pairwise methods. Why substitution rates are generally so low in plant mitochondrial genomes yet occasionally increase dramatically remains mysterious.
While genes that are conserved between related bacterial species are usually thought to have evolved along with the species, phylogenetic trees reconstructed for individual genes may contradict this picture and indicate horizontal gene transfer. Individual trees are often not resolved with high confidence, however, and in that case alternative trees are generally not considered as contradicting the species tree, although not confirming it either. Here we conduct an in-depth analysis of 401 protein phylogenetic trees inferred with varying levels of confidence for three lactobacilli from the acidophilus complex. At present the relationship between these bacteria, isolated from environments as diverse as the gastrointestinal tract (Lactobacillus acidophilus and Lactobacillus johnsonii) and yogurt (Lactobacillus delbrueckii ssp. bulgaricus), is ambiguous due to contradictory phenotypical and 16S rRNA based classifications. Among the 401 phylogenetic trees, those that could be reconstructed with high confidence support the 16S-rRNA tree or one alternative topology in an astonishing 3:2 ratio, while the third possible topology is practically absent. Lowering the confidence threshold for trees to be taken into consideration does not significantly affect this ratio, and therefore suggests that gene transfer may have affected as much as 40% of the core genome genes. Gene function bias suggests that the 16S rRNA phylogeny of the acidophilus complex, which indicates that L. acidophilus and L. delbrueckii ssp. bulgaricus are the closest related of these three species, is correct. A novel approach of comparison of interspecies protein divergence data employed in this study allowed to determine that gene transfer most likely took place between the lineages of the two species found in the gastrointestinal tract. This case-study reports an unprecedented level of phylogenetic incongruence, presumably resulting from extensive horizontal gene transfer. The data give a first indication of the large extent of gene transfer that may take place in the gastrointestinal tract and its accumulated effect. For future studies, our results should encourage a careful weighing of data on phylogenetic tree topology, confidence and distribution to conclude on the absence or presence and extent of horizontal gene transfer.
SEMs of ephippial females of extant Daphnia. a-b. Daphnia (Daphnia) pulex, general view of ephippial female and ephippium. c-d. Daphnia (Ctenodaphnia) magna, general view of ephippial female, ephippium and its sculpture. Red lines show the orientation of the egg axes. White scale bars: 1 mm for a, c-d; 0.1 mm for b.
Single-egged ephippial females of Daphnia and Simocephalus. a-b. Daphnia (Ctenodaphnia) pusilla, general view of ephippial female and ephippium. c. Simocephalus exspinosus, ephippial female. d. Simocephalus vetulus, ephippium. Scales: 1 mm for a, c; 0.1 mm for b, d.
SEMs of Mesozoic ephippia of Daphnia from Khotont, Mongolia. a-c. Putative ephippium of Daphnia (Daphnia) from fragment 2046, its dorsal portion and reticulation. Note that the anterior half of the ephippium is deeper than the posterior half, giving a sub-triangular shape. d-f. Ephippium of Daphnia (Ctenodaphnia) from fragment 2018, reticulation and fine sculpture of valve. Red lines show the putative orientation of the egg axes. White scale bars: 0.1 mm for a, d; 0.01 mm for b-c, e-f.
SEMs of Mesozoic daphniid ephippia from Khotont, Mongolia. a-c. Daphnia (Ctenodaphnia) from fragment 2048, its caudal needle and dorsal portion. d. Daphnia (Ctenodaphnia) from fragment 2044. e. Unknown daphniid from fragment 2009. f. Simocephalus from fragment 2026. Scales: 0.1 mm for a, d-f; 0.01 mm for b-c.
Global map showing fossil records of the genus Daphnia and the antiquity of the subgenus Ctenodaphnia in the former Laurasia. Circles indicate fossil records of Daphnia colored by subgenus (Ctenodaphnia is red and Daphnia s. str. is blue). The grey shaded continents indicate the former Gondwanaland regions and the unshaded regions represent the former Laurasia regions. Red shading in North America indicates the present day distribution of the basal Ctenodaphnia from phylogenetic and morphological information.
The timescale of the origins of Daphnia O. F. Mueller (Crustacea: Cladocera) remains controversial. The origin of the two main subgenera has been associated with the breakup of the supercontinent Pangaea. This vicariance hypothesis is supported by reciprocal monophyly, present day associations with the former Gondwanaland and Laurasia regions, and mitochondrial DNA divergence estimates. However, previous multilocus nuclear DNA sequence divergence estimates at < 10 Million years are inconsistent with the breakup of Pangaea. We examined new and existing cladoceran fossils from a Mesozoic Mongolian site, in hopes of gaining insights into the timescale of the evolution of Daphnia. We describe new fossils of ephippia from the Khotont site in Mongolia associated with the Jurassic-Cretaceous boundary (about 145 MYA) that are morphologically similar to several modern genera of the family Daphniidae, including the two major subgenera of Daphnia, i.e., Daphnia s. str. and Ctenodaphnia. The daphniid fossils co-occurred with fossils of the predaceous phantom midge (Chaoboridae). Our findings indicate that the main subgenera of Daphnia are likely much older than previously known from fossils (at least 100 MY older) or from nuclear DNA estimates of divergence. The results showing co-occurrence of the main subgenera far from the presumed Laurasia/Gondwanaland dispersal barrier shortly after formation suggests that vicariance from the breakup of Pangaea is an unlikely explanation for the origin of the main subgenera. The fossil impressions also reveal that the coevolution of a dipteran predator (Chaoboridae) with the subgenus Daphnia is much older than previously known -- since the Mesozoic.
Speciation often occurs in complex or uncertain temporal and spatial contexts. Processes such as reinforcement, allopatric divergence, and assortative mating can proceed at different rates and with different strengths as populations diverge. The Central American Midas cichlid fish species complex is an important case study for understanding the processes of speciation. Previous analyses have demonstrated that allopatric processes led to species formation among the lakes of Nicaragua as well as sympatric speciation that is occurring within at least one crater lake. However, since speciation is an ongoing process and sampling genetic diversity of such lineages can be biased by collection scheme or random factors, it is important to evaluate the robustness of conclusions drawn on individual time samples. In order to assess the validity and reliability of inferences based on different genetic samples, we have analyzed fish from several lakes in Nicaragua sampled at three different times over 16 years. In addition, this time series allows us to analyze the population genetic changes that have occurred between lakes, where allopatric speciation has operated, as well as between different species within lakes, some of which have originated by sympatric speciation. Focusing on commonly used genetic markers, we have analyzed both DNA sequences from the complete mitochondrial control region as well as nuclear DNA variation at ten microsatellite loci from these populations, sampled thrice in a 16 year time period, to develop a robust estimate of the population genetic history of these diversifying lineages. The conclusions from previous work are well supported by our comprehensive analysis. In particular, we find that the genetic diversity of derived crater lake populations is lower than that of the source population regardless of when and how each population was sampled. Furthermore, changes in various estimates of genetic diversity within lakes are minimal and provide no evidence for drastic changes during the last 20 years, supporting the hypothesis that the processes which have resulted in rapid speciation are primarily historical. In contrast, there is some evidence for ongoing evolution, particularly selection, in all lakes except crater Lake Masaya, perhaps reflecting the persistence of speciational processes. Importantly, we find that the crater Lake Apoyo population, for which strong evidence of sympatric speciation has been demonstrated, has lower genetic diversity than other crater lakes and the strongest evidence for ongoing selection.
Senescence is integral to the flowering plant life-cycle. Senescence-like processes occur also in non-angiosperm land plants, algae and photosynthetic prokaryotes. Increasing numbers of genes have been assigned functions in the regulation and execution of angiosperm senescence. At the same time there has been a large expansion in the number and taxonomic spread of plant sequences in the genome databases. The present paper uses these resources to make a study of the evolutionary origins of angiosperm senescence based on a survey of the distribution, across plant and microbial taxa, and expression of senescence-related genes. Phylogeny analyses were carried out on protein sequences corresponding to genes with demonstrated functions in angiosperm senescence. They include proteins involved in chlorophyll catabolism and its control, homeoprotein transcription factors, metabolite transporters, enzymes and regulators of carotenoid metabolism and of anthocyanin biosynthesis. Evolutionary timelines for the origins and functions of particular genes were inferred from the taxonomic distribution of sequences homologous to those of angiosperm senescence-related proteins. Turnover of the light energy transduction apparatus is the most ancient element in the senescence syndrome. By contrast, the association of phenylpropanoid metabolism with senescence, and integration of senescence with development and adaptation mediated by transcription factors, are relatively recent innovations of land plants. An extended range of senescence-related genes of Arabidopsis was profiled for coexpression patterns and developmental relationships and revealed a clear carotenoid metabolism grouping, coordinated expression of genes for anthocyanin and flavonoid enzymes and regulators and a cluster pattern of genes for chlorophyll catabolism consistent with functional and evolutionary features of the pathway. The expression and phylogenetic characteristics of senescence-related genes allow a framework to be constructed of decisive events in the evolution of the senescence syndrome of modern land-plants. Combining phylogenetic, comparative sequence, gene expression and morphogenetic information leads to the conclusion that biochemical, cellular, integrative and adaptive systems were progressively added to the ancient primary core process of senescence as the evolving plant encountered new environmental and developmental contexts.
Broad-scale phylogeographic studies of freshwater organisms provide not only an invaluable framework for understanding the evolutionary history of species, but also a genetic imprint of the paleo-hydrological dynamics stemming from climatic change. Few such studies have been carried out in Siberia, a vast region over which the extent of Pleistocene glaciation is still disputed. Brachymystax lenok is a salmonid fish distributed throughout Siberia, exhibiting two forms hypothesized to have undergone extensive range expansion, genetic exchange, and multiple speciation. A comprehensive phylogeographic investigation should clarify these hypotheses as well as provide insights on Siberia's paleo-hydrological stability. Molecular-sequence (mtDNA) based phylogenetic and morphological analysis of Brachymystax throughout Siberia support that sharp- and blunt-snouted lenok are independent evolutionary lineages, with the majority of their variation distributed among major river basins. Their evolutionary independence was further supported through the analysis of 11 microsatellite loci in three areas of sympatry, which revealed little to no evidence of introgression. Phylogeographic structure reflects climatic limitations, especially for blunt-snouted lenok above 56 degrees N during one or more glacial maxima. Presumed glacial refugia as well as interbasin exchange were not congruent for the two lineages, perhaps reflecting differing dispersal abilities and response to climatic change. Inferred demographic expansions were dated earlier than the Last Glacial Maximum (LGM). Evidence for repeated trans-basin exchange was especially clear between the Amur and Lena catchments. Divergence of sharp-snouted lenok in the Selenga-Baikal catchment may correspond to the isolation of Lake Baikal in the mid-Pleistocene, while older isolation events are apparent for blunt-snouted lenok in the extreme east and sharp-snouted lenok in the extreme west of their respective distributions. Sharp- and blunt-snouted lenok have apparently undergone a long, independent, and demographically dynamic evolutionary history in Siberia, supporting their recognition as two good biological species. Considering the timing and extent of expansions and trans-basin dispersal, it is doubtful that these historical dynamics could have been generated without major rearrangements in the paleo-hydrological network, stemming from the formation and melting of large-scale glacial complexes much older than the LGM.
Amblyomma cajennense F. is one of the best known and studied ticks in the New World because of its very wide distribution, its economical importance as pest of domestic ungulates, and its association with a variety of animal and human pathogens. Recent observations, however, have challenged the taxonomic status of this tick and indicated that intraspecific cryptic speciation might be occurring. In the present study, we investigate the evolutionary and demographic history of this tick and examine its genetic structure based on the analyses of three mitochondrial (12SrDNA, d-loop, and COII) and one nuclear (ITS2) genes. Because A. cajennense is characterized by a typical trans-Amazonian distribution, lineage divergence dating is also performed to establish whether genetic diversity can be linked to dated vicariant events which shaped the topology of the Neotropics. Total evidence analyses of the concatenated mtDNA and nuclear + mtDNA datasets resulted in well-resolved and fully congruent reconstructions of the relationships within A. cajennense. The phylogenetic analyses consistently found A. cajennense to be monophyletic and to be separated into six genetic units defined by mutually exclusive haplotype compositions and habitat associations. Also, genetic divergence values showed that these lineages are as distinct from each other as recognized separate species of the same genus. The six clades are deeply split and node dating indicates that they started diverging in the middle-late Miocene. Behavioral differences and the results of laboratory cross-breeding experiments had already indicated that A. cajennense might be a complex of distinct taxonomic units. The combined and congruent mitochondrial and nuclear genetic evidence from this study reveals that A. cajennense is an assembly of six distinct species which have evolved separately from each other since at least 13.2 million years ago (Mya) in the earliest and 3.3 Mya in the latest lineages. The temporal and spatial diversification modes of the six lineages overlap the phylogeographical history of other organisms with similar extant trans-Amazonian distributions and are consistent with the present prevailing hypothesis that Neotropical diversity often finds its origins in the Miocene, after the Andean uplift changed the topology and consequently the climate and ecology of the Neotropics.
Sampling localities of N. magellanica in Patagonia where: 1) Puerto Montt (R.F.), 2) Metri (R.F.), 3) Concoto Island (Ch.A.), 4) Puerto Aguirre (Ch.A.), 5) Costa Channel (Ch.A.), 6) Serrano Channel (Ch.A.), 7) London Island (S.M.), 8) Santa Ana (S.M.) 9) Possession Bay (S.M.), 10) Tekenika Bay (C.H.), 11) Orange Bay (C.H.), 12) Virginia Bay (C.H.), 13) Puerto Deseado, 14) Falkland/Malvinas Islands. R.F.=Reloncaví Fjord; Ch.A.=Chonos Archipelago; S.M.=Strait of Magellan; C.H.=Cape Horn. * Significant values after Bonferroni correction.
Haplotype network including 357 Nacella magellanica mtDNA COI sequences. Each haplotype is represented by a colored circle indicating where it was collected; the size of the circle is proportional to its frequency in the whole sample. mv=median vector (theoretical haplotype that has not been collected but should exist).
Pairwise difference distribution (mismatch distribution) for the Cytochrome c oxidase subunit I (COI) in N. magellanica in different areas of Patagonia.A) Pacific Patagonia; B) Puerto Deseado; C) Falkland/Malvinas Islands. R.F.=Reloncaví Fjord; Ch.A.=Chonos Archipelago; S.M. Strait of Magellan; C.H. Cape Horn.
Prevailing direction of currents and winds in southern South America and frequency of the dominant haplotypes in each locality. H.C.S.=Humboldt Current System; C.H.C.=Cape Horn Current, M/FC=Falkland/Malvinas Current; P.C.C. Patagonian Coastal Current. Migration rate measured as effective number of migrants (Nem) among the main areas in Patagonia (Pacific Patagonia, Puerto Deseado and the Falkland/Malvinas Islands).
Historical demographic trends of the effective population size (Ne) constructed using a Bayesian skyline plot approach based on Cytochrome oxidase subunit I (COI) haplotypes of N. magellanica. The y-axis is the product of effective population size (Ne) and generation length in a log scale while the x-axis is the time in 103 before present. The median estimate (black solid line) and 95% highest probability density (HPD) limits (grey) are shown. The thick dashed line represents the time of the most recent ancestor (trcma) and the thin dashed line represents time for the expansion in the species.
Background Patagonia extends for more than 84,000 km of irregular coasts is an area especially apt to evaluate how historic and contemporary processes influence the distribution and connectivity of shallow marine benthic organisms. The true limpet Nacella magellanica has a wide distribution in this province and represents a suitable model to infer the Quaternary glacial legacy on marine benthic organisms. This species inhabits ice-free rocky ecosystems, has a narrow bathymetric range and consequently should have been severely affected by recurrent glacial cycles during the Quaternary. We performed phylogeographic and demographic analyses of N. magellanica from 14 localities along its distribution in Pacific Patagonia, Atlantic Patagonia, and the Falkland/Malvinas Islands. Results Mitochondrial (COI) DNA analyses of 357 individuals of N. magellanica revealed an absence of genetic differentiation in the species with a single genetic unit along Pacific Patagonia. However, we detected significant genetic differences among three main groups named Pacific Patagonia, Atlantic Patagonia and Falkland/Malvinas Islands. Migration rate estimations indicated asymmetrical gene flow, primarily from Pacific Patagonia to Atlantic Patagonia (Nem=2.21) and the Falkland/Malvinas Islands (Nem=16.6). Demographic reconstruction in Pacific Patagonia suggests a recent recolonization process (< 10 ka) supported by neutrality tests, mismatch distribution and the median-joining haplotype genealogy. Conclusions Absence of genetic structure, a single dominant haplotype, lack of correlation between geographic and genetic distance, high estimated migration rates and the signal of recent demographic growth represent a large body of evidence supporting the hypothesis of rapid postglacial expansion in this species in Pacific Patagonia. This expansion could have been sustained by larval dispersal following the main current system in this area. Lower levels of genetic diversity in inland sea areas suggest that fjords and channels represent the areas most recently colonized by the species. Hence recolonization seems to follow a west to east direction to areas that were progressively deglaciated. Significant genetic differences among Pacific, Atlantic and Falkland/Malvinas Islands populations may be also explained through disparities in their respective glaciological and geological histories. The Falkland/Malvinas Islands, more than representing a glacial refugium for the species, seems to constitute a sink area considering the strong asymmetric gene flow detected from Pacific to Atlantic sectors. These results suggest that historical and contemporary processes represent the main factors shaping the modern biogeography of most shallow marine benthic invertebrates inhabiting the Patagonian Province.
Background The Batrachoididae family is a group of marine teleosts that includes several species with more complicated physiological characteristics, such as their excretory, reproductive, cardiovascular and respiratory systems. Previous studies of the 5S rDNA gene family carried out in four species from the Western Atlantic showed two types of this gene in two species but only one in the other two, under processes of concerted evolution and birth-and-death evolution with purifying selection. Here we present results of the 5S rDNA and another two gene families in Halobatrachus didactylus, an Eastern Atlantic species, and draw evolutionary inferences regarding the gene families. In addition we have also mapped the genes on the chromosomes by two-colour fluorescence in situ hybridization (FISH). Results Two types of 5S rDNA were observed, named type α and type β. Molecular analysis of the 5S rDNA indicates that H. didactylus does not share the non-transcribed spacer (NTS) sequences with four other species of the family; therefore, it must have evolved in isolation. Amplification with the type β specific primers amplified a specific band in 9 specimens of H. didactylus and two of Sparus aurata. Both types showed regulatory regions and a secondary structure which mark them as functional genes. However, the U2 snRNA gene and the ITS-1 sequence showed one electrophoretic band and with one type of sequence. The U2 snRNA sequence was the most variable of the three multigene families studied. Results from two-colour FISH showed no co-localization of the gene coding from three multigene families and provided the first map of the chromosomes of the species. Conclusions A highly significant finding was observed in the analysis of the 5S rDNA, since two such distant species as H. didactylus and Sparus aurata share a 5S rDNA type. This 5S rDNA type has been detected in other species belonging to the Batrachoidiformes and Perciformes orders, but not in the Pleuronectiformes and Clupeiformes orders. Two hypotheses have been outlined: one is the possible vertical permanence of the shared type in some fish lineages, and the other is the possibility of a horizontal transference event between ancient species of the Perciformes and Batrachoidiformes orders. This finding opens a new perspective in fish evolution and in the knowledge of the dynamism of the 5S rDNA. Cytogenetic analysis allowed some evolutionary trends to be roughed out, such as the progressive change in the U2 snDNA and the organization of (GATA)n repeats, from dispersed to localized in one locus. The accumulation of (GATA)n repeats in one chromosome pair could be implicated in the evolution of a pair of proto-sex chromosomes. This possibility could situate H. didactylus as the most highly evolved of the Batrachoididae family in terms of sex chromosome biology.
Owl monkeys, belonging to the genus Aotus, have been extensively used as animal models in biomedical research but few reports have focused on the taxonomy and phylogeography of this genus. Moreover, the morphological similarity of several Aotus species has led to frequent misidentifications, mainly at the boundaries of their distribution. In this study, sequence data from five mitochondrial regions and the nuclear, Y-linked, SRY gene were used for species identification and phylogenetic reconstructions using well characterized specimens of Aotus nancymaae, A. vociferans, A. lemurinus, A. griseimembra, A. trivirgatus, A. nigriceps, A. azarae boliviensis and A. infulatus. The complete MT-CO1, MT-TS1, MT-TD, MT-CO2, MT-CYB regions were sequenced in 18 Aotus specimens. ML and Bayesian topologies of concatenated data and separate regions allowed for the proposition of a tentative Aotus phylogeny, indicating that Aotus diverged some 4.62 Million years before present (MYBP). Similar analyses with included GenBank specimens were useful for assessing species identification of deposited data. Alternative phylogenetic reconstructions, when compared with karyotypic and biogeographic data, led to the proposition of evolutionary scenarios questioning the conventional diversification of this genus in monophyletic groups with grey and red necks. Moreover, genetic distance estimates and haplotypic differences were useful for species validations.
The thin-spined porcupine, also known as the bristle-spined rat, Chaetomys subspinosus (Olfers, 1818), the only member of its genus, figures among Brazilian endangered species. In addition to being threatened, it is poorly known, and even its taxonomic status at the family level has long been controversial. The genus Chaetomys was originally regarded as a porcupine in the family Erethizontidae, but some authors classified it as a spiny-rat in the family Echimyidae. Although the dispute seems to be settled in favor of the erethizontid advocates, further discussion of its affinities should be based on a phylogenetic framework. In the present study, we used nucleotide-sequence data from the complete mitochondrial cytochrome b gene and karyotypic information to address this issue. Our molecular analyses included one individual of Chaetomys subspinosus from the state of Bahia in northeastern Brazil, and other hystricognaths. All topologies recovered in our molecular phylogenetic analyses strongly supported Chaetomys subspinosus as a sister clade of the erethizontids. Cytogenetically, Chaetomys subspinosus showed 2n = 52 and FN = 76. Although the sexual pair could not be identified, we assumed that the X chromosome is biarmed. The karyotype included 13 large to medium metacentric and submetacentric chromosome pairs, one small subtelocentric pair, and 12 small acrocentric pairs. The subtelocentric pair 14 had a terminal secondary constriction in the short arm, corresponding to the nucleolar organizer region (Ag-NOR), similar to the erethizontid Sphiggurus villosus, 2n = 42 and FN = 76, and different from the echimyids, in which the secondary constriction is interstitial. Both molecular phylogenies and karyotypical evidence indicated that Chaetomys is closely related to the Erethizontidae rather than to the Echimyidae, although in a basal position relative to the rest of the Erethizontidae. The high levels of molecular and morphological divergence suggest that Chaetomys belongs to an early radiation of the Erethizontidae that may have occurred in the Early Miocene, and should be assigned to its own subfamily, the Chaetomyinae.
Bats of the family Phyllostomidae show a unique diversity in feeding specializations. This taxon includes species that are highly specialized on insects, blood, small vertebrates, fruits or nectar, and pollen. Feeding specialization is accompanied by morphological, physiological and behavioural adaptations. Several attempts were made to resolve the phylogenetic relationships within this family in order to reconstruct the evolutionary transitions accompanied by nutritional specialization. Nevertheless, the evolution of nectarivory remained equivocal. Phylogenetic reconstructions, based on a concatenated nuclear-and mitochondrial data set, revealed a paraphyletic relationship of nectarivorous phyllostomid bats. Our phylogenetic reconstructions indicate that the nectarivorous genera Lonchophylla and Lionycteris are closer related to mainly frugivorous phyllostomids of the subfamilies Rhinophyllinae, Stenodermatinae, Carolliinae, and the insectivorous Glyphonycterinae rather than to nectarivorous bats of the Glossophaginae. This suggests an independent origin of morphological adaptations to a nectarivorous lifestyle within Lonchophyllinae and Glossophaginae. Molecular clock analysis revealed a relatively short time frame of about ten million years for the divergence of subfamilies. Our study provides strong support for diphyly of nectarivorous phyllostomids. This is remarkable, since their morphological adaptations to nutrition, like elongated rostrums and tongues, reduced teeth and the ability to use hovering flight while ingestion, closely resemble each other. However, more precise examinations of their tongues (e.g. type and structure of papillae and muscular innervation) revealed levels of difference in line with an independent evolution of nectarivory in these bats.
Sampling locations of the T. muticellus (and T. souffia; pop. 22) populations analyzed in this study.
The vairone in Italy. (a) Map of northern and Central Italy showing the main river systems, the ichthyogeographic districts and the sampling sites. Numbers (and colors) of the sampling sites correspond to Table 1 and Fig. 2. Note that population 22 consisted of specimens of Telestes souffia, the sister species to T. muticellus, and was excluded from some of the analyses. (b) The Italian vairone (Telestes muticellus). (c) During glacial maxima, the Alps were covered by an ice cap and sea levels were markedly lowered; the estuary of the Po River system, which also included rivers that today directly drain into the Adriatic Sea, was situated at the Middle Adriatic pitch. PV...Padano-Venetian ichthyogeographic district; TL...Tuscano-Latium ichthyogeographic district; SI...South Italian ichthyogeographic district.
Population structure in the Italian vairone. (a) Unrooted neighbor joining population tree based on Cavalli-Sforza & Edwards chord distances (DC) calculated with PHYLIP. The populations from Veneto (pops. 1 and 2), the Po River system (pops. 3 to 19) and Central Liguria (pops. 27 to 29) form a clade that also includes the populations from West Liguria (pops. 23 to 26); the two populations from the Middle Adriatic basins (pops. 20 and 21) form a well supported monophyletic group, just as the populations from East Liguria (pops. 30 to 33) and from Tuscany (pops. 34 to 39). Numbers above the branches correspond to bootstrap values (10000 pseudo-replicates over individuals). (b) Results from the population assignment test with STRUCTURE (K = 5). Again, the individuals from Veneto, the Po River system and Central Liguria are clustered together, their genomes being admixed of two distinct genotype classes. Individuals from the Middle Adriatic basins and from West Liguria fall into two respective clusters; individuals from East Liguria and from Tuscany form a single cluster.
Owing to its independence from the main Central European drainage systems, the Italian freshwater fauna is characterized by a high degree of endemicity. Three main ichthyogeographic districts have been proposed in Italy. Yet, the validity of these regions has not been confirmed by phylogenetic and population genetic analyses and a phylogeographic scenario for Italy's primary freshwater fish fauna is still lacking. Here, we investigate the phylogeography of the Italian vairone (Telestes muticellus). We sampled 38 populations representing the species' entire distribution range and covering all relevant drainage systems, and genotyped 509 individuals at eight variable microsatellite loci. Applying various population genetic analyses, we identify five distinct groups of populations that are only partly in agreement with the proposed ichthyogeographic districts. Our group I, which is formed by specimens from Veneto and the Po River system draining into the Adriatic Sea, corresponds to the Padano-Venetian ichthyogeographic district (PV), except for two Middle Adriatic drainages, which we identify as a separate group (III). The Tuscano-Latium district (TL) is equivalent to our group V. A more complex picture emerges for the Ligurian drainages: populations from Central Liguria belong to group I, while populations from West (group II) and East Liguria (group IV) form their own groups, albeit with affinities to PV and TL, respectively. We propose a phylogeographic scenario for T. muticellus in which an initial T. muticellus stock became isolated from the 'Alpine' clade and survived the various glaciation cycles in several refugia. These were situated in the Upper Adriatic (groups I and II), the Middle Adriatic (group III), (East) Liguria (group IV) and Tuscano-Latium (group V). The population structure in the vairone is, in principal, in agreement with the two main ichthyogeographic districts (PV and TL), except for the two populations in the Middle Adriatic, which we identify as additional major "district".
Large pelagic fishes are generally thought to have little population genetic structuring based on their cosmopolitan distribution, large population sizes and high dispersal capacities. However, gene flow can be influenced by ecological (e.g. homing behaviour) and physical (e.g. present-day ocean currents, past changes in sea temperature and levels) factors. In this regard, Atlantic bigeye tuna shows an interesting genetic structuring pattern with two highly divergent mitochondrial clades (Clades I and II), which are assumed to have been originated during the last Pleistocene glacial maxima. We assess genetic structure patterns of Atlantic bigeye tuna at the nuclear level, and compare them with mitochondrial evidence. We examined allele size variation of nine microsatellite loci in 380 individuals from the Gulf of Guinea, Canary, Azores, Canada, Indian Ocean, and Pacific Ocean. To investigate temporal stability of genetic structure, three Atlantic Ocean sites were re-sampled a second year. Hierarchical AMOVA tests, RST pairwise comparisons, isolation by distance (Mantel) tests, Bayesian clustering analyses, and coalescence-based migration rate inferences supported unrestricted gene flow within the Atlantic Ocean at the nuclear level, and therefore interbreeding between individuals belonging to both mitochondrial clades. Moreover, departures from HWE in several loci were inferred for the samples of Guinea, and attributed to a Wahlund effect supporting the role of this region as a spawning and nursery area. Our microsatellite data supported a single worldwide panmictic unit for bigeye tunas. Despite the strong Agulhas Current, immigration rates seem to be higher from the Atlantic Ocean into the Indo-Pacific Ocean, but the actual number of individuals moving per generation is relatively low compared to the large population sizes inhabiting each ocean basin. Lack of congruence between mt and nuclear evidences, which is also found in other species, most likely reflects past events of isolation and secondary contact. Given the inferred relatively low number of immigrants per generation around the Cape of Good Hope, the proportions of the mitochondrial clades in the different oceans may keep stable, and it seems plausible that the presence of individuals belonging to the mt Clade I in the Atlantic Ocean may be due to extensive migrations that predated the last glaciation.
Mesoamerica is one of the world's most complex biogeographical regions, mostly due to its complex geological history. This complexity has led to interesting biogeographical processes that have resulted in the current diversity and distribution of fauna in the region. The fish genus Astyanax represents a useful model to assess biogeographical hypotheses due to it being one of the most diverse and widely distributed freshwater fish species in the New World. We used mitochondrial and nuclear DNA to evaluate phylogenetic relationships within the genus in Mesoamerica, and to develop historical biogeographical hypotheses to explain its current distribution. Analysis of the entire mitochondrial cytochrome b (Cytb) gene in 208 individuals from 147 localities and of a subset of individuals for three mitochondrial genes (Cytb, 16 S, and COI) and a single nuclear gene (RAG1) yielded similar topologies, recovering six major groups with significant phylogeographic structure. Populations from North America and Upper Central America formed a monophyletic group, while Middle Central America showed evidence of rapid radiation with incompletely resolved relationships. Lower Central America lineages showed a fragmented structure, with geographically restricted taxa showing high levels of molecular divergence. All Bramocharax samples grouped with their sympatric Astyanax lineages (in some cases even with allopatric Astyanax populations), with less than 1% divergence between them. These results suggest a homoplasic nature to the trophic specializations associated with Bramocharax ecomorphs, which seem to have arisen independently in different Astyanax lineages. We observed higher taxonomic diversity compared to previous phylogenetic studies of the Astyanax genus. Colonization of Mesoamerica by Astyanax before the final closure of the Isthmus of Panama (3.3 Mya) explains the deep level of divergence detected in Lower Central America. The colonization of Upper Mesoamerica apparently occurred by two independent routes, with lineage turnover over a large part of the region. Our results support multiple, independent origins of morphological traits in Astyanax, whereby the morphotype associated with Bramocharax represents a recurrent trophic adaptation. Molecular clock estimates indicate that Astyanax was present in Mesoamerica during the Miocene (approximately 8 Mya), which implies the existence of an incipient land-bridge connecting South America and Central America before the final closure of the Isthmus of Panama (approximately 3.3 Mya).
Studies of the phylogeography of Mexican species are steadily revealing genetic patterns shared by different species, which will help to unravel the complex biogeographic history of the region. Campostoma ornatum is a freshwater fish endemic to montane and semiarid regions in northwest Mexico and southern Arizona. Its wide range of distribution and the previously observed morphological differentiation between populations in different watersheds make this species a useful model to investigate the biogeographic role of the Sierra Madre Occidental and to disentangle the actions of Pliocene tecto-volcanic processes vs Quaternary climatic change. Our phylogeographic study was based on DNA sequences from one mitochondrial gene (cytb, 1110 bp, n=285) and two nuclear gene regions (S7 and RAG1, 1822 bp in total, n=56 and 43, respectively) obtained from 18 to 29 localities, in addition to a morphological survey covering the entire distribution area. Such a dataset allowed us to assess whether any of the populations/lineages sampled deserve to be categorised as an evolutionarily significant unit. We found two morphologically and genetically well-differentiated groups within C. ornatum. One is located in the northern river drainages (Yaqui, Mayo, Fuerte, Sonora, Casas Grandes, Santa Clara and Conchos) and another one is found in the southern drainages (Nazas, Aguanaval and Piaxtla). The split between these two lineages took place about 3.9 Mya (CI=2.1-5.9). Within the northern lineage, there was strong and significant inter-basin genetic differentiation and also several secondary dispersal episodes whit gene homogenization between drainages. Interestingly, three divergent mitochondrial lineages were found in sympatry in two northern localities from the Yaqui river basin. Our results indicate that there was isolation between the northern and southern phylogroups since the Pliocene, which was related to the formation of the ancient Nazas River paleosystem, where the southern group originated. Within groups, a complex reticulate biogeographic history for C. ornatum populations emerges, following the taxon pulse theory and mainly related with Pliocene tecto-volcanic processes. In the northern group, several events of vicariance promoted by river or drainage isolation episodes were found, but within both groups, the phylogeographic patterns suggest the occurrence of several events of river capture and fauna interchange. The Yaqui River supports the most diverse populations of C. ornatum, with several events of dispersal and isolation within the basin. Based on our genetic results, we defined three ESUs within C. ornatum as a first attempt to promote the conservation of the evolutionary processes determining the genetic diversity of this species. They will likely be revealed as a valuable tool for freshwater conservation policies in northwest Mexico, where many environmental problems concerning the use of water have rapidly arisen in recent decades.
Newly defined subfamilies within glycoside hydrolase family GH5
Characterized carbohydrate-active enzymes of family GH5 not yet classified into subfamilies
Phylogenetic tree of family GH5. In this circular phylogram, the branches corresponding to subfamilies 1–53 are shown in color and the subfamily numbers are indicated next to the exterior color circle. The branches corresponding to sequences not included into subfamilies are in black. A detailed version of this tree is found in Additional file
1: Figure S1.
Examples of modular GH5 proteins. (a) Diverse modular arrangements of putative monofunctional modular enzymes from subfamily GH5_8. (b) Same for putative bifunctional GH5 enzymes containing a subfamily GH5_8 module. (c) Other putative bifunctional enzymes containing at least a single GH5 module. (d) Selected examples of proteins containing GH5 modules having lost one or more catalytic residues. For a given protein, each GH5 module is identified by a number of fields separated by “|” indicating: (i) the organism, with 3 letters for the genre and either 5 letters for the species or full strain code; (ii) the GenBank protein accession; (iii) if attributed, the subfamily number or other information; (iv) EC numbers if available. These individual tags are analogous to what is found in Additional file
1: Figure S1. The module types and other protein segments present are: GHx_y – glycoside hydrolase family x subfamily y (pink); CEx – carbohydrate esterase module of family x (light brown); Cip21 – chitin-binding protein type 21 module with putative carbohydrate oxidative cleaving activity, formerly CBM33 (dark gray); CBMx – carbohydrate binding modules of family x (light green); FN3 – fibronectin type III modules (dark green); DOC – cellulosomal dockerin modules (light violet); EXPN – expansin modules (dark purple); signal peptides (purple); transmembrane segments (yellow); linkers (light blue); other regions (light grey).
BackgroundThe large Glycoside Hydrolase family 5 (GH5) groups together a wide range of enzymes acting on β-linked oligo- and polysaccharides, and glycoconjugates from a large spectrum of organisms. The long and complex evolution of this family of enzymes and its broad sequence diversity limits functional prediction. With the objective of improving the differentiation of enzyme specificities in a knowledge-based context, and to obtain new evolutionary insights, we present here a new, robust subfamily classification of family GH5.ResultsAbout 80% of the current sequences were assigned into 51 subfamilies in a global analysis of all publicly available GH5 sequences and associated biochemical data. Examination of subfamilies with catalytically-active members revealed that one third are monospecific (containing a single enzyme activity), although new functions may be discovered with biochemical characterization in the future. Furthermore, twenty subfamilies presently have no characterization whatsoever and many others have only limited structural and biochemical data. Mapping of functional knowledge onto the GH5 phylogenetic tree revealed that the sequence space of this historical and industrially important family is far from well dispersed, highlighting targets in need of further study. The analysis also uncovered a number of GH5 proteins which have lost their catalytic machinery, indicating evolution towards novel functions.ConclusionOverall, the subfamily division of GH5 provides an actively curated resource for large-scale protein sequence annotation for glycogenomics; the subfamily assignments are openly accessible via the Carbohydrate-Active Enzyme database at
Tectonic, volcanic and climatic events that produce changes in hydrographic systems are the main causes of diversification and speciation of freshwater fishes. Elucidate the evolutionary history of freshwater fishes permits to infer theories on the biotic and geological evolution of a region, which can further be applied to understand processes of population divergence, speciation and for conservation purposes. The freshwater ecosystems in Central Mexico are characterized by their genesis dynamism, destruction, and compartmentalization induced by intense geologic activity and climatic changes since the early Miocene. The endangered goodeid Zoogoneticus quitzeoensis is widely distributed across Central México, thus making it a good model for phylogeographic analyses in this area. We addressed the phylogeography, evolutionary history and genetic structure of populations of Z. quitzeoensis through a sequential approach, based on both microsatellite and mitochondrial cytochrome b sequences. Most haplotypes were private to particular locations. All the populations analysed showed a remarkable number of haplotypes. The level of gene diversity within populations was Hd = 0.987 (0.714 - 1.00). However, in general the nucleotide diversity was low, pi = 0.0173 (0.0015 - 0.0049). Significant genetic structure was found among populations at the mitochondrial and nuclear level (PhiST = 0.836 and FST = 0.262, respectively). We distinguished two well-defined mitochondrial lineages that were separated ca. 3.3 million years ago (Mya). The time since expansion was ca. 1.5 x 10(6) years ago for Lineage I and ca. 860,000 years ago for Lineage II. Also, genetic patterns of differentiation, between and within lineages, are described at different historical timescales. Our mtDNA data indicates that the evolution of the different genetic groups is more related to ancient geological and climatic events (Middle Pliocene, ca. 3.3 Mya) than to the current hydrographic configuration of the basins. In general, mitochondrial and nuclear data supported the same relationships between populations, with the exception of some reduced populations in highly polluted basins (Lower Lerma River), where the effects of genetic drift are suggested by the different analyses at the nuclear and mitochondrial level. Further, our findings are of special interest for the conservation of this endangered species.
Estimated number of profiles K under the CAT-GTR+Γ mixture model. The frequencies of the value (numbers of different profiles K), as estimated through the MCMC runs at the stationary stage for both the 110-taxa (green) and the 88-taxa (blue) datasets.
Predicted 18S rRNA secondary structure of a divergent aplousobranch sequence (Diplosoma ooru). New predicted structures unique to such divergent Aplousobranchia species (and absent in the conserved Pycnoclavella aff. detorta and Clavelina meridionalis sequences) are boxed in red. Red dotted lines indicate additional loop regions where major elongations occurred in other divergent aplousobranchs.
Analysis of base-composition heterogeneity. Principal component analysis (PCA) of the base composition of 18S rRNA from the 110-taxa dataset considering all nucleotide sites. The graph shows the first two principal components (PC), which contribute 96% and 2% of the total variance, respectively. The main component represents the variance along the AT versus GC axis, with the AT-rich Appendicularia, and the GC-rich Aplousobranchia at the two extremes.
Phylogeny of tunicates inferred from the complete 18S rRNA dataset (110 taxa and 1373 sites). Bayesian majority-rule consensus tree obtained under the CAT-GTR+Γ mixture model implemented in PhyloBayes. Support values at nodes represent: Bayesian Posterior Probabilities (PP) obtained under: 1. PP1 = CAT-GTR+Γ (PhyloBayes)/2. PP2 = RNA6C+Γ+I and GTR+Γ+I (Phase)/3. PP3 = GTR+Γ+I (MrBayes)/4. BP = Maximum likelihood bootstrap percentages (BP) under GTR+Γ+I (PAUP*). Support values are indicated for the main tunicate clades, and within Aplousobranchia, when PP ≥ 0.95 and BP ≥ 65. Newly sequenced Aplousobranchia species are underlined. Among the Stolidobranchia, a newly obtained sequence from Botryllus schlosseri is marked with an asterisk. The red triangle indicates the evolutionary shift in secondary structure of the 18S rRNA molecule within Aplousobranchia.
Phylogeny of tunicates inferred from a reduced 18S rRNA dataset (88 taxa and 1675 sites). Bayesian majority-rule consensus tree obtained under the CAT-GTR+Γ mixture model implemented in PhyloBayes after exclusion of the fast-evolving Aplousobranchia species. Support values obtained using different reconstruction approaches are indicated at nodes in the following order: Bayesian posterior probabilities (PP) under: 1. PP1 = CAT-GTR+Γ (PhyloBayes)/2. PP2 = RNA6A+Γ+I and TN93+Γ+I (Phase)/3. PP3 = GTR+Γ+I (MrBayes)/and 4. BP = Maximum Likelihood bootstrap percentages (BP) under TN93+Γ+I (PAUP*). Support values are displayed when PP ≥ 0.95 and BP ≥ 65. Dots indicate nodes for which all four reconstruction methods agree and provide PP ≥ 0.95 and BP ≥ 65. Newly obtained sequences are underlined, including an additional one from Botryllus schlosseri marked with an asterisk.
Tunicates have been recently revealed to be the closest living relatives of vertebrates. Yet, with more than 2500 described species, details of their evolutionary history are still obscure. From a molecular point of view, tunicate phylogenetic relationships have been mostly studied based on analyses of 18S rRNA sequences, which indicate several major clades at odds with the traditional class-level arrangements. Nonetheless, substantial uncertainty remains about the phylogenetic relationships and taxonomic status of key groups such as the Aplousobranchia, Appendicularia, and Thaliacea. Thirty new complete 18S rRNA sequences were acquired from previously unsampled tunicate species, with special focus on groups presenting high evolutionary rate. The updated 18S rRNA dataset has been aligned with respect to the constraint on homology imposed by the rRNA secondary structure. A probabilistic framework of phylogenetic reconstruction was adopted to accommodate the particular evolutionary dynamics of this ribosomal marker. Detailed Bayesian analyses were conducted under the non-parametric CAT mixture model accounting for site-specific heterogeneity of the evolutionary process, and under RNA-specific doublet models accommodating the occurrence of compensatory substitutions in stem regions. Our results support the division of tunicates into three major clades: 1) Phlebobranchia + Thaliacea + Aplousobranchia, 2) Appendicularia, and 3) Stolidobranchia, but the position of Appendicularia could not be firmly resolved. Our study additionally reveals that most Aplousobranchia evolve at extremely high rates involving changes in secondary structure of their 18S rRNA, with the exception of the family Clavelinidae, which appears to be slowly evolving. This extreme rate heterogeneity precluded resolving with certainty the exact phylogenetic placement of Aplousobranchia. Finally, the best fitting secondary-structure and CAT-mixture models suggest a sister-group relationship between Salpida and Pyrosomatida within Thaliacea. An updated phylogenetic framework for tunicates is provided based on phylogenetic analyses using the most realistic evolutionary models currently available for ribosomal molecules and an unprecedented taxonomic sampling. Detailed analyses of the 18S rRNA gene allowed a clear definition of the major tunicate groups and revealed contrasting evolutionary dynamics among major lineages. The resolving power of this gene nevertheless appears limited within the clades composed of Phlebobranchia + Thaliacea + Aplousobranchia and Pyuridae + Styelidae, which were delineated as spots of low resolution. These limitations underline the need to develop new nuclear markers in order to further resolve the phylogeny of this keystone group in chordate evolution.
Background Marker gene studies often use short amplicons spanning one or more hypervariable regions from an rRNA gene to interrogate the community structure of uncultured environmental samples. Target regions are chosen for their discriminatory power, but the limited phylogenetic signal of short high¿throughput sequencing reads precludes accurate phylogenetic analysis. This is particularly unfortunate in the study of microscopic eukaryotes where horizontal gene flow is limited and the rRNA gene is expected to accurately reflect the species phylogeny. A promising alternative to full phylogenetic analysis is phylogenetic placement, where a reference phylogeny is inferred using the complete marker gene and iteratively extended with the short sequences from a metagenetic sample under study.ResultsBased on the phylogenetic placement approach we built Séance, a community analysis pipeline focused on the analysis of 18S marker gene data. Séance combines the alignment extension and phylogenetic placement capabilities of the Pagan multiple sequence alignment program with a suite of tools to preprocess, cluster and visualise datasets composed of many samples. We showcase Séance by analysing 454 data from a longitudinal study of intestinal parasite communities in wild rufous mouse lemurs (Microcebus rufus) as well as in simulation. We demonstrate both improved OTU picking at higher levels of sequence similarity for 454 data and show the accuracy of phylogenetic placement to be comparable to maximum likelihood methods for lower numbers of taxa.ConclusionsSéance is an open source community analysis pipeline that provides reference¿based phylogenetic analysis for rRNA marker gene studies. Whilst in this article we focus on studying nematodes using the 18S marker gene, the concepts are generic and reference data for alternative marker genes can be easily created. Séance can be downloaded from
Helical regions correspond with missing data or possible foreign DNA. 1a: Possible chimerical patterns in previously published sequences (yellow bars) match with regions of double stranded DNA forming helical regions (blue bars) in newly generated sequences. Epimenia sp. (Ep_sp) and Helicoradomenia sp. (He_sp) sequences have been aligned to the nucleotide positions (represented by the scale bar) of the sequences from this study, Wirenia argentea (Wi_ar) and Simrothiella margaritacea (Si_ma). The simplified schematic alignments (green bars) show high similarity (>90%) whereas other regions (yellow bars) possess lower similarity and point to contamination issues. BLASTn searches of yellow domains indicate high similarity with polychaetes (PI to III) or cnidarians (C), or resulted in no significant hit, indicated as Helicoradomenia only (H). Detailed BLASTn results are given in additional file 1: Blast result table. Below the scale bar, the double stranded helical regions of 18S sequences are indicated containing 66-82% GC islands. 1b and c: Close-up views of single stranded regions of helix 1 (1b) and helix 2 (1c) of S. margaritacea 18S. These stems have strong adhesive forces probably hampering PCR. Secondary structures were calculated using mfold, applying 72°C. G-C hydrogen bonds are indicated in red.
Secondary structures of double stranded 18S sequences. Stitch Profiles calculated at 90.7°C showing melted and double stranded regions for the three new solenogaster 18S sequences and the caudofoveate Scutopus ventrolineatus. One short helical region in the S. ventrolineatus sequence of approximately 100 bp length and 63% GC content is at the same position as in the solenogaster sequences.
Consensus tree from the bayesian analysis using a time heterogeneous model in phase 2.0. * = indicates full support from the posterior probability (pp: 1.00). The Mollusca (pp: 0.75) and all mollusk classes (pp: 1.00) except Bivalvia were recovered. Solenogastres and Cephalopoda were assigned to different substitution models within this run. The branching pattern assembling taxa with similar sequence composition at node one to four is very robust in all analyses (see also table 2). Inserted table: ML bootstrap support (BS) is given for the nodes 1-4 using three suggested doublet models (S6A, S7B, S16). We used two different coding schemes for unpaired sites to infer the possible influence of base composition to the results. BS values are depicted in the order NT/RY: original nucleotide (NT) coded loops in followed by the results providing a recoded (G, A: R; T, C: Y) loop region.
ML analyses with a modified taxon sampling. A: The heterobranchs are here represented by representatives of the fast evolving Aeolidina, only. All other taxa are identical to the sampling shown in figure 3, but we excluded also Patellogastropoda. B: Same alignment as used in the inference for figure 3 but excluding all other gastropods except Patellogastropoda. Both analyses were conducted on ten different starting trees with a partitioned GTR + and S16 substitution model and enabling the bootstopping criterium in RAxML (-# autoFC).
The 18S rRNA gene is one of the most important molecular markers, used in diverse applications such as molecular phylogenetic analyses and biodiversity screening. The Mollusca is the second largest phylum within the animal kingdom and mollusks show an outstanding high diversity in body plans and ecological adaptations. Although an enormous amount of 18S data is available for higher mollusks, data on some early branching lineages are still limited. Despite of some partial success in obtaining these data from Solenogastres, by some regarded to be the most "basal" mollusks, this taxon still remained problematic due to contamination with food organisms and general amplification difficulties. We report here the first authentic 18S genes of three Solenogastres species (Mollusca), each possessing a unique sequence composition with regions conspicuously rich in guanine and cytosine. For these GC-rich regions we calculated strong secondary structures. The observed high intra-molecular forces hamper standard amplification and appear to increase formation of chimerical sequences caused by contaminating foreign DNAs from potential prey organisms. In our analyses, contamination was avoided by using RNA as a template. Indication for contamination of previously published Solenogastres sequences is presented. Detailed phylogenetic analyses were conducted using RNA specific models that account for compensatory substitutions in stem regions. The extreme morphological diversity of mollusks is mirrored in the molecular 18S data and shows elevated substitution rates mainly in three higher taxa: true limpets (Patellogastropoda), Cephalopoda and Solenogastres. Our phylogenetic tree based on 123 species, including representatives of all mollusk classes, shows limited resolution at the class level but illustrates the pitfalls of artificial groupings formed due to shared biased sequence composition.
Geographical location of Trochoidea geyeri sampling sites. For abbreviations see Table 1.
Sampled populations, abbreviations used, geographical location and number of sampled individuals.
Statistical parsimony network and associated nested design. Statistical parsimony network and associated nested design. Haplotypes are designated by names as defined in table 2. Zeros indicate haplotype states that are necessary intermediates but were not present in the sample. Each line represents a single mutational step connecting two haplotypes. Haplotypes belonging to the same clade level are boxed up to clade level 4-x. Clade level designations are given within each box that contains observed haplotypes. a) Nested cladogram for 16S rDNA and b) for ITS-1 variation.
Geographical distribution of 3 and 4-step clades of 16S rDNA variation and inferred events in Trochoidea geyeri. The font size of the inferences decreases with nesting level.
Past gene-flow estimates among populations inferred from 16S rDNA and ITS-1 in Trochoidea geyeri. Estimates of 4 Nm >= 0.001 are indicated by arrows.
The study of organisms with restricted dispersal abilities and presence in the fossil record is particularly adequate to understand the impact of climate changes on the distribution and genetic structure of species. Trochoidea geyeri (Soós 1926) is a land snail restricted to a patchy, insular distribution in Germany and France. Fossil evidence suggests that current populations of T. geyeri are relicts of a much more widespread distribution during more favourable climatic periods in the Pleistocene. Phylogeographic analysis of the mitochondrial 16S rDNA and nuclear ITS-1 sequence variation was used to infer the history of the remnant populations of T. geyeri. Nested clade analysis for both loci suggested that the origin of the species is in the Provence from where it expanded its range first to Southwest France and subsequently from there to Germany. Estimated divergence times predating the last glacial maximum between 25-17 ka implied that the colonization of the northern part of the current species range occurred during the Pleistocene. We conclude that T. geyeri could quite successfully persist in cryptic refugia during major climatic changes in the past, despite of a restricted capacity of individuals to actively avoid unfavourable conditions.
Background Atlantolacerta andreanskyi is an enigmatic lacertid lizard that, according to the most recent molecular analyses, belongs to the tribe Eremiadini, family Lacertidae. It is a mountain specialist, restricted to areas above 2400 m of the High Atlas Mountains of Morocco with apparently no connection between the different populations. In order to investigate its phylogeography, 92 specimens of A. andreanskyi were analyzed from eight different populations across the distribution range of the species for up to 1108 base pairs of mitochondrial DNA (12S, ND4 and flanking tRNA-His) and 2585 base pairs of nuclear DNA including five loci (PDC, ACM4, C-MOS, RAG1, MC1R). Results The results obtained with both concatenated and coalescent approaches and clustering methods, clearly show that all the populations analyzed present a very high level of genetic differentiation for the mitochondrial markers used and are also generally differentiated at the nuclear level. Conclusions These results indicate that A. andreanskyi is an additional example of a montane species complex.
Alternative hypotheses of the relationships of Ericabatrachus baleensis and its sister groups. The hypotheses are derived from different sources, which were at the time not necessarily presented as trees (see also Table 1). A) Largen [5], B) Dubois [8], C) Scott [9] (based on her Figure 4, the consensus of morphological and molecular analyses and the revised classification in Appendix seven), D) Frost et al. [6], and E) Pyron and Wiens [10].
Phylogenetic relationships of Ericabatrachus baleensis. A) ML tree from the large-scale analysis of Ranoidea. Most frequent placements for Ericabatrachus with the corresponding bootstrap percentages are shown (red arrows). The red square denotes the area where Ericabatrachus joins the tree 99% of the times (see text). B) Close up view of position of Ericabatrachus as the sister of Petropedetes. Support values for each branch correspond to (left) the de novo analysis and (right) the constrained analysis.
Small-scale Bayesian tree under GTR model showing the phylogenetic placement of Ericabatrachus baleensis (in bold). Support values for the nodes correspond to posterior probabilities (left) and non-parametric bootstraps (right). Values with “*” represent maximal support (100%), values lower than 40% are denoted by “-”.
Strict consensus of our small-scale Bayesian tree and Pyron and Wiens’ [10]tree. Both trees were restricted to the common taxa. Polytomies represent relations that were in disagreement between the two trees.
Ericabatrachus baleensis and its reported localities. A) Map showing the Bale Mountains National Park in Ethiopia. B) Close-up of the Bale Mountains National Park showing the geographic position of the type locality (Tulla Negesso) and other sites (circles; squares indicate main human settlements). C) A specimen of E. baleensis found in the recent surveys [4], photograph by MM.
The phylogenetic relationships of many taxa remain poorly known because of a lack of appropriate data and/or analyses. Despite substantial recent advances, amphibian phylogeny remains poorly resolved in many instances. The phylogenetic relationships of the Ethiopian endemic monotypic genus Ericabatrachus has been addressed thus far only with phenotypic data and remains contentious. We obtained fresh samples of the now rare and Critically Endangered Ericabatrachus baleensis and generated DNA sequences for two mitochondrial and four nuclear genes. Analyses of these new data using de novo and constrained-tree phylogenetic reconstructions strongly support a close relationship between Ericabatrachus and Petropedetes, and allow us to reject previously proposed alternative hypotheses of a close relationship with cacosternines or Phrynobatrachus. We discuss the implications of our results for the taxonomy, biogeography and conservation of E. baleensis, and suggest a two-tiered approach to the inclusion and analyses of new data in order to assess the phylogenetic relationships of previously unsampled taxa. Such approaches will be important in the future given the increasing availability of relevant mega-alignments and potential framework phylogenies.
Enterovirus (EV) 71 is one of the common causative agents for hand, foot, and, mouth disease (HFMD). In recent years, the virus caused several outbreaks with high numbers of deaths and severe neurological complications. Despite the importance of these epidemics, several aspects of the evolutionary and epidemiological dynamics, including viral nucleotide variations within and between different outbreaks, rates of change in immune-related structural regions vs. non-structural regions, and forces driving the evolution of EV71, are still not clear. We sequenced four genomic segments, i.e., the 5' untranslated region (UTR), VP1, 2A, and 3C, of 395 EV71 viral strains collected from 1998 to 2003 in Taiwan. The phylogenies derived from different genomic segments revealed different relationships, indicating frequent sequence recombinations as previously noted. In addition to simple recombinations, exchanges of the P1 domain between different species/genotypes of human enterovirus species (HEV)-A were repeatedly observed. Contrasting patterns of polymorphisms and divergences were found between structural (VP1) and non-structural segments (2A and 3C), i.e., the former was less polymorphic within an outbreak but more divergent between different HEV-A species than the latter two. Our computer simulation demonstrated a significant excess of amino acid replacements in the VP1 region implying its possible role in adaptive evolution. Between different epidemic seasons, we observed high viral diversity in the epidemic peaks followed by severe reductions in diversity. Viruses sampled in successive epidemic seasons were not sister to each other, indicating that the annual outbreaks of EV71 were due to genetically distinct lineages. Based on observations of accelerated amino acid changes and frequent exchanges of the P1 domain, we propose that positive selection and subsequent frequent domain shuffling are two important mechanisms for generating new genotypes of HEV-A. Our viral dynamics analysis suggested that the importation of EV71 from surrounding areas likely contributes to local EV71 outbreaks.
Bayesian phylogeny of concatenated proteins from green algae and plants with maximum likelihood branch lengths. Numbers at nodes correspond to bootstrap support from protein maximum likelihood methods ProML (top) and PhyML (bottom). Major groups are labeled to the right. Note that the relationships between M. viride and streptophytes and A. acetabulum and chlorophytes are both recovered and supported by bootstrap. The 'Trebouxiophyte' is a composite of sequences from Chlorella, Prototheca, and Helicosporidium (for which genes come from each taxon, see methods).
Bayesian phylogeny of EFL proteins with maximum likelihood branch lengths. Numbers at nodes correspond to bootstrap support from protein maximum likelihood methods ProML (left) and PhyML (right). Major groups are labeled to the right. Note that the relationships of many of the green algal groups are not well supported, and that they fall into three poorly separated groups.
Bayesian phylogeny of EF-1α proteins with maximum likelihood branch lengths. Numbers at nodes correspond to bootstrap support from protein maximum likelihood methods ProML (left) and PhyML (right). Major groups are labeled to the right. Note that A. acetabulum is sister to the streptophytes and that the charophyceans are sister to plants, in agreement with previous molecular phylogenies of the viridiplantae.
Schematic of evolutionary relationships within the green algal lineage showing distribution of EFL and EF-1α and possible events explaining the distribution. A phylogeny of the Plantae, with the known presence of either EFL or EF-1α in extant lineages plotted at the tips of each branch (EFL is a square containing "L" and EF-1α is an octagon containing an "α"). All known rhodophytes and glaucophytes contain EF-1α, as do most other eukaryotic lineages, so the ancestor in inferred to have contained EF-1α as well (indicated by a gray octagon at the base of the tree). Also plotted is one possible explanation for the current distribution of the two genes in the green lineage, where EFL was gained once in the ancestor of the green viridiplantae (indicated by the green box), and subsequently either EFL or EF-1α lost in several lineages each (indicated by red boxes with indicated which gene would have been lost). Other models to explain this distribution that include multiple origins of EFL are also possible, but are not shown for simplicity. The tree of the green lineage is based on Figure 1 and other analyses [18].
EFL (or elongation factor-like) is a member of the translation superfamily of GTPase proteins. It is restricted to eukaryotes, where it is found in a punctate distribution that is almost mutually exclusive with elongation factor-1 alpha (EF-1alpha). EF-1alpha is a core translation factor previously thought to be essential in eukaryotes, so its relationship to EFL has prompted the suggestion that EFL has spread by horizontal or lateral gene transfer (HGT or LGT) and replaced EF-1alpha multiple times. Among green algae, trebouxiophyceans and chlorophyceans have EFL, but the ulvophycean Acetabularia and the sister group to green algae, land plants, have EF-1alpha. This distribution singles out green algae as a particularly promising group to understand the origin of EFL and the effects of its presence on EF-1alpha. We have sampled all major lineages of green algae for both EFL and EF-1alpha. EFL is unexpectedly broad in its distribution, being found in all green algal lineages (chlorophyceans, trebouxiophyceans, ulvophyceans, prasinophyceans, and mesostigmatophyceans), except charophyceans and the genus Acetabularia. The presence of EFL in the genus Mesostigma and EF-1alpha in Acetabularia are of particular interest, since the opposite is true of all their closest relatives. The phylogeny of EFL is poorly resolved, but the Acetabularia EF-1alpha is clearly related to homologues from land plants and charophyceans, demonstrating that EF-1alpha was present in the common ancestor of the green lineage. The distribution of EFL and EF-1alpha in the green lineage is not consistent with the phylogeny of the organisms, indicating a complex history of both genes. Overall, we suggest that after the introduction of EFL (in the ancestor of green algae or earlier), both genes co-existed in green algal genomes for some time before one or the other was lost on multiple occasions.
EF-1α and EFL homologues isolated/identified in this study
Relative copy numbers of EF-1α and EFL transcripts by qRT PCR
EF-1α phylogeny. The unrooted maximum-likelihood tree was inferred from 79 EF-1α sequences with 400 unambiguously aligned amino acid positions. Bootstrap values less than 70% are not shown except at nodes that are relevant to EF-1α gene evolution in Fungi, diatoms, oomycetes, and Apusomonadida (nodes A to F). The nodes supported by Bayesian posterior probabilities ≥ 0.95 are highlighted by thick lines. Branches leading to the taxa containing both EFL and EF-1α genes are highlighted in red. The lineages comprising both EF-1α-containing and EFL-containing species are highlighted in magenta. The new sequences isolated/identified in this study are indicated by stars.
EFL phylogeny. The unrooted maximum-likelihood tree was inferred from 80 EFL sequences with 407 amino acid positions. Only bootstrap values ≥ 70% are shown. The nodes supported by Bayesian posterior probabilities ≥ 0.95 are highlighted by thick lines. All other details of the figure are as described in the legend to Figure 1.
Scheme for EF-1α/EFL evolution in eukaryotes. A differential loss process from the hypothetical dual-EF-containing ancestor (center; open) produced four descendent types (shaded): (i) EFL-containing descendent (lower left), (ii) EF-1α-containing descendent (upper right), (iii) dual-EF-containing descendent with a transcriptionally suppressed EF-1α (lower right), and (iv) dual-EF-containing descendent with a transcriptionally suppressed EFL gene (upper left). The EF-1α gene is blackened in the descendent shown in lower right, as this gene is functionally reduced and transcriptionally suppressed, which is likely analogous to the hypothetical intermediate that leads to the EFL-containing type that lacks EF-1α. Likewise, the other type of dual-EF-containing descendent (upper left), if exist, bears the re-modeled EFL gene (blackened), and is analogous to the hypothetical intermediate that led to the EF-1α-containing descendants that lack EFL.
Elongation factor-1alpha (EF-1alpha) and elongation factor-like (EFL) proteins are functionally homologous to one another, and are core components of the eukaryotic translation machinery. The patchy distribution of the two elongation factor types across global eukaryotic phylogeny is suggestive of a 'differential loss' hypothesis that assumes that EF-1alpha and EFL were present in the most recent common ancestor of eukaryotes followed by independent differential losses of one of the two factors in the descendant lineages. To date, however, just one diatom and one fungus have been found to have both EF-1alpha and EFL (dual-EF-containing species). In this study, we characterized 35 new EF-1alpha/EFL sequences from phylogenetically diverse eukaryotes. In so doing we identified 11 previously unreported dual-EF-containing species from diverse eukaryote groups including the Stramenopiles, Apusomonadida, Goniomonadida, and Fungi. Phylogenetic analyses suggested vertical inheritance of both genes in each of the dual-EF lineages. In the dual-EF-containing species we identified, the EF-1alpha genes appeared to be highly divergent in sequence and suppressed at the transcriptional level compared to the co-occurring EFL genes. According to the known EF-1alpha/EFL distribution, the differential loss process should have occurred independently in diverse eukaryotic lineages, and more dual-EF-containing species remain unidentified. We predict that dual-EF-containing species retain the divergent EF-1alpha homologues only for a sub-set of the original functions. As the dual-EF-containing species are distantly related to each other, we propose that independent re-modelling of EF-1alpha function took place in multiple branches in the tree of eukaryotes.
Structure of pdtc.
Domain and homology map of MoeB/BR/Z sequences. A. Regions of NA homology between Ps-moeZ and other sequences in this study. Homologous regions between the members of the high-homology group and Ps-moeZ are depicted as solid lines. Regions of Ps-moeZ homology to other strains are shown with dotted lines. The horizontal scale is based on the NA alignments with 0 being the first base of Mle-moeZ. B. Moving average of the ratio of the synonymous to non-synonymous nucleotide substitutions (dS/dN) per codon found among the high homology group using an 18 bp window. Scale is based on the AA alignment with 0 being the first residue of Mle-moeZ. C. Structure of MoeB/BR/Z conceptual proteins. The bars labeled with protein designations indicate the length of each structural class and the inclusion or omission of domains and motifs. ThiF = ThiF family domain; 2X CXXC = MoeB C-terminal domain containing tandem cysteine pairs, MoeBR central domain containing tandem cysteine pairs, or MoeZ central domain with modified regions in place of cysteine pairs; RHOD = Rhodanese-like domain. Locations of CXXC motifs are indicated by vertical arrows. **** = dinucleotide binding motif. (2X CXXC) = modified 2X CXXC domain. Location of E. coli MoeB residue 155 is labeled with substitutions found in each structural class within each bar. PP = Polyproline motif of MoeBR and MoeZ proteins.
Alignment of sequences included in this study. The entire lengths of the MoeB and MoeZdR proteins and all except the rhodanese domains of MoeBR and MoeZ proteins are included. The existence rhondanese-like domains (RHOD) are indicated by lines for the sequences that include them. Positions of CXXC motifs are marked with vertical arrows. Positions of Ec-MoeB active sites residues that participate in adenylation reactions are marked with filled circles. Position 155 of Ec-MoeB is highlighted with a star. The polyprotein motif found in MoeBRs and MoeZs is boxed.
Maximum likelihood, neighbor joining trees of moeZ ThiF domain and associated SSU sequences used in this study. The incongruent placement of P. stutzeri between the trees is highlighted with a dotted line. Confidence estimates are placed near the branches that they apply to.
Pyridine-2,6-bis(thiocarboxylic acid) (pdtc) is a small secreted metabolite that has a high affinity for transition metals, increases iron uptake efficiency by 20% in Pseudomonas stutzeri, has the ability to reduce both soluble and mineral forms of iron, and has antimicrobial activity towards several species of bacteria. Six GenBank sequences code for proteins similar in structure to MoeZ, a P. stutzeri protein necessary for the synthesis of pdtc. Analysis of sequences similar to P. stutzeri MoeZ revealed that it is a member of a superfamily consisting of related but structurally distinct proteins that are members of pathways involved in the transfer of sulfur-containing moieties to metabolites. Members of this family of enzymes are referred to here as MoeB, MoeBR, MoeZ, and MoeZdR. MoeB, the molybdopterin synthase activating enzyme in the molybdopterin cofactor biosynthesis pathway, is the most characterized protein from this family. Remarkably, lengths of greater than 73% nucleic acid homology ranging from 35 to 486 bp exist between Pseudomonas stutzeri moeZ and genomic sequences found in some Mycobacterium, Mesorhizobium, Pseudomonas, Streptomyces, and cyanobacteria species. The phylogenetic relationship among moeZ sequences suggests that P. stutzeri may have acquired moeZ through lateral gene transfer from a donor more closely related to mycobacteria and cyanobacteria than to proteobacteria. The importance of this relationship lies in the fact that pdtc, the product of the P. stutzeri pathway that includes moeZ, has an impressive set of capabilities, some of which could make it a potent pathogenicity factor.
Ecological interaction strength may increase under environmental stress including temperature. How such stress enhances and interacts with parasite selection is almost unknown. We studied the importance of resistance genes of the major histocompatibility complex (MHC) class II in 14 families of three-spined sticklebacks Gasterosteus aculeatus exposed to their natural macroparasites in field enclosures in the extreme summer of 2003. After a mass die-off during the 2003-European heat wave killing 78% of 277 experimental fish, we found strong differences in survival among and within families. In families with higher average parasite load fewer individuals survived. Multivariate analysis revealed that the composition of the infecting parasite fauna was family specific. Within families, individuals with an intermediate number of MHC class IIB sequence variants survived best and had the lowest parasite load among survivors, suggesting a direct functional link between MHC diversity and fitness. The within family MHC effects were, however, small compared to between family effects, suggesting that other genetic components or non-genetic effects were also important. The correlation between parasite load and mortality that we found at both individual and family level might have appeared only in the extraordinary heatwave of 2003. Due to global warming the frequency of extreme climatic events is predicted to increase, which might intensify costs of parasitism and enhance selection on immune genes.
The emergence of the 2009 H1N1 Influenza pandemic followed a multiple reassortment event from viruses originally circulating in swines and humans, but the adaptive nature of this emergence is poorly understood. Here we base our analysis on 1180 complete genomes of H1N1 viruses sampled in North America between 2000 and 2010 in swine and human hosts. We show that while transmission to a human host might require an adaptive phase in the HA and NA antigens, the emergence of the 2009 pandemic was essentially nonadaptive. A more detailed analysis of the NA protein shows that the 2009 pandemic sequence is characterized by novel epitopes and by a particular substitution in loop 150, which is responsible for a nonadaptive structural change tightly associated with the emergence of the pandemic. Because this substitution was not present in the 1918 H1N1 pandemic virus, we posit that the emergence of pandemics is due to epistatic interactions between sites distributed over different segments. Altogether, our results are consistent with population dynamics models that highlight the epistatic and nonadaptive rise of novel epitopes in viral populations, followed by their demise when the resulting virus is too virulent.
The availability of newly sequenced vertebrate genomes, along with more efficient and accurate alignment algorithms, have enabled the expansion of the field of comparative genomics. Large-scale genome rearrangement events modify the order of genes and non-coding conserved regions on chromosomes. While certain large genomic regions have remained intact over much of vertebrate evolution, others appear to be hotspots for genomic breakpoints. The cause of the non-uniformity of breakpoints that occurred during vertebrate evolution is poorly understood. We describe a machine learning method to distinguish genomic regions where breakpoints would be expected to have deleterious effects (called breakpoint-refractory regions) from those where they are expected to be neutral (called breakpoint-susceptible regions). Our predictor is trained using breakpoints that took place along the human lineage since amniote divergence. Based on our predictions, refractory and susceptible regions have very distinctive features. Refractory regions are significantly enriched for conserved non-coding elements as well as for genes involved in development, whereas susceptible regions are enriched for housekeeping genes, likely to have simpler transcriptional regulation. We postulate that long-range transcriptional regulation strongly influences chromosome break fixation. In many regions, the fitness cost of altering the spatial association between long-range regulatory regions and their target genes may be so high that rearrangements are not allowed. Consequently, only a limited, identifiable fraction of the genome is susceptible to genome rearrangements.
The quality of multiple sequence alignments plays an important role in the accuracy of phylogenetic inference. It has been shown that removing ambiguously aligned regions, but also other sources of bias such as highly variable (saturated) characters, can improve the overall performance of many phylogenetic reconstruction methods. A current scientific trend is to build phylogenetic trees from a large number of sequence datasets (semi-)automatically extracted from numerous complete genomes. Because these approaches do not allow a precise manual curation of each dataset, there exists a real need for efficient bioinformatic tools dedicated to this alignment character trimming step. Here is presented a new software, named BMGE (Block Mapping and Gathering with Entropy), that is designed to select regions in a multiple sequence alignment that are suited for phylogenetic inference. For each character, BMGE computes a score closely related to an entropy value. Calculation of these entropy-like scores is weighted with BLOSUM or PAM similarity matrices in order to distinguish among biologically expected and unexpected variability for each aligned character. Sets of contiguous characters with a score above a given threshold are considered as not suited for phylogenetic inference and then removed. Simulation analyses show that the character trimming performed by BMGE produces datasets leading to accurate trees, especially with alignments including distantly-related sequences. BMGE also implements trimming and recoding methods aimed at minimizing phylogeny reconstruction artefacts due to compositional heterogeneity. BMGE is able to perform biologically relevant trimming on a multiple alignment of DNA, codon or amino acid sequences. Java source code and executable are freely available at
Long alpha-helical coiled-coil proteins are involved in diverse organizational and regulatory processes in eukaryotic cells. They provide cables and networks in the cyto- and nucleoskeleton, molecular scaffolds that organize membrane systems and tissues, motors, levers, rotating arms, and possibly springs. Mutations in long coiled-coil proteins have been implemented in a growing number of human diseases. Using the coiled-coil prediction program MultiCoil, we have previously identified all long coiled-coil proteins from the model plant Arabidopsis thaliana and have established a searchable Arabidopsis coiled-coil protein database. Here, we have identified all proteins with long coiled-coil domains from 21 additional fully sequenced genomes. Because regions predicted to form coiled-coils interfere with sequence homology determination, we have developed a sequence comparison and clustering strategy based on masking predicted coiled-coil domains. Comparing and grouping all long coiled-coil proteins from 22 genomes, the kingdom-specificity of coiled-coil protein families was determined. At the same time, a number of proteins with unknown function could be grouped with already characterized proteins from other organisms. MultiCoil predicts proteins with extended coiled-coil domains (more than 250 amino acids) to be largely absent from bacterial genomes, but present in archaea and eukaryotes. The structural maintenance of chromosomes proteins and their relatives are the only long coiled-coil protein family clearly conserved throughout all kingdoms, indicating their ancient nature. Motor proteins, membrane tethering and vesicle transport proteins are the dominant eukaryote-specific long coiled-coil proteins, suggesting that coiled-coil proteins have gained functions in the increasingly complex processes of subcellular infrastructure maintenance and trafficking control of the eukaryotic cell.
Top-cited authors
Alexei J Drummond
  • University of Auckland
Eugene V Koonin
  • National Institutes of Health
Arndt von Haeseler
  • University of Vienna
S. Blair Hedges
  • Pennsylvania State University
Frank T Burbrink
  • American Museum of Natural History