[Show abstract][Hide abstract] ABSTRACT: The butterfly family Pieridae comprises approximately 1000 described species placed in 85 genera, but the higher classification has not yet been settled. We used molecular data from eight gene regions (one mitochondrial and seven nuclear protein-coding genes) comprising a total of ~6700 bp from 96 taxa to infer a well-supported phylogenetic hypothesis for the family. Based on this hypothesis, we revise the higher classification for all pierid genera. We resurrect the tribe Teracolini stat. rev. in the subfamily Pierinae to include the genera Teracolus, Pinacopteryx, Gideona, Ixias, Eronia, Colotis and most likely Calopieris. We transfer Hebomoia to the tribe Anthocharidini and assign the previously unplaced genera Belenois and Dixeia to the subtribe Aporiina. Three lineages near the base of Pierinae (Leptosia, Elodina and Nepheronia + Pareronia) remain unplaced. For each of these, we describe and delineate new tribes: Elodinini Braby tribus nova, Leptosiaini Braby tribus nova and Nepheroniini Braby tribus nova. The proposed higher classification is based on well-supported monophyletic groups and is likely to remain stable even with the addition of more data.
[Show abstract][Hide abstract] ABSTRACT: Phylogenomic advances provide more rigorous estimates for the timing of evolutionary divergences than previously available (e.g., Bayesian relaxed-clock estimates with soft fossil constraints). However, because many family-level clades and higher, as well as model species within those clades, have not been included in phylogenomic studies, the literature presents temporal estimates likely harboring substantial errors. Blindly using such dates can substantially retard scientific advancement. We suggest a way forward by conducting analyses that minimize prior assumptions and use large datasets, and demonstrate how using such a phylogenomic approach can lead to significantly more parsimonious conclusions without a good fossil record. We suggest that such an approach calls for research into the biological causes of conflict between molecular and fossil signatures.
Trends in Ecology & Evolution 08/2013; · 15.39 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Transcriptome studies of insect herbivory are still rare, yet studies in model systems have uncovered patterns of transcript regulation that appear to provide insights into how insect herbivores attain polyphagy, such as a general increase in expression breadth and regulation of ribosomal, digestion- and detoxification-related genes. We investigated the potential generality of these emerging patterns, in the Swedish comma, Polygonia c-album, which is a polyphagous, widely-distributed butterfly. Urtica dioica and Ribes uva-crispa are hosts of P. c-album, but Ribes represents a recent evolutionary shift onto a very divergent host. Utilizing the assembled transcriptome for read mapping, we assessed gene expression finding that caterpillar life-history (i.e. 2nd vs. 4th-instar regulation) had a limited influence on gene expression plasticity. In contrast, differential expression in response to host-plant identified genes encoding serine-type endopeptidases, membrane-associated proteins and transporters. Differential regulation of genes involved in nucleic acid binding was also observed suggesting that polyphagy involves large scale transcriptional changes. Additionally, transcripts coding for structural constituents of the cuticle were differentially expressed in caterpillars in response to their diet indicating that the insect cuticle may be a target for plant defence. Our results state that emerging patterns of transcript regulation from model species appear relevant in species when placed in an evolutionary context.
[Show abstract][Hide abstract] ABSTRACT: Oxygen conductance to the tissues determines aerobic metabolic performance in most eukaryotes but has cost/benefit tradeoffs. Here we examine in lowland populations of a butterfly a genetic polymorphism affecting oxygen conductance via the hypoxia-inducible factor (HIF) pathway, which senses intracellular oxygen and controls the development of oxygen delivery networks. Genetically distinct clades of Glanville fritillary (Melitaea cinxia) across a continental scale maintain, at intermediate frequencies, alleles in a metabolic enzyme (succinate dehydrogenase, SDH) that regulates HIF-1α. One Sdhd allele was associated with reduced SDH activity rate, twofold greater cross-sectional area of tracheoles in flight muscle, and better flight performance. Butterflies with less tracheal development had greater post-flight hypoxia signaling, swollen & disrupted mitochondria, and accelerated aging of flight metabolic performance. Allelic associations with metabolic and aging phenotypes were replicated in samples from different clades. Experimentally elevated succinate in pupae increased the abundance of HIF-1α and expression of genes responsive to HIF activation, including tracheal morphogenesis genes. These results indicate that the hypoxia inducible pathway, even in lowland populations, can be an important axis for genetic variation underlying intraspecific differences in oxygen delivery, physiological performance, and life history.
[Show abstract][Hide abstract] ABSTRACT: Exon structure is relatively well conserved among orthologs in several large clades of species (e.g. Mammalia, Diptera, Lepidoptera) across evolutionary distances of up to 80 million years. Thus, it should be straightforward to predict the exon structures in novel species based upon the known exon structures of species that have had their genomes sequenced and well assembled. Being able to predict the exon boundaries in the genes of novel species is important given the quickly growing numbers of transcriptome sequencing projects. CEPiNS is a new pipeline for mining exon boundaries of predicted gene sets from model species and then using this information to identify the exon boundaries in a novel species through codon based alignment. The pipeline uses the freeware SPIDEY, an exon boundary prediction tool, and BLAST (BLASTN, BLASTP, TBLASTX), both of which are part of NCBI's toolkit. CEPiNS provides an important tool to analyze the transcriptome of novel species.
[Show abstract][Hide abstract] ABSTRACT: The macroevolutionary history of the megadiverse insect order Lepidoptera remains little-known, yet coevolutionary dynamics with their angiospermous host plants are thought to have influenced their diversification significantly. We estimate the divergence times of all higher-level lineages of Lepidoptera, including most extant families. We find that the diversification of major lineages in Lepidoptera are approximately equal in age to the crown group of angiosperms and that there appear to have been three significant increases in diversification rates among Lepidoptera over evolutionary time: 1) at the origin of the crown group of Ditrysia about 150 million years ago (mya), 2) at the origin of the stem group of Apoditrysia about 120 mya and finally 3) a spectacular increase at the origin of the stem group of the quadrifid noctuoids about 70 mya. In addition, there appears to be a significant increase in diversification rate in multiple lineages around 90 mya, which is concordant with the radiation of angiosperms. Almost all extant families appear to have begun diversifying soon after the Cretaceous/Paleogene event 65.51 mya.
PLoS ONE 01/2013; 8(11):e80875. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND: Marine fish, such as the Atlantic herring (Clupea harengus), often show a low degree of differentiation over large geographical regions. Despite strong environmental gradients (salinity and temperature) in the Baltic Sea, population genetic studies have shown little genetic differentiation among herring in this area, but some evidence for environmentally-induced selection has been uncovered. The mitochondrial genome is a likely target for selection in this system due to its functional role in metabolism. RESULTS: We sequenced whole mitochondrial genomes for herring from throughout the Baltic region (n=98) in order to investigate evidence for geographical structuring, selection, and associations between genetic and environmental variation. Three well-supported clades that predate the formation of the Baltic Sea were identified, but geographic structuring of this variation was weak (PhiST = 0.036). There was evidence for significant positive selection, particularly in the ND2, ND4 and ND5 genes, and amino acids under significant selection in these genes explained some of the clade formation. Despite uncovering evidence for selection, correlations between genetic diversity or differentiation with environmental factors (temperature, salinity, latitude) were weak. CONCLUSIONS: The results indicate that most of the current mtDNA diversity in herring predates the formation of the Baltic Sea, and that little structuring has evolved since. Thus, fisheries management units in this region cannot be determined on the basis of mtDNA variability. Preliminary evidence for selection underlying clade formation indicates that the NADH complex may be useful for examining adaptation and population structuring at a broader geographical scale.
[Show abstract][Hide abstract] ABSTRACT: The timing of the origin of arthropods in relation to the Cambrian explosion is still controversial, as are the timing of other arthropod macroevolutionary events such as the colonization of land and the evolution of flight. Here we assess the power of a phylogenomic approach to shed light on these major events in the evolutionary history of life on Earth. Analyzing a large phylogenomic dataset (122 taxa, 62 genes) with a Bayesian relaxed molecular clock, we simultaneously reconstructed the phylogenetic relationships and the absolute times of divergences among the arthropods. Simulations were used to test whether our analysis could distinguish between alternative Cambrian explosion scenarios with increasing levels of autocorrelated rate variation. Our analyses support previous phylogenomic hypotheses and simulations indicate a Precambrian origin of the arthropods. Our results provide insights into the three independent colonizations of land by arthropods and suggest that evolution of insect wings happened much earlier than the fossil record indicates, with flight evolving during a period of increasing oxygen levels and impressively large forests. These and other findings provide a foundation for macroevolutionary and comparative genomic study of Arthropoda.
[Show abstract][Hide abstract] ABSTRACT: How well does RNA-Seq data perform for quantitative whole gene expression analysis in the absence of a genome? This is one unanswered question facing the rapidly growing number of researchers studying non-model species. Using Homo sapiens data and resources, we compared the direct mapping of sequencing reads to predicted genes from the genome with mapping to de novo transcriptomes assembled from RNA-Seq data. Gene coverage and expression analysis was further investigated in the non-model context by using increasingly divergent genomic reference species to group assembled contigs by unique genes.
Eight transcriptome sets, composed of varying amounts of Illumina and 454 data, were assembled and assessed. Hybrid 454/Illumina assemblies had the highest transcriptome and individual gene coverage. Quantitative whole gene expression levels were highly similar between using a de novo hybrid assembly and the predicted genes as a scaffold, although mapping to the de novo transcriptome assembly provided data on fewer genes. Using non-target species as reference scaffolds does result in some loss of sequence and expression data, and bias and error increase with evolutionary distance. However, within a 100 million year window these effect sizes are relatively small.
Predicted gene sets from sequenced genomes of related species can provide a powerful method for grouping RNA-Seq reads and annotating contigs. Gene expression results can be produced that are similar to results obtained using gene models derived from a high quality genome, though biased towards conserved genes. Our results demonstrate the power and limitations of conducting RNA-Seq in non-model species.
[Show abstract][Hide abstract] ABSTRACT: Little is known about variation in gene expression that affects life history traits in wild populations of outcrossing species. Here, we analyse heritability of larval development traits and associated variation in gene expression in the Glanville fritillary butterfly (Melitaea cinxia) across three ecologically relevant temperatures. We studied the development of final-instar larvae, which is greatly affected by temperature, and during which stage larvae build up most of the resources for adult life. Larval development time and weight gain varied significantly among families sampled from hundreds of local populations, indicating substantial heritable variation segregating in the large metapopulation. Global gene expression analysis using common garden-reared F2 families revealed that 42% of the >8000 genes surveyed exhibited significant variation among families, 39% of the genes showed significant variation between the temperature treatments, and 18% showed a significant genotype-by-environment interaction. Genes with large family and temperature effects included larval serum protein and cuticle-binding protein genes, and the expression of these genes was closely correlated with the rate of larval development. Significant expression variation in these same categories of genes has previously been reported among adult butterflies originating from newly established versus old local populations, supporting the notion of a life history syndrome put forward based on ecological studies and involving larval development and adult dispersal capacity. These findings suggest that metapopulation dynamics in heterogeneous environments maintain heritable gene expression variation that affects the regulation of life history traits.
[Show abstract][Hide abstract] ABSTRACT: Roche 454 sequencing of the transcriptome has become a standard approach for efficiently obtaining single nucleotide polymorphisms (SNPs) in non-model species. In this chapter, the primary issues facing the development of SNPs from the transcriptome in non-model species are presented: tissue and sampling choices, mRNA preparation, considerations of normalization, pooling and barcoding, how much to sequence, how to assemble the data and assess the assembly, calling transcriptome SNPs, developing these into genomic SNPs, and publishing the work. Discussion also covers the comparison of this approach to RAD-tag sequencing and the potential of using other sequencing platforms for SNP development.
[Show abstract][Hide abstract] ABSTRACT: Aim: Our study provides a description of the mitogenetic structure of alpine butterflies of the Parnassius phoebus complex throughout their Holarctic distribution. Our analyses extend and reassess population history models for alpine butterflies under an explicit calibration of their mitochondrial DNA (mtDNA) substitution rate. Location: Mountain ranges of the Holarctic region. Methods: A fragment (824 bp) of the mitochondrial cytochrome c oxidase subunit I (COI) gene was sequenced in 203 samples (72 locations), and combined with previously available COI sequences (499 samples), to obtain full coverage of the Holarctic distribution of the P. phoebus complex. A global species distribution model (SDM) was calculated by the maximum entropy (Maxent) approach, allowing assignment of samples into geographically consistent 'operational' units. Phylogenetic and coalescent methods were applied to describe the global mitogenetic structure and estimate population genetics parameters. Geological and palaeoecological evidence was used for internal calibration and validation of a COI substitution rate. Results: Eurasian (including Alaskan) and North American populations form two distinct mitochondrial clades. The mitochondrial time to most recent common ancestor (TMRCA) of the North American clade was estimated at less than 125 ka, and the TMRCA of the Eurasian—Alaskan clade at less than 80 ka, except for a single divergent sequence from Mongolia. Pairwise divergence times between all geographical units within each continent date well within the last 100 ka, and most likely, the last 50−10 ka. Main conclusions: In contrast with its currently scattered distribution within each of Eurasia and North America, the mitogenetic structure of the P. phoebus complex in both continents is shallow and weak, and shows no evidence of geographical structure dating back earlier than the last glacial cycle. We argue that mtDNA data are consistent with recent (Würm/Wisconsin) range expansion across each of the two continents and with persistent glacial long-range gene flow which ceased during the Holocene. We propose that P. phoebus may represent a model for Holarctic alpine invertebrates with moderate dispersal abilities in that its genetic structure at a continental scale reflects extensive connectivity during the most recent glacial phases.
Journal of Biogeography 01/2012; · 4.86 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Until recently, read lengths on the Solexa/Illumina system were too short to reliably assemble transcriptomes without a reference sequence, especially for non-model organisms. However, with read lengths up to 100 nucleotides available in the current version, an assembly without reference genome should be possible. For this study we created an EST data set for the common pond snail Radix balthica by Illumina sequencing of a normalized transcriptome. Performance of three different short read assemblers was compared with respect to: the number of contigs, their length, depth of coverage, their quality in various BLAST searches and the alignment to mitochondrial genes.
A single sequencing run of a normalized RNA pool resulted in 16,923,850 paired end reads with median read length of 61 bases. The assemblies generated by VELVET, OASES, and SeqMan NGEN differed in the total number of contigs, contig length, the number and quality of gene hits obtained by BLAST searches against various databases, and contig performance in the mt genome comparison. While VELVET produced the highest overall number of contigs, a large fraction of these were of small size (< 200bp), and gave redundant hits in BLAST searches and the mt genome alignment. The best overall contig performance resulted from the NGEN assembly. It produced the second largest number of contigs, which on average were comparable to the OASES contigs but gave the highest number of gene hits in two out of four BLAST searches against different reference databases. A subsequent meta-assembly of the four contig sets resulted in larger contigs, less redundancy and a higher number of BLAST hits.
Our results document the first de novo transcriptome assembly of a non-model species using Illumina sequencing data. We show that de novo transcriptome assembly using this approach yields results useful for downstream applications, in particular if a meta-assembly of contig sets is used to increase contig quality. These results highlight the ongoing need for improvements in assembly methodology.
[Show abstract][Hide abstract] ABSTRACT: In fragmented landscapes, small populations frequently go extinct and new ones are established with poorly understood consequences for genetic diversity and evolution of life history traits. Here, we apply functional genomic tools to an ecological model system, the well-studied metapopulation of the Glanville fritillary butterfly. We investigate how dispersal and colonization select upon existing genetic variation affecting life history traits by comparing common-garden reared 2-day adult females from new populations with those from established older populations. New-population females had higher expression of abdomen genes involved in egg provisioning and thorax genes involved in the maintenance of flight muscle proteins. Physiological studies confirmed that new-population butterflies have accelerated egg maturation, apparently regulated by higher juvenile hormone titer and angiotensin converting enzyme mRNA, as well as enhanced flight metabolism. Gene expression varied between allelic forms of two metabolic genes (Pgi and Sdhd), which themselves were associated with differences in flight metabolic rate, population age and population growth rate. These results identify likely molecular mechanisms underpinning life history variation that is maintained by extinction-colonization dynamics in metapopulations.
[Show abstract][Hide abstract] ABSTRACT: Transcriptome sequencing provides quick, direct access to the mRNA. With this information, one can design primers for PCR of thousands of different genes, SNP markers, probes for microarrays and qPCR, or just use the sequence data itself in comparative studies. Transcriptome sequencing, while getting cheaper, is still an expensive endeavor, with an examination of data quality and its assembly infrequently performed in depth. Here, we outline many of the important issues we think need consideration when starting a transcriptome sequencing project. We also walk the reader through a detailed analysis of an example transcriptome dataset, highlighting the importance of both within-dataset analysis and comparative inferences. Our hope is that with greater attention focused upon assessing assembly performance, advances in transcriptome assembly will increase as prices continue to drop and new technologies, such as Illumina sequencing, start to be used.
[Show abstract][Hide abstract] ABSTRACT: As advances in next generation sequencing continue to provide increasing access to the genomics -revolution for research systems having few or no genomic resources, transcriptome sequencing will only increase in importance as a fast and direct means of accessing the genes themselves. However, constructing a comprehensive cDNA library for deep sequencing is very difficult, as highly abundant transcripts hamper de novo identification of low-expressed genes, and genes expressed only under very specific conditions will remain elusive. The reduction of variance in gene expression levels to within a tenfold range of differences by cDNA normalization provides an important means of allocating sequencing across a greater fraction of genes, directly translating into a more even coverage across genes. Here, we outline two different normalization methods, addressing many of the important issues we think need consideration when going from RNA isolation to the cDNA material required for sequencing. This will provide coding gene information across thousands of genes from any organism, providing rapid insights into topics such as gene family member identification and genetic variation that may be associated with a studied phenotype.