[Show abstract][Hide abstract] ABSTRACT: Acorn worms, also known as enteropneust (literally, 'gut-breathing') hemichordates, are marine invertebrates that share features with echinoderms and chordates. Together, these three phyla comprise the deuterostomes. Here we report the draft genome sequences of two acorn worms, Saccoglossus kowalevskii and Ptychodera flava. By comparing them with diverse bilaterian genomes, we identify shared traits that were probably inherited from the last common deuterostome ancestor, and then explore evolutionary trajectories leading from this ancestor to hemichordates, echinoderms and chordates. The hemichordate genomes exhibit extensive conserved synteny with amphioxus and other bilaterians, and deeply conserved non-coding sequences that are candidates for conserved gene-regulatory elements. Notably, hemichordates possess a deuterostome-specific genomic cluster of four ordered transcription factor genes, the expression of which is associated with the development of pharyngeal 'gill' slits, the foremost morphological innovation of early deuterostomes, and is probably central to their filter-feeding lifestyle. Comparative analysis reveals numerous deuterostome-specific gene novelties, including genes found in deuterostomes and marine microbes, but not other animals. The putative functions of these genes can be linked to physiological, metabolic and developmental specializations of the filter-feeding ancestor.
[Show abstract][Hide abstract] ABSTRACT: Since its release the high quality peach genome sequence has fostered studies on genetic diversity, domestication and crop improvement in Prunus and related species being a central resource for comparative genomics. To improve its chromosome-scale assembly and genome annotation we performed further analyses. Extensive mapping data allowed the improvement of Peach v2.0 assembly in term of fraction of mapped (99.2%) and orientated (97.9%) sequences and correction of misassembly issues (about 12.2 Mb of wrongly located sequences). Accuracy and contiguity of Peach v2.0 were improved as well: 859 SNPs and 1,347 Indels were corrected and 212 gaps were closed. As a result the contiguity of Peach v2.0 improved of 19.5 % with a contig L50 of 255.4 kb (previously 214.2 kb) and a contig N50 of 250 (previously 294). The repeats annotation was improved as well including low copy repeats and the complete sequence and location of 1,157 non autonomous Helitrons. New gene prediction and annotation were improved using transcript assemblies obtained from 2.2 billion of peach RNA seq reads from different tissues and organs. In total 28,804 protein-coding genes were predicted in Peach v2.1 annotation, 940 more than those predicted in Peach v1.0. An increased number of transcripts (49,110 vs 28,689) were observed in Peach v2.1 with an average of 1.7 transcripts per locus. The new peach release with improved assembly and annotation will be a pivotal resource for comparative genomics in plant kingdom and will burst studies for bridging the gap between genomics and breeding in Prunus and related species.
International Plant and Animal Genome Conference XXIII 2015; 11/2015
[Show abstract][Hide abstract] ABSTRACT: Coleoid cephalopods (octopus, squid and cuttlefish) are active, resourceful predators with a rich behavioural repertoire. They have the largest nervous systems among the invertebrates and present other striking morphological innovations including camera-like eyes, prehensile arms, a highly derived early embryogenesis and a remarkably sophisticated adaptive colouration system. To investigate the molecular bases of cephalopod brain and body innovations, we sequenced the genome and multiple transcriptomes of the California two-spot octopus, Octopus bimaculoides. We found no evidence for hypothesized whole-genome duplications in the octopus lineage. The core developmental and neuronal gene repertoire of the octopus is broadly similar to that found across invertebrate bilaterians, except for massive expansions in two gene families previously thought to be uniquely enlarged in vertebrates: the protocadherins, which regulate neuronal development, and the C2H2 superfamily of zinc-finger transcription factors. Extensive messenger RNA editing generates transcript and protein diversity in genes involved in neural excitability, as previously described, as well as in genes participating in a broad range of other cellular functions. We identified hundreds of cephalopod-specific genes, many of which showed elevated expression levels in such specialized structures as the skin, the suckers and the nervous system. Finally, we found evidence for large-scale genomic rearrangements that are closely associated with transposable element expansions. Our analysis suggests that substantial expansion of a handful of gene families, along with extensive remodelling of genome linkage and repetitive content, played a critical role in the evolution of cephalopod morphological innovations, including their large and complex nervous systems.
[Show abstract][Hide abstract] ABSTRACT: Our understanding of vertebrate origins is powerfully informed by comparative morphology, embryology and genomics of chordates, hemichordates and echinoderms, which together make up the deuterostome clade. Striking body-plan differences among these phyla have historically hindered the identification of ancestral morphological features, but recent progress in molecular genetics and embryology has revealed deep similarities in body-axis formation and organization across deuterostomes, at stages before morphological differences develop. These developmental genetic features, along with robust support of pharyngeal gill slits as a shared deuterostome character, provide the foundation for the emergence of chordates.
[Show abstract][Hide abstract] ABSTRACT: Long-range and highly accurate de novo assembly from short-read data is one
of the most pressing challenges in genomics. Recently, it has been shown that
read pairs generated by proximity ligation of DNA in chromatin of living tissue
can address this problem. These data dramatically increase the scaffold
contiguity of assemblies and provide haplotype phasing information. Here, we
describe a simpler approach ("Chicago") based on in vitro reconstituted
chromatin. We generated two Chicago datasets with human DNA and used a new
software pipeline ("HiRise") to construct a highly accurate de novo assembly
and scaffolding of a human genome with scaffold N50 of 30 Mb. We also
demonstrated the utility of Chicago for improving existing assemblies by
re-assembling and scaffolding the genome of the American alligator. With a
single library and one lane of Illumina HiSeq sequencing, we increased the
scaffold N50 of the American alligator from 508 kb to 10 Mb. Our method uses
established molecular biology procedures and can be used to analyze any genome,
as it requires only about 5 micrograms of DNA as the starting material.
[Show abstract][Hide abstract] ABSTRACT: Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0582-8) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract] ABSTRACT: De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous fragments called reads. We study optimized parallelization of the most time-consuming phases of Meraculous, a state of-the-art production assembler. First, we present a new parallel algorithm for k-mer analysis, characterized by intensive communication and I/O requirements, and reduce the memory requirements by 6.93×. Second, we efficiently parallelize de Bruijn graph construction and traversal, which necessitates a distributed hash table and is a key component of most de novo assemblers. We provide a novel algorithm that leverages one-sided communication capabilities of the Unified Parallel C (UPC) to facilitate the requisite fine-grained parallelism and avoidance of data hazards, while analytically proving its scalability properties. Overall results show unprecedented performance and efficient scaling on up to 15,360 cores of a Cray XC30, on human genome as well as the challenging wheat genome, with performance improvement from days to seconds.
[Show abstract][Hide abstract] ABSTRACT: Assembly and functional annotation of Salix purpurea L. chloroplast genome provides a valuable resource for phylogenetics, genomics, and shrub willow feedstock breeding efforts. A subset of paired-end NGS reads were filtered and batch assembled into contigs de novo using Velvet and ABySS (mer=19..51,2; cov=5x), mapped to related chloroplast genomes with Bowtie2, extracted and iteratively re-assembled. Consensus sequences were aligned by MAFFT, ambiguities resolved using GapCloser, and then manually annotated using DOGMA. Transcriptomes were assembled de novo and mapped to consensus gene models (id=95, len_cov=80) and verified by a BLASTN/X query. Transcript copy number variation was analyzed in egdeR and summarized using hierarchical clustering to define modules of co-expression within an intraspecific S. purpurea F1 family and between five tissue-types. The quadripartite structure is an estimated 156,405bp in length comprising of an 84,977bp LSC and 16,582bp SSC, separated by a pair of 27,423bp inverted repeats, containing 90 mRNA (16 introns), 30 tRNA (37 tRNA-Aragorn), and 8 rRNA. Of the RNA-editing sites (qual=99) within 20 CDS, 58% reside in five NADH subunits. The number of editing sites positively correlates with transcript abundance; thus preferentially edited across tissue-type. The most significant differentially expressed transcripts (p <10E-5; logFC >5) between parent (P) and hybrid (H) cytotypes were PS II (H>P), Cytb6f (P>H), and NADH (H>P) subunits psbI, petN, and ndhK, respectively. Comparative analyses of organellar paralogs in the Salicaceae will offer insight into the contrasting patterns of compensatory post-transcriptional editing and a valuable genomic tool for divergence estimates.
Plant and Animal Genome Conference, San Diego, CA; 01/2015
[Show abstract][Hide abstract] ABSTRACT: Traditional metazoan phylogeny classifies the Vertebrata as a subphylum of the phylum Chordata, together with two other subphyla, the Urochordata (Tunicata) and the Cephalochordata. The Chordata, together with the phyla Echinodermata and Hemichordata, comprise a major group, the Deuterostomia. Chordates invariably possess a notochord and a dorsal neural tube. Although the origin and evolution of chordates has been studied for more than a century, few authors have intimately discussed taxonomic ranking of the three chordate groups themselves. Accumulating evidence shows that echinoderms and hemichordates form a clade (the Ambulacraria), and that within the Chordata, cephalochordates diverged first, with tunicates and vertebrates forming a sister group. Chordates share tadpole-type larvae containing a notochord and hollow nerve cord, whereas ambulacrarians have dipleurula-type larvae containing a hydrocoel. We propose that an evolutionary occurrence of tadpole-type larvae is fundamental to understanding mechanisms of chordate origin. Protostomes have now been reclassified into two major taxa, the Ecdysozoa and Lophotrochozoa, whose developmental pathways are characterized by ecdysis and trochophore larvae, respectively. Consistent with this classification, the profound dipleurula versus tadpole larval differences merit a category higher than the phylum. Thus, it is recommended that the Ecdysozoa, Lophotrochozoa, Ambulacraria and Chordata be classified at the superphylum level, with the Chordata further subdivided into three phyla, on the basis of their distinctive characteristics.
Proceedings of the Royal Society B: Biological Sciences 11/2014; 281(1794). DOI:10.1098/rspb.2014.1729 · 5.05 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The process of plant speciation often involves the evolution of divergent ecotypes in response to differences in soil water availability between habitats. While the same set of traits is frequently associated with xeric/mesic ecotype divergence, it is unknown whether those traits evolve independently or if they evolve in tandem as a result of genetic colocalization either by pleiotropy or genetic linkage.
The self-fertilizing C4 grass species Panicum hallii includes two major ecotypes found in xeric (var. hallii) or mesic (var. filipes) habitats. We constructed the first linkage map for P. hallii by genotyping a reduced representation genomic library of an F2 population derived from an intercross of var. hallii and filipes. We then evaluated the genetic architecture of divergence between these ecotypes through quantitative trait locus (QTL) mapping.
Overall, we mapped QTLs for nine morphological traits that are involved in the divergence between the ecotypes. QTLs for five key ecotype-differentiating traits all colocalized to the same region of linkage group five. Leaf physiological traits were less divergent between ecotypes, but we still mapped five physiological QTLs. We also discovered a two-locus Dobzhansky–Muller hybrid incompatibility.
Our study suggests that ecotype-differentiating traits may evolve in tandem as a result of genetic colocalization.
New Phytologist 09/2014; 205(1). DOI:10.1111/nph.13027 · 7.67 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: An ordered draft sequence of the 17-gigabase hexaploid bread wheat (Triticum aestivum) genome has been produced by sequencing isolated chromosome arms. We have annotated 124,201 gene loci distributed nearly
evenly across the homeologous chromosomes and subgenomes. Comparative gene analysis of wheat subgenomes and extant diploid
and tetraploid wheat relatives showed that high sequence similarity and structural conservation are retained, with limited
gene loss, after polyploidization. However, across the genomes there was evidence of dynamic gene gain, loss, and duplication
since the divergence of the wheat lineages. A high degree of transcriptional autonomy and no global dominance was found for
the subgenomes. These insights into the genome biology of a polyploid crop provide a springboard for faster gene isolation,
rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.
[Show abstract][Hide abstract] ABSTRACT: The allohexaploid bread wheat genome consists of three closely related subgenomes
(A, B, and D), but a clear understanding of their phylogenetic history has been lacking.
We used genome assemblies of bread wheat and five diploid relatives to analyze
genome-wide samples of gene trees, as well as to estimate evolutionary relatedness
and divergence times.We show that the A and B genomes diverged from a common
ancestor ~7 million years ago and that these genomes gave rise to the D genome through
homoploid hybrid speciation 1 to 2 million years later. Our findings imply that the
present-day bread wheat genome is a product of multiple rounds of hybrid speciation
(homoploid and polyploid) and lay the foundation for a new framework for understanding
the wheat genome as a multilevel phylogenetic mosaic.
[Show abstract][Hide abstract] ABSTRACT: Eucalypts are the world’s most widely planted hardwood trees. Their outstanding diversity, adaptability and growth have made them a global renewable resource of fibre and energy. We sequenced and assembled >94% of the 640-megabase genome of Eucalyptus grandis. Of 36,376 predicted protein-coding genes, 34% occur in tandem duplications, the largest proportion thus far in plant genomes. Eucalyptus also shows the highest diversity of genes for specialized metabolites such as terpenes that act as chemical defence and provide unique pharmaceutical oils. Genome sequencing of the E. grandis sister species E. globulus and a set of inbred E. grandis tree genomes reveals dynamic genome evolution and hotspots of inbreeding depression. The E. grandis genome is the first reference for the eudicot order Myrtales and is placed here sister to the eurosids. This resource expands our understanding of the unique biology of large woody perennials and provides a powerful tool to accelerate comparative biology, breeding and biotechnology.