ArticleLiterature Review

Quantification of insect genome divergence

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The recent sequencing of twelve insect genomes has enabled us to quantify their divergence using synteny conservation and sequence identity of single-copy orthologs. Protein identity correlates well with synteny and is about three times more conserved, an observation consistent with comparisons among vertebrates. The observed distribution of the lengths of synteny blocks follows a power law and differs from the expectations of the currently accepted random breakage model. Our results show that there is only limited selection for conservation of gene order and reveal a few hundred genes, proximity among which seems to be vital.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Yet, previous studies have found extensive conservation of gene order across insects (e.g., Engström et al. 2007). Using protein divergence as a proxy for time, a linear decay of micro-synteny over time has been found in insect genomes (Zdobnov & Bork 2007). ...
... In fact, we did not observe an increase of genome shuffling in eusocial Apocrita. However, contrary to what was previously reported by Zdobnov and colleagues (Zdobnov & Bork 2007), we found a decrease in the rate of synteny loss across divergence times that span more than 240 million years (Fig. 1F). This retention of micro-synteny over large evolutionary distances points to the presence of functional constraints on the preservation of local genomic structures or low rates of nonhomologous recombination and rearrangement. ...
... To investigate Hymenoptera genome evolution on a micro-syntenic level, we utilized the identified single-copy orthologs (SCOs) and the recently published Hymenoptera divergence estimates (Peters et al. 2017). SCOs represent conserved genes that likely evolve under similar constraints (e.g., Ciccarelli et al. 2005) and have consequently been exploited as markers to quantify genome shuffling in insects (e.g., Zdobnov & Bork 2007). Using a custom Perl script (included as File S39), the conservation of micro-synteny was inferred as the fraction of shared SCOs that retain the same neighboring SCO between two species relative to their divergences time (SI II.4.5). ...
Article
Full-text available
The tremendous diversity of Hymenoptera is commonly attributed to the evolution of parasitoidism in the last common ancestor of parasitoid sawflies (Orussidae) and wasp-waisted Hymenoptera (Apocrita). However, Apocrita and Orussidae differ dramatically in their species richness, indicating that the diversification of Apocrita was promoted by additional traits. These traits have remained elusive due to a paucity of sawfly genome sequences, in particular those of parasitoid sawflies. Here we present comparative analyses of draft genomes of the primarily phytophagous sawfly Athalia rosae and the parasitoid sawfly Orussus abietinus. Our analyses revealed that the ancestral hymenopteran genome exhibited traits that were previously considered unique to eusocial Apocrita (e.g., low transposable element content and activity) and a wider gene repertoire than previously thought (e.g., genes for CO2 detection). Moreover, we discovered that Apocrita evolved a significantly larger array of odorant receptors than sawflies, which could be relevant to the remarkable diversification of Apocrita by enabling efficient detection and reliable identification of hosts.
... It has been shown that applying spore formulations of the 28 plant-beneficial bacterium Bacillus amyloliquefaciens does not affect the composition 29 of rhizosphere microbial community (Chowdhury et al., 2015a). An increasing 30 number of farmers are recognizing the need for other avenues for pest control that are 31 not as damaging to the environment and the land. According to a comprehensive study 32 of BCC Research, global markets for biopesticides will grow from USD54.8 billion in 33 2013 to USD83.7 billion to 2019 34 (www.bccresearch.com/market-research/chemicals/biopesticides-chm029e.html). ...
... In addition, an 29 incomplete gene cluster directing immunity against the type B lantibiotic mersacidin 30 was detected (Table 1). In this review we will describe several possibilities offered 31 today by in vitro techniques for enhancing the beneficial action of bioformulations 32 based on B. amyloliquefaciens FZB42, and its close relatives SQR9 and NJN6, 33 isolated by the laboratory of Qirong Chen, Nanjing Agriculture University. ...
... In contrast, a double mutant impaired in non-ribosomal synthesis and 28 bacilysin (RS06 ∆sfp ∆bac) was unable to suppress E. amylovora indicating that the 29 additional inhibitory effect is due to production of bacilysin (Chen et al., 2009b). 30 A similar study using appropriate mutant strains of FZB42 was performed 31 recently, demonstrating that difficidin and bacilysin are also efficient against two 32 different Xanthomonas oryzae pathovars, causative agents of damaging rice diseases 33 (bacterial blight and bacterial leaf streak). Agar diffusion tests performed with several 34 FZB42 mutant strains ( Figure 3) revealed that the inhibitory effect of mutant CH8 35 (∆dfn) deficient in production of difficidin was clearly reduced compared to wild type 36 FZB42. ...
Article
Full-text available
Biocontrol (BC) formulations prepared from plant-growth-promoting bacteria are increasingly applied in sustainable agriculture. Especially inoculants prepared from endospore-forming Bacillus strains have been proven as efficient and environmental-friendly alternative to chemical pesticides due to their long shelf life, which is comparable with that of agrochemicals. However, these formulations of the first generation are sometimes hampered in their action and do not fulfill in each case the expectations of the appliers. In this review we use the well-known plant-associated Bacillus amyloliquefaciens type strain FZB42 as example for the successful application of different techniques offered today by comparative, evolutionary and functional genomics, site-directed mutagenesis and strain construction including marker removal, for paving the way for preparing a novel generation of BC agents.
... The tree is based on core genomes in order to ensure better statistical precision, as 16S-based approaches can give false results [26]. Multiple copies of ribosomal rRNA genes as well as intragenomic variability can also lead to problems [27]; thus, using the core genomes whenever possible for detailed phylogenetic analyses and maximized sequence support is advantageous [28]. ...
... The calculated tree is based on core genomes, that is, the set of orthologous genes found in all genomes instead of the 16S rRNA genes, as in the traditional approach, as the former provides superior results by combining multiple genes [28,30]. ...
Article
Full-text available
Modern biotechnology benefits from the introduction of novel chassis organisms in remedying the limitations of already-established strains. For this, Paracoccus pantotrophus was chosen for in-depth assessment. Its unique broad metabolism and robustness against abiotic stressors make this strain a well-suited chassis candidate. This study set out to comprehensively overview abiotic influences on the growth performance of five P. pantotrophus strains. These data can aid in assessing the suitability of this genus for chassis development by using the type strain as a preliminary model organism. The five P. pantotrophus strains DSM 2944T, DSM 11072, DSM 11073, DSM 11104, and DSM 65 were investigated regarding their growth on various carbon sources and other nutrients. Our data show a high tolerance against osmotic pressure for the type strain with both salts and organic osmolytes. It was further observed that P. pantotrophus prefers organic acids over sugars. All of the tested strains were able to grow on short-chain alkanes, which would make P. pantotrophus a candidate for bioremediation and the upcycling of plastics. In conclusion, we were able to gain insights into several P. pantotrophus strains, which will aid in further introducing this species, or even another species from this genus, as a candidate for future biotechnological processes.
... Beetles and flies share a last common ancestor about 274-285 million years ago (Savard et al. 2006b, Zdobnov andBork 2007). The vast knowledge of the dorsoventral GRN in D. melanogaster (described below), enables a direct comparison of these two genetic networks. ...
... In this "traditional" version of insect phylogeny, beetles (Coleoptera) including Tribolium, would occupy a basal position in relation to Hymenoptera and Diptera. Recently, this traditional phylogenetic tree has been questioned by a phylogenomic approach ( Figure 2B, Savard et al. 2006b;Zdobnov and Bork 2007). According to this new tree Hymenoptera would be basal to Coleoptera and Diptera. ...
Thesis
Dorsoventral (DV) patterning in Drosophila melanogaster is one of the most well-known gene regulatory networks (GRN) in biology. To investigate if this GRN is conserved during insect evolution, functional analysis of TGF-β and Toll pathways in the short-germ beetle Tribolium castaneum was performed. In the first part, the function of several BMP/Dpp extracellular modulators, including the products of Tolloid (Tld) and Twisted-gastrulation/Crossveinless (Tsg-Cv), was investigated in Tribolium via parental RNAi (pRNAi). While Tc-tld pRNAi knock-down decreases embryonic BMP activity, Tc-tsg(cv) knock-down completely abolishes it. These observations are strikingly different from those in Drosophila, where tsg is only required for a subset of Dpp activity. These results suggest that Tsg/Cv-like proteins are essential for BMP signalling in Tribolium. Since duplicated copies of tsg(cv)- and tld-related genes are present in the Drosophila melanogaster genome, duplication followed by sub-functionalization of these modulators might have changed the BMP/Dpp gradient during evolution of the dipteran Drosophila lineage. In the second part, a functional analysis of the Toll pathway was performed. This analysis addressed the question of why the Tribolium Dorsal nuclear gradient is not stable, but rapidly shrinks and disappears, in contrast to the stable Drosophila gradient. Negative feedback accounts for this dynamic behavior: Tc-Dorsal and one of its target genes (Tc-Twist) activate transcription of the I-κB homolog Tc-cactus, which in turn terminates Dorsal function. Despite its transient role, Tc-Dorsal is strictly required to initiate DV polarity, as in Drosophila. However, unlike Drosophila, embryos lacking Tc-Dorsal display a periodic pattern of DV cell fates along the AP axis, indicating that a self-organizing ectodermal patterning system operates independently of mesoderm or maternal DV polarity cues. The presence of self-organizing patterning systems in short-germ insects like Tribolium is in agreement with a regulative type of embryogenesis proposed by classical fragmentation studies on hemimetabolous insects. These results also elucidate how extraembryonic tissues are organized in short-germ embryos, and how patterning information is transmitted from the early embryo to the growth zone. Altogether, the functional analysis of the TGF-β and Toll pathways in Tribolium dorsoventral patterning suggests that extensive changes in this GRN have occurred during insect evolution.
... A phylogenetic tree was constructed with EDGAR version 2.0 [26] from concatenated core genes, which has enhanced phylogenetic signal compared to phylogenies derived from single genes such as 16S rRNA genes [28]. Zdobnov and Bork, (2007) recommended the use of all core genes to reinforce the phylogenetic tree [29]. Each set of orthologous genes was individually aligned with MUSCLE [26] and non-matching parts of the alignment were masked by GBLOCKS prior to concatenation of all core genes. ...
... A phylogenetic tree was constructed with EDGAR version 2.0 [26] from concatenated core genes, which has enhanced phylogenetic signal compared to phylogenies derived from single genes such as 16S rRNA genes [28]. Zdobnov and Bork, (2007) recommended the use of all core genes to reinforce the phylogenetic tree [29]. Each set of orthologous genes was individually aligned with MUSCLE [26] and non-matching parts of the alignment were masked by GBLOCKS prior to concatenation of all core genes. ...
Article
Full-text available
Corynebacterium bovis is an opportunistic bacterial pathogen shown to cause eye and prosthetic joint infections as well as abscesses in humans, mastitis in dairy cattle, and skin disease in laboratory mice and rats. Little is known about the genetic characteristics and genomic diversity of C. bovis because only a single draft genome is available for the species. The overall aim of this study was to sequence and compare the genome of C. bovis isolates obtained from different species, locations, and time points. Whole-genome sequencing was conducted on 20 C. bovis isolates (six human, four bovine, nine mouse and one rat) using the Illumina MiSeq platform and submitted to various comparative analysis tools. Sequencing generated high-quality contigs (over 2.53 Mbp) that were comparable to the only reported assembly using C. bovis DSM 20582T (97.8 ± 0.36% completeness). The number of protein-coding DNA sequences (2,174 ± 12.4) was similar among all isolates. A Corynebacterium genus neighbor-joining tree was created, which revealed Corynebacterium falsenii as the nearest neighbor to C. bovis (95.87% similarity), although the reciprocal comparison shows Corynebacterium jeikeium as closest neighbor to C. falsenii. Interestingly, the average nucleotide identity demonstrated that the C. bovis isolates clustered by host, with human and bovine isolates clustering together, and the mouse and rat isolates forming a separate group. The average number of genomic islands and putative virulence factors were significantly higher (p
... Genomic relatedness between or among the nine L. ruminis strains was assessed for evolutionary relationships using an earlier described phylogenetic tree reconstruction method [35,44]. Here, a core genome phylogeny was inferred using the approach taken by Zbodnov and Bork (2007) [45], wherein orthologous genes in different genomes are identified according to their protein homology predictions. For this, MUSCLE [46] was used to construct multiple genome alignments of the in-common core genes, and subsequent to the concatenation of the alignment blocks, GBLOCKS [47] was used for deleting away the sequence gaps and misaligned sections. ...
... Genomic relatedness between or among the nine L. ruminis strains was assessed for evolutionary relationships using an earlier described phylogenetic tree reconstruction method [35,44]. Here, a core genome phylogeny was inferred using the approach taken by Zbodnov and Bork (2007) [45], wherein orthologous genes in different genomes are identified according to their protein homology predictions. For this, MUSCLE [46] was used to construct multiple genome alignments of the in-common core genes, and subsequent to the concatenation of the alignment blocks, GBLOCKS [47] was used for deleting away the sequence gaps and misaligned sections. ...
Article
Full-text available
As an ecological niche, the mammalian intestine provides the ideal habitat for a variety of bacterial microorganisms. Purportedly, some commensal genera and species offer a beneficial mix of metabolic, protective, and structural processes that help sustain the natural digestive health of the host. Among these sort of gut inhabitants is the Gram-positive lactic acid bacterium Lactobacillus ruminis, a strict anaerobe with both pili and flagella on its cell surface, but also known for being autochthonous (indigenous) to the intestinal environment. Given that the molecular basis of gut autochthony for this species is largely unexplored and unknown, we undertook a study at the genome level to pinpoint some of the adaptive traits behind its colonization behavior. In our pan-genomic probe of L. ruminis, the genomes of nine different strains isolated from human, bovine, porcine, and equine host guts were compiled and compared for in silico analysis. For this, we conducted a geno-phenotypic assessment of protein-coding genes, with an emphasis on those products involved with cell-surface morphology and anaerobic fermentation and respiration. We also categorized and examined the core and accessory genes that define the L. ruminis species and its strains. Here, we made an attempt to identify those genes having ecologically relevant phenotypes that might support or bring about intestinal indigenousness.
... Despite the lack of a D. plexippus linkage map, the ~80% of genes with identifiable homologs in B. mori facilitated mapping of the majority of scaffolds and revealed high levels of microsynteny with strong colinearity in most of the putative chromosomes, again except for the Z (sex) chromosome (Zhan et al. 2011). This comparison found 75% of mapped genes to be located in microsynteny blocks, compared with 75% quantified previously for Anopheles gambiae and Aedes aegypti (Zdobnov and Bork 2007) and 63% for Apis mellifera and Nasonia vitripennis single-copy orthologs (Werren et al. 2010). Employing ~6,000 H. melpomene -B. ...
... This was true for two different measures of the extent of synteny conservation: 1) the proportion of orthologous anchor genes maintained as neighbors, and 2) synteny block lengths measured as the ratio of the number of pairs of maintained neighbors to the total number of anchor genes maintained as neighbors. Pairwise species comparisons using similar measures across 12 insect species quantified their divergences using synteny conservation (% orthologs in synteny) and sequence identity of single-copy orthologs (Zdobnov and Bork 2007). Among the four groups of insects examined, the Lepidoptera exhibit much higher levels of synteny conservation than would be expected given their levels of molecular evolutionary divergence. ...
... The adjacency of orthologous genes in different species provides reliable information to identify orthology relationships, because the comparison of closely related species revealed an extensive, quasi-integral conservation of gene arrangements along chromosomes. 87,88 However, synteny conservation suffers from large interfamily evolutionary distances, asynchronous to the sequence divergence between orthologous genes, as shown for example in both drosophilids 89 and yeast species. 90 Similarity approaches were associated with conservation of synteny 84,91,92 and neighborhood of genes between species 93,94 to reliably detect orthologs in closely related species. ...
... In closely related species, a large part of genes is still found in short syntenic blocks as for example in fly and in yeast. 89,90 However, at large evolutionary distances, it remains very difficult to accurately define conserved segments, due to frequent events of synteny breakage. Consequently, one has to choose an appropriate set of closely related species for the analysis of synteny conservation, and hence for the reconstruction of ancestral genomes. ...
Article
Full-text available
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
... Further studies to unravel the importance of TYLCV CP in virus transmission showed its interaction with a member of the small heat shock protein family (BtHSP16), which was identifi ed using a yeast two-hybrid system screen against TYLCSV CP [ 132 ]. Another study recently demonstrated that another heat shock protein, HSP70, interacts with the TYLCV CP in vivo and in vitro, and membrane feeding with anti-HSP70 antibodies resulted in an increase in TYLCV transmission. ...
... By using the same methods and training sets used for within-species CRM discovery [ 117 , 126 ] but searching the genomes of An. gambiae , T. castaneum , A. mellifera , and N. vitripennis instead of that of D. melanogaster (Fig. 6.3 ), we were able to rapidly almost double the collective number of in vivo validated CRMs for these species and predict some 7000 more [ 131 ]. This is a signifi cant advance given that the genomes of these species are highly diverged-substantially more so than human-to-fi sh for Diptera-to-Hymenoptera, for example [ 132 ]-to the point where alignment of noncoding sequences to the Drosophila genome is for the most part not possible. Successful application of supervised motif-blind CRM discovery therefore suggests that not only is regulatory sequence annotation in diverged insect species an attainable goal but also that it is one that can progress without requiring extensive new experimental data to be generated for each newly sequenced genome. ...
Chapter
Full-text available
The sweetpotato whitefly, Bemisia tabaci, is a devastating cosmopolitan insect pest that inflicts serious damage by direct feeding on plants, secreting honeydew, and vectoring more than 100 plant viruses that belong to different virus genera. The interactions between the whitefly and plant viruses, plants, and environmental factors have been extensively studied. In recent years more than 100,000 expressed sequences tags (ESTs) from the whitefly have been made available to the scientific community by several mass sequencing projects, and a genome sequencing project is underway. Tools for functional analysis of gene expression are being developed for studies in the whitefly. Combining EST and genomic sequences with functional analysis will pave the way for addressing urgent issues in whitefly research and developing better strategies for whitefly control.
... As shown in Figure 10, Hymenoptera was assigned as the sister group to the remaining orders, a result consistent with the viewpoint proposed by many scholars [7][8][9][10][11]. The notion that Mecopterida (Antliophora + Amphiesmenoptera) forms a sister group with Neuropteroidea (Coleopterida + Neuropterida) aligns with the findings of Peters et al. [28], who recently analyzed the transcriptome and morphological data of complete metamorphosis insects. ...
Article
Full-text available
Phylogenetic relationships among Holometabola have been the subject of controversy. The value of the wing base structure in phylogenetic analysis has been demonstrated but remains largely underexplored and scarce in studies of Holometabola. We studied the phylogenetic relationships among Holometabola (excluding Siphonaptera), focusing exclusively on wing base structure. Cladistic assessments were conducted using 53 morphological data points derived from the bases of both the forewing and hindwing. The results of wing base data revealed a sister relationship between Hymenoptera and remaining orders. The sister-group relationships between Strepsiptera and Coleoptera, Mecoptera and Diptera, Trichoptera and Lepidoptera, and Neuropterida and Coleopterida were corroborated. In Neuropterida, our results recovered the sister relationship between Megaloptera and Neuroptera, as well as the monophyly of Megaloptera.
... Simulations and empirical work have suggested the translocation of a few genes per million years [83] and the gradual loss of ancestral gene order over several million years [27]. For example, there is still 99% conservation of synteny in Drosophila species that diverged 35 million years ago, but only approximately 10% conservation between flies and honeybees, which diverged 350 million years ago [84]. Second, studies suggest early onset of functional diversification for some, but not for all genes. ...
... Phylogenetic tree. For comparison of different genomes, a phylogenetic tree was constructed using a slightly modified version proposed by Zdobnov and Bork (2007). The core genome is calculated as described above. ...
Article
Full-text available
Pseudomonas that are associated with plants, often found living as parasites or saprophytes on the surfaces or inside plant species. Such species of Pseudomonas associated with plants may promote growth of plants by eliminating pathogenic microbes thereby synthesizing plant growth stimulating hormones and enhancing disease resistance in plants, biological control of plant pathogens and bioremediation. The present investigation was conducted with an aim to study comparative genomic studies of 14 Pseudomonas strains having biocontrol, PGPR and bioremediation activities keeping P. fluorescens as the reference strain. The study revealed that these strains are somewhat nearly related strains based on the various parameters undertaken, and therefore can be used collectively. With the increasing availability of sequences, the complexity of genome alignment and analysis is growing drastically with which the computational requirements of the EDGAR 2.0 and Mauve 2.3.1 have risen considerably over the past decade which supports an easy, user-friendly interface of evolutionary relationships in terms of gene order thereby gaining new biological insights of differential gene content.
... Phylogenetic tree. For comparison of different genomes, a phylogenetic tree was constructed using a slightly modified version proposed by Zdobnov and Bork (2007). The core genome is calculated as described above. ...
Article
Pseudomonas that are associated with plants, often found living as parasites or saprophytes on the surfaces or inside plant species. Such species of Pseudomonas associated with plants may promote growth of plants by eliminating pathogenic microbes thereby synthesizing plant growth stimulating hormones and enhancing disease resistance in plants, biological control of plant pathogens and bioremediation. The present investigation was conducted with an aim to study comparative genomic studies of 14 Pseudomonas strains having biocontrol, PGPR and bioremediation activities keeping P. fluorescens as the reference strain. The study revealed that these strains are somewhat nearly related strains based on the various parameters undertaken, and therefore can be used collectively. With the increasing availability of sequences, the complexity of genome alignment and analysis is growing drastically with which the computational requirements of the EDGAR 2.0 and Mauve 2.3.1 have risen considerably over the past decade which supports an easy, user-friendly interface of evolutionary relationships in terms of gene order thereby gaining new biological insights of differential gene content.
... 38 In contrast to this echinoid chromosomal stability, we observed an extensive reshuffling of the microsyntenic intrachromosomal gene order, which results in the absence of an observable ''colinear'' gene order visible as linear segments across pairs of homologous chromosomes, as seen when comparing human and mouse genomes ( Figures 3B and 3C). To quantify the rate at which gene collinearity is eroded, we compared the retention of microsynteny with the divergence time for selected sea urchin and vertebrate species ( Figure 3A) 51 and showed in this way that intrachromosomal gene order appears to evolve at a much slower pace in vertebrates than in sea urchins ( Figure 3A). ...
Article
Full-text available
Sea urchins are emblematic models in developmental biology and display several characteristics that set them apart from other deuterostomes. To uncover the genomic cues that may underlie these specificities, we generated a chromosome-scale genome assembly for the sea urchin Paracentrotus lividus and an extensive gene expression and epigenetic profiles of its embryonic development. We found that, unlike vertebrates, sea urchins retained ancestral chromosomal linkages but underwent very fast intrachromosomal gene order mixing. We identified a burst of gene duplication in the echinoid lineage and showed that some of these expanded genes have been recruited in novel structures (water vascular system, Aristotle's lantern, and skeletogenic micromere lineage). Finally, we identified gene-regulatory modules conserved between sea urchins and chordates. Our results suggest that gene-regulatory networks controlling development can be conserved despite extensive gene order rearrangement.
... Chez les diptères, qui sont parmi les ordres d'insectes les plus récemment apparus (Zdobnov et Bork 2007) (Figure 43), la paire d'ailes postérieures est réduite en un petit organe mécanosenseur appelé « haltère ». Contrairement aux ailes, les haltères n'ont pas de rôle direct pour le vol, mais jouent un rôle de gyroscope permettant aux diptères de stabiliser leurs rotations durant le vol. ...
Thesis
Au cours de l'évolution, les êtres vivants ont développé une étonnante diversité morphologique leur permettant de s'adapter à des environnements très variés. L'un des exemples les plus frappants est la radiation des appendices de vol chez les insectes à partir d'ancêtres possédant deux paires d'ailes très similaires. Plusieurs travaux montrent l'importance d'un gène Hox particulier, Ultrabithorax ( Ubx), pour la formation des organes de vol postérieurs. Ceux-ci correspondent à des organes balanciers appelés « haltères » chez les diptères. A l'inverse, la formation des organes de vol antérieurs est considérée comme indépendante des gènes Hox. Au cours de mes travaux de thèse, nous avons remis en question ce modèle, et montré que la protéine Hox Antennapedia (Antp) est non seulement produite da ns le primordium de l'a ile mais éga lement nécessaire à la formation des ailes antér ieures chez la mouche des fruits Drosophila melanogaster. De manière surprenante, la dose de Antp est bien en deçà de celle observée pour Ubx dans le primordium d'haltère, et l'augmentat ion artificielle de cette dose suffit à transformer l'aile en haltère. À l'inverse, la diminution contrôlée de la dose de Ubx induit une transformation de l'haltère en aile. Ces résultats montrent ainsi que ce n'est pas le type de protéine Hox, ici Antp ou Ubx, mais leur dose, qui contrôle le programme développemental de type a ile ou haltère chez la drosophile. L'analyse du rôle du facteur de transcription Homothorax (Hth) montre également que celui-ci est nécessaire respectivement à l'activation de Antp et à la répression de Ubx dans les primordia d'aile et d'haltère, permettant de mettre en place un profil d'expression spécifique. L'observation d'une corrélation entre variation de dose Hox et formation d'ailes identiques ou différentes dans d'autres lignages d'insectes permet de proposer un nouveau modèle dans lequel les modifications du niveau d'expression des gènes Antp et Ubx pourraient être plus largement responsables de la diversification morphologique des organes de vol au cours de l'évolution des insectes. Comment la protéine Ubx parvient à spécifier le programme développemental des haltères chez la drosophile reste encore mal compris au niveau moléculaire. En particulier, il est attendu que son activité dose-dépendante puisse reposer sur l'interaction avec d'autres partenaires transcriptionnels. J'ai donc initié deux approches complémentaires originales pour identifier de nouveaux cofacteurs de Ubx. Si l'une des approches n'a pu aboutir à son terme, l'autre a permis d'identifier un certain nombre de nouveaux partenaires et de révéler une stabilité surprenante du programme développemental de l'haltère.
... As a key feature, they employ gene syntenies to allow for highly sensitive matches (i.e those with little sequence similarity) if de novo genes appear between the same gene pairs in two or more genomes. Using synteny works well in mammals (Jebb et al., 2020;Vakirlis et al., 2020) or in vertebrates in general but gene order is less conserved in taxa such as insects (Zdobnov and Bork, 2007). ...
Preprint
Full-text available
Novel genes are essential for evolutionary innovations and differ substantially even between closely related species. Recently, multiple studies across many taxa have suggested that some novel genes arise de novo, i.e. from previously non-coding DNA. In order to characterise the underlying mutations that allowed de novo gene emergence and their order of occurrence, homologous regions must be detected within non-coding sequences in closely related sister genomes. So far, most studies do not detect non-coding homologs of de novo genes due to inconsistent data and long evolutionary distances separating genomes. Here we overcome these issues by searching for proto-genes, the not-yet fixed precursors of de novo genes that emerged within a single species. We sequenced and assembled genomes with long-read technology and the corresponding transcriptomes from inbred lines of Drosophila melanogaster, derived from seven geographically diverse populations. We found line-specific proto-genes in abundance but few proto- genes shared by lines, suggesting a rapid turnover. Gain and loss of transcription is more frequent than the creation of Open Reading Frames (ORFs), e.g. by forming new START- and STOP-codons. Consequently, the gain of ORFs becomes rate limiting and is frequently the initial step in proto-gene emergence. Furthermore, Transposable Elements (TEs) are major drivers for intra genomic duplications of proto-genes, yet TE insertions are less important for the emergence of proto-genes. However, highly mutable genomic regions around TEs provide new features that enable gene birth. In conclusion, proto- genes have a high birth-death rate, are rapidly purged, but surviving proto-genes spread neutrally through populations and within genomes.
... While these sets of species maintain a reasonable amount of alignable sequence, cross-species CRM prediction has also been demonstrated across more highly sequence-diverged species pairs. Minnoye et al. [63] designed a multi-class neural network-based method, DeepMEL, that when trained on human melanoma ATAC-seq data successfully predicted enhancers for two related but distinct cell types across six different species (human, dog, horse, pig, mouse, zebrafish); the latter pairing begins to approach the level of divergence observed in family-level comparisons among the holometabola [64]. Transcription factor binding site clustering, based on Drosophila melanogaster CRMs, has been used to discover CRMs in other holometabolous insects [65][66][67][68], and the SCRMshaw algorithm (described more fully below) has used Drosophila data to successfully identify CRMs in species as distantly diverged as the Hemiptera e.g., [62,69,70] (H.A. and M.S.H., unpublished data), based on statistical similarities in subsequence (kmer) counts among the CRMs. ...
Article
Full-text available
An ever-growing number of insect genomes is being sequenced across the evolutionary spectrum. Comprehensive annotation of not only genes but also regulatory regions is critical for reaping the full benefits of this sequencing. Driven by developments in sequencing technologies and in both empirical and computational discovery strategies, the past few decades have witnessed dramatic progress in our ability to identify cis-regulatory modules (CRMs), sequences such as enhancers that play a major role in regulating transcription. Nevertheless, providing a timely and comprehensive regulatory annotation of newly sequenced insect genomes is an ongoing challenge. We review here the methods being used to identify CRMs in both model and non-model insect species, and focus on two tools that we have developed, REDfly and SCRMshaw. These resources can be paired together in a powerful combination to facilitate insect regulatory annotation over a broad range of species, with an accuracy equal to or better than that of other state-of-the-art methods.
... A modified version of the pipeline published by [18] was used to construct the phylogenic tree. Alignments of the core gene sets were compiled using MUSCLE [19]. ...
Article
Full-text available
For the last 13 years, the fur industry in Europe has suffered from epidemic spouts of a severe necrotizing pyoderma. It affects all species currently farmed for fur and causes animal welfare problems and significant losses to the farmers. The causative agent of this disease was identified as Arcanobacterium phocae. Previously, this bacterium has been isolated from seals and other marine mammals, apparently causing wound and lung infections. Attempts at antibiotic treatment have been unsuccessful and the current advice on preventing the disease is to cull all animals with clinical signs. This poses an urgent question regarding possible vaccine development, as well as the need for further understanding of the pathogenicity of this organism. This study compared the whole genomes of 42 A. phocae strains isolated from seals, blue foxes, finnraccoons, mink and otter. The sequences were created using the Illumina technology and annotations were done using the RAST pipeline. A phylogenetic analysis identified a clear separation between the seal strains and the fur-animal-derived isolates, but also indicated that the bacterium readily adapts to new environments and host species with reasonable diversity. A pan- and core-genome was created and analyzed for proteins. A further analysis identified several virulence factors as well as multiple putative and secreted proteins of special interest for vaccine development.
... Hexapoda, among animal groups, exhibits greatly diversified spermatozoa. This is likely due to the old origin of the Hexapoda [41,42] and to their short lifespan that allows the accumulation of mutations and leads to faster genetic divergence [43]. Moreover, the genes involved in sexual reproduction tend to diverge faster than those codifying for components of non-reproductive tissues [44]. ...
Article
Full-text available
Centrioles are-widely conserved barrel-shaped organelles present in most organisms. They are indirectly involved in the organization of the cytoplasmic microtubules both in interphase and during the cell division by recruiting the molecules needed for microtubule nucleation. Moreover, the centrioles are required to assemble cilia and flagella by the direct elongation of their microtubule wall. Due to the importance of the cytoplasmic microtubules in several aspects of the cell life, any defect in centriole structure can lead to cell abnormalities that in humans may result in significant diseases. Many aspects of the centriole dynamics and function have been clarified in the last years, but little attention has been paid to the exceptions in centriole structure that occasionally appeared within the animal kingdom. Here, we focused our attention on non-canonical aspects of centriole architecture within the Hexapoda. The Hexapoda is one of the major animal groups and represents a good laboratory in which to examine the evolution and the organization of the centrioles. Although these findings represent obvious exceptions to the established rules of centriole organization, they may contribute to advance our understanding of the formation and the function of these organelles.
... The phylogenetic tree was calculated using a somewhat modified version of the pipeline proposed by Zbodnov and Bork [17]. Alignments of each core gene set are compiled using MUSCLE [18], the numerous resulting multiple alignments were concatenated, and poorly aligned positions were removed using GBLOCKS [19]. ...
Article
Full-text available
Streptococcus halichoeri is an emerging pathogen with a variety of host species and zoonotic potential. It has been isolated from grey seals and other marine mammals as well as from human infections. Beginning in 2010, two concurrent epidemics were identified in Finland, in fur animals and domestic dogs, respectively. The fur animals suffered from a new disease fur animal epidemic necrotic pyoderma (FENP) and the dogs presented with ear infections with poor treatment response. S. halichoeri was isolated in both studies, albeit among other pathogens, indicating a possible role in the disease etiologies. The aim was to find a possible common origin of the fur animal and dog isolates and study the virulence factors to assess pathogenic potential. Isolates from seal, human, dogs, and fur animals were obtained for comparison. The whole genomes were sequenced from 20 different strains using the Illumina MiSeq platform and annotated using an automatic annotation pipeline RAST. The core and pangenomes were formed by comparing the genomes against each other in an all-against-all comparison. A phylogenetic tree was constructed using the genes of the core genome. Virulence factors were assessed using the Virulence Factor Database (VFDB) concentrating on the previously confirmed streptococcal factors. A core genome was formed which encompassed approximately half of the genes in Streptococcus halichoeri. The resulting core was nearly saturated and would not change significantly by adding more genomes. The remaining genes formed the pangenome which was highly variable and would still evolve after additional genomes. The results highlight the great adaptability of this bacterium possibly explaining the ease at which it switches hosts and environments. Virulence factors were also analyzed and were found primarily in the core genome. They represented many classes and functions, but the largest single category was adhesins which again supports the marine origin of this species.
... Trunk/PTTH and Torso seem to have been co-opted from a more ancient role in moulting control (Duncan, Benton et al. 2013, Skelly, Pushparajan et al. 2018. Our data from Nasonia, (a hymenopteran insect, the sister group to the rest of the holometabola (Krauss et al., 2004;Savard et al., 2006;Zdobnov and Bork, 2007)), implies an ancestral role of tsl could have been to ensure VM integrity. ...
Article
Full-text available
Axis specification is a fundamental developmental process. Despite this, the mechanisms by which it is controlled across insect taxa are strikingly different. An excellent example of this is terminal patterning, which in Diptera such as Drosophila melanogaster occurs via the localized activation of the receptor tyrosine kinase Torso. In Hymenoptera, however, the same process appears to be achieved via localized mRNA. How these mechanisms evolved and what they evolved from remains largely unexplored. Here, we show that torso-like, known for its role in Drosophila terminal patterning, is instead required for the integrity of the vitelline membrane in the hymenopteran wasp Nasonia vitripennis. We find that other genes known to be involved in Drosophila terminal patterning, such as torso and Ptth, also do not function in Nasonia embryonic development. These findings extended to orthologues of Drosophila vitelline membrane proteins known to play a role in localizing Torso-like in Drosophila; in Nasonia these are instead required for dorso–ventral patterning, gastrulation and potentially terminal patterning. Our data underscore the importance of the vitelline membrane in insect development, and implies phenotypes caused by knockdown of torso-like must be interpreted in light of its function in the vitelline membrane. In addition, our data imply that the signalling components of the Drosophila terminal patterning systems were co-opted from roles in regulating moulting, and co-option into terminal patterning involved the evolution of a novel interaction with the vitelline membrane protein Torso-like. This article has an associated First Person interview with the first author of the paper.
... The patterns observed from inferred ancestral genome contents were further explored using pairwise species comparison approaches, similar to the quantifications of synteny and sequence conservation among 12 insects (Zdobnov and Bork, 2007). Comparing pairwise molecular evolutionary divergences from the species phylogeny ( Fig. 2A) with synteny quantifications between pairs of species from each of the four clades showed an expected decrease in synteny conservation with increasing evolutionary distances (Fig. 2C). ...
Article
Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for insights into gene expression patterns. Genome wide synteny analysis indicated a high level of macrosynteny in the Lepidoptera. Annotation and analyses were carried out for gene families involved in a wide spectrum of biological processes, including apoptosis, vacuole sorting, growth and development, structures of exoskeleton, egg shells, and muscle, vision, chemosensation, ion channels, signal transduction, neuropeptide signaling, neurotransmitter synthesis and transport, nicotine tolerance, lipid metabolism, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects.
... Syntenic blocks usually cover a large part of genomes of recently diversified lineages. Although syntenic blocks can be an effective means of estimating genomic similarity, the size of the blocks can still be underestimated due to fragmentation and misassembly (Proost et al., 2012;Zdobnov and Bork, 2007). ...
Chapter
Parasitic trematodes (flukes) cause substantial mortality and morbidity in humans. The Chinese liver fluke, Clonorchis sinensis, is one of the most destructive parasitic worms in humans in China, Vietnam, Korea and the Russian Far East. Although C. sinensis infection can be controlled relatively well using anthelmintics, the worm is carcinogenic, inducing cholangiocarcinoma and causing major suffering in ~15 million people in Asia. This chapter provides an account of C. sinensis and clonorchiasis research-covering aspects of biology, epidemiology, pathogenesis and immunity, diagnosis, treatment and control, genetics and genomics. It also describes progress in the area of molecular biology (genetics, genomics, transcriptomics and proteomics) and highlights challenges associated with comparative genomics and population genetics. It then reviews recent advances in the sequencing and characterisation of the mitochondrial and nuclear genomes for a Korean isolate of C. sinensis and summarises salient comparative genomic work and the implications thereof. The chapter concludes by considering how advances in genomic and informatics will enable research on the genetics of C. sinensis and related parasites, as well as the discovery of new fluke-specific intervention targets.
... A phylogenomic tree based on the core-genome was constructed using previously described approaches (Kant et al., 2011). Concisely, a tree was calculated using a slightly adapted version of the pipeline proposed by Zbodnov & Bork (2007), in which predicted protein homologies were used to identify possible genes in each of the different genomes. Here, multiple genome alignments of mutually conserved orthologous genes from the core genome were produced with MUSCLE (Edgar, 2004). ...
Article
Full-text available
Non-aureus staphylococci (NAS) are most commonly isolated from subclinical mastitis. Different NAS species may, however, have diverse effects on the inflammatory response in the udder. We determined the genome sequences of 20 staphylococcal isolates from clinical or subclinical bovine mastitis, belonging to the NAS species Staphylococcus agnetis, S. chromogenes, and S. simulans , and focused on the putative virulence factor genes present in the genomes. For comparison we used our previously published genome sequences of four S. aureus isolates from bovine mastitis. The pan-genome and core genomes of the non-aureus isolates were characterized. After that, putative virulence factor orthologues were searched in silico . We compared the presence of putative virulence factors in the NAS species and S. aureus and evaluated the potential association between bacterial genotype and type of mastitis (clinical vs. subclinical). The NAS isolates had much less virulence gene orthologues than the S. aureus isolates. One third of the virulence genes were detected only in S. aureus . About 100 virulence genes were present in all S. aureus isolates, compared to about 40 to 50 in each NAS isolate. S. simulans differed the most. Several of the virulence genes detected among NAS were harbored only by S. simulans , but it also lacked a number of genes present both in S. agnetis and S. chromogenes . The type of mastitis was not associated with any specific virulence gene profile. It seems that the virulence gene profiles or cumulative number of different virulence genes are not directly associated with the type of mastitis (clinical or subclinical), indicating that host derived factors such as the immune status play a pivotal role in the manifestation of mastitis.
... This approach has greatly increased the number of potential genes available to study by simply requiring cross-detection and extraction of candidate orthologs with bioinformatic tools, rather than labour-intensive targeted amplification plus sequencing. Zdobnov and Bork (2007) found 2032 candidate single-copy genes to be conserved across the twelve-holometabolan insects then studied (seven of which were Drosophila spp.), a massive increase in the scale of data evaluated elsewhere. Subsequently, Bonasio et al. (2010) reported results from phylogenomic analyses of 1032 single-copy genes for a broader taxon sample of eight holometabolan insect genomes. ...
Chapter
Insecta consists of 29 living orders that are not equivalent by any criteria except taxonomic rank (Davis et al. 2010). Insects demonstrate the greatest biodiversity, accounting for over half of all described eukaryotes, approximately 1 million described species (Grimaldi and Engel 2005) and a global total of anywhere between 5 and 10 million species (Gaston 1991; Raven and Yeates 2007). Although lower-end estimates of species numbers are more likely (Mora et al. 2011), around two-thirds of all insects probably remain to be discovered and described (May 2010), vastly outnumbering the total diversity of other better- studied taxonomic groups like vertebrates and vascular plants. The importance of insects for stable ecosystem functioning also cannot be understated. For example, insects are responsible for the breakdown of organic material, animal and human remains, removal of waste, aeration and turnover of soil, and the vital task of pollination for flowering plants. They also include important predators that control numbers of other pest invertebrates or weed plants, and are an essential food source for many birds, fish, reptiles and amphibians. Understanding the impressive numerical and ecological diversity of insects has long been recognized as an important research goal. To achieve this, it is vital toclarify the evolutionary history and ancestral attributes of lineages. Here we will (1) take stock of our current understanding of insect systematics and the role molecular phylogenetics has played, (2) review the taxonomic diversity of transcriptomes and whole genomes in Insecta and its current bias, (3) discuss the ways that NGS technologies can be used to study insect evolution, and (4) propose strategies for selecting future insects to sequence, for example to maximize genomic diversity and resolve important phylogenetic questions that remain in the field of insect systematics.
... In addition, the function of each enhancer tends to exhibit low levels of pleiotropy (Carroll, 2008), resulting in the accumulation of more evolutionary changes in enhancers. These characteristics, along with the faster rate of genome evolution in insects compared with vertebrates (Zdobnov and Bork, 2007), make the identification of insect enhancers a challenging task. ...
Article
Evolution of cis-properties (such as enhancers) often plays an important role in the production of diverse morphology. However, a mechanistic understanding is often limited by the absence of methods to study enhancers in species outside of established model systems. Here, we sought to establish methods to identify and test enhancer activity in the red flour beetle, Tribolium castaneum. To identify possible enhancer regions, we first obtained genome-wide chromatin profiles from various tissues and stages of Tribolium via FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements)-sequencing. Comparison of these profiles revealed a distinct set of open chromatin regions in each tissue and stage. Second, we established the first reporter assay system that works in both Drosophila and Tribolium, using nubbin in the wing and hunchback in the embryo as case studies. Together, these advances will be useful to study the evolution of cis-language and morphological diversity in Tribolium and other insects.
... The patterns observed from inferred ancestral genome contents were further explored using pairwise species comparison approaches, similar to the quantifications of synteny and sequence conservation among 12 insects (Zdobnov and Bork, 2007). Comparing pairwise molecular evolutionary divergences from the species phylogeny ( Fig. 2A) with synteny quantifications between pairs of species from each of the four clades showed an expected decrease in synteny conservation with increasing evolutionary distances (Fig. 2C). ...
... For example, significant size variation among gene families and high rates of gene gain and loss were found in a study involving comparative genomics (Hahn et al. 2007). In addition, a wealth of information regarding gene and genome architecture has also come up from several other comparative genomics studies using these 12 sequenced Drosophila genomes when conducted within and with genomes of other eukaryotes (Michael and Manyuan 1999;Zdobnov and Bork 2007;Seetharam and Stuart 2013;Warnefors et al. 2016). However, genome information available is restricted to certain species group or subgroups belonging to limited geographic regions. ...
Article
Full-text available
Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.
... Possible synteny for insect Phf7 was also examined but none was found, a finding that was not surprising given the greater divergence between insects and overall lack of gene synteny [19,20]. As for G2e3, there is clear synteny around its genomic location from fish to humans (data not shown). ...
Article
Full-text available
PHD finger protein 7 (Phf7) is a male germline specific gene in Drosophila melanogaster that can trigger the male germline sexual fate and regulate spermatogenesis, and its human homologue can rescue fecundity defects in male flies lacking this gene. These findings prompted us to investigate conservation of reproductive strategies through studying the evolutionary origin of this gene. We find that Phf7 is present only in select species including mammals and some insects,whereas the closely related G2/M-phase specific E3 ubiquitin protein ligase (G2e3) is in the genome of most metazoans. Interestingly, phylogenetic analyses showed that vertebrate and insect Phf7 genes did not evolve from a common Phf7 ancestor but rather through independent duplication events from an ancestral G2e3. This is an example of parallel evolution in which a male germline factor evolved at least twice from a pre-existing template to develop new regulatory mechanisms of spermatogenesis. © 2017 The Author(s) Published by the Royal Society. All rights reserved.
... This hinted at a very high selective pressure to maintain gene structures such as splice sites, intron phase, and splicing isoforms (Ast, 2004) during metazoan evolution. The degree of conservation even allowed those characters to be used for phylogenetic inferences (Krauss et al., 2008;Zdobnov and Bork, 2007). ...
Article
Full-text available
Metazoan evolution encompasses a vast evolutionary time scale spanning over 600 million years. Our ability to infer ancestral metazoan characters, both morphological and functional, is limited by our understanding of the nature and evolutionary dynamics of the underlying regulatory networks. Increasing coverage of metazoan genomes enables us to identify the evolutionary changes of the relevant genomic characters such as the loss or gain of coding sequences, gene duplications, micro- and macro-synteny, and non-coding element evolution in different lineages. In this review we describe recent advances in our understanding of ancestral metazoan coding and non-coding features, as deduced from genomic comparisons. Some genomic changes such as innovations in gene and linkage content occur at different rates across metazoan clades, suggesting some level of independence among genomic characters. While their contribution to biological innovation remains largely unclear, we review recent literature about certain genomic changes that do correlate with changes to specific developmental pathways and metazoan innovations. In particular, we discuss the origins of the recently described pharyngeal cluster which is conserved across deuterostome genomes, and highlight different genomic features that have contributed to the evolution of this group. We also assess our current capacity to infer ancestral metazoan states from gene models and comparative genomics tools and elaborate on the future directions of metazoan comparative genomics relevant to evo-devo studies.
... While comparative genomic sequence analysis has furnished tremendous information regarding genetic factors underlying inter-species divergence (Chinwalla et al., 2002;Kaufman et al., 2002;Kirkness et al., 2003;Zdobnov and Bork, 2007;Arensburger et al., 2010;Bonasio et al., 2010;Werren et al., 2010), an increasing number of studies have applied RNA-seq for this purpose, particularly in species whose genome sequences are unavailable. For example, transcriptomic comparisons have been performed between different aphids, A. pisum vs. Sitobion avenae (Wang et al., 2014), whitefly (Bemisia tabaci) species complexes Middle East-Asia Minor 1 vs. Mediterranean (Wang et al., 2011), ranid frogs Rana chensinensis vs. Rana kukunoris , ornamental primrose species Primula poissonii vs. Primula wilsonii (Zhang L. et al., 2013), and fishes, Erythroculter ilishaeformis vs. Danio rerio (Ren et al., 2014). ...
Article
Full-text available
Green peach aphid (Myzus persicae) and pea aphid (Acyrthosiphon pisum) are two phylogenetically closely related agricultural pests. While pea aphid is restricted to Fabaceae, green peach aphid feeds on hundreds of plant species from more than 40 families. Transcriptome comparison could shed light on the genetic factors underlying the difference in host range between the two species. Furthermore, a large scale study contrasting gene expression between immature nymphs and fully developed adult aphids would fill a previous knowledge gap. Here, we obtained transcriptomic sequences of green peach aphid nymphs and adults, respectively, using Illumina sequencing technology. A total of 2244 genes were found to be differentially expressed between the two developmental stages, many of which were associated with detoxification, hormone production, cuticle formation, metabolism, food digestion, and absorption. When searched against publically available pea aphid mRNA sequences, 13,752 unigenes were found to have no homologous counterparts. Interestingly, many of these unigenes that could be annotated in other databases were involved in the “xenobiotics biodegradation and metabolism” pathway, suggesting the two aphids differ in their adaptation to secondary metabolites of host plants. Conversely, 3989 orthologous gene pairs between the two species were subjected to calculations of synonymous and nonsynonymous substitutions, and 148 of the genes potentially evolved in response to positive selection. Some of these genes were predicted to be associated with insect-plant interactions. Our study has revealed certain molecular events related to aphid development, and provided some insight into biological variations in two aphid species, possibly as a result of host plant adaptation.
... 18S rDNA paradoxically supports both previously mentioned and novel Hymenoptera hypotheses depending on alignment strategy and taxon sampling [5,15,22]. Our results constitute the tipping point of the compounding evidence (extensive sample of nuclear genes, fossil evidence, wing characters, and introns of elongation factor 1-alpha) that Hymenoptera are the earliest branching lineage of the holometabolan radiation [14,[37][38][39][40][41]. ...
... The patterns observed from inferred ancestral genome contents were further explored using pairwise species comparison approaches, similar to the quantifications of synteny and sequence conservation among 12 insects (Zdobnov and Bork, 2007). Comparing pairwise molecular evolutionary divergences from the species phylogeny ( Fig. 2A) with synteny quantifications between pairs of species from each of the four clades showed an expected decrease in synteny conservation with increasing evolutionary distances (Fig. 2C). ...
Article
Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for insights into gene expression patterns. Genome wide synteny analysis indicated a high level of macrosynteny in the Lepidoptera. Annotation and analyses were carried out for gene families involved in a wide spectrum of biological processes, including apoptosis, vacuole sorting, growth and development, structures of exoskeleton, egg shells, and muscle, vision, chemosensation, ion channels, signal transduction, neuropeptide signaling, neurotransmitter synthesis and transport, nicotine tolerance, lipid metabolism, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects.
... The patterns observed from inferred ancestral genome contents were further explored using pairwise species comparison approaches, similar to the quantifications of synteny and sequence conservation among 12 insects (Zdobnov and Bork, 2007). Comparing pairwise molecular evolutionary divergences from the species phylogeny ( Fig. 2A) with synteny quantifications between pairs of species from each of the four clades showed an expected decrease in synteny conservation with increasing evolutionary distances (Fig. 2C). ...
Article
Manduca sexta, known as the tobacco hornworm or Carolina sphinx moth, is a lepidopteran insect that is used extensively as a model system for research in insect biochemistry, physiology, neurobiology, development, and immunity. One important benefit of this species as an experimental model is its extremely large size, reaching more than 10 g in the larval stage. M. sexta larvae feed on solanaceous plants and thus must tolerate a substantial challenge from plant allelochemicals, including nicotine. We report the sequence and annotation of the M. sexta genome, and a survey of gene expression in various tissues and developmental stages. The Msex_1.0 genome assembly resulted in a total genome size of 419.4 Mbp. Repetitive sequences accounted for 25.8% of the assembled genome. The official gene set is comprised of 15,451 protein-coding genes, of which 2498 were manually curated. Extensive RNA-seq data from many tissues and developmental stages were used to improve gene models and for insights into gene expression patterns. Genome wide synteny analysis indicated a high level of macrosynteny in the Lepidoptera. Annotation and analyses were carried out for gene families involved in a wide spectrum of biological processes, including apoptosis, vacuole sorting, growth and development, structures of exoskeleton, egg shells, and muscle, vision, chemosensation, ion channels, signal transduction, neuropeptide signaling, neurotransmitter synthesis and transport, nicotine tolerance, lipid metabolism, and immunity. This genome sequence, annotation, and analysis provide an important new resource from a well-studied model insect species and will facilitate further biochemical and mechanistic experimental studies of many biological systems in insects.
... We compared the putative targets of Ubx in Apis and Bombyx to the previously reported lists of targets in Drosophila 7,8 to understand the events downstream of Ubx as focus for molecular evolution leading to haltere specification in dipterans. While only about 15-20% of the putative targets of Ubx in Apis and Bombyx were common to those in Drosophila, a large proportion of these common targets are known to function during wing development in Zdobnov and Bork (2007) 39 . This traces the divergence of 4 major orders of endopterygote insects for nearly 350 million years. ...
Article
Full-text available
In the fruitfly Drosophila melanogaster, the differential development of wing and haltere is dependent on the function of the Hox protein Ultrabithorax (Ubx). Here we compare Ubx-mediated regulation of wing patterning genes between the honeybee, Apis mellifera, the silkmoth, Bombyx mori and Drosophila. Orthologues of Ubx are expressed in the third thoracic segment of Apis and Bombyx, although they make functional hindwings. When over-expressed in transgenic Drosophila, Ubx derived from Apis or Bombyx could suppress wing development, suggesting evolutionary changes at the level of co-factors and/or targets of Ubx. To gain further insights into such events, we identified direct targets of Ubx from Apis and Bombyx by ChIP-seq and compared them with those of Drosophila. While majority of the putative targets of Ubx are species-specific, a considerable number of wing-patterning genes are retained, over the past 300 millions years, as targets in all the three species. Interestingly, many of these are differentially expressed only between wing and haltere in Drosophila but not between forewing and hindwing in Apis or Bombyx. Detailed bioinformatics and experimental validation of enhancer sequences suggest that, perhaps along with other factors, changes in the cis-regulatory sequences of earlier targets contribute to diversity in Ubx function.
... While phylogenetic analyses were not part of the web server in EDGAR 1.0, a phylogenetic tree of all available genomes is now calculated by default for all EDGAR projects. For that purpose, EDGAR 2.0 uses the phylogenetic analysis pipeline developed on the basis of the ideas of Zdobnov et al. (11) which was described in the use case in (7). This pipeline analyzes the phylogenetic relationships between genomes based on the thousands of orthologous genes in the complete core genome. ...
Article
Full-text available
The rapidly increasing availability of microbial genome sequences has led to a growing demand for bioinformatics software tools that support the functional analysis based on the comparison of closely related genomes. By utilizing comparative approaches on gene level it is possible to gain insights into the core genes which represent the set of shared features for a set of organisms under study.Vice versasingleton genes can be identified to elucidate the specific properties of an individual genome. Since initial publication, the EDGAR platform has become one of the most established software tools in the field of comparative genomics. Over the last years, the software has been continuously improved and a large number of new analysis features have been added. For the new version, EDGAR 2.0, the gene orthology estimation approach was newly designed and completely re-implemented. Among other new features, EDGAR 2.0 provides extended phylogenetic analysis features like AAI (Average Amino Acid Identity) and ANI (Average Nucleotide Identity) matrices, genome set size statistics and modernized visualizations like interactive synteny plots or Venn diagrams. Thereby, the software supports a quick and user-friendly survey of evolutionary relationships between microbial genomes and simplifies the process of obtaining new biological insights into their differential gene content. All features are offered to the scientific community via a web-based and therefore platform-independent user interface, which allows easy browsing of precomputed datasets. The web server is accessible athttp://edgar.computational.bio.
... However, one relevant fact is clear: not all animal genomes are equal in their evolutionary behavior, with some genomes evolving and rearranging at much higher rates than others . This is most clearly exemplified by comparisons of synteny across animals, which reveal that some species exhibit high (statistically significant) levels of conserved synteny across large evolutionary timescales [e.g., between cnidarians, chordates (Putnam et al., 2007(Putnam et al., , 2008, some arthropods (Chipman et al., 2014), and lophotrochozoans (Simakov et al., 2013)] whilst other lineages show high rates of rearrangements such that little, if any, conserved synteny can be seen even between members of the same phylum [e.g., tunicates (Denoeud et al., 2010) or some insects (Zdobnov and Bork, 2007)]. Consequently, it is clear that this evolutionary diversity must be taken into account and more homeobox linkage data is required from a taxonomically widespread selection of species in order to distinguish generalities from lineage-specific oddities. ...
Article
Full-text available
The Hox gene cluster has been a major focus in evolutionary developmental biology. This is because of its key role in patterning animal development and widespread examples of changes in Hox genes being linked to the evolution of animal body plans and morphologies. Also, the distinctive organization of the Hox genes into genomic clusters in which the order of the genes along the chromosome corresponds to the order of their activity along the embryo, or during a developmental process, has been a further source of great interest. This is known as collinearity, and it provides a clear link between genome organization and the regulation of genes during development, with distinctive changes marking evolutionary transitions. The Hox genes are not alone, however. The homeobox genes are a large super-class, of which the Hox genes are only a small subset, and an ever-increasing number of further gene clusters besides the Hox are being discovered. This is of great interest because of the potential for such gene clusters to help understand major evolutionary transitions, both in terms of changes to development and morphology as well as evolution of genome organization. However, there is uncertainty in our understanding of homeobox gene cluster evolution at present. This relates to our still rudimentary understanding of the dynamics of genome rearrangements and evolution over the evolutionary timescales being considered when we compare lineages from across the animal kingdom. A major goal is to deduce whether particular instances of clustering are primary (conserved from ancient ancestral clusters) or secondary (reassortment of genes into clusters in lineage-specific fashion). The following summary of the various instances of homeobox gene clusters in animals, and the hypotheses about their evolution, provides a framework for the future resolution of this uncertainty.
... 2004). Our analysis based on the use of all core genes of a set of 42 genomes to maximize the sequence support for the phylogenetic tree (Zdobnov and Bork, 2007) and used the pipeline provided by the EDGAR software (Blom et al., 2009). According to phylogenomic analysis B. amyloliquefaciens is clustered into three taxonomic units which could be considered as 'subspecies' (Figure 1): ...
Preprint
Full-text available
The evolution of insects has been marked by the appearance of key body plan innovations and novel organs that promoted the outstanding ability of this lineage to adapt to new habitats, boosting the most successful radiation in animals. To understand the origin and evolution of these new structures, it is essential to investigate which are the genes and gene regulatory networks participating during the embryonic development of insects. Great efforts have been made to fully understand, from a gene expression and gene regulation point of view, the development of holometabolous insects, in particular Drosophila melanogaster , with the generation of numerous functional genomics resources and databases. Conversely, how hemimetabolous insects develop, and which are the dynamics of gene expression and gene regulation that control their embryogenesis, are still poorly characterized. Therefore, to provide a new platform to study gene regulation in insects, we generated ATAC-seq (Assay for transposase-Accessible Chromatin using sequencing) for the first time during the development of the mayfly Cloeon dipterum. This new available resource will allow to better understand the dynamics of gene regulation during hemimetabolan embryogenesis, since C. dipterum belongs to the paleopteran order of Ephemeroptera, the sister group to all other winged insects. These new datasets include six different time points of its embryonic development and identify accessible chromatin regions corresponding to both general and stage-specific promoters and enhancers. With these comprehensive datasets, we characterised pronounced changes in accessible chromatin between stages 8 and 10 of embryonic development, which correspond to the transition from the last stages of segmentation to organogenesis and appendage differentiation. The application of ATAC-seq in mayflies has contributed to identify the epigenetic mechanisms responsible for embryonic development in hemimetabolous insects and it will provide a fundamental resource to understand the evolution of gene regulation in winged insects.
Article
Full-text available
Novel genes are essential for evolutionary innovations and differ substantially even between closely related species. Recently, multiple studies across many taxa showed that some novel genes arise de novo, that is, from previously noncoding DNA. To characterize the underlying mutations that allowed de novo gene emergence and their order of occurrence, homologous regions must be detected within noncoding sequences in closely related sister genomes. So far, most studies do not detect noncoding homologs of de novo genes because of incomplete assemblies and annotations, and long evolutionary distances separating genomes. Here, we overcome these issues by searching for de novo expressed open reading frames (neORFs), the not-yet fixed precursors of de novo genes that emerged within a single species. We sequenced and assembled genomes with long-read technology and the corresponding transcriptomes from inbred lines of Drosophila melanogaster , derived from seven geographically diverse populations. We found line-specific neORFs in abundance but few neORFs shared by lines, suggesting a rapid turnover. Gain and loss of transcription is more frequent than the creation of ORFs, for example, by forming new start and stop codons. Consequently, the gain of ORFs becomes rate limiting and is frequently the initial step in neORFs emergence. Furthermore, transposable elements (TEs) are major drivers for intragenomic duplications of neORFs, yet TE insertions are less important for the emergence of neORFs. However, highly mutable genomic regions around TEs provide new features that enable gene birth. In conclusion, neORFs have a high birth-death rate, are rapidly purged, but surviving neORFs spread neutrally through populations and within genomes.
Article
Full-text available
Many plants produce chemical defense compounds as protection against antagonistic herbivores. However, how beneficial insects such as pollinators deal with the presence of these potentially toxic chemicals in nectar and pollen is poorly understood. Here, we characterize a conserved mechanism of plant secondary metabolite detoxification in the Hymenoptera, an order that contains numerous highly beneficial insects. Using phylogenetic and functional approaches, we show that the CYP336 family of cytochrome P450 enzymes detoxifies alkaloids, a group of potent natural insecticides, in honeybees and other hymenopteran species that diverged over 281 million years. We linked this function to an aspartic acid residue within the main access channel of CYP336 enzymes that is highly conserved within this P450 family. Together, these results provide detailed insights into the evolution of P450s as a key component of detoxification systems in hymenopteran species and reveal the molecular basis of adaptations arising from interactions between plants and beneficial insects.
Article
The development of deep sequencing technologies has led to the discovery of novel transcripts. Many in silico methods have been developed to assess the coding potential of these transcripts to further investigate their functions. Existing methods perform well on distinguishing majority long noncoding RNAs (lncRNAs) and coding RNAs (mRNAs) but poorly on RNAs with small open reading frames (sORFs). Here, we present DeepCPP (deep neural network for coding potential prediction), a deep learning method for RNA coding potential prediction. Extensive evaluations on four previous datasets and six new datasets constructed in different species show that DeepCPP outperforms other state-of-the-art methods, especially on sORF type data, which overcomes the bottleneck of sORF mRNA identification by improving more than 4.31, 37.24 and 5.89% on its accuracy for newly discovered human, vertebrate and insect data, respectively. Additionally, we also revealed that discontinuous k-mer, and our newly proposed nucleotide bias and minimal distribution similarity feature selection method play crucial roles in this classification problem. Taken together, DeepCPP is an effective method for RNA coding potential prediction.
Chapter
The deployment of next‐generation sequencing approaches has caused a rapid increase in the number of completely sequenced genomes. As one result of this development, it is feasible to analyze not only single genomes, but also large groups of related genomes in a comparative approach. Whole genome sequencing of type strain genomes also holds huge potential for obtaining a higher resolution phylogenetic and taxonomic classification. In the past 9 years, the EDGAR platform has become one of the most established software tools in the field of comparative genomics. During this time, the software has been continuously improved, and a large number of new analysis features have been added. In recent years, the use of EDGAR for core‐genome‐based phylogenomic/taxonomic analysis has become a main application field of the software. With a focus on generating genome sequences of all type strains of prokaryotic species, the basic 16S rRNA gene sequence phylogeny can be significantly extended to a higher resolution core‐genome‐based taxonomy, and lab work intensive DNA–DNA hybridization (DDH) can be replaced by genome‐sequence‐based indices, which reflect the species borders in the same manner as the DDH. The web‐based user interface of EDGAR offers all tools required for phylogenomic inter‐ and intraspecies taxonomic analyses as needed for the proposal of novel species. EDGAR calculates core‐genome‐based phylogenetic trees with neighbor‐joining and maximum‐likelihood methods as well as amino acid identity (AAI) and average nucleotide identity (ANI) matrices. Furthermore, it offers convenient visualization features such as Venn diagrams, synteny plots, and a comparative view of the genomic neighborhood of orthologous genes. Recently, the software was extended to include various new features, such as statistical analyses, replicon grouping options, and second‐level analyses of meta gene sets. Thus, the software enables a quick survey of evolutionary relationships and simplifies the process of obtaining new biological insights into the differential gene content of kindred genomes. EDGAR also provides public databases with precomputed projects providing comparative genomics and phylogenomic results. The platform provides 322 genus‐based public databases comprising 8,079 complete genomes. Besides those genus‐based projects, in this article, we present 226 new public projects that are clustered on the family level and use type strains genomes, which also include draft genomes. These new public projects comprise a further 4,400 genomes. EDGAR is free for academic use and funded as a service by the German Network for Bioinformatics Infrastructure – de.NBI. EDGAR is available via the public web server http://edgar.computational.bio.
Article
Several hundred insect genome assemblies are already publicly available, and this total grows on a weekly basis. A major challenge now confronting insect science is how best to use genomic data to improve our understanding of insect biology. We consider a framework for genome analysis based on functional affiliation, that is, groups of genes involved in the same biological process or pathway, and explore how such an approach furthers our understanding of several aspects of insect phenotype. We anticipate that this approach will prove useful for future research across the breadth of insect studies, whatever organism or trait it involves. •Genome assemblies are an as-yet underutilised resource for understanding phenotypic diversity among insects. •Examining genes that act together, for instance in metabolic or developmental pathways, can improve understanding of the molecular basis of insect biodiversity. •Elaboration or simplification of genetic networks, particularly at terminal stages, is common in insect evolution. •Loss of genes may indicate changes to a pathway or development of alternative mechanisms to maintain it (including complementation by genes of endosymbiotic origin). •Genetic model organisms can be poorly representative of insects, and species with contrasting phenotypes should be prioritised for genome sequencing.
Article
Diptera (true flies) are among the most diverse holometabolan insect orders and were the first eukaryotic order to have a representative genome fully sequenced. 110 fly species have publically available genome assemblies and many hundreds of population-level genomes have been generated in the model organisms Drosophila melanogaster and the malaria mosquito Anopheles gambiae. Comparative genomics carried out in a phylogenetic context is illuminating many aspects of fly biology, providing unprecedented insight into variability in genome structure, gene content, genetic mechanisms, and rates and patterns of evolution in genes, populations, and species. Despite the rich availability of genomic resources in flies, there remain many fly lineages to which new genome sequencing efforts should be directed. Such efforts would be most valuable in fly families or clades that exhibit multiple origins of key fly behaviors such as blood feeding, phytophagy, parasitism, pollination, and mycophagy.
Chapter
Full-text available
This review summarizes some major events in the evolution of body plans along the backbone of the arthropod tree, with a special focus on the origin of insects. The incompatibility among recent molecular phylogenies motivates a discussion about possible causes for failures: there is a worrisome lack of information in alignments, which can be visualized with spectra of split-supporting positions, and there are systematic errors occurring even when using correct models in maximum likelihood methods (Kück et al., this book). Currently, these problems cannot be avoided. Combining information from the fossil record and from extant arthropods, the morphology-based evolutionary scenario leads from worm-like stem-lineage arthropods via first euarthropods to the crown group of Mandibulata. The evolution of the mandibulate head is well documented in the Cambrian Orsten fossils. The evolution within crustaceans is also the evolution that leads to characters of the bauplan of myriapods and insects. It is argued that morphologicallymyriapods do not fit to the base of the mandibulatan tree and that this placement is also not plausible from a paleontological point of view. Available morphological evidencesuggests that myriapods are the sister-group to Hexapoda and that tracheates evolved from a marine ancestor that was similar in many ways to Remipedia. In the extant fauna, the Remipedia are the sister-group of Tracheata.
Chapter
Insects are the most diverse and ecologically important group of animals in the animal kingdom, with more than a million species described to date. Whole-genome sequencing, which has revolutionized many areas of biological research, carries significant potential for achieving a deeper understanding of insect development, physiology, and evolution, and for facilitating new biotechnological advances in insect management and bio-control. Comprehensive genome annotation, including not only genes but also regulatory regions, is necessary for realizing the full benefits of this sequencing. However, regulatory element discovery in non-model organisms—the majority of insects—remains a major challenge as most regulatory sequences have diverged past the point of recognition by standard sequence alignment methods, even for relatively closely related species such as flies and mosquitoes. We review here some of the advances made in insect regulatory genomics and the methods and resources available for identifying regulatory elements in well-studied model insects such as Drosophila. We discuss recent efforts to extend these approaches to discovering regulatory elements in evolutionarily diverged non-model species and potential applications of the resulting regulatory data.
Article
Full-text available
Bees are haplodiploid organisms: haploid males develop from unfertilized eggs, diploid females from ertilized eggs. Under haplodiploidy, deleterious mutations are effectively purged by purifying selection on aploid males. Therefore, genetic load and inbreeding depression are low in bees, which allow them to exist in very small populations, and facilitate the colonization of new areas and habitats by single fertilized females. Exceptions caused by distinct modes of genetic sex-determination are discussed. Owing to the purifying selection and the higher rate of genetic drift in small populations, the genetic variation of bees is only one third of the variation of diploid insects. As a consequence, bees have less genetic adaptability to environmental change, for which they compensate by exhibiting higher learning ability and greater behavioural plasticity than many other insect taxa. Most bee species need specific microclimatic conditions to perform the proper flight behaviour to provision their nests with larval food. Energy flow and metabolic rates in flight muscles of bees are among the highest ever measured in animal tissue. The temperature dependence of the enzymes which drive the flight muscle metabolism is therefore of critical importance for the functioning of the system. Mutations which change the thermal tolerance range of one of those enzymes might lead to changing habitat requirements, and parapatric or allochronous population divergence. The fact that bees choose their nesting site very carefully already hints at the critical role, temperature and humidity ranges play for bee development. Experiments show a remarkable dependence of learning ability and behaviour on developmental temperatures. Evolutionary and ecological aspects of social behaviour, social and cleptoparasitism, and flower choice in bees are discussed. Possible paths of population divergence and speciation are pointed out. The reproduction rate of bees is closer to the rates of primates than to that of other insects. Compared to other insects, bees evolve only slowly.
Article
Full-text available
We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Article
Full-text available
Prior to the Near Earth Asteroid Rendezvous (NEAR) mission, little was known about Eros except for its orbit, spin rate, and pole orientation, which could be determined from ground-based telescope observations. Radar bounce data provided a rough estimate of the shape of Eros. On December 23, 1998, after an engine misfire, the NEAR-Shoemaker spacecraft flew by Eros on a high-velocity trajectory that provided a brief glimpse of Eros and allowed for an estimate of the asteroid's pole, prime meridian, and mass. This new information, when combined with the ground-based observations, provided good a priori estimates for processing data in the orbit phase.
Article
Full-text available
We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Article
Full-text available
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic, and statistical refinements permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is described for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position Specific Iterated BLAST (PSLBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities.
Article
Full-text available
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
Article
Full-text available
Ab initio gene identification in the genomic sequence of Drosophila melanogaster was obtained using (human gene predictor) and Fgenesh programs that have organism-specific parameters for human, Drosophila, plants, yeast, and nematode. We did not use information about cDNA/EST in most predictions to model a real situation for finding new genes because information about complete cDNA is often absent or based on very small partial fragments. We investigated the accuracy of gene prediction on different levels and designed several schemes to predict an unambiguous set of genes (annotation CGG1), a set of reliable exons (annotation CGG2), and the most complete set of exons (annotation CGG3). For 49 genes, protein products of which have clear homologs in protein databases, predictions were recomputed by Fgenesh+ program. The first annotation serves as the optimal computational description of new sequence to be presented in a database. Reliable exons from the second annotation serve as good candidates for selecting the PCR primers for experimental work for gene structure verification. Our results shows that we can identify approximately 90% of coding nucleotides with 20% false positives. At the exon level we accurately predicted 65% of exons and 89% including overlapping exons with 49% false positives. Optimizing accuracy of prediction, we designed a gene identification scheme using Fgenesh, which provided sensitivity (Sn) = 98% and specificity (Sp) = 86% at the base level, Sn = 81% (97% including overlapping exons) and Sp = 58% at the exon level and Sn = 72% and Sp = 39% at the gene level (estimating sensitivity on std1 set and specificity on std3 set). In general, these results showed that computational gene prediction can be a reliable tool for annotating new genomic sequences, giving accurate information on 90% of coding sequences with 14% false positives. However, exact gene prediction (especially at the gene level) needs additional improvement using gene prediction algorithms. The program was also tested for predicting genes of human Chromosome 22 (the last variant of Fgenesh can analyze the whole chromosome sequence). This analysis has demonstrated that the 88% of manually annotated exons in Chromosome 22 were among the ab initio predicted exons. The suite of gene identification programs is available through the WWW server of Computational Genomics Group at http://genomic.sanger.ac.uk/gf. html.
Article
Full-text available
TREE-PUZZLE is a program package for quartet-based maximum-likelihood phylogenetic analysis (formerly PUZZLE, Strimmer and von Haeseler, Mol. Biol. Evol., 13, 964-969, 1996) that provides methods for reconstruction, comparison, and testing of trees and models on DNA as well as protein sequences. To reduce waiting time for larger datasets the tree reconstruction part of the software has been parallelized using message passing that runs on clusters of workstations as well as parallel computers. Availability: http://www.tree-puzzle.de. The program is written in ANSI C. TREE-PUZZLE can be run on UNIX, Windows and Mac systems, including Mac OS X. To run the parallel version of PUZZLE, a Message Passing Interface (MPI) library has to be installed on the system. Free MPI implementations are available on the Web (cf. http://www.lam-mpi.org/mpi/implementations/).
Article
Full-text available
Comparison of the genomes and proteomes of the two dipteraAnopheles gambiae and Drosophila melanogaster, which diverged about 250 million years ago, reveals considerable similarities. However, numerous differences are also observed; some of these must reflect the selection and subsequent adaptation associated with different ecologies and life strategies. Almost half of the genes in both genomes are interpreted as orthologs and show an average sequence identity of about 56%, which is slightly lower than that observed between the orthologs of the pufferfish and human (diverged about 450 million years ago). This indicates that these two insects diverged considerably faster than vertebrates. Aligned sequences reveal that orthologous genes have retained only half of their intron/exon structure, indicating that intron gains or losses have occurred at a rate of about one per gene per 125 million years. Chromosomal arms exhibit significant remnants of homology between the two species, although only 34% of the genes colocalize in small “microsyntenic” clusters, and major interarm transfers as well as intra-arm shuffling of gene order are detected.
Article
Full-text available
Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.
Article
Full-text available
De novo gene predictors are programs that predict the exon-intron structures of genes using the sequences of one or more genomes as their only input. In the past two years, dual-genome de novo predictors, which exploit local rates and patterns of mutation inferred from alignments between two genomes, have led to significant improvements in accuracy. Systems that exploit more than two genomes simultaneously have only recently begun to appear and are not yet competitive on practical tasks, but offer the greatest hope for near-term improvements. Dual-genome de novo prediction for compact eukaryotic genomes such as those of Arabidopsis thaliana and Caenorhabditis elegans is already quite accurate. Although mammalian gene prediction lags behind in accuracy, it is yielding ever more useful results. Coupled with significant improvements in pseudogene detection methods, which have eliminated many false positives, we have reached the point where de novo gene predictions are being used as hypotheses to drive experimental annotation via systematic RT-PCR and sequencing.
Article
Full-text available
We report a draft sequence for the genome of the domesticated silkworm (Bombyx mori), covering 90.9% of all known silkworm genes. Our estimated gene count is 18,510, which exceeds the 13,379 genes reported for Drosophila melanogaster. Comparative analyses to fruitfly, mosquito, spider, and butterfly reveal both similarities and differences in gene content.
Article
Full-text available
FlyBase (http://flybase.org) is the primary repository of genetic and molecular data of the insect family Drosophilidae. For the most extensively studied species, Drosophila melanogaster, a wide range of data are presented in integrated formats. Data types include mutant phenotypes, molecular characterization of mutant alleles and aberrations, cytological maps, wild-type expression patterns, anatomical images, transgenic constructs and insertions, sequence-level gene models and molecular classification of gene product functions. There is a growing body of data for other Drosophila species; this is expected to increase dramatically over the next year, with the completion of draft-quality genomic sequences of an additional 11 Drosphila species.
Article
Full-text available
Homeotic (Hox) genes are usually clustered and arranged in the same order as they are expressed along the anteroposterior body axis of metazoans. The mechanistic explanation for this colinearity has been elusive, and it may well be that a single and universal cause does not exist. The Hox-gene complex (HOM-C) has been rearranged differently in several Drosophila species, producing a striking diversity of Hox gene organizations. We investigated the genomic and functional consequences of the two HOM-C splits present in Drosophila buzzatii. Firstly, we sequenced two regions of the D. buzzatii genome, one containing the genes labial and abdominal A, and another one including proboscipedia, and compared their organization with that of D. melanogaster and D. pseudoobscura in order to map precisely the two splits. Then, a plethora of conserved noncoding sequences, which are putative enhancers, were identified around the three Hox genes closer to the splits. The position and order of these enhancers are conserved, with minor exceptions, between the three Drosophila species. Finally, we analyzed the expression patterns of the same three genes in embryos and imaginal discs of four Drosophila species with different Hox-gene organizations. The results show that their expression patterns are conserved despite the HOM-C splits. We conclude that, in Drosophila, Hox-gene clustering is not an absolute requirement for proper function. Rather, the organization of Hox genes is modular, and their clustering seems the result of phylogenetic inertia more than functional necessity.
Article
Full-text available
Despite the continuous production of genome sequence for a number of organisms, reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularly true for genomes for which there is not a large collection of known gene sequences, such as the recently published chicken genome. We used the chicken sequence to test comparative and homology-based gene-finding methods followed by experimental validation as an effective genome annotation method. We performed experimental evaluation by RT-PCR of three different computational gene finders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram was computed and each component of it was evaluated. The results showed that de novo comparative methods can identify up to about 700 chicken genes with no previous evidence of expression, and can correctly extend about 40% of homology-based predictions at the 5' end. De novo comparative gene prediction followed by experimental verification is effective at enhancing the annotation of the newly sequenced genomes provided by standard homology-based methods.
Article
Full-text available
The rate of molecular evolution varies widely between proteins, both within and among lineages. To what extent is this variation influenced by genome-wide, lineage-specific effects? To answer this question, we assess the rate variation between insect lineages for a large number of orthologous genes. When compared to the beetle Tribolium castaneum, we find that the stem lineage of flies and mosquitoes (Diptera) has experienced on average a 3-fold increase in the rate of evolution. Pairwise gene comparisons between Drosophila and Tribolium show a high correlation between evolutionary rates of orthologous proteins. Gene specific divergence rates remain roughly constant over long evolutionary times, modulated by genome-wide, lineage-specific effects. Among the insects analysed so far, it appears that the Tribolium genes show the lowest rates of divergence. This has the practical consequence that homology searches for human genes yield significantly better matches in Tribolium than in Drosophila. We therefore suggest that Tribolium is better suited for comparisons between phyla than the widely employed dipterans.
Article
Full-text available
Comparative studies require knowledge of the evolutionary relationships between taxa. However, neither morphological nor paleontological data have been able to unequivocally resolve the major groups of holometabolous insects so far. Here, we utilize emerging genome projects to assemble and analyze a data set of 185 nuclear genes, resulting in a fully resolved phylogeny of the major insect model species. Contrary to the most widely accepted phylogenetic hypothesis, bees and wasps (Hymenoptera) are basal to the other major holometabolous orders, beetles (Coleoptera), moths (Lepidoptera), and flies (Diptera). We validate our results by meticulous examination of potential confounding factors. Phylogenomic approaches are thus able to resolve long-standing questions about the phylogeny of insects.
Article
An efficient means for generating mutation data matrices from large numbers of protein sequences is presented here. By means of an approximate peptide-based sequence comparison algorithm, the set sequences are clustered at the 85% identity level. The closest relating pairs of sequences are aligned, and observed amino acid exchanges tallied in a matrix. The raw mutation frequency matrix is processed in a similar way to that described by Dayhoff et al. (1978), and so the resulting matrices may be easily used in current sequence analysis applications, in place of the standard mutation data matrices, which have not been updated for 13 years. The method is fast enough to process the entire SWISS-PROT databank in 20 h on a Sun SPARCstation 1, and is fast enough to generate a matrix from a specific family or class of proteins in minutes. Differences observed between our 250 PAM mutation data matrix and the matrix calculated by Dayhoff et al. are briefly discussed.
Article
Linkage relationships of homologous loci in man and mouse were used to estimate the mean length of autosomal segments conserved during evolution. Comparison of the locations of greater than 83 homologous loci revealed 13 conserved segments. Map distances between the outermost markers of these 13 segments are known for the mouse and range from 1 to 24 centimorgans. Methods were developed for using this sample of conserved segments to estimate the mean length of all conserved autosomal segments in the genome. This mean length was estimated to be 8.1 +/- 1.6 centimorgans. Evidence is presented suggesting that chromosomal rearrangements that determine the lengths of these segments are randomly distributed within the genome. The estimated mean length of conserved segments was used to predict the probability that certain loci, such as peptidase-3 and renin, are linked in man given that homologous loci are chi centimorgans apart in the mouse. The mean length of conserved segments was also used to estimate the number of chromosomal rearrangements that have disrupted linkage since divergence of man and mouse. This estimate was shown to be 178 +/- 39 rearrangements.
Article
The use of some multiple-sequence alignments in phylogenetic analysis, particularly those that are not very well conserved, requires the elimination of poorly aligned positions and divergent regions, since they may not be homologous or may have been saturated by multiple substitutions. A computerized method that eliminates such positions and at the same time tries to minimize the loss of informative sites is presented here. The method is based on the selection of blocks of positions that fulfill a simple set of requirements with respect to the number of contiguous conserved positions, lack of gaps, and high conservation of flanking positions, making the final alignment more suitable for phylogenetic analysis. To illustrate the efficiency of this method, alignments of 10 mitochondrial proteins from several completely sequenced mitochondrial genomes belonging to diverse eukaryotes were used as examples. The percentages of removed positions were higher in the most divergent alignments. After removing divergent segments, the amino acid composition of the different sequences was more uniform, and pairwise distances became much smaller. Phylogenetic trees show that topologies can be different after removing conserved blocks, particularly when there are several poorly resolved nodes. Strong support was found for the grouping of animals and fungi but not for the position of more basal eukaryotes. The use of a computerized method such as the one presented here reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.
Article
The Wnt genes encode a large family of secreted protein growth factors that have been identified in animals from hydra to humans. In humans, 19 WNT proteins have been identified that share 27% to 83% amino-acid sequence identity and a conserved pattern of 23 or 24 cysteine residues. Wnt genes are highly conserved between vertebrate species sharing overall sequence identity and gene structure, and are slightly less conserved between vertebrates and invertebrates. During development, Wnts have diverse roles in governing cell fate, proliferation, migration, polarity, and death. In adults, Wnts function in homeostasis, and inappropriate activation of the Wnt pathway is implicated in a variety of cancers.
Article
The purpose of this chapter is two-fold. First, all available 18S rDNA sequences for the Holometabola to reappraise their phylogenetic relationships will be compiled. Second, these data and analyses will be used to highlight general problems in using molecular data to infer higher-level phylogeny.
Article
Even with the availability of the genome sequences of many different organisms, we are still left wondering about the definition of a true gene. In their Perspective, [Snyder and Gerstein][1] discuss different criteria that can be used to define what a gene is in the era of genomics. [1]: http://www.sciencemag.org/cgi/content/full/300/5617/258
Article
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/.
Article
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log‐expectation score, and refinement using tree‐dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T‐Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T‐Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
Article
Orthologous genes that maintain a single-copy status in a broad range of species may indicate a selection against gene duplication. If this is the case, then duplicates of such genes that do survive may have escaped the dosage control by rapid and sizable changes in their function. To test this hypothesis and to develop a strategy for the identification of novel gene functions, we have analyzed 22 primate-specific intrachromosomal duplications of genes with a single-copy ortholog in all other completely sequenced metazoans. When comparing this set to genes not exposed to the single-copy status constraint, we observed a higher tendency of the former to modify their gene structure, often through complex genomic rearrangements. The analysis of the most dramatic of these duplications, affecting approximately 10% of human Chromosome 2, enabled a detailed reconstruction of the events leading to the appearance of a novel gene family. The eight members of this family originated from the highly conserved nucleoporin RanBP2 by several genetic rearrangements such as segmental duplications, inversions, translocations, exon loss, and domain accretion. We have experimentally verified that at least one of the newly formed proteins has a cellular localization different from RanBP2's, and we show that positive selection did act on specific domains during evolution.
Article
Seven distinct genome-wide divergence measures were applied pairwise to the nine sequenced animal genomes of human, mouse, rat, chicken, pufferfish, fruit fly, mosquito, and two nematode worms (Caenorhabditis briggsae and Caenorhabditis elegans). Qualitatively, all of these divergence measures are found to correlate with the estimated time since speciation; however, marked deviations are observed in a few lineages. The distinct genome divergence measures also correlate well among themselves, indicating that most of the processes shaping genomes are dominated by neutral events. The deviations from the clock-like scenario in some lineages are observed consistently by several measures, implicitly confirming their reliability.
Article
Previous genome comparisons have suggested that one important trend in vertebrate evolution has been a sharp rise in intron abundance. By using genomic data and expressed sequence tags from the marine annelid Platynereis dumerilii, we provide direct evidence that about two-thirds of human introns predate the bilaterian radiation but were lost from insect and nematode genomes to a large extent. A comparison of coding exon sequences confirms the ancestral nature of Platynereis and human genes. Thus, the urbilaterian ancestor had complex, intron-rich genes that have been retained in Platynereis and human.
Lengths of chromosomal segments conserved since divergence of man and mouse
  • Nadeau
Nadeau, J.H. and Taylor, B.A. (1984) Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl. Acad. Sci. U. S. A. 81, 814-818