Figure I - uploaded by Sebastian Fraune
Content may be subject to copyright.
Percentages of orphan or taxonomically-restricted genes (TRGs) in 30 animal genomes. Orphan genes comprise about 10-20% of every genome independently of the phylogenetic position of a species. The percentage of orphan genes may vary depending on selected BLAST search criteria (note alternative calculations for C. elegans; Anopheles gambia; C. briggsae and Bos taurus). For example, 11% of proteins in C.briggsae have no significant BLASTP hits with E < 10 À10 , but the percentage of non-hits decreases to 4.1% if a cut-off value of E < 10 À5 is used [36]. Species are listed according to the date of publication of the corresponding genome sequence paper. References and information regarding BLAST settings are summarized in Supplementary Table 1. 

Percentages of orphan or taxonomically-restricted genes (TRGs) in 30 animal genomes. Orphan genes comprise about 10-20% of every genome independently of the phylogenetic position of a species. The percentage of orphan genes may vary depending on selected BLAST search criteria (note alternative calculations for C. elegans; Anopheles gambia; C. briggsae and Bos taurus). For example, 11% of proteins in C.briggsae have no significant BLASTP hits with E < 10 À10 , but the percentage of non-hits decreases to 4.1% if a cut-off value of E < 10 À5 is used [36]. Species are listed according to the date of publication of the corresponding genome sequence paper. References and information regarding BLAST settings are summarized in Supplementary Table 1. 

Source publication
Article
Full-text available
Comparative genome analyses indicate that every taxonomic group so far studied contains 10-20% of genes that lack recognizable homologs in other species. Do such 'orphan' or 'taxonomically-restricted' genes comprise spurious, non-functional ORFs, or does their presence reflect important evolutionary processes? Recent studies in basal metazoans such...

Contexts in source publication

Context 1
... consider the implications of the exclusive presence of some genes in one species or animal group. Because these are still early days for functional studies on TRGs, we focus on examples in basal metazoans ( Figure 1) where at least some functional data are already available. Specifically, we discuss recent evidence from Hydra which points to roles for taxonomically-restricted genes in the creation of phylum-specific novelties such as cnidocytes, the generation of morphological diversity, and in the innate defence system. ...
Context 2
... 10-20% of every genome is composed of orphan genes only becomes obvious in the context of comparative genomics which, by definition, is strongly dependent on the availability of reliable datasets. The ideal dataset should Figure 1. Phylogeny of the 'lower' Metazoa. ...
Context 3
... independently of the approach and the phylogenetic position of a species, orphan genes comprise about 10-20% of all genes in a genome. The percentages of orphan genes in 30 published genomes are summarized in Figure I. ...
Context 4
... us imagine that we are interested in the comparative genome analysis of nine species representing three genera (see Figure I). Initial sequencing of three species belonging to three genera identifies three orphan genes (A, B and D). ...
Context 5
... analyses from the 12 Drosophila genomes pro- ject [38] revealed that 15-30% of genes do not show any significant sequence similarity to D. melanogaster gene models (see Figure I in Box 1; Figure 2 and Supplementary Table 9 in Ref. [38]), suggesting that they are species- specific. Moreover, 2.5% of genes (n = 296) were not detected at the root of the Drosophila genus and therefore most likely arose de novo within this animal group. ...
Context 6
... analyses from the 12 Drosophila genomes pro- ject [38] revealed that 15-30% of genes do not show any significant sequence similarity to D. melanogaster gene models (see Figure I in Box 1; Figure 2 and Supplementary Table 9 in Ref. [38]), suggesting that they are species- specific. Moreover, 2.5% of genes (n = 296) were not detected at the root of the Drosophila genus and therefore most likely arose de novo within this animal group. ...
Context 7
... and the evolution of a cnidarian-specific structure Cnidaria (corals, jellyfishes and polyps) represent the sim- plest animals at the tissue level of organization (Figure 1). However, despite of their morphological simplicity they also possess what might be one of the most sophisticated and complex of all cell types in the animal kingdom -stinging [67,68]; H. magnipapillata lives close to the surface (Habitat A) whereas H. oligactis occupies a deeper habitat (depicted as Habitat B here). ...
Context 8
... Hydra (Figure 2), three different types of nemato- cytes (stenoteles, desmonemes and isorhiza) are formed as derivatives of the multipotent interstitial stem cell lineage (reviewed in Ref. [41]). All known Hydra species share similar stenoteles and desmonemes whereas differences in a subtype of isorhiza (holotrichous isorhiza; Figure 2f and j) serve as distinguishing characteristics between different hydra groups [42]. ...
Context 9
... Hydra (Figure 2), three different types of nemato- cytes (stenoteles, desmonemes and isorhiza) are formed as derivatives of the multipotent interstitial stem cell lineage (reviewed in Ref. [41]). All known Hydra species share similar stenoteles and desmonemes whereas differences in a subtype of isorhiza (holotrichous isorhiza; Figure 2f and j) serve as distinguishing characteristics between different hydra groups [42]. What are the relative contri- butions of 'conserved' and 'novel' genetic components in generating a phylum-specific structure? ...
Context 10
... comparison between two Hydra species (H. magnipapillata and H. oligactis; Figure 1 and Figure 2a) and the distantly related sea anemone Nematostella vectensis revealed that most of TRGs identified were restricted to Hydra. Homo- logs of two genes could be identified in the sea anemone genome but not in any other animals outside cnidaria [41], suggesting that these two genes represent phylum-specific TRGs. ...
Context 11
... comparison between two Hydra species (H. magnipapillata and H. oligactis; Figure 1 and Figure 2a) and the distantly related sea anemone Nematostella vectensis revealed that most of TRGs identified were restricted to Hydra. Homo- logs of two genes could be identified in the sea anemone genome but not in any other animals outside cnidaria [41], suggesting that these two genes represent phylum-specific TRGs. ...
Context 12
... address this question the 5 0 -flanking region of one of the genes investigated, nb001, was used to gen- erate eGFP + transgenic animals. A reporter construct with 1 kb of the nb001 5 0 -flanking region faithfully recapitulated the endogenous expression of nb001 in developing cnido- cytes (Figure 2k). Interestingly, the nb001 promoter lacks any detectable conserved cis-regulatory elements, pointing to the existence of novel transcription factor binding sites or even novel taxonomically-restricted transcription fac- tors. ...
Context 13
... together, the invention of a new morphological feature such as the cnidocyte seems to be tightly inter- linked with the evolution of TRGs. Different Hydra species preferentially live in different habitats and most likely encounter different types of prey (Figure 2b). Thus, TRGs might contribute to the diversity of structures (cnidocytes) that are adapted for different food sources in a particular environment. ...
Context 14
... example, RNAi knockdown experiments of the epitheliopeptide Hym-301, initially identified by the Hydra peptide project [48], provided hints that Hym-301 gene is involved in tentacle formation [56]. Unbiased SSH screening for genes that differ in sequence and expression between H. magnipapillata and H.oligactis led to the identification of additional Hym301-like genes and demon- strated a correlation between their expression patterns and the modes of tentacle formation in these two species [57] (Figure 3). The expression of Hym301 in the tentacles of H. oligactis correlates with the formation of two long and functional tentacles prior to the appearance of the other tentacles ( Figure 3 b and e). ...
Context 15
... SSH screening for genes that differ in sequence and expression between H. magnipapillata and H.oligactis led to the identification of additional Hym301-like genes and demon- strated a correlation between their expression patterns and the modes of tentacle formation in these two species [57] (Figure 3). The expression of Hym301 in the tentacles of H. oligactis correlates with the formation of two long and functional tentacles prior to the appearance of the other tentacles ( Figure 3 b and e). By contrast, H. magnipapil- lata and H. vulgaris express Hym301 genes in the tentacle formation zone but not in the tentacles themselves, and four or five short tentacles develop simultaneously (Figure 3a and d). ...
Context 16
... expression of Hym301 in the tentacles of H. oligactis correlates with the formation of two long and functional tentacles prior to the appearance of the other tentacles ( Figure 3 b and e). By contrast, H. magnipapil- lata and H. vulgaris express Hym301 genes in the tentacle formation zone but not in the tentacles themselves, and four or five short tentacles develop simultaneously (Figure 3a and d). Overexpression of Hym301 gene in tentacles of H.vulgaris AEP (in the pattern typical of another species -Hydra oligactis) induced changes in morphology that mirror the phenotypic differences observed between species -asymmetric appearance of tentacles during budding and regeneration (Figure 3c and f) [57]. ...
Context 17
... contrast, H. magnipapil- lata and H. vulgaris express Hym301 genes in the tentacle formation zone but not in the tentacles themselves, and four or five short tentacles develop simultaneously (Figure 3a and d). Overexpression of Hym301 gene in tentacles of H.vulgaris AEP (in the pattern typical of another species -Hydra oligactis) induced changes in morphology that mirror the phenotypic differences observed between species -asymmetric appearance of tentacles during budding and regeneration (Figure 3c and f) [57]. ...
Context 18
... propose that differ- ences in Hym301 regulation constitute a principal source of molecular variation that can be acted on by natural selec- tion. The way in which natural selection fine-tunes the expression of Hym301 genes or their gene regulators remains to be elucidated (Figure 4). ...
Context 19
... might appear and continuously evolve to mediate these lineage-specific adaptations. When changes in ecological settings favor an individual that is different from the mean individual in a population, TRGs can have a critical effect on adaptation towards the new optimum ( Figure 5). In this scenario, over evolutionary time, TRGs seem to have modified major components of the innate immune defence system, have provided important structural components of evolutionary novelties, and have been responsible for the generation of morphological diver- sity within closely related species. ...
Context 20
... are at least two possible scenarios. First, TRGs might be created Figure 5. A brief outline of landmark processes involved in the emergence of evolutionary novelties. ...
Context 21
... changes, gene duplications and the emergence of genes de novo). Figure 4. TRGs might help to cope with new ecological niches and changing environments. ...
Context 22
... emergence of evolutionary novelties appears to stem from several processes that act simultaneously and cooperatively: cis-regulatory changes, gene duplications, and the emergence of genes de novo ( Figure 5). Our un- derstanding of the evolutionary significance for TRGs does not overturn a general consensus concerning the role of deep homology and conserved transcription factors in gen- erating diverse adaptations and in the evolution of novel- ties. ...

Similar publications

Article
Full-text available
Renal cell carcinoma (RCC) is a highly vascularized tumor type, which is often associated with inactivated mutations in the von Hippel-Lindau gene that drives proangiogenic signaling pathways. As such, new therapies for the treatment of RCC have largely been focused on blocking angiogenesis. Sunitinib, an antiangiogenic tyrosine kinase inhibitor, i...
Article
Full-text available
Identifying novel genes that drive tumor metastasis and drug resistance has significant potential to improve patient outcomes. High-throughput sequencing approaches have identified cancer genes, but distinguishing driver genes from passengers remains challenging. Insertional mutagenesis screens using replication-incompetent retroviral vectors have...
Article
Full-text available
Prostate cancer variants expressing alternative lineage markers appear at relapse from antiandrogen therapy. We show loss of the retinoblastoma (RB1) and tumor protein 53 (TP53) genes drive expression of stem cell reprogramming factors, lineage plasticity, and antiandrogen resistance. Epigenetic manipulation restores antiandrogen sensitivity—sugges...
Article
Full-text available
Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesia...
Article
Full-text available
Development of liver cancers is driven largely by genomic alterations that deregulate signaling pathways, influencing growth and survival of cancer cells. Because of the hundreds or thousands of genomic/epigenomic alterations that have accumulated in the cancer genome, it is very challenging to find and test candidate genes driving tumor developmen...

Citations

... This is the latest sequencing technology available and greatly improves the continuity, accuracy and completeness of genome assembly compared to other sequencing technologies, and this is the first genome for mangrove plant using HIFI sequencing. The size of our assembled genome LSGs, also called new genes or orphan genes, with essential roles in driving life processes such as speciation, differentiation, phenotypic characteristics, and adaptation to new environments (Khalturin et al. 2009;Kaessmann 2010;Long et al. 2003). In corals and Daphnia pulex, there are a large number of LSGs that are closely related to adaptation to environmental changes (Voolstra et al. 2011;Colbourne et al. 2011). ...
Article
Full-text available
Main conclusion Whole-genome duplication, gene family and lineage-specific genes analysis based on high-quality genome reveal the adaptation mechanisms of Avicennia marina to coastal intertidal habitats. Abstract Mangrove plants grow in a complex habitat of coastal intertidal zones with high salinity, hypoxia, etc. Therefore, it is an interesting question how mangroves adapt to the unique intertidal environment. Here, we present a chromosome-level genome of the Avicennia marina, a typical true mangrove with a size of 480.43 Mb, contig N50 of 11.33 Mb and 30,956 annotated protein-coding genes. We identified 621 Avicennia-specific genes that are mainly related to flavonoid and lignin biosynthesis, auxin homeostasis and response to abiotic stimulus. We found that A. marina underwent a novel specific whole-genome duplication, which is in line with a brief era of global warming that occurred during the paleocene–eocene maximum. Comparative genomic and transcriptomic analyses outline the distinct evolution and sophisticated regulations of A. marina adaptation to the intertidal environments, including expansion of photosynthesis and oxidative phosphorylation gene families, unique genes and pathways for antibacterial, detoxifying antioxidant and reactive oxygen species scavenging. In addition, we also analyzed salt gland secretion-related genes, and those involved in the red bark-related flavonoid biosynthesis, while significant expansions of key genes such as NHX, 4CL, CHS and CHI. High-quality genomes in future investigations will facilitate the understand of evolution of mangrove and improve breeding.
... Taxonomically restricted or orphan genes (OGs), which have no homology to genes in other taxa, may contribute to evolutionary novelties and might be responsible for lineage-specific trait origins (Wilson et al., 2005;Khalturin et al., 2009;Tautz and Domazet-Lošo, 2011). Previous comparative genomic studies have estimated that OGs constitute at least 1% of the total genes in a genome, depending on the alignment rate and taxonomic level considered (Khalturin et al., 2009;Arendsee et al., 2014;Prabh and Rödelsperger, 2016). ...
... Taxonomically restricted or orphan genes (OGs), which have no homology to genes in other taxa, may contribute to evolutionary novelties and might be responsible for lineage-specific trait origins (Wilson et al., 2005;Khalturin et al., 2009;Tautz and Domazet-Lošo, 2011). Previous comparative genomic studies have estimated that OGs constitute at least 1% of the total genes in a genome, depending on the alignment rate and taxonomic level considered (Khalturin et al., 2009;Arendsee et al., 2014;Prabh and Rödelsperger, 2016). Comparative genomics applied to investigate OGs has demonstrated that these genes have shorter lengths and intron sizes and that they have higher levels of transposable elements than genes broadly shared across green plants (Tautz and Domazet-Lošo, 2011). ...
Preprint
Full-text available
Orphan genes (OGs) are protein-coding genes that are restricted to particular clades or species and lack homology with genes from other organisms, making their biological function difficult to predict. OGs can rapidly originate and become functional; consequently, they may support rapid adaptation to environmental changes. Extensive spread of mobile elements, and whole genome duplication, occurred in the Saccharum group, which may have contributed to the origin and diversification of OGs in the sugarcane genome. Here, we identified and characterized OGs in sugarcane, examined their expression profiles across tissues and genotypes, and investigated their regulation under varying conditions. We identified 319 OGs in the Saccharum spontaneum genome without detected homology to protein-coding genes in green plants, except those belonging to Saccharinae. Transcriptomic analysis showed 288 sugarcane OGs with detectable expression levels in at least one tissue or genotype. We observed similar expression patterns of OGs in sugarcane genotypes originating from the closest geographical locations. We also observed tissue-specific expression of some OGs, possibly indicating a complex regulatory process for maintaining diverse functional activity of these genes across sugarcane tissues and genotypes. Sixty-six OGs were differentially expressed under stress conditions, especially cold and osmotic stresses. Gene co-expression network and functional enrichment analyses suggested that sugarcane OGs may be involved in several biological mechanisms, including stimulus response and defence mechanisms. These findings provide a valuable genomic resource for sugarcane researchers, especially those interested in selecting stress-responsive genes.
... In this review we will describe a special kind of exception to the shared-toolkit-conserved clade-specific genesand their role in the evolution of novelty. As full genome sequences from diverse organisms have accumulated, it has become clear that many genes are not shared across clades, and these novel genes are found at all phylogenetic levels, from species to phyla, to super-phyletic groups (reviewed in, for instance, Khalturin et al. 2009;Kaessmann 2010;Tautz and Domazet-Loso 2011). We refer to such genes as clade-specific genes, which is equivalent to the term lineagespecific genes. ...
... The importance of novel proteins in cnidocyte structure and function has long been appreciated (e.g. Kurz et al. 1991;Koch et al. 1998), and this cell type has become a leading model for understanding the role of novel proteins in a complex novelty (reviewed in Khalturin et al. 2009;Babonis and Martindale 2014). Several recent studies have systematically looked for cnidocyte-specific genes, and found that many are restricted to cnidarians. ...
Article
Full-text available
Clade-specific (a.k.a. lineage-specific) genes are very common and found at all taxonomic levels and in all clades examined. They can arise by duplication of previously existing genes, which can involve partial truncations or combinations with other protein domains or regulatory sequences. They can also evolve de novo from non-coding sequences, leading to potentially truly novel protein domains. Finally, since clade-specific genes are generally defined by lack of sequence homology with other proteins, they can also arise by sequence evolution that is rapid enough that previous sequence homology can no longer be detected. In such cases, where the rapid evolution is followed by constraint, we consider them to be ontologically non-novel but likely novel at a functional level. In general, clade-specific genes have received less attention from biologists but there are increasing numbers of fascinating examples of their roles in important traits. Here we review some selected recent examples, and argue that attention to clade-specific genes is an important corrective to the focus on the conserved developmental regulatory toolkit that has been the habit of evo-devo as a field. Finally, we discuss questions that arise about the evolution of clade-specific genes, and how these might be addressed by future studies. We highlight the hypothesis that clade-specific genes are more likely to be involved in synapomorphies that arose in the stem group where they appeared, compared to other genes.
... It is important to highlight that many genes putatively playing a key role in the development of the bracts of the two species have not been annotated (Additional file 2: Table S1, Additional file 4: Table S3), which means that they have no similarities or detectable homologs in other lineages [88,89]. Two factors could explain this, on the one hand, it could be due to the limited number of genomes currently available for gymnosperms and on the other hand that, these genes could be species specific, or taxonomically restricted genes [90][91][92]. Taxonomically restricted genes are important for the development of specific novelties, generating morphological diversity [90]. Thus, further studies to properly annotate these 'orphan genes' are important to understand the unique bract development in Ephedra. ...
... Two factors could explain this, on the one hand, it could be due to the limited number of genomes currently available for gymnosperms and on the other hand that, these genes could be species specific, or taxonomically restricted genes [90][91][92]. Taxonomically restricted genes are important for the development of specific novelties, generating morphological diversity [90]. Thus, further studies to properly annotate these 'orphan genes' are important to understand the unique bract development in Ephedra. ...
Article
Full-text available
Background Gnetales have a key phylogenetic position in the evolution of seed plants. Among the Gnetales, there is an extraordinary morphological diversity of seeds, the genus Ephedra, in particular, exhibits fleshy, coriaceous or winged (dry) seeds. Despite this striking diversity, its underlying genetic mechanisms remain poorly understood due to the limited studies in gymnosperms. Expanding the genomic and developmental data from gymnosperms contributes to a better understanding of seed evolution and development. Results We performed transcriptome analyses on different plant tissues of two Ephedra species with different seed morphologies. Anatomical observations in early developing ovules, show that differences in the seed morphologies are established early in their development. The transcriptomic analyses in dry-seeded Ephedra californica and fleshy-seeded Ephedra antisyphilitica, allowed us to identify the major differences between the differentially expressed genes in these species. We detected several genes known to be involved in fruit ripening as upregulated in the fleshy seed of Ephedra antisyphilitica. Conclusions This study allowed us to determine the differentially expressed genes involved in seed development of two Ephedra species. Furthermore, the results of this study of seeds with the enigmatic morphology in Ephedra californica and Ephedra antisyphilitica, allowed us to corroborate the hypothesis which suggest that the extra envelopes covering the seeds of Gnetales are not genetically similar to integument. Our results highlight the importance of carrying out studies on less explored species such as gymnosperms, to gain a better understanding of the evolutionary history of plants.
... They are also known as lineagespecific genes, taxonomically restricted genes, species-specific genes, and de novo originated new genes (Xiao et al., 2009;Yang et al., 2009;Tautz and Domazet-Loso, 2011). Identification and functional study of the orphan genes is fundamentally important for understanding the origin of new species, species-specific morphological features, and evolution of immune systems (Khalturin et al., 2009;Chen et al., 2013). However, in-depth study of lineage-specific orphan genes is challenging because it requires high-quality reference genomes for closely related species in a genus (Zhang et al., 2019). ...
Article
Full-text available
All genomes carry lineage-specific orphan genes lacking homology in their closely related species. Identification and functional study of the orphan genes is fundamentally important for understanding lineage-specific adaptations including acquirement of resistance to pathogens. However, most orphan genes are of unknown function due to the difficulties in studying them using helpful comparative genomics. Here, we present a defense-related Oryza -specific orphan gene, Xio1 , specifically induced by the bacterial pathogen Xanthomonas oryzae pv. oryzae ( Xoo ) in an immune receptor XA21-dependent manner. Salicylic acid (SA) and ethephon (ET) also induced its expression, but methyl jasmonic acid (MeJA) reduced its basal expression. C-terminal green fluorescent protein (GFP) tagged Xio1 (Xio1-GFP) was visualized in the nucleus and the cytosol after polyethylene glycol (PEG)-mediated transformation in rice protoplasts and Agrobacterium-mediated infiltration in tobacco leaves. Transgenic rice plants overexpressing Xio1-GFP showed significantly enhanced resistance to Xoo with reduced lesion lengths and bacterial growth, in company with constitutive expression of defense-related genes. However, all of the transgenic plants displayed severe growth retardation and premature death. Reactive oxygen species (ROS) was significantly produced in rice protoplasts constitutively expressing Xio1-GFP. Overexpression of Xio1-GFP in non- Oryza plant species, Arabidopsis thaliana , failed to induce growth retardation and enhanced resistance to Pseudomonas syringae pv. tomato ( Pst ) DC3000. Our results suggest that the defense-related orphan gene Xio1 plays an important role in distinctive mechanisms evolved within the Oryza and provides a new source of Oryza -specific genes for crop-breeding programs.
... Such proteins, also known as orphans, comprise about 10-20% of animal genomes, and are suggested to be involved in the evolution of species-specific adaptive traits. 12 In other cases, the obtained rates profile is incomplete due to missing data, i.e. positions in the MSA that only have a few un-gapped amino-acids. Such local divergences in sequence can result from clade-specific insertion events, which act as major drivers of evolution in various species. ...
Article
Measuring evolutionary rates at the residue level is indispensable for gaining structural and functional insights into proteins. State-of-the-art tools for estimating rates take as input a large set of homologous proteins, a probabilistic model of evolution and a phylogenetic tree. However, a gap exists when only few or no homologous proteins can be found, e.g., orphan proteins. In addition, such tools do not take the three-dimensional (3D) structure of the protein into account. The association between the 3D structure and site-specific rates can be learned using machine-learning regression tools from a cohort of proteins for which both the structure and a large set of homologs exist. Here we present EvoRator, a user-friendly web server that implements a machine-learning regression algorithm to predict site-specific evolutionary rates from protein structures. We show that EvoRator outperforms predictions obtained using traditional physicochemical features, such as relative solvent accessibility and weighted contact number. We also demonstrate the application of EvoRator in three common scenarios that arise in protein evolution research: (1) orphan proteins for which no (or few) homologs exist; (2) When homologous sequences exist, our algorithm contrasts structure-based estimates of the evolutionary rates and the phylogeny-based estimates. This allows detecting sites that are likely conserved due to functional rather than structural constraints; (3) Algorithms that only rely on homologous sequence often fail to accurately measure the evolutionary rates of positions in gapped sequence alignments, which frequently occurs as a result of a clade-specific insertion. Our algorithm makes use of training data and known 3D structure of such gapped positions to predict their evolutionary rates. EvoRator is freely available for all users at: https://evorator.tau.ac.il/
... Several well characterized mechanisms like gene duplication, gene fusion, and horizontal gene transfer are responsible for the birth of new genes [41]. These new genes in turn contribute to species specific processes and generate morphological and physiological diversity [42]. Although non-deterministic processes produce genetic variation (on which natural selection acts), many adaptive traits can be exapted through modifications of already pre-existing characters [43]. ...
Article
Full-text available
Background Evolution can occur with surprising predictability when organisms face similar ecological challenges. For most traits, it is difficult to ascertain whether this occurs due to constraints imposed by the number of possible phenotypic solutions or because of parallel responses by shared genetic and regulatory architecture. Exceptionally, oral venoms are a tractable model of trait evolution, being largely composed of proteinaceous toxins that have evolved in many tetrapods, ranging from reptiles to mammals. Given the diversity of venomous lineages, they are believed to have evolved convergently, even though biochemically similar toxins occur in all taxa. Results Here, we investigate whether ancestral genes harbouring similar biochemical activity may have primed venom evolution, focusing on the origins of kallikrein-like serine proteases that form the core of most vertebrate oral venoms. Using syntenic relationships between genes flanking known toxins, we traced the origin of kallikreins to a single locus containing one or more nearby paralogous kallikrein-like clusters. Additionally, phylogenetic analysis of vertebrate serine proteases revealed that kallikrein-like toxins in mammals and reptiles are genetically distinct from non-toxin ones. Conclusions Given the shared regulatory and genetic machinery, these findings suggest that tetrapod venoms evolved by co-option of proteins that were likely already present in saliva. We term such genes ‘toxipotent’—in the case of salivary kallikreins they already had potent vasodilatory activity that was weaponized by venomous lineages. Furthermore, the ubiquitous distribution of kallikreins across vertebrates suggests that the evolution of envenomation may be more common than previously recognized, blurring the line between venomous and non-venomous animals.
... The general routes by which new genes originate include exon shuffling, gene duplication and subsequent divergence, retroposition, the transposition of mobile elements, lateral gene transfer, gene fusion and fission, and de novo origination (Long et al. 2003). De novo originated genes in a genome are usually present in the form of orphan genes (Khalturin et al. 2009). Orphan genes (or taxonomically restricted genes) are phylogenetically restricted, without detectable sequence similarity in the genomes of other organisms and do not encode any previously identified protein domains (Khalturin et al. 2009). ...
... De novo originated genes in a genome are usually present in the form of orphan genes (Khalturin et al. 2009). Orphan genes (or taxonomically restricted genes) are phylogenetically restricted, without detectable sequence similarity in the genomes of other organisms and do not encode any previously identified protein domains (Khalturin et al. 2009). Orphan genes generally have no introns, encode small proteins, undergo more rapid evolution than other genes in the same genome, and more likely to be expressed under environmental pressure than non-orphan genes (Guo et al. 2007). ...
Article
Full-text available
Background The rice ( Oryza sativa ) gene Xa7 has been hypothesized to be a typical executor resistance gene against Xanthomonas oryzae pv. oryzae ( Xoo ), and has conferred durable resistance in the field for decades. Its identity and the molecular mechanisms underlying this resistance remain elusive. Results Here, we filled in gaps of genome in Xa7 mapping locus via BAC library construction, revealing the presence of a 100-kb non-collinear sequence in the line IRBB7 compared with Nipponbare reference genomes. Complementary transformation with sequentially overlapping subclones of the BACs demonstrated that Xa7 is an orphan gene, encoding a small novel protein distinct from any other resistance proteins reported. A 27-bp effector binding element (EBE) in the Xa7 promoter is essential for AvrXa7-inducing expression model. XA7 is anchored in the endoplasmic reticulum membrane and triggers programmed cell death in rice and tobacco ( Nicotiana benthamiana ). The Xa7 gene is absent in most cultivars, landraces, and wild rice accessions, but highly homologs of XA7 were identified in Leersia perrieri , the nearest outgroup of the genus Oryza . Conclusions Xa7 acts as a trap to perceive AvrXa7 via EBE AvrXa7 in its promoter, leading to the initiation of resistant reaction. Since EBE AvrXa7 is ubiquitous in promoter of rice susceptible gene SWEET14 , the elevated expression of which is conducive to the proliferation of Xoo , that lends a great benefit for the Xoo strains retaining AvrXa7. As a result, varieties harboring Xa7 would show more durable resistance in the field. Xa7 alleles analysis suggests that the discovery of new resistance genes could be extended beyond wild rice, to include wild grasses such as Leersia species.
... Lineage-specific genes (LSGs) was first systematically proposed in 1996 when the sequencing of the yeast genome was completed (Dujon 1996). The possible sequence that cannot be found in any other species by homologous search is called LSGs of the species, sometimes called orphan genes or ORFans (Fischer and Eisenberg 1999), they typically make up to 10-20% of all genes in a genome (Khalturin et al. 2009). Since a growing number of genomes have been mapped, LSGs are found in all organisms, including plants (Campbell et al. 2007;Yang et al. 2009;Lin et al. 2010Lin et al. , 2014Donoghue et al. 2011), microorganisms (Wilson et al. 2005;Fischer 2006, 2008), insects (Domazet-Loso and Tautz 2003;Zhang et al. 2007;Johnson and Tsutsui 2011;Nicola et al. 2014) and primates (Toll-Riera and Albà 2009;Tay et al. 2009;Lindskog et al. 2014;Zhang 2014). ...
... Since a growing number of genomes have been mapped, LSGs are found in all organisms, including plants (Campbell et al. 2007;Yang et al. 2009;Lin et al. 2010Lin et al. , 2014Donoghue et al. 2011), microorganisms (Wilson et al. 2005;Fischer 2006, 2008), insects (Domazet-Loso and Tautz 2003;Zhang et al. 2007;Johnson and Tsutsui 2011;Nicola et al. 2014) and primates (Toll-Riera and Albà 2009;Tay et al. 2009;Lindskog et al. 2014;Zhang 2014). The functions of most of the LSGs are unknown, but are often related to species-specific biological characteristics and adaptations to the environment (Khalturin et al. 2009). High-quality annotated plant genome sequencing projects provide an unprecedented opportunity to explore the role of LSGs in plants during the adaptation to environment. ...
... LSGs acquire new biological functions when subjected to external environmental pressures during evolution, and the new functions make the species more adaptable to the external environment and are preserved, and become important genes during the evolutionary process (Long et al. 2003;Khalturin et al. 2009). The orphan gene QQS in Arabidopsis thaliana, for example, controls metabolic pathways that affect the separation of carbon and nitrogen in plants, thereby influencing the component ratios of protein to carbohydrate content in leaves (Li et al. 2015). ...
Article
Full-text available
Lineage-specific genes (LSGs) are the genes that have no recognizable homology to any sequences in other species, which are important drivers for the generation of new functions, phenotypic changes, and facilitating species adaptation to environment. Aegiceras corniculatum is one of major mangrove plant species adapted to waterlogging and saline conditions, and the exploration of aegiceras-specific genes (ASGs) is important to reveal its adaptation to the harsh environment. Here, we performed a systematic analysis on ASGs, focusing on their sequence characterization, origination and expression patterns. Our results reveal that there are 4823 ASGs in the genome, approximately 11.84% of all protein-coding genes. High proportion (45.78%) of ASGs originate from gene duplication, and the time of gene duplication of ASGs is consistent with the timing of two genome-wide replication (WGD) events that occurred in A. corniculatum, and also coincides with a short period of global warming during the Paleocene–Eocene Maximum (PETM, 55.5 million years ago). Gene structure analysis showed that ASGs have shorter protein lengths, fewer exons, and higher isoelectric point. Expression patterns analysis showed that ASGs had low levels of expression and more tissue-specific expression. Weighted gene co-expression network analysis (WGCNA) revealed that 86 ASGs co-expressed gene modules were primarily involved in pathways related to adversity stress, including plant hormone signal transduction, phenylpropanoid biosynthesis, photosynthesis, peroxisome and pentose phosphate pathway. This study provides a comprehensive analysis of the characteristics and potential functions of ASGs and identifies key candidate genes, which will contribute to the subsequent further investigation of the adaptation of A. corniculatum to intertidal coastal wetland habitats.
... Human genes in GenAge are sequence orthologs of aging-related genes in model species. Because not all human genes have sequence orthologs in model species [32,33], using GenAge may miss aspects of the aging process that are unique to the human species [34]. So, we consider another source of aging-related knowledge obtained by studying the human species directly (rather than doing so indirectly from model species), namely the down-regulated aging-related genes from genotype-tissue expression (GTEx) project [8], i.e., GTEx-DAG. ...
Article
Full-text available
Background This study focuses on the task of supervised prediction of aging-related genes from -omics data. Unlike gene expression methods for this task that capture aging-specific information but ignore interactions between genes (i.e., their protein products), or protein–protein interaction (PPI) network methods for this task that account for PPIs but the PPIs are context-unspecific , we recently integrated the two data types into an aging-specific PPI subnetwork, which yielded more accurate aging-related gene predictions. However, a dynamic aging-specific subnetwork did not improve prediction performance compared to a static aging-specific subnetwork, despite the aging process being dynamic. This could be because the dynamic subnetwork was inferred using a naive Induced subgraph approach. Instead, we recently inferred a dynamic aging-specific subnetwork using a methodologically more advanced notion of network propagation (NP), which improved upon Induced dynamic aging-specific subnetwork in a different task, that of unsupervised analyses of the aging process. Results Here, we evaluate whether our existing NP-based dynamic subnetwork will improve upon the dynamic as well as static subnetwork constructed by the Induced approach in the considered task of supervised prediction of aging-related genes. The existing NP-based subnetwork is unweighted, i.e., it gives equal importance to each of the aging-specific PPIs. Because accounting for aging-specific edge weights might be important, we additionally propose a weighted NP-based dynamic aging-specific subnetwork. We demonstrate that a predictive machine learning model trained and tested on the weighted subnetwork yields higher accuracy when predicting aging-related genes than predictive models run on the existing unweighted dynamic or static subnetworks, regardless of whether the existing subnetworks were inferred using NP or the Induced approach. Conclusions Our proposed weighted dynamic aging-specific subnetwork and its corresponding predictive model could guide with higher confidence than the existing data and models the discovery of novel aging-related gene candidates for future wet lab validation.