ArticlePDF Available

The Roles and Evolutionary Patterns of Intronless Genes in Deuterostomes

Authors:

Abstract and Figures

Genes without introns are a characteristic feature of prokaryotes, but there are still a number of intronless genes in eukaryotes. To study these eukaryotic genes that have prokaryotic architecture could help to understand the evolutionary patterns of related genes and genomes. Our analyses revealed a number of intronless genes that reside in 6 deuterostomes (sea urchin, sea squirt, zebrafish, chicken, platypus, and human). We also determined the conservation for each intronless gene in archaea, bacteria, fungi, plants, metazoans, and other eukaryotes. Proportions of intronless genes that are inherited from the common ancestor of archaea, bacteria, and eukaryotes in these species were consistent with their phylogenetic positions, with more proportions of ancient intronless genes residing in more primitive species. In these species, intronless genes belong to different cellular roles and gene ontology (GO) categories, and some of these functions are very basic. Part of intronless genes is derived from other intronless genes or multiexon genes in each species. In conclusion, we showed that a varying number and proportion of intronless genes reside in these 6 deuterostomes, and some of them function importantly. These genes are good candidates for subsequent functional and evolutionary analyses specifically.
Content may be subject to copyright.
Hindawi Publishing Corporation
Comparative and Functional Genomics
Volume 2011, Article ID 680673, 8pages
doi:10.1155/2011/680673
Research Article
The Roles and Evolutionary Patterns of Intronless
Genes in Deuterostomes
Ming Zou,1, 2 Baocheng Guo,3, 4 and Shunping He1
1The key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology,
Chinese Academy of Sciences, Wuhan 430072, China
2Institute of Hydrobiology, Graduate University of the Chinese Academy of Sciences, Beijing 100039, China
3Institute of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland
4The Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015 Lausanne, Switzerland
Correspondence should be addressed to Shunping He, heshunping@gmail.com
Received 23 July 2010; Revised 13 April 2011; Accepted 22 June 2011
Academic Editor: J. Peter W. Young
Copyright © 2011 Ming Zou et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Genes without introns are a characteristic feature of prokaryotes, but there are still a number of intronless genes in eukaryotes.
To study these eukaryotic genes that have prokaryotic architecture could help to understand the evolutionary patterns of related
genes and genomes. Our analyses revealed a number of intronless genes that reside in 6 deuterostomes (sea urchin, sea squirt,
zebrafish, chicken, platypus, and human). We also determined the conservation for each intronless gene in archaea, bacteria,
fungi, plants, metazoans, and other eukaryotes. Proportions of intronless genes that are inherited from the common ancestor
of archaea, bacteria, and eukaryotes in these species were consistent with their phylogenetic positions, with more proportions
of ancient intronless genes residing in more primitive species. In these species, intronless genes belong to dierent cellular roles
and gene ontology (GO) categories, and some of these functions are very basic. Part of intronless genes is derived from other
intronless genes or multiexon genes in each species. In conclusion, we showed that a varying number and proportion of intronless
genes reside in these 6 deuterostomes, and some of them function importantly. These genes are good candidates for subsequent
functional and evolutionary analyses specifically.
1. Introduction
Most eukaryotic genes are interrupted by one or more
noncoding sequences called introns, and intronless genes are
a characteristic feature of prokaryotes. However, researches
on intronless genes in eukaryotes have been reported over the
past few decades [14]. Many human genes, like G protein-
coupled receptor genes, are intronless [5]andthehuman
genome report identified 901 predicted intronless genes [6].
Recently, Tay et al. found that many single-copy primate-
specific human transcriptional units are single exon [7].
Moreover, Yang et al. found that species-specific genes in
Arabidopsis,Oryza, and Populus are enriched with intronless
genes [8]. A retrogene, which is formed by homologous
recombination between the genomic copy of a gene and an
cDNA [9], is also considered to be intronless, and it has been
reported that many retrogenes exist in eukaryotic genomes
[1012]. Intronless genes in eukaryotes, because of their
prokaryotic architecture, provide interesting datasets for
comparative genomics and evolutionary studies. Studying
these genes can help to understand the evolutionary patterns
of related genes and genomes. As a result, systematical
researches on intronless genes in many species from mam-
mals to plants have been reported [1318]. Several databases
of these single exon genes, such as SEGE [19] and Genome
SEGE [20], have been set up and are of important use
for evolutionary and functional studies. However, former
evolutionary researches on intronless genes have usually been
limited to 1 to 2 species and studies within a phylogenetic
framework are rare. With the development of sequencing
technology, more and more complete genomes have been
sequenced and annotated, which makes comprehensive com-
parative analysis on intronless genes possible. The present
study was designed to identify and analyse intronless genes in
2Comparative and Functional Genomics
6 deuterostomes, sea urchin (Strongylocentrotus purpura-
tus), sea squirt (Ciona intestinalis), zebrafish (Danio rerio),
chicken (Gallus gallus), platypus (Ornithorhynchus anat-
inus), and human (Homo sapiens), which were selected
because of their pivotal phylogenetic positions. We compared
the functions and conservation of these genes between and
within species in an attempt to gain some evolutionary
meaningful insights.
2. Materials and Methods
2.1. Data Source of Intronless Genes. The annotated genomes
(GenBank Flat File Format) of sea urchin, sea squirt, zebra-
fish, chicken, platypus, and human were downloaded from
the NCBI ftp server (ftp://ftp.ncbi.nih.gov/genomes/,10
Jun 2009). Using a customized Perl script, we extracted
protein sequences for all the intron and intronless genes
from each annotated genome. During our processing, a gene
was classified as intron-containing if the “CDS” line in the
FEATURES contains a “join”; otherwise, it was classified as
an intronless gene. Proteins that encoded by mitochondrial
genomes were removed. To avoid any ambiguity, proteins
encoded by genes which have the symbol “<”or“>”in
their annotation (“<” indicates partial on the 5end and “>
indicates partial on the 3end) were also discarded.
2.2. Functional Assignment and Category. ProtFun is an on-
line procedure designed to produce ab initio predictions of
protein functions from sequences and combines 14 dierent
sequence-based functional prediction methods. ProtFun
queries a large number of other feature prediction servers to
obtain information on various posttranslation and localisa-
tion aspects of the protein to predict protein function, rather
than relying on sequence similarity compared with other
protein function prediction procedures [21,22]. Therefore,
functional assignments of intronless genes in our study were
done with the webserver ProtFun (http://www.cbs.dtu.dk/
services/ProtFun/) and sequences were clustered according to
their cellular roles and gene ontology (GO) categories.
2.3. Distribution, Conservation, and Paralogue Identification
of Intronless Genes. Genes (both intronless and intron-
containing genes) in archaea, bacteria, fungi, plants, meta-
zoans, and other eukaryotes homologous with our intronless
genes (BLAST score more than 100), were determined on
the basis of sequence similarity using BLink (BLAST Link),
which is a tool that displays the precomputed results of
BLAST searches that have been completed for every protein
sequence in the Entrez proteins data domain [24]andis
available at NCBI.
CD-HIT is a program for clustering the entries in a
large protein database according to sequence identity (with a
high threshold of identity). CD-HIT can remove redundant
sequences and generate a database of only the representatives
[25]. To determine the conservative intronless genes among
the 6 deuterostome species in this study, we clustered all of
our intronless genes using CD-HIT. In order to determine
the relationships among these intronless genes, we clustered
them and identified nonredundant intronless genes in each
genome. We also clustered intron-containing genes to pro-
duce nonredundant multiexon genes in each genome. We
clustered these nonredundant intronless genes with nonre-
dundant multiexon genes in the same genome and produced
a list of corresponding intronless and intron-containing
genes to determine the relationships between intronless and
intron-containing genes. All these data handling were done
with CD-HIT.
3. Results and Discussion
3.1. Intronless Genes in Deuterostomes. Sea urchin, sea squirt,
zebrafish, chicken, platypus, and human were selected to
represent the major groups of deuterostomes and the intron-
less genes in their genomes were identified. Gi number and
protein sequence for each intronless gene in each species
were obtained from processing their annotated genomes. As
a result, there are abundant intronless genes in each of the
6 deuterostome genomes. The numbers of intronless genes
in each species is given in Tab le 1 and details are given in
supplementary material online at doi:10.1155/2011/680673.
Among the selected species, human has the maximum
number of intronless genes (6229) and platypus has the
least (930). We can see the maximum one is nearly seven
times the number of the least one. However, the dierence
among numbers of intronless genes in sea urchin (2482),
zebrafish (2169), chicken (1659), and sea squirt (1448) is not
significant and these numbers should increase and be more
accurate when their well-annotated genomes are available.
Since a few previous studies reported a bit lower numbers
of the number of intronless gene [16,20], we compared
protein numbers (encoded by intron and intronless genes)
from Ensemble (Tab l e 1) with ours, and found the former
was always larger. Compared to their numbers, proportions
that intronless genes are accounting for total genes do not
dier significantly, and the maximum one is about twice the
number of the least one (Tab le 1 ). In fact, former researches
reported that 11109, 5846, and 5085 intronless genes reside
in rice, Arabidopsis and mouse genomes, accounting for
19.9%, 21.7%, and 18.9% genes correspondingly [13]. Given
that the total gene numbers and annotation qualities between
species are dierent, these data may indicate that although
the number of intronless genes varied significantly between
species, the proportions that they account for total genes
are nearly constant. However, the number and percentages
of intronless genes do not correlate with their genome sizes
(P>0.6, |r|<0.3, Spearman’s test). The human genome has
the largest number of intronless genes, which might be due to
the following reasons. Firstly, human has the most complete
expression data, which could result in more annotated genes
compared with other species during the genome annotation
process. Secondly, the human genome has many more
retrogenes compared with other species [26,27]. Thirdly,
duplications of intronless genes are common in the human
genome (see later). Plenty of intronless genes exist in the
6 deuterostomes indicating they may play important roles
during deuterostome evolution. Earlier, Jain et al. found that
Comparative and Functional Genomics 3
Tab le 1: The C-values and statistics for genes (intron and intronless) in each species.
Species Sea-Urchin Sea-Squirt Zebrafish Chicken Platypus Human
C-value (pg) 0.89 0.20 1.75 1.25 3.06 3.5
N 19858 40585 22194 26836 88237
I2482 (8.6) 1448 (11.2) 2169 (8.6) 1659 (10.2) 930 (7.9) 6229 (16.7)
NR676 1263 856 1029 516 2290
NRI8792 8502 10620 9502 7840 12823
R∗∗ 621 (92) 110 (9) 274 (32) 156 (15) 86 (17) 1321 (58)
C∗∗ 212 (31) 191 (15) 269 (31) 186 (18) 133 (26) 665 (29)
C: values are obtained from http://www.genomesize.com/.
N: number of proteins encoded by intron and intronless genes, obtained from Ensemble (http://www.ensembl.org/index.html).
I: number of intronless genes in each genome, numbers in parentheses are percentages of intronless genes account for total genes.
NR: number of nonredundant intronless gene clusters.
NRI: number of nonredundant intron gene clusters.
R: number of nonredundant clusters that represent more than one intronless gene.
C: number of clusters that represent both a nonredundant intronless and a nonredundant intron-containing gene.
Clustered using CD-Hit (identity =0.3).
∗∗Numbers in parentheses are percentages they account for all nonredundant intronless gene clusters.
intronless genes have a strong bias towards encoding shorter
proteins [13]. Here we testified that the average length of
intronless genes is significantly shorter than multiexon genes
in all the selected species (P<0.001, Mann-Whitney Test).
The average length for intronless genes in sea urchin, sea
squirt, zebrafish, chicken, platypus, and human is 341.75 bp,
389.34 bp, 378.78 bp, 259.53 bp, 294.43 bp, 241.26 bp, and for
intron genes is 530.59 bp, 540.58 bp, 553.04 bp, 541.31 bp,
528.11 bp, and 503.31 bp, respectively.
Among the selected species, chromosomes were well as-
sembled in human, chicken, and zebrafish. To study the dis-
tribution of intronless genes in each selected genome, we
counted the numbers of intronless genes on each of their
chromosomes (Figure 1). Spearman’s test showed that the
number of intronless genes is significantly correlated with
the length of their chromosomes in human (P<0.001, r=
0.721) and chicken (P<0.001, r=0.712). The correlation
may be also significant in zebrafish (P=0.119, r=
0.320) given that nonparametric tests have less “power” to
detect a significant dierence. Therefore, we proved that the
distributions of intronless genes in human and chicken (and
maybe in zebrafish) are stochastic, just like previous studies
in mouse, rice, and Arabidopsis [13,18]. However, several
clusters of intronless genes exist in certain chromosomes and
some of these clusters have been reported. For example, the
olfactory receptor gene clusters on human chromosome 17
and odorant receptor genes in the zebrafish genome [28,29].
3.2. Functional Assignment of Intronless Genes. It has been
shown that the distribution of intronless human genes across
molecular function categories is nonrandom [17]. In order
to study the molecular function categories of intronless
genes in the 6 selected species, their cellular roles and GO
categories were predicted using ProtFun (available via web-
server http://www.cbs.dtu.dk/services/ProtFun/). Figure 2
shows the distribution of intronless genes among each
cellular role in 6 species. As in plants, intronless genes that
functionally belong to translation and energy metabolism
are the commonest in most species, followed by the cell
envelope and amino acid biosynthesis [13]. Furthermore, in
these 6 deuterostomes, transport and binding, followed by
regulatory functions and central intermediary metabolism,
are also well represented compared with other function
categories. The percentage of intronless genes with the same
cellular role among the total intronless genes varies sig-
nificantly between species (Figure 2). For example, 7% of
intronless genes are transported and binding in sea squirt
is significantly fewer than in other species. The number of
cellular roles, such as amino acid biosynthesis and central
intermediary metabolism, are quite similar in sea urchin,
sea squirt, zebrafish, and human. However, this is not the
case for chicken or platypus. GO categories can be assigned
to more than 70% of intronless genes except in sea squirt
(which is more than 60%), and the distribution of genes
according to each GO category is shown in Figure 3.As
in plants, proteins associated with the GO category growth
factor, transcription regulation, transport, immune response
and structural proteins are overrepresented in these species
[13]. Furthermore, proteins associated with the GO category
transcription, which might be dierent between plants and
animals, are well represented in deuterostomes. The percent-
age of total intronless genes that proteins with a certain GO
category, such as growth factor and transporter proteins,
varied significantly among these species. According to their
cellular roles and GO categories, the functional category
distribution of intronless genes in each selected genome
is very similar to those reported for rice and Arabidopsis
[13]. This result might indicate that biological mechanisms
related to intronless genes are common in the biological
kingdom. On the basis of earlier work and this analysis,
we concluded that most plant and deuterostome intronless
genes have the same characteristics, but deuterostomes still
have some lineage-specific and species-specific functional
intronless genes.
4Comparative and Functional Genomics
Human
Chromosome number
12
10
8
6
4
2
0
700
600
500
400
300
200
100
0
xy
2468
10 12 14 16 18 20 22
Number of genes
Genes (%)
(a)
Chick
Chromosome number
2 4 6 8 10 12 14 16 18 20 22
20
15
10
5
0
300
250
200
150
100
50
0
24 26 28wz
Number of genes
Genes (%)
(b)
Chromosome number
Zebrafish
7
6
5
4
3
2
1
0
200
150
100
50
0
24681012141618202224
Number of genes
Genes (%)
Genes (%)
Number of genes
(c)
Figure 1: The numbers of intronless genes on each chromosome in
human, chicken, and zebrafish. Both numbers and percentages are
shown.
3.3. Taxonomic Distribution. To study the evolutionary pat-
terns of intronless genes in major taxonomic groups, we used
BLink, a tool that displays the precomputed results of BLAST
searches for every protein sequence from the entrez proteins
data domain [24], to determine the evolutionarily conserved
proteins among dierent taxonomic groups (archaea, bac-
teria, fungi, metazoans, plants, and other eukaryotes). The
results of intronless gene clustering on the basis of homology
with each taxonomic group are given in Ta b le 2 ,andthis
will change as more genome sequences become available. We
divided these genes into 7 types of combination according
to their conservation among archaea (A), bacteria (B), and
eukaryote (E) and the distributions are shown in Figure 4.
Majority of intronless genes in each species that have homo-
logues only in eukaryotes (E) suggested that most intronless
0
5
10
15
20
25
30
35
AAB BOC CE CP CIM EM FAM PAP RF RAT T TAB
Cellular role
Genes (%)
su sq z
cph
Figure 2: Distribution of intronless genes among dierent cellular
roles in each species. Su: sea urchin; sq, sea squirt; z: zebrafish;
c:chicken;p:platypus;h:human.AAB:aminoacidbiosynthesis;
BOC: biosynthesis of cofactors; CE: cell envelope; CP: Cellular pro-
cesses; CIM: Central intermediary metabolism; EM: energy meta-
bolism; FAM: fatty acid metabolism; PAP: purines and pyrimidines;
RF: regulatory functions; RAT: replication and transcription; T:
translation; TAB: transport and binding.
0
5
10
15
20
25
30
ST R H SP T IC VGIC CC TR TRR SR IR GF MIT
GO category
su sq z
cph
Genes (%)
Figure 3: Distribution of intronless genes among dierent kinds of
gene ontology (GO) categories in each species. su: sea urchin; sq:
sea squirt; z: zebrafish; c: chicken; p: platypus; h: human. ST: signal
transducer; R: receptor; H: hormone; SP: structural protein; T:
transporter; IC: ion channel; VGIC: voltage-gated ion channel; CC:
cation channel; TR: transcription; TRR: transcription regulation;
SR: stress response; IR: immune response; GF: growth factor; MIT:
metal ions transport.
genes emerged after the eukaryotes diverged from prokary-
otes. Another important category of intronless genes is ABE,
in which intronless genes are conserved in all major biolog-
ical kingdoms, and these genes are considered to be func-
tionally important and evolved slowly [30]. Intronless genes
belonging to ABE account for 39% of the total intronless
genes in sea squirt and 30% in sea urchin, which together
Comparative and Functional Genomics 5
0
10
20
30
40
50
60
70
A B E AB AE BE ABE ORFans
Domain combination
su sq z
cph
Genes (%)
Figure 4: Distribution of intronless genes specific to dierent tax-
onomic group combinations for each species. su: sea urchin; sq: sea
squirt;z:zebrash;c:chicken;p:platypus;h:human.A:archaea;
B: bacteria; E: eukaryote; AB: archaea and bacteria; AE: archaea and
eukaryote; BE: bacteria and eukaryote; ABE: archaea, bacteria and
eukaryote; ORFans: homologs not found in other organisms.
form the first class. The second class contains zebrafish
(23%), chicken (22%), and the third class includes human
(16%) and platypus (14%). Given their phylogenetic posi-
tions, the first class is more primitive than the second
class, which is more primitive than the third class. These
data show that higher percentage of intronless genes in
primitive species are inherited from the common ancestor
of archaea, bacteria, and eukaryotes than in higher species.
More than 20% of intronless genes are conserved in bacteria
and eukaryotes (BE) in each species, but less than 5% are
conserved in archaea and eukaryotes (AE). This could be
because archaea have lost more homologues with eukaryotes
than bacteria, or because bacteria have obtained more.
Moreover, the percentage of genes conserved in bacteria
and eukaryotes that account for total intronless genes is
significantly higher in zebrafish and chicken than that in
other species, suggesting that these 2 species have a greater
percentage of intronless genes inherited from the common
ancestor of bacteria and eukaryotes. No gene is conserved in
archaea or/and bacteria except one human gene in bacteria
and this might be an example of lateral gene transfer (LGT)
from bacteria to human.
More than 30% of intronless genes are eukaryote specific
in all these species, especially in platypus (61.6%). To inves-
tigate their distributions in eukaryotic groups, we divided
these proteins according to their homogeneity in fungi (F),
metazoans (M), other eukaryotes (O), and plants (P) and
formed 15 types of combination (Figure 5). Generally, ma-
jority of genes have homologues in the combination MO
(metazoans and other eukaryotes) in each species except in
sea squirt, in which only 13% of eukaryote-specific intronless
genes are of this kind. Less than 10% of genes are metazoan-
specific (M) in many species, but inchicken and human there
are 26% and 29%, respectively. Genes conserved in fungi,
metazoans, other eukaryotes, and plants (FMOP), including
histones and ribosomal proteins, were thought to be very
0
10
20
30
40
50
60
F
M
O
P
FM
FO
FP
MO
MP
OP
FMO
FMP
FOP
MOP
FMOP
Genes (%)
su sq z
cph
Domain combination
Figure 5: Distribution of eukaryote-specific intronless genes spe-
cific to dierent eukaryotic taxonomic group combinations for
each species. su: sea urchin; sq: sea squirt; z: zebrafish; c: chicken; p:
platypus;h:human.F:fungi;M:metazoans;O:othereukaryotes;P:
plants; FM: fungi and metazoans; FO: fungi and other eukaryotes;
FP: fungi and plants; MO: metazoans and other eukaryotes; MP:
metazoans and plants; OP: other eukaryotes and plants; FMO:
fungi, metazoans and other eukaryotes; FMP: fungi, metazoans and
plants; FOP: fungi, other eukaryotes and plants; MOP: metazoan,
other eukaryotes and plants FMOP: fungi, metazoans, other eu-
karyotes and plants.
conservative because they are essential for the survival of all
eukaryotes [30]. The number of FMOP genes is very similar
in sea urchin (314), sea squirt (263) and zebrafish (228) but
the percentage of total eukaryote-specific intronless genes
is much greater in sea squirt (54.7%) than that in sea
urchin (27.8%) and zebrafish (26.9%). Except those cases
mentioned above, very few genes are conserved in other
taxonomic combinations. Moreover, some genes in some
species have homologues in fungi, plants or other eukaryotes
but not in metazoans. These genes might be examples of
lateral gene transfer (LGT) between eukaryotes, which has
been demonstrated recently [31,32]. Since fungi and plants
diverged from metazoan ahead of other eukaryotes, the
distribution pattern of eukaryote-specific intronless genes in
these species can be explained by that much more homologs
have been lost in fungi and plants plus lots of others
have been obtained after their divergence. However, lots of
essential genes (FMOP) were still preserved. Therefore, the
distribution pattern of eukaryote-specific intronless genes
and the gain and loss patterns in this work are in accord with
earlier reports [13,15,33].
The predicted cellular role of each kind of combination
is shown in the supplementary material. Amino acid biosyn-
thesis, cell envelope, energy metabolism, translation, trans-
port and binding are usually well represented. Furthermore,
the distribution of basic functional categories, such as amino
acid biosynthesis, energy metabolism, and translation, are
overrepresented in intronless genes conserved in all major
biological kingdoms (ABE) or all eukaryotic groups (FMOP)
compared to others.
6Comparative and Functional Genomics
Tab le 2: Number of intronless genes with homologousgenes in other taxonomic groups.
Taxonomic group Sea urchin Sea squirt Zebrafish Chicken Platypus Human
Archaea 764 625 524 385 143 1184
Bacteria 1303 873 1161 920 345 2524
Fungi 1492 1149 1067 817 413 2513
Plants 1480 1209 1070 809 441 2617
Metazoans 2447 1403 2027 1575 918 4807
Other eukaryotes 2337 1343 1897 1388 867 4055
ORFans 35 39 136 80 7 1416
ORFans [23]: homologues not found in other species.
AAB BOC CE CP CIM EM FAM PAP RF RAT T TAB
Cellular role
0
10
20
30
40
50
60
70
80
90
su sq z
cph
Genes (%)
Figure 6: Distribution of ORFans according to their functional
categories in each species. The description of functional categories
and species is the same as that given for Figure 2.
0
5
10
15
20
25
30
35
AAB BOC CE CP CIM EM FAM PAP RF RAT T TAB
Cellular role
Genes (%)
Figure 7: Functional distribution of conserved intronless genes in
vertebrates. The description of functional categories is same as that
given in Figure 2.
3.4. ORFans. The protein sequences that have no homologue
in other species are termed ORFans [23]. These proteins
could be responsible for some species-specific characteristics,
and most of these proteins might have evolved faster than
others [34]; in fact, they are part of the most interesting
genome content. Thus, it is important to experimentally
characterize these proteins or use more sensitive bioinfor-
matic approaches to understand their roles and functions
[15]. We found very few ORFans in these species except in
human (Tab le 2 ), about 22.7% of whose intronless genes are
ORFans, and this might be due to their complexities because
they are viviparous and mammalian. Most of these proteins
are annotated as hypothetical; however, majority of ORFans
in all species, except platypus, have mRNA or EST supports
when we checked their annotations. Thus, most of these
ORFans might not be misannotated. Figure 6 shows the pre-
dicted cellular role distribution of ORFans in each species. It
is interesting to note that translation and energy metabolism
are the most frequently represented cellular roles in these
species. The pattern is similar to earlier reports of plants
and human [13,15], suggesting that most species-specific
intronless genes in plants and animals have the same func-
tions and even the components of basic cellular machinery
might evolve to perform species-specific functions in all these
species [13,15]. Moreover, we found the cellular role of cell
envelope is well represented in sea urchin ORFans and more
than half of the intronless genes that have the cellular role of
fatty acid metabolism are ORFans.
3.5. Conserved Intronless Genes in Deuterostomes. To e x a m i n e
the conservation of intronless genes in deuterostomes, we
clustered them together for these 6 species using CD-Hit
(identity =0.3). Only 6 nonredundant sequences (NR) were
shared by these species, and they might perform pivotal
functions in deuterostomes. The predicted cellular roles were
translation for 4 NRs, regulatory function for one NR, and
energy metabolism for one NR. The GO categories of these
NRs were associated with transcription regulation, growth
factor, and transport. When we compared the shared NRs
between any two species, we found that sea urchin and sea
squirt have less than 100 shared NRs with other species, but
the number was more than 200 between any two vertebrate
species (data not shown), suggesting that significantly more
intronless genes are shared by vertebrates than those shared
by deuterostomes. As expected, we found 125 NRs shared
by vertebrates, and Figure 7 shows the distribution of the
predicted functions of these NRs. Most of these proteins
are involved in basic cellular processes, such as transport
and binding, cell envelope, and translation, and these genes
could be one of the important reasons for the emergence of
vertebrates.
3.6. Paralogues of Intronless Genes. Intronless genes in eukar-
yotic genomes have many origins other than inheritance
Comparative and Functional Genomics 7
from ancient prokaryotes, such as duplication (whole ge-
nome duplication or tandem duplication) of existing intron-
less genes and retroposition of intron-containing genes
(retroduplicated genes). Also, there is evidence that ancient
intronless genes were the origin of multiexon genes [35,36].
To investigate these latter patterns, we clustered intronless
genes and nonredundant intronless genes with multiexon
genes using CD-Hit (identity =0.3), and the results were
shown in Ta b le 1 . It shows that about 92% of sea urchin
nonredundant clusters have more than 1 intronless gene and
the value is 58% in human, 32% in zebrafish, and only 9%
in sea squirt. These data suggest that most intronless genes
may originate from other intronless genes in sea urchin,
human, and zebrafish, but much fewer intronless genes
havethesameorigininotherspecies.Ta bl e 1 also shows
the frequency of correspondence between nonredundant
intronless genes and nonredundant intron-containing genes.
About 30% of nonredundant intronless gene clusters have
corresponding nonredundant intron-containing genes in
each species, but in sea squirt and chicken, the proportion
is only 15% and 18%, respectively. This might be due to
the activity of LINE retrotransposable elements in their
genomes. Active LINE retrotransposons that can reversibly
transcribe polyadenylated mRNAs are thought to be the
main reason for the emergence of retrogenes [37,38]. More
than 20% of the human genome is composed of LINE
retrotransposable elements [39], and many studies have
suggested a high rate of retroposition in human [26,40,41],
which might result in the emergence of intronless genes.
In chicken, only about 8% of the genome is comprised of
the CR1 (chicken repeat 1) [42] and this kind of LINE-1 is
not thought to reversibly transcribe polyadenylated mRNAs
[42]. Majority of intronless genes that have intronless or
intron-containing homologs are associated with cellular roles
transport and binding, cell envelope, and translation. It
has long been believed that duplicated genes (including
retroduplicated genes) provide material for the evolution
of genes with new functions [43], but there is evidence
that retrogenes function as their parent genes during the
spermatogenesis X chromosome inactivation of meiosis in
mammals [12] and in the fruit fly [10]. Thus, the selective
advantage of retention of these duplicated intronless genes
might be that these genes can evolve new functions or
help to buer crucial functions similar to earlier reports on
duplicated genes in angiosperms [44].
4. Conclusion
Both this and earlier studies indicate that the evolutionary
patterns of intronless genes among deuterostomes, as well
as between deuterostomes and plants, have many common
characteristics and might be appropriate for all major
eukaryote kingdoms. However, there are still some lineage-
specific and species-specific characteristics on the evolution
of intronless genes, and this might be one of the reasons for
the existence of biodiversity in this world. As more genome
sequences are sequenced and more exhaustive and accurate
genes are annotated, the evolutionary patterns of intronless
genes will become clearer, providing insights into under-
standing the evolutionary mechanisms underlying gene or
genome evolution in eukaryotes.
Acknowledgments
The authors are thankful to four anonymous reviewers and
M. Yu for their critical reading of this manuscript and helpful
comments and suggestions that greatly improved the paper.
This work was supported by a Grant from the Major State
Basic Research Development Program of China (973 Pro-
gram, no. 2007CB411601).
References
[1] K. B. Gatermann, A. Homann, G. H. Rosenberg, and N. F.
Kaufer, “Introduction of functional artificial introns into the
naturally intronless ura4 gene of Schizosaccharomyces
pombe,” Molecular and Cellular Biology, vol. 9, no. 4, pp. 1526–
1535, 1989 (English).
[2] B. Bhandari, W. J. Roesler, K. D. DeLisio, D. J. Klemm, N. S.
Ross, and R. E. Miller, “A functional promoter flanks an in-
tronless glutamine synthetase gene,” Journal of Biological
Chemistry, vol. 266, no. 12, pp. 7784–7792, 1991 (English).
[3] A. V. Makeyev, A. N. Chkheidze, and S. A. Liebhaber, “A set
of highly conserved RNA-binding proteins, alpha CP-1 and
alpha CP-2, implicated in mRNA stabilization, are coexpressed
from an intronless gene and its intron-containing paralog,
Journal of Biological Chemistry, vol. 274, no. 35, pp. 24849–
24857, 1999 (English).
[4] A. Sugiyama, K. Noguchi, C. Kitanaka et al., “Molecular clon-
ing and chromosomal mapping of mouse intronless myc gene
acting as a potent apoptosis inducer,Gene, vol. 226, no. 2, pp.
273–283, 1999 (English).
[5] A. J. Gentles and S. Karlin, “Why are human G-protein-cou-
pled receptors predominantly intronless?” Trends in Gene t i c s ,
vol. 15, no. 2, pp. 47–49, 1999 (English).
[6] J. C. Venter, “the sequence of the human genome,”Science,vol.
292, no. 5507, pp. 1304–1351, 2001 (English).
[7] S. K. Tay, J. Blythe, and L. Lipovich, “Global discovery of pri-
mate-specific genes in the human genome,Proceedings of the
National Academy of Sciences of the United States of America,
vol. 106, no. 29, pp. 12019–12024, 2009 (English).
[8] X . Yan g , S . Ja w d y, T. J. Ts c h a pl in ski , an d G . A . Tu s k an, “Ge-
nome-wide identification of lineage-specific genes in Ara-
bidopsis, Oryza and Populus,” Genomics,vol.93,no.5,pp.
473–480, 2009 (English).
[9] G. R. Fink, “Pseudogenes in yeast?Cell, vol. 49, no. 1, pp. 5–6,
1987 (English).
[10] E. Betr´
an, K. Thornton, and M. Long, “Retroposed new genes
out of the X in Drosophila,Genome Research, vol. 12, no. 12,
pp. 1854–1859, 2002.
[11] Y. Zhang, Y. Wu, Y. Liu, and B. Han, “Computational identifi-
cation of 69 retroposons in Arabidopsis,Plant Physiology,vol.
138, no. 2, pp. 935–948, 2005.
[12] J. J. Emerson, H. Kaessmann, E. Betr´
an, and M. Long, “Exten-
sive gene trac on the mammalian X chromosome, Science,
vol. 303, no. 5657, pp. 537–540, 2004.
[13] M.Jain,P.Khurana,A.K.Tyagi,andJ.P.Khurana,“Genome-
wide analysis of intronless genes in rice and Arabidopsis,
Functional and Integrative Genomics, vol. 8, no. 1, pp. 69–78,
2008 (English).
8Comparative and Functional Genomics
[14] S. M. Agarwal, “Evolutionary rate variation in eukaryotic
lineage specific human intronless proteins,Biochemical and
Biophysical Research Communications, vol. 337, no. 4, pp.
1192–1197, 2005 (English).
[15] S. M. Agarwal and J. Gupta, “Comparative analysis of human
intronless proteins,Biochemical and Biophysical Research
Communications, vol. 331, no. 2, pp. 512–519, 2005 (English).
[16] M. K. Sakharkar et al., “Computational prediction of SEG
(single exon gene) function in humans,Frontiers in Bio-
science, vol. 10, pp. 1382–1395, 2005 (English).
[17] A. E. Hill and E. J. Sorscher, “The non-randomdistribution of
intronless human genes across molecular function categories,
FEBS Letters, vol. 580, no. 18, pp. 4303–4305, 2006 (English).
[18] K. R. Sakharkar, M. K. Sakharkar, C. T. Culiat, V. T. K.
Chow, and S. Pervaiz, “Functional and evolutionary analyses
on expressed intronless genes in the mouse genome,” FEBS
Letters, vol. 580, no. 5, pp. 1472–1478, 2006 (English).
[19] M. K. Sakharkar, P. Kangueane, D. A. Petrov, A. S. Kolaskar,
and S. Subbiah, “SEGE: a database on ‘intron less/single ex-
onic’ genes from eukaryotes,Bioinformatics,vol.18,no.9,pp.
1266–1267, 2002 (English).
[20] M. K. Sakharkar and P. Kangueane, “Genome SEGE: a
database for ‘intronless’ genes in eukaryotic genomes,” BMC
Bioinformatics, vol. 5, article 67, 2004 (English).
[21] M. Punta and Y. Ofran, “The rough guide to in silico function
prediction, or how to use sequence and structure information
to predict protein function,” PLoS Computational Biology,vol.
4, no. 10, Article ID e1000160, 2008 (English).
[22] L.J.Jensen,R.Gupta,N.Blometal.,“Predictionofhuman
protein function from post-translational modifications and
localization features,Journal of Molecular Biology, vol. 319,
no. 5, pp. 1257–1265, 2002 (English).
[23] D. Fischer and D. Eisenberg, “Finding families for genomic
ORFans,Bioinformatics, vol. 15, no. 9, pp. 759–762, 1999
(English).
[24] D. L. Wheeler, T. Barrett, D. A. Benson et al., “Database re-
sources of the National Center for Biotechnology Informa-
tion,Nucleic Acids Research, vol. 33, pp. D39–D45, 2005
(English).
[25] W. Li, L. Jaroszewski, and A. Godzik, “Clustering of highly
homologous sequences to reduce the size of large protein
databases,Bioinformatics, vol. 17, no. 3, pp. 282–283, 2001
(English).
[26] A. C. Marques, I. Dupanloup, N. Vinckenbosch, A. Reymond,
and H. Kaessmann, “Emergence of young human genes after
a burst of retroposition in primates,PLoS Biology,vol.3,no.
11, article e357, pp. 1970–1979, 2005.
[27] Z. Yu, D. Morais, M. Ivanga, and P. M. Harrison, “Analysis of
the role of retrotransposition in gene evolution in vertebrates,
BMC Bioinformatics, vol. 8, article 308, 2007.
[28] N. Ben-Arie, D. Lancet, C. Taylor et al., “Olfactory receptor
gene cluster on human chromosome 17: possible duplication
of an ancestral receptor repertoire,” Human Molecular Genet-
ics, vol. 3, no. 2, pp. 229–235, 1994 (English).
[29] J. C. Dugas and J. Ngai, “Analysis and characterization of
an odorant receptor gene cluster in the zebrafish genome,”
Genomics, vol. 71, no. 1, pp. 53–65, 2001 (English).
[30]I.KingJordan,I.B.Rogozin,Y.I.Wolf,andE.V.Koonin,
“Essential genes are more evolutionarily conserved than are
nonessential genes in bacteria,Genome Research,vol.12,no.
6, pp. 962–968, 2002 (English).
[31] R. Kamikawa, Y. Inagaki, and Y. Sako, “Direct phylogenetic
evidence for lateral transfer of elongation factor-like gene,
Proceedings of the National Academy of Sciences of the United
States of America, vol. 105, no. 19, pp. 6965–6969, 2008
(English).
[32] M. E. Rumpho, J. M. Worful, J. Lee et al., “Horizontal gene
transfer of the algal nuclear gene psbO to the photosynthetic
sea slug Elysia chlorotica,Proceedings of the National Academy
of Sciences of the United States of America, vol. 105, no. 46, pp.
17867–17871, 2008 (English).
[33] E. V. Koonin, N. D. Fedorova, J. D. Jackson et al., “A com-
prehensive evolutionary classification of proteins encoded in
complete eukaryotic genomes,” Genome biology,vol.5,no.2,
p. R7, 2004 (English).
[34] T. Domazet-Loso and D. Tautz, “An evolutionary analysis of
orphan genes in Drosophila,” Genome Research, vol. 13, no.
10, pp. 2213–2219, 2003 (English).
[35]A.Lecharny,N.Boudet,I.Gy,S.Aubourg,andM.Kreis,
“Introns in, introns out in plant gene families: a genomic ap-
proach of the dynamics of gene structure,” Journal of Structural
and Functional Genomics, vol. 3, no. 1–4, pp. 111–116, 2003
(English).
[36] N. Boudet, S. Aubourg, C. Toano-Nioche, M. Kreis, and
A. Lecharny, “Evolution of intron/exon structure of DEAD
helicase family genes in Arabidopsis, Caenorhabditis and Dro-
sophila,Genome Research, vol. 11, no. 12, pp. 2101–2114,
2001 (English).
[37] E. V. Gogvadze and A. A. Buzdin, “New mechanism of ret-
rogene formation in mammalian genomes: in vivo recom-
bination during RNA reverse transcription,Molekulyarnaya
Biologiya, vol. 39, no. 3, pp. 364–373, 2005 (Russian).
[38] C. Esnault, J. Maestre, and T. Heidmann, “Human LINE retro-
transposons generate processed pseudogenes,” Nature Genet-
ics, vol. 24, no. 4, pp. 363–367, 2000.
[39] E. S. Lander, L. M. Linton, B. Birren et al., “Initial sequencing
and analysis of the human genome,” Nature, vol. 409, no. 6822,
pp. 860–921, 2001 (English).
[40] Z. Zhang, P. M. Harrison, Y. Liu, and M. Gerstein, “Millions
of years of evolution preserved: a comprehensive catalog of
the processed pseudogenes in the human genome,Genome
Research, vol. 13, no. 12, pp. 2541–2558, 2003 (English).
[41] K. Ohshima, M. Hattori, T. Yada, T. Gojobori, Y. Sakaki,
and N. Okada, “Whole-genome screening indicates a possible
burst of formation of processed pseudogenes and Alu repeats
by particular L1 subfamilies in ancestral primates,Genome
Biology, vol. 4, no. 11, article R74, 2003.
[42] N.B.Haas,J.M.Grabowski,A.B.Sivitz,andJ.B.E.Burch,
“Chicken repeat 1 (CR1) elements, which define an ancient
family of vertebrate non-LTR retrotransposons, contain two
closely spaced open reading frames,” Gene, vol. 197, no. 1-2,
pp. 305–309, 1997.
[43] A. Wagner, “The fate of duplicated genes: loss or new func-
tion?” BioEssays, vol. 20, no. 10, pp. 785–788, 1998 (English).
[44] B.A.Chapman,J.E.Bowers,F.A.Feltus,andA.H.Paterson,
“Buering of crucial functions by paleologous duplicated
genes may contribute cyclicality to angiosperm genome dupli-
cation,Proceedings of the National Academy of Sciences of the
United States of America, vol. 103, no. 8, pp. 2730–2735, 2006
(English).
... These differences can seriously affect the ability of MsWRKY proteins to bind to W-box elements, which in turn affects the www.nature.com/scientificreports/ sequences from RNA into the genome), duplication of existing intron-free genes, and horizontal gene transfer 39 . Differences in the intron size of MsWRKY genes may result from gene duplication, inversion, and/or fusion events 40 . ...
Article
Full-text available
Miscanthus is an emerging sustainable bioenergy crop whose growing environment is subject to many abiotic and biological stresses. WRKY transcription factors play an important role in stress response and growth of biotic and abiotic. To clarify the distribution and expression of the WRKY genes in Miscanthus, it is necessary to classify and phylogenetically analyze the WRKY genes in Miscanthus. The v7.1 genome assembly of Miscanthus was analyzed by constructing an evolutionary tree. In Miscanthus, there are 179 WRKY genes were identified. The 179 MsWRKYs were classified into three groups with conserved gene structure and motif composition. The tissue expression profile of the WRKY genes showed that MsWRKY genes played an essential role in all growth stages of plants. At the early stage of plant development, the MsWRKY gene is mainly expressed in the rhizome of plants. In the middle stage, it is mainly expressed in the leaf. At the end stage, mainly in the stem. According to the results, it showed significant differences in the expression of the MsWRKY in different stages of Miscanthus sinensis. The results of the study contribute to a better understanding of the role of the MsWRKY gene in the growth and development of Miscanthus.
... Subsequently, duplication events may have occurred in pre-existing intronless genes. Lastly, intron-containing genes could have undergone retroposition (Zou et al., 2011). Research has shown that the similarities among Hsp70 family members are greater from different organisms than that from the same species in some cases (Lindquist and Craig, 1988). ...
Article
Aspongopus chinensis Dallas 1851, an insect of important economic value, faces challenges in artificial breeding due to mandatory diapause and limited access to wild resources. Heat shock proteins (Hsps) are thought to influence diapause in insects, but little is known about their role in A. chinensis during diapause. This study used genomic methods to identify 25 Hsp genes in A. chinensis, including two Hsp90, 14 Hsp70, four Hsp60 and five small Hsp genes, were located on seven chromosomes, respectively. The gene structures among the same families are relatively conserved. Meanwhile, the motif compositions and secondary structures of A. chinensis Hsps (AcHsps) were predicted. RNA-seq data and fluorescence quantitative PCR analysis showed that there were differences in the expression patterns of AcHsps in diapause and non-diapause stages, and AcHsp70-5 was significantly differentially expressed in both analysis, which was enriched in the pathway of response to hormone. All the results showed that Hsps play an important role in the diapause mechanism of A. chinensis. Our observations highlight the molecular evolution of the Hsp gene and their effect on diapause in A. chinensis.
... Various hypotheses have been suggested for the existence of intronless genes, including inheritance from ancient prokaryotes, gene duplications, retroposition, and other mechanisms (Yan et al. 2016). The intronless genes can effectively express and this may be a reason for the importance of the functional role of these genes in maintaining cell processes (Zou and Guo 2011;Gentles and Karlin 1999). Search for conserved functional domains showed that the CBD domain has repeated one to four times in some barley heveins, while this domain was associated with another domain called LYZ in some other. ...
Article
Full-text available
Heveins are one of the most important groups of plant antimicrobial peptides. So far, various roles in plant growth and development and in response to biotic and abiotic stresses have reported for heveins. The present study aimed to identify and characterize the hevein genes in two-row and six-row cultivars of barley. In total, thirteen hevein genes were identified in the genome of two-row and six-row cultivars of barley. The identified heveins were identical in two-row and six-row cultivars of barley and showed a high similarity with heveins from other plant species. The hevein coding sequences produced open reading frames (ORFs) ranged from 342 to 1002 bp. Most of the identified hevein genes were intronless, and the others had only one intron. The hevein ORFs produced proteins ranged from 113 to 333 amino acids. Search for conserved functional domains showed CBD and LYZ domains in barley heveins. All barley heveins comprised extracellular signal peptides ranged from 19 to 35 amino acids. The phylogenetic analysis divided barley heveins into two groups. The promoter analysis showed regulatory elements with different frequencies between two-row and six-row cultivars. These cis-acting elements included elements related to growth and development, hormone response, and environmental stresses. The expression analysis showed high expression level of heveins in root and reproductive organs of both two-row and six-row cultivars. The expression analysis also showed that barley heveins is induced by both biotic and abiotic stresses. The results of antimicrobial activity prediction showed the highest antimicrobial activity in CBD domain of barley heveins. The findings of the current study can improve our knowledge about the role of hevein genes in plant and can be used for future studies.
... These intronless genes indicate that these genes might be conserved in all major biological kingdoms, and these genes are considered to be functionally significant and evolved slowly, and the activity of LINE retrotransposable elements is thought to be the main reason for Fig. 10 A Inhibition of acetylcholinesterase activity assay using neostigmine bromide (Nst) B Endogenous acetylcholinesterase activity and C Relative gene expression of AChE gene in untreated and 150 mM salt treated tomato seedlings. Bars (mean ± SE, n = 3) with significant difference at p < 0.05 the emergence of retrogenes (Zou et al. 2011). In addition, some genes contain more than 10 introns (5.74%), indicating intron gain. ...
Article
Full-text available
In human, acetylcholinesterase (AChE) is a cholinergic enzyme involved in the hydrolysis of neurotransmitter acetylcholine (ACh) into its constituents, choline, and acetate. In plants, the biological functions of AChE are lacking and its existence has been recognized by indirect evidence of its activity. Therefore, in the present investigation, a systematic analysis of the AChE gene family in tomato was performed by integrating structural features, phylogenetic analysis, and its enzyme activity. Using the computational approach, we have identified 87 SlAChE genes containing GDSL lipase/acylhydrolase domain in tomato. In silico expression analysis of SlAChE genes showed up-and down regulation under salinity stress condition. The activity of the AChE enzyme was further confirmed using Ellman assay. Promoter analysis of SlAChE genes using PlantCARE showed the presence of several cis-acting elements including abiotic stress, light, and hormone regulatory elements. In silico screening indicated that tomato AChE homologs are widely distributed in plants. Syntenic analysis revealed several gene pairs between tomato and other species. Interestingly, the deduced amino acid sequence of human AChE showed no similarity with that of tomato AChE sequence. However, the binding energy of SlAChE enzyme to agonists and antagonists was almost identical to that of human AChE. This preliminary study of ChE-like activity in plants may open the way for additional research in non-neuronal role in plants. The studies provide a theoretical basis for further elucidating the functions of the AChE gene family at the molecular level.
... Nonetheless, distinct exon-intron structures have formed in several GRAS genes, indicating that they have likely acquired new specialized roles to adapt to their environment. According to previous studies, the plant GRAS gene family evolved from a prokaryotic genome by horizontal gene transfer, followed by duplication events 80 . Many TaGRAS genes have major outliers with more than 5 introns, demonstrating the TaGRAS gene high degree of divergence. ...
Article
Full-text available
The GRAS transcription factors are multifunctional proteins involved in various biological processes, encompassing plant growth, metabolism, and responses to both abiotic and biotic stresses. Wheat is an important cereal crop cultivated worldwide. However, no systematic study of the GRAS gene family and their functions under heat, drought, and salt stress tolerance and molecular dynamics modeling in wheat has been reported. In the present study, we identified the GRAS gene in Triticum aestivum through systematically performing gene structure analysis, chromosomal location, conserved motif, phylogenetic relationship, and expression patterns. A total of 177 GRAS genes were identified within the wheat genome. Based on phylogenetic analysis, these genes were categorically placed into 14 distinct subfamilies. Detailed analysis of the genetic architecture revealed that the majority of TaGRAS genes had no intronic regions. The expansion of the wheat GRAS gene family was proven to be influenced by both segmental and tandem duplication events. The study of collinearity events between TaGRAS and analogous orthologs from other plant species provided valuable insights into the evolution of the GRAS gene family in wheat. It is noteworthy that the promoter regions of TaGRAS genes consistently displayed an array of cis-acting elements that are associated with stress responses and hormone regulation. Additionally, we discovered 14 miRNAs that target key genes involved in three stress-responsive pathways in our study. Moreover, an assessment of RNA-seq data and qRT-PCR results revealed a significant increase in the expression of TaGRAS genes during abiotic stress. These findings highlight the crucial role of TaGRAS genes in mediating responses to different environmental stresses. Our research delved into the molecular dynamics and structural aspects of GRAS domain-DNA interactions, marking the first instance of such information being generated. Overall, the current findings contribute to our understanding of the organization of the GRAS genes in the wheat genome. Furthermore, we identified TaGRAS27 as a candidate gene for functional research, and to improve abiotic stress tolerance in the wheat by molecular breeding.
... Interestingly, we found that 52.5% of oat GRAS genes were intronless, indicating that the structure of GRAS genes is highly conserved in oat. It is hypothesized that a large number of intronless genes in plants originate from prokaryotes and are replicated in the plant genome to produce [67,68]. Another study reported that the GRAS gene originated in the bacterial genome [69], which may be the reason why the GRAS family contains a large number of intronless genes. ...
Article
Full-text available
The GRAS protein family is involved in plant growth and development, plant disease resistance, and abiotic stress response. Although the GRAS protein family has been systematically studied and reported in many plants, it has not been reported in oat, an excellent foodstuff crop of Gramineae. We identified 90 AsGRAS genes and all of the AsGRAS genes were randomly distributed on 21 chromosomes with 6 tandem duplicated genes and 49 pairs of segmental duplications, which may be the main reason for the expansion of the GRAS gene family. According to the phylogenetic tree, 90 AsGRASs were classified into 10 distinct subfamilies. Gene structure revealed introns varying from zero to seven, and all genes have conserved motifs and GRAS structure domain. Protein–protein interaction and miRNA prediction analysis showed that AsGRAS proteins mainly interacted with GA signalling, cell division, etc., and that the AsGRAS genes were targeted by miRNA171. RNA-seq and qRT–PCR data showed that GRAS genes were expressed at different growth and developmental stages and under different abiotic stresses in oat, indicating the potential role of GRAS genes in promoting growth and stress tolerance in oat. Overall, our evolutionary and expression analysis of AsGRAS genes contributes to the elucidation of a theoretical basis for the GRAS gene family. Moreover, it helped reveal gene function and laid the foundation for future agricultural improvement of oats based on functional properties.
... Several studies have found that their shorter evolutionary time of OGs is the main reason for shorter gene length (Neme and Tautz 2013;Ma et al. 2021). Zou et al. (2011) believed that intron-less genes may be the key reason for the existence of biological diversity and are related to some species-specific characteristics during species evolution. ...
Article
Full-text available
Key message Brassica orphan gene BrFLM, identified by two allelic mutants, was involved in leafy head formation in Chinese cabbage. Abstract Leafy head formation is a unique agronomic trait of Chinese cabbage that determines its yield and quality. In our previous study, an EMS mutagenesis Chinese cabbage mutant library was constructed using the heading Chinese cabbage double haploid (DH) line FT as the wild-type. Here, we screened two extremely similar leafy head deficiency mutants lfm-1 and lfm-2 with geotropic growth leaves from the library to investigate the gene(s) related to leafy head formation. Reciprocal crossing results showed that these two mutants were allelic. We utilized lfm-1 to identify the mutant gene(s). Genetic analysis showed that the mutated trait was controlled by a single nuclear gene Brlfm. Mutmap analysis showed that Brlfm was located on chromosome A05, and BraA05g012440.3C or BraA05g021450.3C were the candidate gene. Kompetitive allele-specific PCR analysis eliminated BraA05g012440.3C from the candidates. Sanger sequencing identified an SNP from G to A at the 271st nucleotide on BraA05g021450.3C. The sequencing of lfm-2 detected another non-synonymous SNP (G to A) located at the 266st nucleotide on BraA05g021450.3C, which verified its function on leafy head formation. We blasted BraA05g021450.3C on database and found that it belongs to a Brassica orphan gene encoding an unknown 13.74 kDa protein, named BrLFM. Subcellular localization showed that BrLFM was located in the nucleus. These findings reveal that BrLFM is involved in leafy head formation in Chinese cabbage.
Article
Full-text available
Zebra and quagga mussels ( Dreissena spp. ) are invasive freshwater biofoulers that perpetrate devastating economic and ecological impact. Their success depends on their ability to anchor onto substrates with protein-based fibers known as byssal threads. Yet, compared to other mussel lineages, little is understood about the proteins comprising their fibers or their evolutionary history. Here, we investigated the hierarchical protein structure of Dreissenid byssal threads and the process by which they are fabricated. Unique among bivalves, we found that threads possess a predominantly β -sheet crystalline structure reminiscent of spider silk. Further analysis revealed unexpectedly that the Dreissenid thread protein precursors are mechanoresponsive α -helical proteins that are mechanically processed into β -crystallites during thread formation. Proteomic analysis of the byssus secretory organ and byssus fibers revealed a family of ultrahigh molecular weight (354 to 467 kDa) asparagine-rich (19 to 20%) protein precursors predicted to form α -helical coiled coils. Moreover, several independent lines of evidence indicate that the ancestral predecessor of these proteins was likely acquired via horizontal gene transfer. This chance evolutionary event that transpired at least 12 Mya has endowed Dreissenids with a distinctive and effective fiber formation mechanism, contributing significantly to their success as invasive species and possibly, inspiring new materials design.
Article
Full-text available
The plant-specific family of GRAS transcription factors has been wide implicated in the regulation of transcriptional reprogramming associated with a diversity of biological functions ranging from plant development processes to stress responses. Functional analyses of GRAS transcription factors supported by in silico structural and comparative analyses are emerging and clarifying the regulatory networks associated with their biological roles. In this review, a detailed analysis of GRAS proteins´ structure and biochemical features as revealed by recent discoveries indicated how these characteristics may impact subcellular location, molecular mechanisms, and function. Nomenclature issues associated with GRAS classification into different subfamilies in diverse plant species even in the presence of robust genomic resources are discussed, in particular how it affects assumptions of biological function. Insights into the mechanisms driving evolution of this gene family and how genetic and epigenetic regulation of GRAS contributes to subfunctionalization are provided. Finally, this review debates challenges and future perspectives on the application of this complex but promising gene family for crop improvement to cope with challenges of environmental transition.
Preprint
Full-text available
Heveins are one of the most important groups of plant antimicrobial peptides. So far, various roles in plant growth and development and in response to biotic and abiotic stresses have reported for heveins. The present study aimed to identify and characterize the hevein genes in barley. In total, thirteen hevein genes identified in barley genome. The identified heveins showed a high similarity with heveins from other plant species in terms of structural and functional characteristics. The hevein coding sequences produced open reading frames (ORFs) ranged from 342 to 1002 bp. Most of the identified hevein genes were intronless, and the others had only one intron. The hevein ORFs produced proteins ranged from 113 to 333 amino acids. Search for conserved functional domains showed ChtBD1 and Lyz-like domains in barley heveins. All barley heveins comprised extracellular signal peptides ranged from 19 to 35 amino acids. The phylogenetic analysis divided barley heveins into two groups. The promoter analysis identified cis-acting elements related to growth and development, hormone response, and environmental stresses in the promoter of barley hevein genes. The expression analysis showed high expression level of heveins in root and reproductive organs of barley. The expression analysis also showed that barley heveins is induced by both biotic and abiotic stresses. The results of antimicrobial activity prediction showed the highest antimicrobial activity in ChtBD1 domain of barley heveins. The findings of the current study can improve our knowledge about the role of hevein genes in plant and can be used for future studies.
Article
Full-text available
We present a fast and flexible program for clustering large protein databases at different sequence identity levels. It takes less than 2 h for the all-against-all sequence comparison and clustering of the non-redundant protein database of over 560000 sequences on a high-end PC. The output database, including only the representative sequences, can be used for more efficient and sensitive database searches. Availability: The program is available from http://bioinformatics.burnham-inst.org/cd-hi Contact: liwz@sdsc.edu or adam@burnham-inst.org * To whom correspondence should be addressed.
Article
Full-text available
The sea slug Elysia chlorotica acquires plastids by ingestion of its algal food source Vaucheria litorea. Organelles are sequestered in the mollusc's digestive epithelium, where they photosynthesize for months in the absence of algal nucleocytoplasm. This is perplexing because plastid metabolism depends on the nuclear genome for >90% of the needed proteins. Two possible explanations for the persistence of photosynthesis in the sea slug are (i) the ability of V. litorea plastids to retain genetic autonomy and/or (ii) more likely, the mollusc provides the essential plastid proteins. Under the latter scenario, genes supporting photosynthesis have been acquired by the animal via horizontal gene transfer and the encoded proteins are retargeted to the plastid. We sequenced the plastid genome and confirmed that it lacks the full complement of genes required for photosynthesis. In support of the second scenario, we demonstrated that a nuclear gene of oxygenic photosynthesis, psbO, is expressed in the sea slug and has integrated into the germline. The source of psbO in the sea slug is V. litorea because this sequence is identical from the predator and prey genomes. Evidence that the transferred gene has integrated into sea slug nuclear DNA comes from the finding of a highly diverged psbO 3′ flanking sequence in the algal and mollusc nuclear homologues and gene absence from the mitochondrial genome of E. chlorotica. We demonstrate that foreign organelle retention generates metabolic novelty (“green animals”) and is explained by anastomosis of distinct branches of the tree of life driven by predation and horizontal gene transfer. • symbiosis • Vaucheria litorea • evolution • plastid • stramenopile
Article
Full-text available
Choosing the right function prediction tools The vast majority of known proteins have not yet been characterized experimentally, and there is very little that is known about their function. New unannotated sequences are added to the databases at a pace that far exceeds the one in which they are annotated in the lab. Computational biology offers tools that can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history, and their association with other proteins. In this contribution, we attempt to provide a framework that will enable biologists and computational biologists to decide which type of computational tool is appropriate for the analysis of their protein of interest, and what kind of insights into its function these tools can provide. In particular, we describe computational methods for predicting protein function directly from sequence or structure, focusing mainly on methods for predicting molecular function. We do not discuss methods that rely on sources of information that are beyond the protein itself, such as genomic context [1], protein–protein interaction networks [2], or membership in biochemical pathways [3]. When choosing a tool for function prediction, one would typically want to identify the best performing tool. However, a quantitative comparison of different tools is a tricky task. While most developers report their own assessment of their tool, in most cases there are no standard datasets and generally agreed-upon measures and criteria for benchmarking function prediction methods. In the absence of independent benchmarks, comparing the figures reported by the developers is almost always comparing oranges and apples (for discussion of this problem see [4]). Therefore, we refrain from reporting numerical assessments of specific methods. For those cases in which independent assessment of performance is available, we refer the reader to the original publications. Finally, we discuss only methods that are either accessible as Web servers or freely available for download (relevant Web links can be found in Table S1).
Article
The genomic basis of primate phenotypic uniqueness remains obscure, despite increasing genome and transcriptome sequence data availability. Although factors such as segmental duplications and positive selection have received much attention as potential drivers of primate phenotypes, single-copy primate-specific genes are poorly characterized. To discover such genes genomewide, we screened a catalog of 38,037 human transcriptional units (TUs), compiled from EST and cDNA sequences in conjunction with the FANTOM3 transcriptome project. We identified 131 TUs from transcribed sequences residing within primate-specific insertions in 9-species sequence alignments and outside of segmental duplications. Exons of 120 (92%) of the TUs contained interspersed repeats, indicating that repeat insertions may have contributed to primate-specific gene genesis. Fifty-nine (46%) primate-specific TUs may encode proteins. Although primate-specific TU transcript lengths were comparable to known human gene mRNA lengths overall, 92 (70%) primate-specific TUs were single-exon. Thirty-two (24%) primate-specific TUs were localized to subtelomeric and pericentromeric regions. Forty (31%) of the TUs were nested in introns of known genes, indicating that primate-specific TUs may arise within older, protein-coding regions. Primate-specific TUs were preferentially expressed in reproductive organs and tissues (P < 0.011), consistent with the expectation that emergence of new, lineage-specific genes may accompany speciation or reproduction. Of the 33 primate-specific TUs with human Affymetrix microarray probe support, 21 were differentially expressed in human teratozoospermia. In addition to elucidating the likely functional relevance of primate-specific TUs to reproduction, we present a set of primate-specific genes for future functional studies, and we implicate nonduplicated pericentromeric and subtelomeric regions in gene genesis.
Article
Protein sequences were compared among Arabidopsis, Oryza and Populus to identify differential gene (DG) sets that are in one but not the other two genomes. The DG sets were screened against a plant transcript database, the NR protein database and six newly-sequenced genomes (Carica, Glycine, Medicago, Sorghum, Vitis and Zea) to identify a set of species-specific genes (SS). Gene expression, protein motif and intron number were examined. 165, 638 and 109 SS genes were identified in Arabidopsis, Oryza and Populus, respectively. Some SS genes were preferentially expressed in flowers, roots, xylem and cambium or up-regulated by stress. Six conserved motifs in Arabidopsis and OryzaSS proteins were found in other distant lineages. The SS gene sets were enriched with intronless genes. The results reflect functional and/or anatomical differences between monocots and eudicots or between herbaceous and woody plants. The Populus-specific genes are candidates for carbon sequestration and biofuel research.