ArticlePDF Available

The Roles and Evolutionary Patterns of Intronless Genes in Deuterostomes

January 2011
Comparative and Functional Genomics 2011(4):680673

January 2011
2011(4):680673

DOI:10.1155/2011/680673

Source
PubMed

License
CC BY

Authors:

Ming Zou

Huazhong Agricultural University

Baocheng Guo

Chinese Academy of Sciences

Shunping He

Chinese Academy of Sciences

Genes without introns are a characteristic feature of prokaryotes, but there are still a number of intronless genes in eukaryotes. To study these eukaryotic genes that have prokaryotic architecture could help to understand the evolutionary patterns of related genes and genomes. Our analyses revealed a number of intronless genes that reside in 6 deuterostomes (sea urchin, sea squirt, zebrafish, chicken, platypus, and human). We also determined the conservation for each intronless gene in archaea, bacteria, fungi, plants, metazoans, and other eukaryotes. Proportions of intronless genes that are inherited from the common ancestor of archaea, bacteria, and eukaryotes in these species were consistent with their phylogenetic positions, with more proportions of ancient intronless genes residing in more primitive species. In these species, intronless genes belong to different cellular roles and gene ontology (GO) categories, and some of these functions are very basic. Part of intronless genes is derived from other intronless genes or multiexon genes in each species. In conclusion, we showed that a varying number and proportion of intronless genes reside in these 6 deuterostomes, and some of them function importantly. These genes are good candidates for subsequent functional and evolutionary analyses specifically.

: Number of intronless genes with homologous genes in other taxonomic groups.

…

The numbers of intronless genes on each chromosome in human, chicken, and zebrafish. Both numbers and percentages are shown.

…

Distribution of intronless genes among different cellular roles in each species. Su: sea urchin; sq, sea squirt; z: zebrafish; c: chicken; p: platypus; h: human. AAB: amino acid biosynthesis; BOC: biosynthesis of cofactors; CE: cell envelope; CP: Cellular processes; CIM: Central intermediary metabolism; EM: energy metabolism; FAM: fatty acid metabolism; PAP: purines and pyrimidines; RF: regulatory functions; RAT: replication and transcription; T: translation; TAB: transport and binding.

…

Distribution of intronless genes among different kinds of gene ontology (GO) categories in each species. su: sea urchin; sq: sea squirt; z: zebrafish; c: chicken; p: platypus; h: human. ST: signal transducer; R: receptor; H: hormone; SP: structural protein; T: transporter; IC: ion channel; VGIC: voltage-gated ion channel; CC: cation channel; TR: transcription; TRR: transcription regulation; SR: stress response; IR: immune response; GF: growth factor; MIT: metal ions transport.

…

Distribution of intronless genes specific to different taxonomic group combinations for each species. su: sea urchin; sq: sea squirt; z: zebrafish; c: chicken; p: platypus; h: human. A: archaea; B: bacteria; E: eukaryote; AB: archaea and bacteria; AE: archaea and eukaryote; BE: bacteria and eukaryote; ABE: archaea, bacteria and eukaryote; ORFans: homologs not found in other organisms.

…

Figures - available via license: CC BY

Content may be subject to copyright.

Access to this full-text is provided by Hindawi.

Learn more

Content available from Comparative and Functional Genomics

This content is subject to copyright. Terms and conditions apply.

Hindawi Publishing Corporation

Comparative and Functional Genomics

Volume 2011, Article ID 680673, 8pages

doi:10.1155/2011/680673

Research Article

The Roles and Evolutionary Patterns of Intronless

Genes in Deuterostomes

Ming Zou,1, 2 Baocheng Guo,3, 4 and Shunping He1

1The key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology,

Chinese Academy of Sciences, Wuhan 430072, China

2Institute of Hydrobiology, Graduate University of the Chinese Academy of Sciences, Beijing 100039, China

3Institute of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland

4The Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015 Lausanne, Switzerland

Correspondence should be addressed to Shunping He, heshunping@gmail.com

Received 23 July 2010; Revised 13 April 2011; Accepted 22 June 2011

Academic Editor: J. Peter W. Young

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Genes without introns are a characteristic feature of prokaryotes, but there are still a number of intronless genes in eukaryotes.

To study these eukaryotic genes that have prokaryotic architecture could help to understand the evolutionary patterns of related

genes and genomes. Our analyses revealed a number of intronless genes that reside in 6 deuterostomes (sea urchin, sea squirt,

zebraﬁsh, chicken, platypus, and human). We also determined the conservation for each intronless gene in archaea, bacteria,

fungi, plants, metazoans, and other eukaryotes. Proportions of intronless genes that are inherited from the common ancestor

of archaea, bacteria, and eukaryotes in these species were consistent with their phylogenetic positions, with more proportions

of ancient intronless genes residing in more primitive species. In these species, intronless genes belong to diﬀerent cellular roles

and gene ontology (GO) categories, and some of these functions are very basic. Part of intronless genes is derived from other

intronless genes or multiexon genes in each species. In conclusion, we showed that a varying number and proportion of intronless

genes reside in these 6 deuterostomes, and some of them function importantly. These genes are good candidates for subsequent

functional and evolutionary analyses speciﬁcally.

1. Introduction

Most eukaryotic genes are interrupted by one or more

noncoding sequences called introns, and intronless genes are

a characteristic feature of prokaryotes. However, researches

on intronless genes in eukaryotes have been reported over the

past few decades [1–4]. Many human genes, like G protein-

coupled receptor genes, are intronless [5]andthehuman

genome report identiﬁed 901 predicted intronless genes [6].

Recently, Tay et al. found that many single-copy primate-

speciﬁc human transcriptional units are single exon [7].

Moreover, Yang et al. found that species-speciﬁc genes in

Arabidopsis,Oryza, and Populus are enriched with intronless

genes [8]. A retrogene, which is formed by homologous

recombination between the genomic copy of a gene and an

cDNA [9], is also considered to be intronless, and it has been

reported that many retrogenes exist in eukaryotic genomes

[10–12]. Intronless genes in eukaryotes, because of their

prokaryotic architecture, provide interesting datasets for

comparative genomics and evolutionary studies. Studying

these genes can help to understand the evolutionary patterns

of related genes and genomes. As a result, systematical

researches on intronless genes in many species from mam-

mals to plants have been reported [13–18]. Several databases

of these single exon genes, such as SEGE [19] and Genome

SEGE [20], have been set up and are of important use

for evolutionary and functional studies. However, former

evolutionary researches on intronless genes have usually been

limited to 1 to 2 species and studies within a phylogenetic

framework are rare. With the development of sequencing

technology, more and more complete genomes have been

sequenced and annotated, which makes comprehensive com-

parative analysis on intronless genes possible. The present

study was designed to identify and analyse intronless genes in

2Comparative and Functional Genomics

6 deuterostomes, sea urchin (Strongylocentrotus purpura-

tus), sea squirt (Ciona intestinalis), zebraﬁsh (Danio rerio),

chicken (Gallus gallus), platypus (Ornithorhynchus anat-

inus), and human (Homo sapiens), which were selected

because of their pivotal phylogenetic positions. We compared

the functions and conservation of these genes between and

within species in an attempt to gain some evolutionary

meaningful insights.

2. Materials and Methods

2.1. Data Source of Intronless Genes. The annotated genomes

(GenBank Flat File Format) of sea urchin, sea squirt, zebra-

ﬁsh, chicken, platypus, and human were downloaded from

the NCBI ftp server (ftp://ftp.ncbi.nih.gov/genomes/,10

Jun 2009). Using a customized Perl script, we extracted

protein sequences for all the intron and intronless genes

from each annotated genome. During our processing, a gene

was classiﬁed as intron-containing if the “CDS” line in the

FEATURES contains a “join”; otherwise, it was classiﬁed as

an intronless gene. Proteins that encoded by mitochondrial

genomes were removed. To avoid any ambiguity, proteins

encoded by genes which have the symbol “<”or“>”in

their annotation (“<” indicates partial on the 5end and “>”

indicates partial on the 3end) were also discarded.

2.2. Functional Assignment and Category. ProtFun is an on-

line procedure designed to produce ab initio predictions of

protein functions from sequences and combines 14 diﬀerent

sequence-based functional prediction methods. ProtFun

queries a large number of other feature prediction servers to

obtain information on various posttranslation and localisa-

tion aspects of the protein to predict protein function, rather

than relying on sequence similarity compared with other

protein function prediction procedures [21,22]. Therefore,

functional assignments of intronless genes in our study were

done with the webserver ProtFun (http://www.cbs.dtu.dk/

services/ProtFun/) and sequences were clustered according to

their cellular roles and gene ontology (GO) categories.

2.3. Distribution, Conservation, and Paralogue Identiﬁcation

of Intronless Genes. Genes (both intronless and intron-

containing genes) in archaea, bacteria, fungi, plants, meta-

zoans, and other eukaryotes homologous with our intronless

genes (BLAST score more than 100), were determined on

the basis of sequence similarity using BLink (BLAST Link),

which is a tool that displays the precomputed results of

BLAST searches that have been completed for every protein

sequence in the Entrez proteins data domain [24]andis

available at NCBI.

CD-HIT is a program for clustering the entries in a

large protein database according to sequence identity (with a

high threshold of identity). CD-HIT can remove redundant

sequences and generate a database of only the representatives

[25]. To determine the conservative intronless genes among

the 6 deuterostome species in this study, we clustered all of

our intronless genes using CD-HIT. In order to determine

the relationships among these intronless genes, we clustered

them and identiﬁed nonredundant intronless genes in each

genome. We also clustered intron-containing genes to pro-

duce nonredundant multiexon genes in each genome. We

clustered these nonredundant intronless genes with nonre-

dundant multiexon genes in the same genome and produced

a list of corresponding intronless and intron-containing

genes to determine the relationships between intronless and

intron-containing genes. All these data handling were done

with CD-HIT.

3. Results and Discussion

3.1. Intronless Genes in Deuterostomes. Sea urchin, sea squirt,

zebraﬁsh, chicken, platypus, and human were selected to

represent the major groups of deuterostomes and the intron-

less genes in their genomes were identiﬁed. Gi number and

protein sequence for each intronless gene in each species

were obtained from processing their annotated genomes. As

a result, there are abundant intronless genes in each of the

6 deuterostome genomes. The numbers of intronless genes

in each species is given in Tab le 1 and details are given in

supplementary material online at doi:10.1155/2011/680673.

Among the selected species, human has the maximum

number of intronless genes (6229) and platypus has the

least (930). We can see the maximum one is nearly seven

times the number of the least one. However, the diﬀerence

among numbers of intronless genes in sea urchin (2482),

zebraﬁsh (2169), chicken (1659), and sea squirt (1448) is not

signiﬁcant and these numbers should increase and be more

accurate when their well-annotated genomes are available.

Since a few previous studies reported a bit lower numbers

of the number of intronless gene [16,20], we compared

protein numbers (encoded by intron and intronless genes)

from Ensemble (Tab l e 1) with ours, and found the former

was always larger. Compared to their numbers, proportions

that intronless genes are accounting for total genes do not

diﬀer signiﬁcantly, and the maximum one is about twice the

number of the least one (Tab le 1 ). In fact, former researches

reported that 11109, 5846, and 5085 intronless genes reside

in rice, Arabidopsis and mouse genomes, accounting for

19.9%, 21.7%, and 18.9% genes correspondingly [13]. Given

that the total gene numbers and annotation qualities between

species are diﬀerent, these data may indicate that although

the number of intronless genes varied signiﬁcantly between

species, the proportions that they account for total genes

are nearly constant. However, the number and percentages

of intronless genes do not correlate with their genome sizes

(P>0.6, |r|<0.3, Spearman’s test). The human genome has

the largest number of intronless genes, which might be due to

the following reasons. Firstly, human has the most complete

expression data, which could result in more annotated genes

compared with other species during the genome annotation

process. Secondly, the human genome has many more

retrogenes compared with other species [26,27]. Thirdly,

duplications of intronless genes are common in the human

genome (see later). Plenty of intronless genes exist in the

6 deuterostomes indicating they may play important roles

during deuterostome evolution. Earlier, Jain et al. found that

Comparative and Functional Genomics 3

Tab le 1: The C-values and statistics for genes (intron and intronless) in each species.

Species Sea-Urchin Sea-Squirt Zebraﬁsh Chicken Platypus Human

C-value (pg) 0.89 0.20 1.75 1.25 3.06 3.5

N— 19858 40585 22194 26836 88237

I2482 (8.6) 1448 (11.2) 2169 (8.6) 1659 (10.2) 930 (7.9) 6229 (16.7)

NR∗676 1263 856 1029 516 2290

NRI∗8792 8502 10620 9502 7840 12823

R∗∗ 621 (92) 110 (9) 274 (32) 156 (15) 86 (17) 1321 (58)

C∗∗ 212 (31) 191 (15) 269 (31) 186 (18) 133 (26) 665 (29)

C: values are obtained from http://www.genomesize.com/.

N: number of proteins encoded by intron and intronless genes, obtained from Ensemble (http://www.ensembl.org/index.html).

I: number of intronless genes in each genome, numbers in parentheses are percentages of intronless genes account for total genes.

NR: number of nonredundant intronless gene clusters.

NRI: number of nonredundant intron gene clusters.

R: number of nonredundant clusters that represent more than one intronless gene.

C: number of clusters that represent both a nonredundant intronless and a nonredundant intron-containing gene.

∗Clustered using CD-Hit (identity =0.3).

∗∗Numbers in parentheses are percentages they account for all nonredundant intronless gene clusters.

intronless genes have a strong bias towards encoding shorter

proteins [13]. Here we testiﬁed that the average length of

intronless genes is signiﬁcantly shorter than multiexon genes

in all the selected species (P<0.001, Mann-Whitney Test).

The average length for intronless genes in sea urchin, sea

squirt, zebraﬁsh, chicken, platypus, and human is 341.75 bp,

389.34 bp, 378.78 bp, 259.53 bp, 294.43 bp, 241.26 bp, and for

intron genes is 530.59 bp, 540.58 bp, 553.04 bp, 541.31 bp,

528.11 bp, and 503.31 bp, respectively.

Among the selected species, chromosomes were well as-

sembled in human, chicken, and zebraﬁsh. To study the dis-

tribution of intronless genes in each selected genome, we

counted the numbers of intronless genes on each of their

chromosomes (Figure 1). Spearman’s test showed that the

number of intronless genes is signiﬁcantly correlated with

the length of their chromosomes in human (P<0.001, r=

0.721) and chicken (P<0.001, r=0.712). The correlation

may be also signiﬁcant in zebraﬁsh (P=0.119, r=

0.320) given that nonparametric tests have less “power” to

detect a signiﬁcant diﬀerence. Therefore, we proved that the

distributions of intronless genes in human and chicken (and

maybe in zebraﬁsh) are stochastic, just like previous studies

in mouse, rice, and Arabidopsis [13,18]. However, several

clusters of intronless genes exist in certain chromosomes and

some of these clusters have been reported. For example, the

olfactory receptor gene clusters on human chromosome 17

and odorant receptor genes in the zebraﬁsh genome [28,29].

3.2. Functional Assignment of Intronless Genes. It has been

shown that the distribution of intronless human genes across

molecular function categories is nonrandom [17]. In order

to study the molecular function categories of intronless

genes in the 6 selected species, their cellular roles and GO

categories were predicted using ProtFun (available via web-

server http://www.cbs.dtu.dk/services/ProtFun/). Figure 2

shows the distribution of intronless genes among each

cellular role in 6 species. As in plants, intronless genes that

functionally belong to translation and energy metabolism

are the commonest in most species, followed by the cell

envelope and amino acid biosynthesis [13]. Furthermore, in

these 6 deuterostomes, transport and binding, followed by

regulatory functions and central intermediary metabolism,

are also well represented compared with other function

categories. The percentage of intronless genes with the same

cellular role among the total intronless genes varies sig-

niﬁcantly between species (Figure 2). For example, 7% of

intronless genes are transported and binding in sea squirt

is signiﬁcantly fewer than in other species. The number of

cellular roles, such as amino acid biosynthesis and central

intermediary metabolism, are quite similar in sea urchin,

sea squirt, zebraﬁsh, and human. However, this is not the

case for chicken or platypus. GO categories can be assigned

to more than 70% of intronless genes except in sea squirt

(which is more than 60%), and the distribution of genes

according to each GO category is shown in Figure 3.As

in plants, proteins associated with the GO category growth

factor, transcription regulation, transport, immune response

and structural proteins are overrepresented in these species

[13]. Furthermore, proteins associated with the GO category

transcription, which might be diﬀerent between plants and

animals, are well represented in deuterostomes. The percent-

age of total intronless genes that proteins with a certain GO

category, such as growth factor and transporter proteins,

varied signiﬁcantly among these species. According to their

cellular roles and GO categories, the functional category

distribution of intronless genes in each selected genome

is very similar to those reported for rice and Arabidopsis

[13]. This result might indicate that biological mechanisms

related to intronless genes are common in the biological

kingdom. On the basis of earlier work and this analysis,

we concluded that most plant and deuterostome intronless

genes have the same characteristics, but deuterostomes still

have some lineage-speciﬁc and species-speciﬁc functional

intronless genes.

4Comparative and Functional Genomics

Human

Chromosome number

700

600

500

400

300

200

100

2468

10 12 14 16 18 20 22

Number of genes

Genes (%)

(a)

Chick

Chromosome number

2 4 6 8 10 12 14 16 18 20 22

300

250

200

150

100

24 26 28wz

Number of genes

Genes (%)

(b)

Chromosome number

Zebraﬁsh

200

150

100

24681012141618202224

Number of genes

Genes (%)

Number of genes

(c)

Figure 1: The numbers of intronless genes on each chromosome in

human, chicken, and zebraﬁsh. Both numbers and percentages are

shown.

3.3. Taxonomic Distribution. To study the evolutionary pat-

terns of intronless genes in major taxonomic groups, we used

BLink, a tool that displays the precomputed results of BLAST

searches for every protein sequence from the entrez proteins

data domain [24], to determine the evolutionarily conserved

proteins among diﬀerent taxonomic groups (archaea, bac-

teria, fungi, metazoans, plants, and other eukaryotes). The

results of intronless gene clustering on the basis of homology

with each taxonomic group are given in Ta b le 2 ,andthis

will change as more genome sequences become available. We

divided these genes into 7 types of combination according

to their conservation among archaea (A), bacteria (B), and

eukaryote (E) and the distributions are shown in Figure 4.

Majority of intronless genes in each species that have homo-

logues only in eukaryotes (E) suggested that most intronless

AAB BOC CE CP CIM EM FAM PAP RF RAT T TAB

Cellular role

Genes (%)

su sq z

cph

Figure 2: Distribution of intronless genes among diﬀerent cellular

roles in each species. Su: sea urchin; sq, sea squirt; z: zebraﬁsh;

c:chicken;p:platypus;h:human.AAB:aminoacidbiosynthesis;

BOC: biosynthesis of cofactors; CE: cell envelope; CP: Cellular pro-

cesses; CIM: Central intermediary metabolism; EM: energy meta-

bolism; FAM: fatty acid metabolism; PAP: purines and pyrimidines;

RF: regulatory functions; RAT: replication and transcription; T:

translation; TAB: transport and binding.

ST R H SP T IC VGIC CC TR TRR SR IR GF MIT

GO category

su sq z

cph

Genes (%)

Figure 3: Distribution of intronless genes among diﬀerent kinds of

gene ontology (GO) categories in each species. su: sea urchin; sq:

sea squirt; z: zebraﬁsh; c: chicken; p: platypus; h: human. ST: signal

transducer; R: receptor; H: hormone; SP: structural protein; T:

transporter; IC: ion channel; VGIC: voltage-gated ion channel; CC:

cation channel; TR: transcription; TRR: transcription regulation;

SR: stress response; IR: immune response; GF: growth factor; MIT:

metal ions transport.

genes emerged after the eukaryotes diverged from prokary-

otes. Another important category of intronless genes is ABE,

in which intronless genes are conserved in all major biolog-

ical kingdoms, and these genes are considered to be func-

tionally important and evolved slowly [30]. Intronless genes

belonging to ABE account for 39% of the total intronless

genes in sea squirt and 30% in sea urchin, which together

Comparative and Functional Genomics 5

A B E AB AE BE ABE ORFans

Domain combination

su sq z

cph

Genes (%)

Figure 4: Distribution of intronless genes speciﬁc to diﬀerent tax-

onomic group combinations for each species. su: sea urchin; sq: sea

squirt;z:zebraﬁsh;c:chicken;p:platypus;h:human.A:archaea;

B: bacteria; E: eukaryote; AB: archaea and bacteria; AE: archaea and

eukaryote; BE: bacteria and eukaryote; ABE: archaea, bacteria and

eukaryote; ORFans: homologs not found in other organisms.

form the ﬁrst class. The second class contains zebraﬁsh

(23%), chicken (22%), and the third class includes human

(16%) and platypus (14%). Given their phylogenetic posi-

tions, the ﬁrst class is more primitive than the second

class, which is more primitive than the third class. These

data show that higher percentage of intronless genes in

primitive species are inherited from the common ancestor

of archaea, bacteria, and eukaryotes than in higher species.

More than 20% of intronless genes are conserved in bacteria

and eukaryotes (BE) in each species, but less than 5% are

conserved in archaea and eukaryotes (AE). This could be

because archaea have lost more homologues with eukaryotes

than bacteria, or because bacteria have obtained more.

Moreover, the percentage of genes conserved in bacteria

and eukaryotes that account for total intronless genes is

signiﬁcantly higher in zebraﬁsh and chicken than that in

other species, suggesting that these 2 species have a greater

percentage of intronless genes inherited from the common

ancestor of bacteria and eukaryotes. No gene is conserved in

archaea or/and bacteria except one human gene in bacteria

and this might be an example of lateral gene transfer (LGT)

from bacteria to human.

More than 30% of intronless genes are eukaryote speciﬁc

in all these species, especially in platypus (61.6%). To inves-

tigate their distributions in eukaryotic groups, we divided

these proteins according to their homogeneity in fungi (F),

metazoans (M), other eukaryotes (O), and plants (P) and

formed 15 types of combination (Figure 5). Generally, ma-

jority of genes have homologues in the combination MO

(metazoans and other eukaryotes) in each species except in

sea squirt, in which only 13% of eukaryote-speciﬁc intronless

genes are of this kind. Less than 10% of genes are metazoan-

speciﬁc (M) in many species, but inchicken and human there

are 26% and 29%, respectively. Genes conserved in fungi,

metazoans, other eukaryotes, and plants (FMOP), including

histones and ribosomal proteins, were thought to be very

FMO

FMP

FOP

MOP

FMOP

Genes (%)

su sq z

cph

Domain combination

Figure 5: Distribution of eukaryote-speciﬁc intronless genes spe-

cific to diﬀerent eukaryotic taxonomic group combinations for

each species. su: sea urchin; sq: sea squirt; z: zebraﬁsh; c: chicken; p:

platypus;h:human.F:fungi;M:metazoans;O:othereukaryotes;P:

plants; FM: fungi and metazoans; FO: fungi and other eukaryotes;

FP: fungi and plants; MO: metazoans and other eukaryotes; MP:

metazoans and plants; OP: other eukaryotes and plants; FMO:

fungi, metazoans and other eukaryotes; FMP: fungi, metazoans and

plants; FOP: fungi, other eukaryotes and plants; MOP: metazoan,

other eukaryotes and plants FMOP: fungi, metazoans, other eu-

karyotes and plants.

conservative because they are essential for the survival of all

eukaryotes [30]. The number of FMOP genes is very similar

in sea urchin (314), sea squirt (263) and zebraﬁsh (228) but

the percentage of total eukaryote-speciﬁc intronless genes

is much greater in sea squirt (54.7%) than that in sea

urchin (27.8%) and zebraﬁsh (26.9%). Except those cases

mentioned above, very few genes are conserved in other

taxonomic combinations. Moreover, some genes in some

species have homologues in fungi, plants or other eukaryotes

but not in metazoans. These genes might be examples of

lateral gene transfer (LGT) between eukaryotes, which has

been demonstrated recently [31,32]. Since fungi and plants

diverged from metazoan ahead of other eukaryotes, the

distribution pattern of eukaryote-speciﬁc intronless genes in

these species can be explained by that much more homologs

have been lost in fungi and plants plus lots of others

have been obtained after their divergence. However, lots of

essential genes (FMOP) were still preserved. Therefore, the

distribution pattern of eukaryote-speciﬁc intronless genes

and the gain and loss patterns in this work are in accord with

earlier reports [13,15,33].

The predicted cellular role of each kind of combination

is shown in the supplementary material. Amino acid biosyn-

thesis, cell envelope, energy metabolism, translation, trans-

port and binding are usually well represented. Furthermore,

the distribution of basic functional categories, such as amino

acid biosynthesis, energy metabolism, and translation, are

overrepresented in intronless genes conserved in all major

biological kingdoms (ABE) or all eukaryotic groups (FMOP)

compared to others.

6Comparative and Functional Genomics

Tab le 2: Number of intronless genes with homologousgenes in other taxonomic groups.

Taxonomic group Sea urchin Sea squirt Zebraﬁsh Chicken Platypus Human

Archaea 764 625 524 385 143 1184

Bacteria 1303 873 1161 920 345 2524

Fungi 1492 1149 1067 817 413 2513

Plants 1480 1209 1070 809 441 2617

Metazoans 2447 1403 2027 1575 918 4807

Other eukaryotes 2337 1343 1897 1388 867 4055

ORFans 35 39 136 80 7 1416

ORFans [23]: homologues not found in other species.

AAB BOC CE CP CIM EM FAM PAP RF RAT T TAB

Cellular role

su sq z

cph

Genes (%)

Figure 6: Distribution of ORFans according to their functional

categories in each species. The description of functional categories

and species is the same as that given for Figure 2.

AAB BOC CE CP CIM EM FAM PAP RF RAT T TAB

Cellular role

Genes (%)

Figure 7: Functional distribution of conserved intronless genes in

vertebrates. The description of functional categories is same as that

given in Figure 2.

3.4. ORFans. The protein sequences that have no homologue

in other species are termed ORFans [23]. These proteins

could be responsible for some species-speciﬁc characteristics,

and most of these proteins might have evolved faster than

others [34]; in fact, they are part of the most interesting

genome content. Thus, it is important to experimentally

characterize these proteins or use more sensitive bioinfor-

matic approaches to understand their roles and functions

[15]. We found very few ORFans in these species except in

human (Tab le 2 ), about 22.7% of whose intronless genes are

ORFans, and this might be due to their complexities because

they are viviparous and mammalian. Most of these proteins

are annotated as hypothetical; however, majority of ORFans

in all species, except platypus, have mRNA or EST supports

when we checked their annotations. Thus, most of these

ORFans might not be misannotated. Figure 6 shows the pre-

dicted cellular role distribution of ORFans in each species. It

is interesting to note that translation and energy metabolism

are the most frequently represented cellular roles in these

species. The pattern is similar to earlier reports of plants

and human [13,15], suggesting that most species-speciﬁc

intronless genes in plants and animals have the same func-

tions and even the components of basic cellular machinery

might evolve to perform species-speciﬁc functions in all these

species [13,15]. Moreover, we found the cellular role of cell

envelope is well represented in sea urchin ORFans and more

than half of the intronless genes that have the cellular role of

fatty acid metabolism are ORFans.

3.5. Conserved Intronless Genes in Deuterostomes. To e x a m i n e

the conservation of intronless genes in deuterostomes, we

clustered them together for these 6 species using CD-Hit

(identity =0.3). Only 6 nonredundant sequences (NR) were

shared by these species, and they might perform pivotal

functions in deuterostomes. The predicted cellular roles were

translation for 4 NRs, regulatory function for one NR, and

energy metabolism for one NR. The GO categories of these

NRs were associated with transcription regulation, growth

factor, and transport. When we compared the shared NRs

between any two species, we found that sea urchin and sea

squirt have less than 100 shared NRs with other species, but

the number was more than 200 between any two vertebrate

species (data not shown), suggesting that signiﬁcantly more

intronless genes are shared by vertebrates than those shared

by deuterostomes. As expected, we found 125 NRs shared

by vertebrates, and Figure 7 shows the distribution of the

predicted functions of these NRs. Most of these proteins

are involved in basic cellular processes, such as transport

and binding, cell envelope, and translation, and these genes

could be one of the important reasons for the emergence of

vertebrates.

3.6. Paralogues of Intronless Genes. Intronless genes in eukar-

yotic genomes have many origins other than inheritance

Comparative and Functional Genomics 7

from ancient prokaryotes, such as duplication (whole ge-

nome duplication or tandem duplication) of existing intron-

less genes and retroposition of intron-containing genes

(retroduplicated genes). Also, there is evidence that ancient

intronless genes were the origin of multiexon genes [35,36].

To investigate these latter patterns, we clustered intronless

genes and nonredundant intronless genes with multiexon

genes using CD-Hit (identity =0.3), and the results were

shown in Ta b le 1 . It shows that about 92% of sea urchin

nonredundant clusters have more than 1 intronless gene and

the value is 58% in human, 32% in zebraﬁsh, and only 9%

in sea squirt. These data suggest that most intronless genes

may originate from other intronless genes in sea urchin,

human, and zebraﬁsh, but much fewer intronless genes

havethesameorigininotherspecies.Ta bl e 1 also shows

the frequency of correspondence between nonredundant

intronless genes and nonredundant intron-containing genes.

About 30% of nonredundant intronless gene clusters have

corresponding nonredundant intron-containing genes in

each species, but in sea squirt and chicken, the proportion

is only 15% and 18%, respectively. This might be due to

the activity of LINE retrotransposable elements in their

genomes. Active LINE retrotransposons that can reversibly

transcribe polyadenylated mRNAs are thought to be the

main reason for the emergence of retrogenes [37,38]. More

than 20% of the human genome is composed of LINE

retrotransposable elements [39], and many studies have

suggested a high rate of retroposition in human [26,40,41],

which might result in the emergence of intronless genes.

In chicken, only about 8% of the genome is comprised of

the CR1 (chicken repeat 1) [42] and this kind of LINE-1 is

not thought to reversibly transcribe polyadenylated mRNAs

[42]. Majority of intronless genes that have intronless or

intron-containing homologs are associated with cellular roles

transport and binding, cell envelope, and translation. It

has long been believed that duplicated genes (including

retroduplicated genes) provide material for the evolution

of genes with new functions [43], but there is evidence

that retrogenes function as their parent genes during the

spermatogenesis X chromosome inactivation of meiosis in

mammals [12] and in the fruit ﬂy [10]. Thus, the selective

advantage of retention of these duplicated intronless genes

might be that these genes can evolve new functions or

help to buﬀer crucial functions similar to earlier reports on

duplicated genes in angiosperms [44].

4. Conclusion

Both this and earlier studies indicate that the evolutionary

patterns of intronless genes among deuterostomes, as well

as between deuterostomes and plants, have many common

characteristics and might be appropriate for all major

eukaryote kingdoms. However, there are still some lineage-

speciﬁc and species-speciﬁc characteristics on the evolution

of intronless genes, and this might be one of the reasons for

the existence of biodiversity in this world. As more genome

sequences are sequenced and more exhaustive and accurate

genes are annotated, the evolutionary patterns of intronless

genes will become clearer, providing insights into under-

standing the evolutionary mechanisms underlying gene or

genome evolution in eukaryotes.

Acknowledgments

The authors are thankful to four anonymous reviewers and

M. Yu for their critical reading of this manuscript and helpful

comments and suggestions that greatly improved the paper.

This work was supported by a Grant from the Major State

Basic Research Development Program of China (973 Pro-

gram, no. 2007CB411601).

References

[1] K. B. Gatermann, A. Hoﬀmann, G. H. Rosenberg, and N. F.

Kaufer, “Introduction of functional artiﬁcial introns into the

naturally intronless ura4 gene of Schizosaccharomyces

pombe,” Molecular and Cellular Biology, vol. 9, no. 4, pp. 1526–

1535, 1989 (English).

[2] B. Bhandari, W. J. Roesler, K. D. DeLisio, D. J. Klemm, N. S.

Ross, and R. E. Miller, “A functional promoter ﬂanks an in-

tronless glutamine synthetase gene,” Journal of Biological

Chemistry, vol. 266, no. 12, pp. 7784–7792, 1991 (English).

[3] A. V. Makeyev, A. N. Chkheidze, and S. A. Liebhaber, “A set

of highly conserved RNA-binding proteins, alpha CP-1 and

alpha CP-2, implicated in mRNA stabilization, are coexpressed

from an intronless gene and its intron-containing paralog,”

Journal of Biological Chemistry, vol. 274, no. 35, pp. 24849–

24857, 1999 (English).

[4] A. Sugiyama, K. Noguchi, C. Kitanaka et al., “Molecular clon-

ing and chromosomal mapping of mouse intronless myc gene

acting as a potent apoptosis inducer,” Gene, vol. 226, no. 2, pp.

273–283, 1999 (English).

[5] A. J. Gentles and S. Karlin, “Why are human G-protein-cou-

pled receptors predominantly intronless?” Trends in Gene t i c s ,

vol. 15, no. 2, pp. 47–49, 1999 (English).

[6] J. C. Venter, “the sequence of the human genome,”Science,vol.

292, no. 5507, pp. 1304–1351, 2001 (English).

[7] S. K. Tay, J. Blythe, and L. Lipovich, “Global discovery of pri-

mate-speciﬁc genes in the human genome,” Proceedings of the

National Academy of Sciences of the United States of America,

vol. 106, no. 29, pp. 12019–12024, 2009 (English).

[8] X . Yan g , S . Ja w d y, T. J. Ts c h a pl in ski , an d G . A . Tu s k an, “Ge-

nome-wide identiﬁcation of lineage-speciﬁc genes in Ara-

bidopsis, Oryza and Populus,” Genomics,vol.93,no.5,pp.

473–480, 2009 (English).

[9] G. R. Fink, “Pseudogenes in yeast?” Cell, vol. 49, no. 1, pp. 5–6,

1987 (English).

[10] E. Betr´

an, K. Thornton, and M. Long, “Retroposed new genes

out of the X in Drosophila,” Genome Research, vol. 12, no. 12,

pp. 1854–1859, 2002.

[11] Y. Zhang, Y. Wu, Y. Liu, and B. Han, “Computational identiﬁ-

cation of 69 retroposons in Arabidopsis,” Plant Physiology,vol.

138, no. 2, pp. 935–948, 2005.

[12] J. J. Emerson, H. Kaessmann, E. Betr´

an, and M. Long, “Exten-

sive gene traﬃc on the mammalian X chromosome,” Science,

vol. 303, no. 5657, pp. 537–540, 2004.

[13] M.Jain,P.Khurana,A.K.Tyagi,andJ.P.Khurana,“Genome-

wide analysis of intronless genes in rice and Arabidopsis,”

Functional and Integrative Genomics, vol. 8, no. 1, pp. 69–78,

2008 (English).

8Comparative and Functional Genomics

[14] S. M. Agarwal, “Evolutionary rate variation in eukaryotic

lineage speciﬁc human intronless proteins,” Biochemical and

Biophysical Research Communications, vol. 337, no. 4, pp.

1192–1197, 2005 (English).

[15] S. M. Agarwal and J. Gupta, “Comparative analysis of human

intronless proteins,” Biochemical and Biophysical Research

Communications, vol. 331, no. 2, pp. 512–519, 2005 (English).

[16] M. K. Sakharkar et al., “Computational prediction of SEG

(single exon gene) function in humans,” Frontiers in Bio-

science, vol. 10, pp. 1382–1395, 2005 (English).

[17] A. E. Hill and E. J. Sorscher, “The non-randomdistribution of

intronless human genes across molecular function categories,”

FEBS Letters, vol. 580, no. 18, pp. 4303–4305, 2006 (English).

[18] K. R. Sakharkar, M. K. Sakharkar, C. T. Culiat, V. T. K.

Chow, and S. Pervaiz, “Functional and evolutionary analyses

on expressed intronless genes in the mouse genome,” FEBS

Letters, vol. 580, no. 5, pp. 1472–1478, 2006 (English).

[19] M. K. Sakharkar, P. Kangueane, D. A. Petrov, A. S. Kolaskar,

and S. Subbiah, “SEGE: a database on ‘intron less/single ex-

onic’ genes from eukaryotes,” Bioinformatics,vol.18,no.9,pp.

1266–1267, 2002 (English).

[20] M. K. Sakharkar and P. Kangueane, “Genome SEGE: a

database for ‘intronless’ genes in eukaryotic genomes,” BMC

Bioinformatics, vol. 5, article 67, 2004 (English).

[21] M. Punta and Y. Ofran, “The rough guide to in silico function

prediction, or how to use sequence and structure information

to predict protein function,” PLoS Computational Biology,vol.

4, no. 10, Article ID e1000160, 2008 (English).

[22] L.J.Jensen,R.Gupta,N.Blometal.,“Predictionofhuman

protein function from post-translational modiﬁcations and

localization features,” Journal of Molecular Biology, vol. 319,

no. 5, pp. 1257–1265, 2002 (English).

[23] D. Fischer and D. Eisenberg, “Finding families for genomic

ORFans,” Bioinformatics, vol. 15, no. 9, pp. 759–762, 1999

(English).

[24] D. L. Wheeler, T. Barrett, D. A. Benson et al., “Database re-

sources of the National Center for Biotechnology Informa-

tion,” Nucleic Acids Research, vol. 33, pp. D39–D45, 2005

(English).

[25] W. Li, L. Jaroszewski, and A. Godzik, “Clustering of highly

homologous sequences to reduce the size of large protein

databases,” Bioinformatics, vol. 17, no. 3, pp. 282–283, 2001

(English).

[26] A. C. Marques, I. Dupanloup, N. Vinckenbosch, A. Reymond,

and H. Kaessmann, “Emergence of young human genes after

a burst of retroposition in primates,” PLoS Biology,vol.3,no.

11, article e357, pp. 1970–1979, 2005.

[27] Z. Yu, D. Morais, M. Ivanga, and P. M. Harrison, “Analysis of

the role of retrotransposition in gene evolution in vertebrates,”

BMC Bioinformatics, vol. 8, article 308, 2007.

[28] N. Ben-Arie, D. Lancet, C. Taylor et al., “Olfactory receptor

gene cluster on human chromosome 17: possible duplication

of an ancestral receptor repertoire,” Human Molecular Genet-

ics, vol. 3, no. 2, pp. 229–235, 1994 (English).

[29] J. C. Dugas and J. Ngai, “Analysis and characterization of

an odorant receptor gene cluster in the zebraﬁsh genome,”

Genomics, vol. 71, no. 1, pp. 53–65, 2001 (English).

[30]I.KingJordan,I.B.Rogozin,Y.I.Wolf,andE.V.Koonin,

“Essential genes are more evolutionarily conserved than are

nonessential genes in bacteria,” Genome Research,vol.12,no.

6, pp. 962–968, 2002 (English).

[31] R. Kamikawa, Y. Inagaki, and Y. Sako, “Direct phylogenetic

evidence for lateral transfer of elongation factor-like gene,”

Proceedings of the National Academy of Sciences of the United

States of America, vol. 105, no. 19, pp. 6965–6969, 2008

(English).

[32] M. E. Rumpho, J. M. Worful, J. Lee et al., “Horizontal gene

transfer of the algal nuclear gene psbO to the photosynthetic

sea slug Elysia chlorotica,” Proceedings of the National Academy

of Sciences of the United States of America, vol. 105, no. 46, pp.

17867–17871, 2008 (English).

[33] E. V. Koonin, N. D. Fedorova, J. D. Jackson et al., “A com-

prehensive evolutionary classiﬁcation of proteins encoded in

complete eukaryotic genomes,” Genome biology,vol.5,no.2,

p. R7, 2004 (English).

[34] T. Domazet-Loso and D. Tautz, “An evolutionary analysis of

orphan genes in Drosophila,” Genome Research, vol. 13, no.

10, pp. 2213–2219, 2003 (English).

[35]A.Lecharny,N.Boudet,I.Gy,S.Aubourg,andM.Kreis,

“Introns in, introns out in plant gene families: a genomic ap-

proach of the dynamics of gene structure,” Journal of Structural

and Functional Genomics, vol. 3, no. 1–4, pp. 111–116, 2003

(English).

[36] N. Boudet, S. Aubourg, C. Toﬀano-Nioche, M. Kreis, and

A. Lecharny, “Evolution of intron/exon structure of DEAD

helicase family genes in Arabidopsis, Caenorhabditis and Dro-

sophila,” Genome Research, vol. 11, no. 12, pp. 2101–2114,

2001 (English).

[37] E. V. Gogvadze and A. A. Buzdin, “New mechanism of ret-

rogene formation in mammalian genomes: in vivo recom-

bination during RNA reverse transcription,” Molekulyarnaya

Biologiya, vol. 39, no. 3, pp. 364–373, 2005 (Russian).

[38] C. Esnault, J. Maestre, and T. Heidmann, “Human LINE retro-

transposons generate processed pseudogenes,” Nature Genet-

ics, vol. 24, no. 4, pp. 363–367, 2000.

[39] E. S. Lander, L. M. Linton, B. Birren et al., “Initial sequencing

and analysis of the human genome,” Nature, vol. 409, no. 6822,

pp. 860–921, 2001 (English).

[40] Z. Zhang, P. M. Harrison, Y. Liu, and M. Gerstein, “Millions

of years of evolution preserved: a comprehensive catalog of

the processed pseudogenes in the human genome,” Genome

Research, vol. 13, no. 12, pp. 2541–2558, 2003 (English).

[41] K. Ohshima, M. Hattori, T. Yada, T. Gojobori, Y. Sakaki,

and N. Okada, “Whole-genome screening indicates a possible

burst of formation of processed pseudogenes and Alu repeats

by particular L1 subfamilies in ancestral primates,” Genome

Biology, vol. 4, no. 11, article R74, 2003.

[42] N.B.Haas,J.M.Grabowski,A.B.Sivitz,andJ.B.E.Burch,

“Chicken repeat 1 (CR1) elements, which deﬁne an ancient

family of vertebrate non-LTR retrotransposons, contain two

closely spaced open reading frames,” Gene, vol. 197, no. 1-2,

pp. 305–309, 1997.

[43] A. Wagner, “The fate of duplicated genes: loss or new func-

tion?” BioEssays, vol. 20, no. 10, pp. 785–788, 1998 (English).

[44] B.A.Chapman,J.E.Bowers,F.A.Feltus,andA.H.Paterson,

“Buﬀering of crucial functions by paleologous duplicated

genes may contribute cyclicality to angiosperm genome dupli-

cation,” Proceedings of the National Academy of Sciences of the

United States of America, vol. 103, no. 8, pp. 2730–2735, 2006

(English).

Available via license: CC BY

Content may be subject to copyright.

Genome-wide identification of WRKY transcription factor family members in Miscanthus sinensis (Miscanthus sinensis Anderss)

Article

Full-text available

Mar 2024

Miscanthus is an emerging sustainable bioenergy crop whose growing environment is subject to many abiotic and biological stresses. WRKY transcription factors play an important role in stress response and growth of biotic and abiotic. To clarify the distribution and expression of the WRKY genes in Miscanthus, it is necessary to classify and phylogenetically analyze the WRKY genes in Miscanthus. The v7.1 genome assembly of Miscanthus was analyzed by constructing an evolutionary tree. In Miscanthus, there are 179 WRKY genes were identified. The 179 MsWRKYs were classified into three groups with conserved gene structure and motif composition. The tissue expression profile of the WRKY genes showed that MsWRKY genes played an essential role in all growth stages of plants. At the early stage of plant development, the MsWRKY gene is mainly expressed in the rhizome of plants. In the middle stage, it is mainly expressed in the leaf. At the end stage, mainly in the stem. According to the results, it showed significant differences in the expression of the MsWRKY in different stages of Miscanthus sinensis. The results of the study contribute to a better understanding of the role of the MsWRKY gene in the growth and development of Miscanthus.

HSP gene superfamily in Aspongopus chinensis Dallas: unravelling identification, characterisation and expression patterns during diapause and non-diapause stages

Article

Mar 2024
B ENTOMOL RES

Aspongopus chinensis Dallas 1851, an insect of important economic value, faces challenges in artificial breeding due to mandatory diapause and limited access to wild resources. Heat shock proteins (Hsps) are thought to influence diapause in insects, but little is known about their role in A. chinensis during diapause. This study used genomic methods to identify 25 Hsp genes in A. chinensis, including two Hsp90, 14 Hsp70, four Hsp60 and five small Hsp genes, were located on seven chromosomes, respectively. The gene structures among the same families are relatively conserved. Meanwhile, the motif compositions and secondary structures of A. chinensis Hsps (AcHsps) were predicted. RNA-seq data and fluorescence quantitative PCR analysis showed that there were differences in the expression patterns of AcHsps in diapause and non-diapause stages, and AcHsp70-5 was significantly differentially expressed in both analysis, which was enriched in the pathway of response to hormone. All the results showed that Hsps play an important role in the diapause mechanism of A. chinensis. Our observations highlight the molecular evolution of the Hsp gene and their effect on diapause in A. chinensis.

Molecular Identification and Characterization of Hevein Antimicrobial Peptide Genes in Two-Row and Six-Row Cultivars of Barley (Hordeum vulgare L.)

Article

Full-text available

Feb 2024
BIOCHEM GENET

Heveins are one of the most important groups of plant antimicrobial peptides. So far, various roles in plant growth and development and in response to biotic and abiotic stresses have reported for heveins. The present study aimed to identify and characterize the hevein genes in two-row and six-row cultivars of barley. In total, thirteen hevein genes were identified in the genome of two-row and six-row cultivars of barley. The identified heveins were identical in two-row and six-row cultivars of barley and showed a high similarity with heveins from other plant species. The hevein coding sequences produced open reading frames (ORFs) ranged from 342 to 1002 bp. Most of the identified hevein genes were intronless, and the others had only one intron. The hevein ORFs produced proteins ranged from 113 to 333 amino acids. Search for conserved functional domains showed CBD and LYZ domains in barley heveins. All barley heveins comprised extracellular signal peptides ranged from 19 to 35 amino acids. The phylogenetic analysis divided barley heveins into two groups. The promoter analysis showed regulatory elements with different frequencies between two-row and six-row cultivars. These cis-acting elements included elements related to growth and development, hormone response, and environmental stresses. The expression analysis showed high expression level of heveins in root and reproductive organs of both two-row and six-row cultivars. The expression analysis also showed that barley heveins is induced by both biotic and abiotic stresses. The results of antimicrobial activity prediction showed the highest antimicrobial activity in CBD domain of barley heveins. The findings of the current study can improve our knowledge about the role of hevein genes in plant and can be used for future studies.

Identification of Acetylcholinesterase Like Gene Family and Its Expression Under Salinity Stress in Solanum lycopersicum

Article

Full-text available

Nov 2023
J PLANT GROWTH REGUL

In human, acetylcholinesterase (AChE) is a cholinergic enzyme involved in the hydrolysis of neurotransmitter acetylcholine (ACh) into its constituents, choline, and acetate. In plants, the biological functions of AChE are lacking and its existence has been recognized by indirect evidence of its activity. Therefore, in the present investigation, a systematic analysis of the AChE gene family in tomato was performed by integrating structural features, phylogenetic analysis, and its enzyme activity. Using the computational approach, we have identified 87 SlAChE genes containing GDSL lipase/acylhydrolase domain in tomato. In silico expression analysis of SlAChE genes showed up-and down regulation under salinity stress condition. The activity of the AChE enzyme was further confirmed using Ellman assay. Promoter analysis of SlAChE genes using PlantCARE showed the presence of several cis-acting elements including abiotic stress, light, and hormone regulatory elements. In silico screening indicated that tomato AChE homologs are widely distributed in plants. Syntenic analysis revealed several gene pairs between tomato and other species. Interestingly, the deduced amino acid sequence of human AChE showed no similarity with that of tomato AChE sequence. However, the binding energy of SlAChE enzyme to agonists and antagonists was almost identical to that of human AChE. This preliminary study of ChE-like activity in plants may open the way for additional research in non-neuronal role in plants. The studies provide a theoretical basis for further elucidating the functions of the AChE gene family at the molecular level.

Genome-wide identification and expression analysis of the GRAS gene family under abiotic stresses in wheat (Triticum aestivum L.)

Article

Full-text available

Oct 2023

The GRAS transcription factors are multifunctional proteins involved in various biological processes, encompassing plant growth, metabolism, and responses to both abiotic and biotic stresses. Wheat is an important cereal crop cultivated worldwide. However, no systematic study of the GRAS gene family and their functions under heat, drought, and salt stress tolerance and molecular dynamics modeling in wheat has been reported. In the present study, we identified the GRAS gene in Triticum aestivum through systematically performing gene structure analysis, chromosomal location, conserved motif, phylogenetic relationship, and expression patterns. A total of 177 GRAS genes were identified within the wheat genome. Based on phylogenetic analysis, these genes were categorically placed into 14 distinct subfamilies. Detailed analysis of the genetic architecture revealed that the majority of TaGRAS genes had no intronic regions. The expansion of the wheat GRAS gene family was proven to be influenced by both segmental and tandem duplication events. The study of collinearity events between TaGRAS and analogous orthologs from other plant species provided valuable insights into the evolution of the GRAS gene family in wheat. It is noteworthy that the promoter regions of TaGRAS genes consistently displayed an array of cis-acting elements that are associated with stress responses and hormone regulation. Additionally, we discovered 14 miRNAs that target key genes involved in three stress-responsive pathways in our study. Moreover, an assessment of RNA-seq data and qRT-PCR results revealed a significant increase in the expression of TaGRAS genes during abiotic stress. These findings highlight the crucial role of TaGRAS genes in mediating responses to different environmental stresses. Our research delved into the molecular dynamics and structural aspects of GRAS domain-DNA interactions, marking the first instance of such information being generated. Overall, the current findings contribute to our understanding of the organization of the GRAS genes in the wheat genome. Furthermore, we identified TaGRAS27 as a candidate gene for functional research, and to improve abiotic stress tolerance in the wheat by molecular breeding.

Genome-Wide Identification and Expression of the GRAS Gene Family in Oat (Avena sativa L.)

Article

Full-text available

Jul 2023

The GRAS protein family is involved in plant growth and development, plant disease resistance, and abiotic stress response. Although the GRAS protein family has been systematically studied and reported in many plants, it has not been reported in oat, an excellent foodstuff crop of Gramineae. We identified 90 AsGRAS genes and all of the AsGRAS genes were randomly distributed on 21 chromosomes with 6 tandem duplicated genes and 49 pairs of segmental duplications, which may be the main reason for the expansion of the GRAS gene family. According to the phylogenetic tree, 90 AsGRASs were classified into 10 distinct subfamilies. Gene structure revealed introns varying from zero to seven, and all genes have conserved motifs and GRAS structure domain. Protein–protein interaction and miRNA prediction analysis showed that AsGRAS proteins mainly interacted with GA signalling, cell division, etc., and that the AsGRAS genes were targeted by miRNA171. RNA-seq and qRT–PCR data showed that GRAS genes were expressed at different growth and developmental stages and under different abiotic stresses in oat, indicating the potential role of GRAS genes in promoting growth and stress tolerance in oat. Overall, our evolutionary and expression analysis of AsGRAS genes contributes to the elucidation of a theoretical basis for the GRAS gene family. Moreover, it helped reveal gene function and laid the foundation for future agricultural improvement of oats based on functional properties.

Role of Brassica orphan gene BrLFM on leafy head formation in Chinese cabbage (Brassica rapa)

Article

Full-text available

Jul 2023
THEOR APPL GENET

Key message Brassica orphan gene BrFLM, identified by two allelic mutants, was involved in leafy head formation in Chinese cabbage. Abstract Leafy head formation is a unique agronomic trait of Chinese cabbage that determines its yield and quality. In our previous study, an EMS mutagenesis Chinese cabbage mutant library was constructed using the heading Chinese cabbage double haploid (DH) line FT as the wild-type. Here, we screened two extremely similar leafy head deficiency mutants lfm-1 and lfm-2 with geotropic growth leaves from the library to investigate the gene(s) related to leafy head formation. Reciprocal crossing results showed that these two mutants were allelic. We utilized lfm-1 to identify the mutant gene(s). Genetic analysis showed that the mutated trait was controlled by a single nuclear gene Brlfm. Mutmap analysis showed that Brlfm was located on chromosome A05, and BraA05g012440.3C or BraA05g021450.3C were the candidate gene. Kompetitive allele-specific PCR analysis eliminated BraA05g012440.3C from the candidates. Sanger sequencing identified an SNP from G to A at the 271st nucleotide on BraA05g021450.3C. The sequencing of lfm-2 detected another non-synonymous SNP (G to A) located at the 266st nucleotide on BraA05g021450.3C, which verified its function on leafy head formation. We blasted BraA05g021450.3C on database and found that it belongs to a Brassica orphan gene encoding an unknown 13.74 kDa protein, named BrLFM. Subcellular localization showed that BrLFM was located in the nucleus. These findings reveal that BrLFM is involved in leafy head formation in Chinese cabbage.

Invasive mussels fashion silk-like byssus via mechanical processing of massive horizontally acquired coiled coils

Article

Full-text available

Nov 2023
P NATL ACAD SCI USA

Zebra and quagga mussels ( Dreissena spp. ) are invasive freshwater biofoulers that perpetrate devastating economic and ecological impact. Their success depends on their ability to anchor onto substrates with protein-based fibers known as byssal threads. Yet, compared to other mussel lineages, little is understood about the proteins comprising their fibers or their evolutionary history. Here, we investigated the hierarchical protein structure of Dreissenid byssal threads and the process by which they are fabricated. Unique among bivalves, we found that threads possess a predominantly β -sheet crystalline structure reminiscent of spider silk. Further analysis revealed unexpectedly that the Dreissenid thread protein precursors are mechanoresponsive α -helical proteins that are mechanically processed into β -crystallites during thread formation. Proteomic analysis of the byssus secretory organ and byssus fibers revealed a family of ultrahigh molecular weight (354 to 467 kDa) asparagine-rich (19 to 20%) protein precursors predicted to form α -helical coiled coils. Moreover, several independent lines of evidence indicate that the ancestral predecessor of these proteins was likely acquired via horizontal gene transfer. This chance evolutionary event that transpired at least 12 Mya has endowed Dreissenids with a distinctive and effective fiber formation mechanism, contributing significantly to their success as invasive species and possibly, inspiring new materials design.

Network of GRAS transcription factors in plant development, fruit ripening and stress responses

Article

Full-text available

Nov 2023

The plant-specific family of GRAS transcription factors has been wide implicated in the regulation of transcriptional reprogramming associated with a diversity of biological functions ranging from plant development processes to stress responses. Functional analyses of GRAS transcription factors supported by in silico structural and comparative analyses are emerging and clarifying the regulatory networks associated with their biological roles. In this review, a detailed analysis of GRAS proteins´ structure and biochemical features as revealed by recent discoveries indicated how these characteristics may impact subcellular location, molecular mechanisms, and function. Nomenclature issues associated with GRAS classification into different subfamilies in diverse plant species even in the presence of robust genomic resources are discussed, in particular how it affects assumptions of biological function. Insights into the mechanisms driving evolution of this gene family and how genetic and epigenetic regulation of GRAS contributes to subfunctionalization are provided. Finally, this review debates challenges and future perspectives on the application of this complex but promising gene family for crop improvement to cope with challenges of environmental transition.

Molecular identification and characterization of hevein antimicrobial peptide genes in barley (Hordeum vulgare L.)

Preprint

Full-text available

Aug 2023

Heveins are one of the most important groups of plant antimicrobial peptides. So far, various roles in plant growth and development and in response to biotic and abiotic stresses have reported for heveins. The present study aimed to identify and characterize the hevein genes in barley. In total, thirteen hevein genes identified in barley genome. The identified heveins showed a high similarity with heveins from other plant species in terms of structural and functional characteristics. The hevein coding sequences produced open reading frames (ORFs) ranged from 342 to 1002 bp. Most of the identified hevein genes were intronless, and the others had only one intron. The hevein ORFs produced proteins ranged from 113 to 333 amino acids. Search for conserved functional domains showed ChtBD1 and Lyz-like domains in barley heveins. All barley heveins comprised extracellular signal peptides ranged from 19 to 35 amino acids. The phylogenetic analysis divided barley heveins into two groups. The promoter analysis identified cis-acting elements related to growth and development, hormone response, and environmental stresses in the promoter of barley hevein genes. The expression analysis showed high expression level of heveins in root and reproductive organs of barley. The expression analysis also showed that barley heveins is induced by both biotic and abiotic stresses. The results of antimicrobial activity prediction showed the highest antimicrobial activity in ChtBD1 domain of barley heveins. The findings of the current study can improve our knowledge about the role of hevein genes in plant and can be used for future studies.

Initial sequencing and analysis of the human genome

Article

Full-text available

Jan 2001
NATURE

The sequence of the human genome

Article

Full-text available

Li W, Jaroszewski L, Godzik A.. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17: 282-283

Article

Full-text available

Mar 2001

We present a fast and flexible program for clustering large protein databases at different sequence identity levels. It takes less than 2 h for the all-against-all sequence comparison and clustering of the non-redundant protein database of over 560000 sequences on a high-end PC. The output database, including only the representative sequences, can be used for more efficient and sensitive database searches. Availability: The program is available from http://bioinformatics.burnham-inst.org/cd-hi Contact: liwz@sdsc.edu or adam@burnham-inst.org * To whom correspondence should be addressed.

Horizontal gene transfer of the algal nuclear gene psbO to the photosynthetic sea slug Elysia chlorotica

Article

Full-text available

Dec 2008
P NATL ACAD SCI USA

The sea slug Elysia chlorotica acquires plastids by ingestion of its algal food source Vaucheria litorea. Organelles are sequestered in the mollusc's digestive epithelium, where they photosynthesize for months in the absence of algal nucleocytoplasm. This is perplexing because plastid metabolism depends on the nuclear genome for >90% of the needed proteins. Two possible explanations for the persistence of photosynthesis in the sea slug are (i) the ability of V. litorea plastids to retain genetic autonomy and/or (ii) more likely, the mollusc provides the essential plastid proteins. Under the latter scenario, genes supporting photosynthesis have been acquired by the animal via horizontal gene transfer and the encoded proteins are retargeted to the plastid. We sequenced the plastid genome and confirmed that it lacks the full complement of genes required for photosynthesis. In support of the second scenario, we demonstrated that a nuclear gene of oxygenic photosynthesis, psbO, is expressed in the sea slug and has integrated into the germline. The source of psbO in the sea slug is V. litorea because this sequence is identical from the predator and prey genomes. Evidence that the transferred gene has integrated into sea slug nuclear DNA comes from the finding of a highly diverged psbO 3′ flanking sequence in the algal and mollusc nuclear homologues and gene absence from the mitochondrial genome of E. chlorotica. We demonstrate that foreign organelle retention generates metabolic novelty (“green animals”) and is explained by anastomosis of distinct branches of the tree of life driven by predation and horizontal gene transfer. • symbiosis • Vaucheria litorea • evolution • plastid • stramenopile

The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function

Article

Full-text available

Nov 2008
PLOS COMPUT BIOL

Choosing the right function prediction tools The vast majority of known proteins have not yet been characterized experimentally, and there is very little that is known about their function. New unannotated sequences are added to the databases at a pace that far exceeds the one in which they are annotated in the lab. Computational biology offers tools that can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history, and their association with other proteins. In this contribution, we attempt to provide a framework that will enable biologists and computational biologists to decide which type of computational tool is appropriate for the analysis of their protein of interest, and what kind of insights into its function these tools can provide. In particular, we describe computational methods for predicting protein function directly from sequence or structure, focusing mainly on methods for predicting molecular function. We do not discuss methods that rely on sources of information that are beyond the protein itself, such as genomic context [1], protein–protein interaction networks [2], or membership in biochemical pathways [3]. When choosing a tool for function prediction, one would typically want to identify the best performing tool. However, a quantitative comparison of different tools is a tricky task. While most developers report their own assessment of their tool, in most cases there are no standard datasets and generally agreed-upon measures and criteria for benchmarking function prediction methods. In the absence of independent benchmarks, comparing the figures reported by the developers is almost always comparing oranges and apples (for discussion of this problem see [4]). Therefore, we refrain from reporting numerical assessments of specific methods. For those cases in which independent assessment of performance is available, we refer the reader to the original publications. Finally, we discuss only methods that are either accessible as Web servers or freely available for download (relevant Web links can be found in Table S1).

The fate of duplicated genes: loss or new function?

Article

Oct 1998

Andreas Wagner

Computational prediction of SEG (single exon gene) function in humans

Article

Jan 2005

K. Sakharkar Meena

The sequence of the human genome

Article

Jan 2001

Global discovery of primate-specific genes in the human genome

Article

Aug 2009
P NATL ACAD SCI USA

The genomic basis of primate phenotypic uniqueness remains obscure, despite increasing genome and transcriptome sequence data availability. Although factors such as segmental duplications and positive selection have received much attention as potential drivers of primate phenotypes, single-copy primate-specific genes are poorly characterized. To discover such genes genomewide, we screened a catalog of 38,037 human transcriptional units (TUs), compiled from EST and cDNA sequences in conjunction with the FANTOM3 transcriptome project. We identified 131 TUs from transcribed sequences residing within primate-specific insertions in 9-species sequence alignments and outside of segmental duplications. Exons of 120 (92%) of the TUs contained interspersed repeats, indicating that repeat insertions may have contributed to primate-specific gene genesis. Fifty-nine (46%) primate-specific TUs may encode proteins. Although primate-specific TU transcript lengths were comparable to known human gene mRNA lengths overall, 92 (70%) primate-specific TUs were single-exon. Thirty-two (24%) primate-specific TUs were localized to subtelomeric and pericentromeric regions. Forty (31%) of the TUs were nested in introns of known genes, indicating that primate-specific TUs may arise within older, protein-coding regions. Primate-specific TUs were preferentially expressed in reproductive organs and tissues (P < 0.011), consistent with the expectation that emergence of new, lineage-specific genes may accompany speciation or reproduction. Of the 33 primate-specific TUs with human Affymetrix microarray probe support, 21 were differentially expressed in human teratozoospermia. In addition to elucidating the likely functional relevance of primate-specific TUs to reproduction, we present a set of primate-specific genes for future functional studies, and we implicate nonduplicated pericentromeric and subtelomeric regions in gene genesis.

Genome-wide identication of lineage-specific genes in Arabidopsis, Oryza and Populus

Article

Jun 2009
GENOMICS

Protein sequences were compared among Arabidopsis, Oryza and Populus to identify differential gene (DG) sets that are in one but not the other two genomes. The DG sets were screened against a plant transcript database, the NR protein database and six newly-sequenced genomes (Carica, Glycine, Medicago, Sorghum, Vitis and Zea) to identify a set of species-specific genes (SS). Gene expression, protein motif and intron number were examined. 165, 638 and 109 SS genes were identified in Arabidopsis, Oryza and Populus, respectively. Some SS genes were preferentially expressed in flowers, roots, xylem and cambium or up-regulated by stress. Six conserved motifs in Arabidopsis and OryzaSS proteins were found in other distant lineages. The SS gene sets were enriched with intronless genes. The results reflect functional and/or anatomical differences between monocots and eudicots or between herbaceous and woody plants. The Populus-specific genes are candidates for carbon sequestration and biofuel research.

The Roles and Evolutionary Patterns of Intronless Genes in Deuterostomes

Abstract and Figures

Recommended publications

Genome comparisons highlight similarity and diversity within the eukaryotic kingdoms

Short tandem repeats, segmental duplications, gene deletion, and genomic instability in a rapidly di...

On the Incidence of Intron Loss and Gain in Paralogous Gene Families

Comparative genomic organization and tissue-specific transcription of the duplicated fabp7 and fabp1...