PosterPDF Available

Genome-wide Intra-specific Comparison from GATA Transcription Factors Among Nineteen Arabidopsis Thaliana Genomes

Authors:
  • InfoBoss, south Korea, Seoul

Abstract

GATA transcription factors (TFs) are widespread eukaryotic regulators whose DNA-binding domain is a class IV zinc finger motif (CX2CX17–20CX2C) followed by a basic region. We identified GATA TFs from 19 eco-type A. thaliana genomes to understand infra-specific characteristics of A. thaliana GATA TFs. 566 GATA genes (772 GATA TFs) from 19 genomes were identified and classified into four subfamilies (I to IV) based on phylogenetic tree of A. thaliana Col0 GATA TFs. Four ecotypes (Hi0, Ler0, Mt0, and Ws0) do not have AtGATA24 gene of which function is cryptochrome1-dependent response to excess light. Only Kn0 ecotype presents alternative splicing forms of AtGATA15 gene of which start positions of ORF are different. It may subtly affect their functions; however, there is no available experimental evidence. 22 out of 2,195 amino acids (1.002%) originated from 41 GATA domains have variations across 19 ecotypes considering that four GATA TFs have heterogeneous nucleotides in ORF. Amino acid sequence of each GATA TF has a maximum of four forms in 19 ecotypes. Rsch4 and Wu0 genomes present completely identical amino acid sequence of GATA domains. In comparison to Reyes et al. (2004), three GATA genes show different length, indicating that improvement of gene prediction has affected amino acid sequence of GATA TFs. Taken together, our intra-specific comparative analyses will be a corner stone to understand intra-specific characteristics of GATA TFs in plant genomes as well as to update GATA TFs of Arabidopsis.
Genome-wide Intra-specific Comparison from GATA Transcription Factors Among
Nineteen Arabidopsis Thaliana Genomes
Mangi Kim, Hong Xi, Jongsun Park*
1InfoBoss Research Center, 301 room, 670, Seolleung-ro, Gangnam-gu, Seoul, Republic of Korea, 06088
2InfoBoss, Co., Ltd., 301 room, 670, Seolleung-ro, Gangnam-gu, Seoul, Republic of Korea, 06088
Abstract
Amino Acid Patterns of GATA Domain from 19 A. thaliana Genomes
Distribution of GATA Domain Forms of A. thaliana GATA TFs
Plant GATA Transcription Factors Identified Genome-widely
0
10
20
30
40
50
60
70
Subfamily I Subfamily II Subfamily III Subfamily IV Subfamily V Subfamily VI Subfamily VII
# of GATA
genes
SoybeanA. thaliana Col0 Castor bean Tomato Apple Rice
- Due to may plant genome sequencing projects, there are six genome-wide analyses of
GATA gene family conducted in A. thaliana Col0[1], Glycin max[5], Ricinus communis[6],
Solanum lycopersicum[7], Malus x domestica[8], and Oryza sativa[1].
- Total number of GATA genes are different from each other.
: one form of amino acid
: two forms of amino acid
: three forms of amino acid
: four forms of amino acid
: five forms of amino acid
: six forms of amino acid : one or two forms of amino acid containing hetero
type
: one form of amino acid containing hetero type
: two forms of amino acid containing hetero type
: amino acid only containing Ler0
GATA Transcription Factors
The Sequence (left) and ribbon
representation (right) of the DNA-binding
domain of the fungal GATA factor AreA.
(Claudio Scazzocchio, 2000, Current
Opinion in Microbiology 3:126-131)
- GATA transcription factors are a class
of transcriptional regulators present in
fungi, metazoans and plants.
- The DNA-binding domains of
eukaryotic GATA factors comprise a
four-cysteine Zn finger and an adjacent
basic region.
- Plant GATA factors plant various roles
including regulation of light and
circadian clock responsive[1], control of
nitrite reductase genes[1], control genes
related to low nitrogen stress[2], light-
responsive development[3],and
chlorophyll-level regulation[3].
Acknowledgements
- This research
was fully
supported by
InfoBoss Grant
(IBI-0001).
1. Reyes, J.C., Muro-Pastor, M.I. and Florencio, F.J., 2004. The GATA family of transcription factors in Arabidopsis
and rice. Plant physiology,134(4), pp.1718-1732.
2. Chen, H., Shao, H., Li, K., Zhang, D., Fan, S., Li, Y., Han, M., Genome-wide identification, evolution, and
expression analysis of GATA transcription factors in apple (Malus × domestica Borkh.). Gene, 627, pp.460-472.
3. Zhang, C., Hou, Y., Hao, Q., Chen, H., Chen, L., Yuan, S, Shan, Z., Zhang, X., Yang, Z., Qiu, D., Zhou, X., Huang, W.,
Genome-Wide Survey of the Soybean GATA Transcription Factor Gene Family and Expression Analysis under
Low Nitrogen Stress, PLoS One, 10(4), e0125174.
4. Shaikhali J, de Dios Barajas-Lopéz J, Ötvös K, Kremnev D, Garcia AS, Srivastava V, Wingsle G, Bako L, Strand Å
(2012) The CRYPTOCHROME1-dependent response to excess light is mediated through the transcriptional
activators ZINC FINGER PROTEIN EXPRESSED IN INFLORESCENCE MERISTEM LIKE1 and ZML2 in Arabidopsis. The
Plant Cell: tpc. 112.100099
Comparison of A. thaliana Col0 GATA Gene in Both Studies
Study in 2004
AtGATA3b
AtGATA3a
AtGATA7
AtGATA5b
AtGATA5a
AtGATA6
AtGATA12
AtGATA2
AtGATA4
AtGATA9
AtGATA14
AtGATA1
AtGATA8a
AtGATA8b
AtGATA11a
AtGATA11b
AtGATA10a
AtGATA10b
AtGATA13
AtGATA26c
AtGATA26a
AtGATA26b
AtGATA27
AtGATA18
AtGATA30
AtGATA17
AtGATA15
AtGATA20
AtGATA19
AtGATA24a
AtGATA24b
AtGATA28a
AtGATA28b
AtGATA25a
AtGATA25b
AtGATA25c
AtGATA21
AtGATA22
AtGATA16
AtGATA23
AtGATA29
100
100
99
48
29
99
77
37
30
41
23
100
37
100
100
100
100
79
100
100
93
75
49
55
68
99
100
100
100
100
100
100
75
34
32
83
98
99
0.10 : TIFY domain
: CCT domain
: GATA domain
Subfamily IV
Subfamily III
Subfamily II
Subfamily I
Subfamily III
Subfamily IV
Subfamily II
Subfamily II
Subfamily I
Phylogenetic tree Domain structure Domain structure
Phylogenetic tree
Study in 2018
References
5. Zhang C, Hou Y, Hao Q, Chen H, Chen L, Yuan S, Shan Z, Zhang X, Yang Z, Qiu D (2015) Genome-wide surve
y of the soybean GATA transcription factor gene family and expression analysis under low nitrogen stress. PL
oS One 10: e0125174
6. Tao A, Xiao-Jia L, Wei X, Ai-Zhong L (2015) Identification and Characterization of GATA Gene Family in Cast
or Bean (Ricinus communis). Plant Diver. Resour. 37: 453-462
7. Yuan Q, Zhang C, Zhao T, Yao M, Xu X (2018) A Genome-Wide Analysis of GATA Transcription Factor Family
in Tomato and Analysis of Expression Patterns. INTERNATIONAL JOURNAL OF AGRICULTURE AND BIOLOGY
20: 1274-1282
8. Chen H, Shao H, Li K, Zhang D, Fan S, Li Y, Han M (2017) Genome-wide identification, evolution, and
expression analysis of GATA transcription factors in apple (Malus× domestica Borkh.). Gene 627: 460-472
GATA transcription factors (TFs) are widespread eukaryotic regulators whose
DNA-binding domain is a class IV zinc finger motif (CX2CX1720CX2C) followed
by a basic region. We identified GATA TFs from 19 eco-type A. thaliana
genomes to understand infra-specific characteristics of A. thaliana GATA TFs.
566 GATA genes (772 GATA TFs) from 19 genomes were identified and
classified into four subfamilies (I to IV) based on phylogenetic tree of A.
thaliana Col0 GATA TFs. Four ecotypes (Hi0, Ler0, Mt0, and Ws0) do not have
AtGATA24 gene of which function is cryptochrome1-dependent response to
excess light. Only Kn0 ecotype presents alternative splicing forms of
AtGATA15 gene of which start positions of ORF are different. It may subtly
affect their functions; however, there is no available experimental evidence.
22 out of 2,195 amino acids (1.002%) originated from 41 GATA domains have
variations across 19 ecotypes considering that four GATA TFs have
heterogeneous nucleotides in ORF. Amino acid sequence of each GATA TF has
a maximum of four forms in 19 ecotypes. Rsch4 and Wu0 genomes present
completely identical amino acid sequence of GATA domains. In comparison to
Reyes et al. (2004), three GATA genes show different length, indicating that
improvement of gene prediction has affected amino acid sequence of GATA
TFs. Taken together, our intra-specific comparative analyses will be a corner
stone to understand intra-specific characteristics of GATA TFs in plant
genomes as well as to update GATA TFs of Arabidopsis.
Arabidopsis thaliana Col0 Genome
TAIR10:
A. thaliana genome length : 119,146,348 bp
# of genes : 34,074 ea (transcripts : 53,167 ea)
Paper:
A. thaliana genome length : 115 Mbp
# of genes : 25,498 ea
Genome data
can be improved
as time goes on!
- Plant genomes have been improved as more researches have been conducted.
- Genome sequence can be improved by additional sequencing data, by filling gaps,
or by conducting additional experiments (e.g., optical mapping).
- Genome version is an important factor to analyze plant genomes.
- - # of genes can be increased by accumulating RNA-Seq data or by finding new
elements (e.g. miRNAs).
Published in Nature, 2000
A. thaliana 19 Eco-Type Genomes
Chromosomal Distribution of A. thaliana Col0 GATA Genes
- 19 Arabidopsis genomes were used, of which chromosomes and gene models
were generated by mapping method.
- Numbers of genes except Col0 are similar to each other; while number of
transcripts of Col0 is smaller than those of the rest genomes.
116000000
116500000
117000000
117500000
118000000
118500000
119000000
119500000
(Mb)
- 19 genomes presents similar lengths to each other, ranging from 117.3Mb to
118.9Mb.
- Gap lengths of these genomes ranges from 108,963bp (Oy0) to 185,738bp (Col0).
- This statistics indicates that there will be no missed genes due to gap sequences.
Strain name
Version
# of
Contigs # of Genes
# of Transcripts
# of Proteins # of
tRNAs
# of
rRNAs
# of
ncRNAs
Bur0
0.7
5
27,014
38,717
38,717
Can0
0.7
5
26,949
38,556
38,556
Col0
TAIR10
5
33,321
53,167
48,113
703
aCt1
0.7
5
27,006
38,930
38,930
Edi0
0.7
5
26,997
38,813
38,813
Hi0
0.7
5
27,052
39,015
39,015
Kn0
0.7
5
27,002
38,908
38,908
Ler0
0.7
5
27,014
38,997
38,997
Mt0
0.7
5
27,002
38,685
38,685
No0
0.7
5
27,018
38,635
38,635
Oy0
0.7
5
27,010
38,596
38,596
Po0
0.7
5
27,045
38,776
38,776
Rsch4
0.7
5
27,031
38,557
38,557
Sf2
0.7
5
26,974
38,513
38,513
Tsu0
0.7
5
27,013
38,701
38,701
Il2
0.7
5
26,978
38,559
38,558
Ws0
0.7
5
27,010
38,395
38,395
Wu0
0.7
5
27,024
38,704
38,704
Zu0
0.7
5
27,044
38,901
38,901
MonocotDicot
- 22 out of 2,195 amino acids (1.002%) originated from 41 GATA domains have variations across 19 ecotypes
considering that four GATA TFs have heterogeneous nucleotides in ORF.
- Interestingly, there are hetero-amino acids in some GATA TFs due to heterozygous sites on GATA TFs.
- Variations at amino acid level on GATA domains are not so high; most of amino acids are conserved.
- Different amino acids outside of beta sheets and alpha helix are 13; while amino acid variations inside beta
sheets and alpha helix are 12, which is similar to each other.
- Maximum frequency of amino acid variations is six out of 19 strains, located in beta sheet 3 and variations
found in five strains is located at C-terminal of GATA domain.
- Subfamily III and IV do not have any amino acid variations among 19 genomes; while 6 out of 19 and 3 out
of 11 GATA TFs shows amino acid variations in Subfamily I and II, respectively, indicating that each
subfamily may have different evolutionary speed.
- There is only one case that GATA genes having alternative splicing forms present amino acid variation
(AtGATA10a/b).
- AtGATA30 (At4g16141.1/NP_680707.4) was newly added in comparison to GATA TFs identified in 2004.
- Length of AtGATA13 has been changed from 315aa in 2004 to 291aa in this study.
AtGATA16
AtGATA27
AtGATA23
AtGATA12
AtGATA21
AtGATA2
AtGATA13
AtGATA20
AtGATA15
AtGATA17
AtGATA29
AtGATA1
AtGATA14
AtGATA18
AtGATA6
AtGATA4
AtGATA30
AtGATA22
AtGATA9
AtGATA7
AtGATA19
AtGATA28
AtGATA8
AtGATA11
AtGATA10
AtGATA3
AtGATA5
(MB)
32
24
16
8
0
Chr1 Chr2 Chr3 Chr4 Chr5
AtGATA24
AtGATA26
AtGATA25
- Chromosomal distribution of GATA genes presents that chromosome 1 and 2 have only 3
GATA genes; while the remaining three chromosomes contains many GATA genes.
- Subfamily I distributes all five chromosomes; while Subfamily II members are in
chromosome 2 to 5. Subfamily III and IV, which are small subfamilies, are in chromosome
1, 3, and 4, and 4 and 5, respectively.
- Based on chromosomal position, 14 members in Subfamily I can be grouped as 12
pseudo-loci.
- Four ecotypes (Hi0, Ler0, Mt0, and Ws0) do not have AtGATA24 gene of which function is
cryptochrome1-dependent response to excess light [4]
: Subfamily I
: Subfamily II
: Subfamily III
: Subfamily IV
Yellow color means
that there is a GATA
gene, otherwise it
means no GATA gene.
Zu0
No0
Mt0
Ler0
Kn0
Hi0
Bur0
Can0
Edi0 Ct1
Wu0Ws0
Wil2Tsu0
Sf2
Rsch4
Po0
Oy0
- 566 GATA genes (772 GATA TFs)
from 19 genomes were identified
and classified into four subfamilies
(I to IV) based on phylogenetic
tree of A. thaliana Col0 GATA TFs.
- Amino acid sequence of each
GATA TF has a maximum of four
forms in 19 ecotypes.
: One form of GATA domain
: Two forms of GATA domain
: Three forms of GATA domain
: four forms of GATA domain
: One or two forms of GATA domain
by hetero type
AtGATA3b
AtGATA3a
AtGATA7
AtGATA5b
AtGATA5a
AtGATA6
AtGATA12
AtGATA2
AtGATA4
AtGATA9
AtGATA14
AtGATA1
AtGATA8a
AtGATA8b
AtGATA11a
AtGATA11b
AtGATA10a
AtGATA10b
AtGATA13
AtGATA26c
AtGATA26a
AtGATA26b
AtGATA27
AtGATA18
AtGATA30
AtGATA17
AtGATA15
AtGATA20
AtGATA19
AtGATA24a
AtGATA24b
AtGATA28a
AtGATA28b
AtGATA25a
AtGATA25b
AtGATA25c
AtGATA21
AtGATA22
AtGATA16
AtGATA23
AtGATA29
100
100
99
48
29
99
77
37
30
41
23
100
37
100
100
100
100
79
100
100
93
75
49
55
68
99
100
100
100
100
100
100
75
34
32
83
98
99
0.10
Subfamily I
Subfamily IV
Subfamily II
Subfamily II
Subfamily III
- Only Kn0 ecotype presents alternative splicing forms of AtGATA15 gene of which
start positions of ORF are different.
- It is not a typical alternative splicing event that transcripts gain or lose exons;
however, it is not affected frame of translation, so that remaining part sharing by
two transcripts have same amino acids and GATA domain of two transcripts are
identical.
- It may subtly affect their functions; however, there is no available experimental
evidence.
1
21
106
273
348
611
AtGATA15a
AtGATA15b
A. thaliana Kn0
GATA name Exon structure of transcript Exon structure of ORFs
1
267
352
519
594
1,059
1
150
225
488
1
243
352
519
594
1,059
ORF
start site
Special Alternative Splicing Form of GATA TF
: GATA domain
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.