Content uploaded by Jongsun Park
Author content
All content in this area was uploaded by Jongsun Park on Aug 15, 2020
Content may be subject to copyright.
Genome-wide Identification of GATA Transcription Factors
in Seven Populus Genomes
Mangi Kim, Hong Xi, and Jongsun Park*
1InfoBoss Research Center, 301 room, 670, Seolleung-ro, Gangnam-gu, Seoul, Republic of Korea, 06088
2InfoBoss, Co., Ltd., 301 room, 670, Seolleung-ro, Gangnam-gu, Seoul, Republic of Korea, 06088
GATA transcription factors (TFs) are widespread eukaryotic regulators
whose DNA-binding domain is a class IV zinc finger motif (CX2CX17-
20CX2C) followed by a basic region. We identified GATA TFs from seven
Populus genomes to understand phylogenomic position of Populus GATA
TFs. 262 GATA TFs (389 transcripts) from seven Populus genomes were
classified into four subfamilies (I to IV), which are common in both
monocots and dicots. Some Populus GATA TFs in subfamily III lack CCT
domain found in all Arabidopsis GATA TFs in same subfamily. Populus-
specific conserved amino acids on GATA domain were identified in
subfamilies I and II against those of Arabidopsis GATA TFs. One GATA TF
has maximumly nine alternative transcripts of which some show
differences in both untranslated regions. Phylogenetic tree of Populus
GATA TFs presents that most of clades show similar number of GATA TFs
from each Populus species. We established database in Populus
Comparative Genomics Database (http://www.populusgenome.info/) to
provide detailed information of GATA TFs analyzed in this study. Our
analyses with database will be a corner stone to understand transcription
factors among seven Populus genomes.
Abstract
GATA Transcription Factors
The Sequence (left) and ribbon
representation (right) of the DNA-binding
domain of the fungal GATA factor AreA.
(Claudio Scazzocchio, 2000, Current
Opinion in Microbiology 3:126-131)
- GATA transcription factors are a
class of transcriptional regulators
present in fungi, metazoans and
plants.
- The DNA-binding domains of
eukaryotic GATA factors comprise a
four-cysteine Zn finger and an
adjacent basic region.
- Plant GATA factors plant various
roles including regulation of light and
circadian clock responsive[1], control
of nitrite reductase genes[1], control genes related to low nitrogen
stress[2], light-responsive development[3], and chlorophyll-level
regulation[3].
Identification of Seven Populus Genomes
Species
Version
#of GATA
TFs
#of GATA
transcripts
# of
Genes # of
Proteins
P.
trichocarpa
3.1
39
67
42,950
63,498
P.
pruinosa
1
37
37
35,131
35,131
P.
euphratica
1
40
55
30,688
49,676
P.
deltoides WV94
2.1
38
55
44,853
57,249
P.
tremuloides
1.1
37
44
36,830
48,320
P.
tremula
1.1
33
60
35,309
83,720
P.
tremula x alba 717-
1B4
1.1
38
71
41,335
73,013
-P. pruinosa genome does not provide alternative splicing form data
(grey color in the above table).
- Numbers of GATA TFs in Populus genus ranges from 33 to 40; there is no
significant differences.
- Interestingly, number of alternative transcripts of GATA TFs in P. tremula
xalba are the largest among the seven Populus species mainly because
this genome was assembled by re-sequencing method.
Length of GATA Transcripts
Length
(aa)
Ar:Arabidopsis
Po:Populus
Subfamily Identification of GATA TFs
in Various Plants
- Subfamilies V to VII were monocot-specific subfamily.
- In most dicot, subfamily I have the largest number of GATA
TFs; while, subfamily IV contains the smallest number of
GATA TFs.
- Due to may plant genome sequencing projects, there are
six genome-wide analyses of GATA gene family conducted
in A. thaliana[1], Glycine max[4], Ricinus communis[5],
Solanum lycopersicum[6], Malus xdomestica[7], and Oryza
sativa[1].
- Total number of GATA TFs are different from each other.
Transmembrane Helix (TMH) of GATA Gene Family in Populus Seven Genomes
- Membrane-bound transcription factors (MTFs) are located in cellular membranes due to their
transmembrane domains.
- MTFs were involved in various aspects of plant growth, development and environmental
responses, such as seed germination[9] and cell division[10].
- To find similar GATA transcripts with TMH, we utilized TMHMM program (Version 2.0) against all
identified GATA transcripts and five Populus GATA transcripts from three species (P. trichocarpa,
P. pruinosa, and P. tremula x alba 717-1B4) were identified.
Species name GATA transcript
name # of TMHs Subfamily
P.
trichocarpa
PtGATA7b
1 I
P.
trichocarpa
PtGATA7c
1 I
P.
pruinosa
PpGATA21
1II
P.
pruinosa
PpGATA25
1II
P.
tremula x alba 717-
1B4
PtaaGATA23
1II
O. sativa
M. domestica
S. lycopersicum
R. communis
G. max
A. thaliana
P. tremula x alba 717-1B4
P. tremula
P. tremuloides
P. deltoides WV94
P. euphratica
P. pruinosa
P. trichocarpa
0
5
10
15
20
25
30
# of GATA
TFs
Dicot
(Populus)
Dicot
Monocot
Analysis of Alternative Splicing Form of PdGATA6
- Some of GATA TFs identified in Populus genus have
several alternative splicing forms.
- One of extreme case for the number of alternative
splicing forms of GATA TF is PdGATA6.
- Interestingly, not all alternative splicing forms of
PdGATA6 affects translated amino acids (ORF
regions): Among these nine transcripts, all expect
PdGATA6f and PdGATA6h present the same start
and end position of ORFs.
- The first exon containing start methionine not
including stop codon of the eight transcripts expect
PdGATA6f shows two types, one is 627bp in length
and the other is 645bp, presenting there are only
two types of amino acid sequences in the eight
transcripts.
- This phenomena reflects that we need to study
more about this kind of alternative splicing forms in
transcription factors.
1
1,748
3,861
4,111
5,244
5,461
PdGATA6a
PdGATA6b 1
173
561
2,133
4,246
4,496
5,629
5,846
PdGATA6c 1
165
553
782
1,146
2,107
5,621
5,838
4,238
4,488
PdGATA6d 1
486
850
1,811
3,942
4,192
5,325
5,542
PdGATA6e 1
399
1,098
1,745
3,858
4,108
5,241
5,501
PdGATA6f 1
399
1,098
3,747
3,858
4,108
5,241
5,501
PdGATA6g 1
329
696
1,675
3,788
4,038
5,171
5,388
PdGATA6h 1
1,142
3,255
3,505
4,638
4,855
3,865
4,277
PdGATA6i 1
1,124
3,255
3,505
4,638
4,855
GATA name 5’, 3’ UTRs and
ORF Region ORF Region
1
645
2,758
3,008
4,141
4,150
1
645
2,758
3,008
4,141
4,150
1
627
2,758
3,008
4,141
4,150
1
627
2,758
3,008
4,141
4,150
1
645
2,758
3,008
4,141
4,150
1
699
1
645
2,758
3,008
3,368
3,377
1
645
2,758
3,008
4,141
4,150
1
627
2,758
3,008
4,141
4,150
GATA transcripts without the GATA Domain
Phylogenetic Tree of Populus GATA Domains (with Arabidopsis)
- 262 GATA TFs (389 transcripts) from seven Populus genomes
were classified into four subfamilies (I to IV), which are
common in both monocots and dicots, based on phylogenetic
tree.
- Based on bootstrap value of each node (>90), two nodes
(orange circles) containing Populus and Arabidopsis GATA TFs
together were identified in Subfamily II, III, and IV, indicating
that subfamily I is more diverged between two genera than
the remaining subfamilies.
- Subfamily II clade presents several Populus GATA transcripts
not clustered into the major clades (blue arrows), indicating
that more events have been occurred during evolution.
- Except P. euphratica GATA transcripts, more than one GATA
transcripts from each Populus species lack CCT and/or TIFY
domains in subfamily III.
- 9 GATA transcripts without alternative splicing forms from
five Populus species lost CCT or/and TIFY domains.
- 5 GATA transcripts which have alternative splicing forms from
three Populus species show that the CCT or/and TIFY
domains were lost by alternative splicing events.
2(2)/1(1)/2(5)/2(2)/2(3)/2(3)/2(2)
4(10)/4(4)/4(16)/4(14)/4(5)/4(10)/4(7)
2(2)/2(2)/2(2)/2(3)/2(2)/2(3)/2(2)
1(1)/1(1)/1(1)/1(1)/1(1)
AtGATA25c
AtGATA25b
AtGATA25a
AtGATA28a
AtGATA28b
AtGATA24b
AtGATA24a
AtGATA26c
AtGATA26b
AtGATA27
AtGATA26a
AtGATA16
AtGATA15
AtGATA30
AtGATA17
AtGATA21
AtGATA22
AtGATA23
AtGATA19
AtGATA18
PpGATA23
AtGATA20
PpGATA21
PeGATA19
AtGATA29
PeGATA23
AtGATA14
AtGATA3b
AtGATA3a
AtGATA13
AtGATA10a
AtGATA10b
AtGATA11b
AtGATA11a
AtGATA5a
AtGATA5b
AtGATA7
AtGATA6
AtGATA12
AtGATA2
AtGATA4
AtGATA9
AtGATA1
PdGATA18
AtGATA8a
AtGATA8b
26
22
84
100
40
50
100
52
99
67
11
11
96
37
95
62
51
100
43
97 98
88
44
37
97
68
68
54
68
69
99
36
46
98
65
64
62
96
41
100
73
99
63
55
68
91
76
100
12
100
99
72
58
13
43
2
99
99
25
34
24
99
41
99
14
3
95
76
49
40
42
42
26
60
36
30
27
91
94
22
9
91
99
99
0.050
2(2)/1(1)/1(1)/1(2)/2(2)/1(1)/2(2)
1(1)/1(1)/1(1)/1(1)/1(2)/1(2)/1(2)
2(5)/2(2)/2(6)/2(2)/2(3)/1(1)/2(2)
1(1)/1(1)/1(1)/1(1)/1(1)/1(1)/1(1)
2(4)/2(2)/2(5)/2(3)/2(3)/2(2)/2(2)
2(2)/2(2)/2(2)/2(2)/2(2)/1(2)/1(1)
2(2)/1(1)/2(2)/2(2)/2(2)/1(1)/2(2)
2(12)/2(2)/2(2)/2(3)/2(2)/2(2)/3(11)
2(5)/2(2)/2(5)/2(2)/2(2)/2(5)/2(3)
2(2)/2(2)/2(2)/2(2)/2(2)/2(4)/1(1)
2(2)/2(2)/2(2)/2(2)/2(2)/2(2)/2(2)
1(1)/1(1)/1(2)/1(1)/1(1)/1(2)/1(1)
3(3)/2(2)/3(6)/2(2)/2(2)/2(3)/2(2)
1(1)/1(1)/1(1)/1(1)/1(1)/1(4)/1(1)
1(1)/1(1)/1(1)/1(1)/1(1)/1(1)/1(1)
2(2)/2(2)/2(3)/3(3)/2(3)/2(4)/2(6)
2(6)/2(2)/2(5)/2(3)/2(3)/2(7)/2(2)
Subfamily III
Subfamily IV
Subfamily II
Subfamily I
ㅇㅇ P. trichocarpa .P. pruinosa .P. tremula x alba 717-IB4
.P. euphratica .P. tremuloides
.P. tremula
.P. deltoides WV94
Conserved GATA Domain Sequences Along with Subfamilies and Genera
Conserved GATA domain sequences & amino acid forms in each position
Acknowledgements
- This research was fully supported by InfoBoss Grant (IBI-
0001).
- It is also supported by
endless sacrifice of high-
end servers stayed in
InfoBoss DataCenter
1. Reyes, J.C., Muro-Pastor, M.I. and Florencio, F.J., 2004. The GATA family of transcription factors in
Arabidopsis and rice. Plant physiology,134(4), pp.1718-1732.
2. Chen, H., Shao, H., Li, K., Zhang, D., Fan, S., Li, Y., Han, M., Genome-wide identification, evolution, and
expression analysis of GATA transcription factors in apple (Malus × domestica Borkh.). Gene, 627, pp.460-
472.
3. Zhang, C., Hou, Y., Hao, Q., Chen, H., Chen, L., Yuan, S, Shan, Z., Zhang, X., Yang, Z., Qiu, D., Zhou, X.,
Huang, W., Genome-Wide Survey of the Soybean GATA Transcription Factor Gene Family and Expression
Analysis under Low Nitrogen Stress, PLoS One, 10(4), e0125174.
4. Zhang C, Hou Y, Hao Q, Chen H, Chen L, Yuan S, Shan Z, Zhang X, Yang Z, Qiu D (2015) Genome-wide survey
of the soybean GATA transcription factor gene family and expression analysis under low nitrogen stress.
PLoS One 10: e0125174
5. Tao A, Xiao-Jia L, Wei X, Ai-Zhong L (2015) Identification and Characterization of GATA Gene Family
in Castor Bean (Ricinus communis). Plant Diver. Resour. 37: 453-462
6. Yuan Q, Zhang C, Zhao T, Yao M, Xu X (2018) A Genome-Wide Analysis of GATA Transcription Factor Family in
Tomato and Analysis of Expression Patterns. INTERNATIONAL JOURNAL OF AGRICULTURE AND BIOLOGY 20: 1274-
1282
7. Chen H, Shao H, Li K, Zhang D, Fan S, Li Y, Han M (2017) Genome-wide identification, evolution, and expression
analysis of GATA transcription factors in apple (Malus× domestica Borkh.). Gene 627: 460-472
8. Park J-S, Kim H-J, Kim S-O, Kong S-H, Park J-J, Kim S-R, Han H-Y, Park B-S, Jung K-Y, Lee Y-H (2006) A comparative
genome-wide analysis of GATA transcription factors in fungi. Genomics & Informatics 4: 147-160
9. Park J, Kim Y, Kim S, Jung J, Woo J, Park C. 2011. Integration of auxin and salt signals by the NAC transcription
factor NTM2 during seed germination in Arabidopsis.Plant Physiology 156:537-549
10. Kim Y, Kim S, Park J, Park H, Lim M, Chua N, Park C. 2006. A membrane-bound NAC transcription factor regulates
cell division in Arabidopsis. The Plant Cell 18:3132-3144
11. Seo PJ (2014) Recent advances in plant membrane‐bound transcription factor research: Emphasis on intracellular
movement. Journal of integrative plant biology 56: 334-342
: Amino acid conserved at 100%
: Amino acid conserved at over 80%
: Amino acid conserved at over 60%
: Amino acid conserved at over 40%
: Amino acid conserved at over 20%
: Amino acid conserved at less than 20%
The number below the amino acid means the total
amino acid in that section of each subfamily.
-Populus-specific 100% conserved
amino acids on GATA domain
were identified in subfamilies I
and II against those of A. thaliana
GATA transcripts.
- In subfamily IV, there are 100%
conserved amino acids at the
same position in Populus and A.
thaliana, but no common amino
acids in two species.
- Subfamilies I, II, and IV are 100%
conserved as Trp(W) in position
22, but subfamily III is 100%
preserved as Met(M).
-CX2CX17~20CX2C region which
binds to DNA has few amino acid
forms; while, before and after Zn
loop regions are more various
amino acid forms than Zn loop
region.
References
Populus name GATA TF name Number of GATA
transcripts without the
GATA domain Subfamily
P.
tremula x alba 717-1B4
PtaaGATA13
1 I
P.
tremula x alba 717-1B4
PtaaGATA21
1II
P.
tremula x alba 717-1B4
PtaaGATA23
1II
P.
tremula x alba 717-1B4
PtaaGATA27
1III
P.
tremula x alba 717-1B4
PtaaGATA29
1III
P.
tremula x alba 717-1B4
PtaaGATA31
1III
P.
tremula
PtaGATA28
5III
P.
tremula
PtaGATA30
1III
P.
tremula
PtaGATA33
2IV
P.
tremuloides
PtsGATA31
1III
P.
tremuloides
PtsGATA33
1III
P.
tremuloides
PtsGATA35
1III
- Interestingly, only P. tremuloides, P. tremula, and P. tremula xalba
717-1B4 have GATA transcripts missing GATA domain across all four
subfamilies.
- Transcripts that missed GATA domain by alternative splicing forms
are no longer GATA transcript because they cannot bind to DNA so
that excluded from further analyses.
Identification of Populus GATA Subfamilies
Subfamily I
Subfamily II
Subfamily III
Subfamily IV
Major domain type
Type IVb Type IVb Type IVc Type IVb
GATA Domain position
Mostly C-
terminal
Mostly C-
terminal
Mostly C-
terminal N-terminal
Total number of GATA
transcripts
185
78
105
21
P.
trichocarpa
35
12
15
5
P.
pruinosa
17
11
7
2
P.
euphratica
20
14
19
2
P.
deltoides WV94
31
9
13
2
P.
tremuloides
19
10
12
3
P.
tremula
34
9
16
1
P.
tremula x alba 717-
1B4
29
13
23
6
- There is no specific features of Populus GATA subfamilies in comparison
to those of Arabidopsis.
- Overall length distribution
of Populus and Arabidopsis
GATA transcripts are similar to
each other.
- However, there are several
extreme cases in Populus GATA
transcripts: largest length of
GATA transcripts (red circle)
and shortest GATA transcripts
(blue circle) in each
subfamilies.
- Most of these GATA transcripts
contain intact GATA domains.
100
200
300
400
500
600
700
800
900
0Po Ar Ar Ar ArPo Po Po
Subfamily I Subfamily II Subfamily III Subfamily IV
Classification of GATA Domains
Motif Type
Pattern
Plant
Fungi
Metazoa
REFs
Type
IVa
CX
2CX17CX2C X O O
[8]
Type IVb
CX
2CX18CX2C O O O
[1] [8]
Type IVc
CX
2CX19-21CX2C O O O
[1] [8]
Type
IV4*
CX
4CX18CX2C O - -
[5]
Type
IVp
CX
2CX?or X?CX2C O O -
[3] [8]
This phylogenetic trees of GATA transcripts and GATA domains were constructed with Neighbor-Joining method with bootstrap option (10,000 repeats) by ClustalW 2.1.
Black triangles on the tree indicate a group of Populus GATA domains.
The number X(Y) means: X is the number of GATA TFs and Y is number of GATA transcripts from the Populus species indicated by background color.
- Five GATA transcripts with TMH indicate that Populus genus may have similar to MTFs.
- PtGATA7 presents alternative splicing forms (PtGATA7a, 7d, and 7e) which do not have
TMH, indicating that alternative splicing forms cause missing TMH.
- All five GATA transcripts have only one TMH.
- MTFs in Populus genus are identified only in Subfamily I and II.
PtaaGATA29a
PtaaGATA29b
PtaaGATA29c
: CCT domain : TIFY domain : GATA domain
Domain structure
GATA name
TIFY domain was
missed by alternative
splicing event.
1: NTL6 is localized in the plasma membrane under normal
conditions.
2: Under stress conditions, the NTL6 protein is processed by
an as-yet-unidentified intramembrane protease.
3: The SnRK2.8 kinase is responsible for phosphorylation of
NTL6 and facilitates its nuclear import.
4: Transcription factor controls expression of target
transcripts in the nucleus.
Mechanisms of Membrane-bound Transcription Factor (MTF)11)
Membrane-bound
Transcription Factor
(MTF)
Cytoplasm
Nucleus
P
1
2
3
Intramembrane
protease
SnRK
2.8
Protein
kinase
Control of
expression of
transcripts 4
- Type IV4is defined in this study because there is no type definition for
this pattern.