PreprintPDF Available

Constraints on the evolution of toxin-resistant Na,K-ATPases have limited dependence on sequence divergence

Authors:

Abstract

A growing body of theoretical and experimental evidence suggests that intramolecular epistasis is a major determinant of rates and patterns of protein evolution and imposes a substantial constraint on the evolution of novel protein functions. Here, we examine the role of intramolecular epistasis in the case of the recurrent evolution of resistance to cardiotonic steroids (CTS) across diverse groups of tetrapods, which occurs via specific amino acid substitutions to the α-subunit family of Na,K-ATPases (ATP1A). First identifying a series of recurrent substitutions at two key sites of ATP1A that are predicted to confer CTS resistance in diverse tetrapods, we then performed protein engineering experiments to test the functional consequences of introducing these substitutions onto divergent species backgrounds. In line with previous results, we find that substitutions at these sites can have substantial background-dependent effects on CTS resistance. Globally, however, these substitutions also have pleiotropic effects that are consistent with additive rather than background-dependent effects. Moreover, the magnitude of a substitution’s effect on activity does not depend on the overall extent of ATP1A sequence divergence between species. Our results suggest that epistatic constraints on the evolution of CTS-resistant forms of Na,K-ATPase likely depends on a small number of sites, with little dependence on overall levels of protein divergence. We propose that this dependence on a limited number sites may account for the observation of parallel CTS resistance substitutions observed among taxa with highly divergent Na,K-ATPases. Significance Statement Individual amino acid residues within a protein work in concert to produce a functionally coherent structure that must be maintained even as orthologous proteins in different species diverge over time. Given this dependence, we expect identical mutations to have more similar effects on protein function in more closely related species. We tested this hypothesis by performing protein-engineering experiments on ATP1A, an enzyme mediating target-site insensitivity to cardiotonic steroids (CTS) in diverse animals. These experiments reveal that that the phenotypic effects of substitutions can sometimes be background-dependent, but also that the magnitude of these phenotypic effects does not correlate with overall levels of ATP1A sequence divergence. Our results suggest that epistatic constraints are determined by states at a small number of sites, potentially explaining the frequent parallel CTS resistance substitutions among Na,K-ATPases of highly divergent taxa. Significance Statement Individual amino acid residues within a protein work in concert to produce a functionally coherent structure that must be maintained even as orthologous proteins in different species diverge over time. Given this dependence, we expect identical mutations to have more similar effects on protein function in more closely related species. We tested this hypothesis by performing protein-engineering experiments on ATP1A, an enzyme mediating target-site insensitivity to cardiotonic steroids (CTS) in diverse animals. These experiments reveal that although the phenotypic effects of substitutions can sometimes be background-dependent, the magnitude of these effects does not correlate with ATP1A1 sequence divergence. This implies that the genetic background across the ATP1A protein does not strongly limit the evolution of CTS resistance in animals.
1
Main Manuscript for
1
Epistasis is not a strong constraint on the recurrent evolution of toxin-
2
resistant Na+,K+-ATPases among tetrapods.
3
4
Shabnam Mohammadi1,2,*, Lu Yang3, §,*, Santiago Herrera-Álvarez4,5,*, María del Pilar Rodríguez-
5
Ordoñez4,#, Karen Zhang3, Jay F. Storz1, Susanne Dobler2, Andrew J. Crawford4 & Peter
6
Andolfatto6
7
1School of Biological Sciences, University of Nebraska, Lincoln, NE, USA
8
2Molecular Evolutionary Biology, Institute of Zoology, Universität Hamburg, Hamburg, Germany
9
3Department of Ecology and Evolution, Princeton University, Princeton, NJ, USA
10
4Department of Biological Sciences, Universidad de los Andes, Bogotá, 111711, Colombia
11
5Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
12
6Department of Biological Sciences, Columbia University, New York, NY, USA
13
14
*Co-first authorship
15
§ Current address: Wellcome Sanger Institute, Cambridge, United Kingdom
16
# Current address: Université Paris-Saclay Evry, Evry, France
17
18
Email: andrew@dna.ac, pa2543@columbia.edu
19
20
SM: 0000-0003-3450-6424, LY: 0000-0002-2694-1189, SHA: 0000-0002-0793-7811, MPRO:
21
0000-0002-0856-1297, KZ: 0000-0003-4406-9977, JFS: 0000-0001-5448-7924, SD: 0000-0002-
22
0635-7719, AJC: 0000-0003-3153-6898, PA: 0000-0003-3393-4574
23
24
Classification
25
Biological Sciences; Evolution
26
Keywords
27
Epistasis, protein evolution, cardiotonic steroids, toxin resistance, adaptation
28
Author Contributions
29
PA and AJC conceived of and oversaw the project; SM, JFS, SD, AJC and PA
30
designed experiments; KZ, LY, MPRO, SHA, SM collected data; SM, SHA and PA
31
performed evolutionary and statistical analyses; SM, SHA, and PA wrote the paper; All authors
32
edited the manuscript.
33
This PDF file includes:
34
Main Text
35
Figures 1 to 6
36
37
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
2
Abstract
38
Comparative genomic studies reveal a global decline in rates of convergent amino acid substitution
39
as a function of evolutionary distance. This pattern has been attributed to epistatic constraints on
40
protein evolution, the idea being that mutations tend to confer the same fitness effects on more
41
similar genetic backgrounds, so convergent substitutions are more likely to occur in closely related
42
species. However, this hypothesis lacks experimental validation. We tested this model in the
43
context of the recurrent evolution of resistance to cardiotonic steroids (CTS) across diverse groups
44
of tetrapods, which occurs via specific amino acid substitutions to the α-subunit family of Na+,K+-
45
ATPases (ATP1A). After identifying a series of recurrent substitutions at two key sites of ATP1A1
46
predicted to confer CTS resistance, we performed protein engineering experiments to test the
47
functional consequences of introducing these substitutions onto divergent species backgrounds.
48
While we find that substitutions at these sites can have substantial background-dependent effects
49
on CTS resistance, we also find no evidence for background-dependent effects on protein activity.
50
We further show that the magnitude of a substitution’s effect on activity does not depend on the
51
overall extent of ATP1A1 sequence divergence between species. More generally, a global analysis
52
of substitution patterns across ATP1A orthologs and paralogs reveals that the probability of
53
convergent substitution protein-wide is not predicted by sequence divergence. Together, these
54
findings suggest that intramolecular epistasis is not an important constraint on the evolution of
55
ATP1A CTS resistance in tetrapods.
56
57
Significance Statement
58
Individual amino acid residues within a protein work in concert to produce a functionally coherent
59
structure that must be maintained even as orthologous proteins in different species diverge over
60
time. Given this dependence, we expect identical mutations to have more similar effects on protein
61
function in more closely related species. We tested this hypothesis by performing protein-
62
engineering experiments on ATP1A, an enzyme mediating target-site insensitivity to cardiotonic
63
steroids (CTS) in diverse animals. These experiments reveal that although the phenotypic effects
64
of substitutions can sometimes be background-dependent, the magnitude of these effects does not
65
correlate with ATP1A1 sequence divergence. This implies that the genetic background across the
66
ATP1A protein does not strongly limit the evolution of CTS resistance in animals.
67
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
3
Main Text
68
69
Introduction
70
71
Patterns of molecular parallelism and convergence represent a useful paradigm to examine the
72
factors that limit the rate of adaptation and the extent to which adaptive evolutionary paths are
73
predictable (1, 2). In the context of protein evolution, patterns of parallelism and convergence are
74
influenced by pleiotropy (the effect of a given mutation on multiple phenotypes) and intramolecular
75
epistasis (nonadditive interactions between mutant sites in the same protein) (311). If the
76
phenotypic and fitness effects of mutations depend on the genetic background on which they arise
77
(i.e. epistasis), a given mutation is expected to have more similar effects in orthologs from closely
78
related species. Therefore, the probability of parallel or convergent substitution resulting in
79
sequence divergence between species is expected to decrease with divergence time. Consistent
80
with this expectation, there is evidence for such a decline in broad-scale phylogenetic comparisons
81
of mitochondrial (12) and nuclear (13, 14) proteins. However, this hypothesis has not been tested
82
experimentally to date.
83
84
To address the question of how changes in the genetic background alter the phenotypic effects of
85
new mutations, we focus on the test case of the repeated evolution of resistance to cardiotonic
86
steroids (CTS) in animals. CTS are potent inhibitors of Na+,K+-ATPase (NKA), a protein that plays
87
a critical role in maintaining membrane potential and is consequently vital for the maintenance of
88
many physiological processes and signaling pathways in animals (15). NKA (Fig. 1A) is a
89
heterodimeric transmembrane protein that consists of a catalytic α-subunit (ATP1A) and a
90
glycoprotein
𝛽
-subunit (ATP1B) (16). CTS inhibit NKA function by binding to a highly conserved
91
domain of ATP1A and blocking the exchange of Na+ and K+ ions (15). NKA is thus often the target
92
of parallel evolution of CTS resistance in insect herbivores that feed on toxic plants (17, 18) as well
93
as vertebrate predators that feed on toxic prey (19–22). Functional investigations of CTS
94
resistance-conferring substitutions in Drosophila (23, 24) and Neotropical grass frogs (25) revealed
95
associated negative pleiotropic effects on protein function and showed that substitutions elsewhere
96
in the protein mitigate these effects. However, despite these examples, the generality of these
97
patterns, and specifically the predicted dependence on evolutionary distance, remain poorly
98
understood given the limited availability of comparative functional data.
99
100
Broad phylogenetic comparisons in vertebrates have focused primarily on the H1-H2 extracellular
101
loop of ATP1A proteins, a subset of the CTS-binding domain that contains two sites (111 and 122)
102
known to underlie CTS resistance in rats and toad-eating frogs (25, 26). Most vertebrates possess
103
three paralogs of the α-subunit gene (ATP1A) that have different tissue-specific expression profiles
104
and are associated with distinct physiological roles (Fig. 1B) (15, 27). Mammals possess a fourth
105
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
4
paralog that is expressed predominantly in testes (28). A major limitation of studies to date is that
106
the H1-H2 extracellular loop has been inconsistently surveyed among vertebrate taxa, with
107
previous studies focusing on ATP1A3 in reptiles (20, 21, 29, 30), ATP1A1 and/or ATP1A2 in birds
108
and mammals (30, 31), and either ATP1A1 or ATP1A3 in amphibians (19, 30). We therefore lack
109
a comprehensive survey of amino acid variation in the ATP1A protein family across vertebrates.
110
111
To bridge this gap, we first surveyed variation in near full-length coding sequences of the three
112
NKA α-subunit paralogs (ATP1A1, ATP1A2, ATP1A3) that are shared across major extant tetrapod
113
groups (mammals, birds, non-avian reptiles, and amphibians), and identified substitutions that
114
occur repeatedly among divergent lineages. Focusing on two key sites implicated in CTS resistance
115
across animals (111 and 122), we tested whether substitutions at these sites have increasingly
116
distinct phenotypic effects on more divergent genetic backgrounds. Specifically, we engineered
117
several common substitutions at sites 111 and 122 of ATP1A1 that differ between species to reveal
118
potential crypticepistasis (8, 32). By quantifying the level of CTS resistance conferred by these
119
substitutions, as well as their effects on enzyme function, we evaluate the extent to which pleiotropy
120
and epistasis have constrained the evolution of CTS-resistant forms of ATP1A1 across tetrapods.
121
122
123
Results
124
125
Patterns of ATP1A sequence evolution across species and paralogs.
126
127
To obtain a more comprehensive portrait of ATP1A amino acid variation among tetrapods, we
128
created multiple sequence alignments for near full-length ATP1A proteins for the three ATP1A
129
paralogs shared among vertebrates. In addition to publicly available data, we generated new RNA-
130
seq data for 27 non-avian reptiles (PRJNA754197) (Table S1-S2). We then de novo assembled
131
full-length transcripts of all ATP1A paralogs using these and RNA-seq data from 18 anuran species
132
(25) (PRJNA627222) to achieve better representation for these groups. In total, this dataset
133
comprises 429 species for ATP1A1, 197 species for ATP1A2 and 204 species for ATP1A3 (831
134
sequences total; Supplemental Dataset 1, Fig. S1).
135
136
Our survey reveals numerous substitutions at sites implicated in CTS resistance of NKA (Fig. 2;
137
Supplementary Dataset 2; for comparison to insects, see Supplemental file 1 of ref. (23)). As
138
anticipated from studies of full-length sequences in insects (17, 18, 23), most amino acid variation
139
among species and paralogs is concentrated in the H1-H2 extracellular loop (residues 111-122;
140
Fig 1A). Despite harboring just 28% of 43 sites previously implicated in CTS resistance (33), the
141
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
5
H1-H2 extracellular loop contains 81.4% of all substitutions identified among the three ATP1A
142
paralogs (Fig. S2).
143
Our survey reveals several clade- and paralog-specific patterns. Notably, ATP1A1 exhibits more
144
variation among species at sites implicated in CTS resistance (Fig. 2). Most of the variation in
145
ATP1A2 at these sites is restricted to squamate reptiles and ATP1A3 lacks substitutions at site 122
146
altogether, despite the well-known potential for substitutions at this site to confer CTS resistance
147
(25, 26). Looking across species and paralogs, the extent of parallelism at sites 111 and 122 is
148
remarkable (Figs. 2-3): for example, the substitutions Q111E, Q111T, Q111H, Q111L, and Q111V
149
all occur in parallel in multiple species of both insects and vertebrates. N122H and N122D also
150
frequently occur in parallel in both of these major clades. The frequent parallelism of CTS-sensitive
151
(i.e. Q111 and N122) to CTS-resistant states at these sites has been interpreted as evidence for
152
adaptive significance of these substitutions (1720), but may also reflect mutation biases (34) and
153
the nature of physico-chemico constraints (13, 35).
154
In contrast, some parallelism is restricted to specific clades: for example, Q111R occurs in parallel
155
across tetrapods but has not been observed in insects. Similarly, the combination Q111R+N122D
156
has evolved three times independently in ATP1A1 of tetrapods but is not observed in insects.
157
Conversely, insects have evolved Q111V+N122H independently four times, but this combination
158
has never been observed in tetrapods. These patterns suggest that the fitness effects of some
159
CTS-resistant substitutions depend on genetic background, with the result that CTS-resistance
160
evolved via different mutational pathways in different lineages.
161
Beyond known CTS-resistant substitutions at sites 111 and 122, some taxa have evolved other
162
paths to CTS resistance. For example, the Pacman frog (genus Ceratophyrs) is known to prey on
163
CTS-containing toads (36) and its ATP1A1 harbors a known CTS-resistant substitution at site 121
164
(D121N, Supplementary Dataset 2). This substitution is rare among vertebrates but has been
165
previously reported in CTS-adapted milkweed bugs (17, 18). Similarly, the known CTS resistance
166
substitution C104Y is observed among many natricid snakes (Supplementary Dataset 2) and CTS-
167
adapted milkweed weevils (18). Chinchilla (Chinchilla lanigera) and yellow-throated sandgrouse
168
(Pterocles gutturalis) show distinct single-amino acid insertions in the H1–H2 extracellular loop, a
169
characteristic that has been previously associated with CTS resistance in pyrgomorphid
170
grasshoppers (33, 37). Further, in lieu of variation at site 122, ATP1A3 of tetrapods harbors
171
frequent parallel substitutions at site 120 (G120R). Interestingly, this site also shows substantial
172
parallel substitution in the ATP1A1 paralog of birds (where N120K occurs eight times
173
independently) but is mostly invariant in ATP1A1 of other tetrapods.
174
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
6
175
Context-dependent CTS resistance for substitutions at sites 111 and 122
176
177
The clade- and paralog-specific patterns of substitution among ATP1A paralogs outlined above
178
suggest that the evolution of CTS resistance may be highly dependent on sequence context.
179
However, the functional effects of the vast majority of these substitutions on the diverse genetic
180
backgrounds in which they occur remain largely unknown (25, 26, 29). Given the diversity and
181
broad phylogenetic distribution of parallel substitutions at sites 111 and 122, and the documented
182
effects of some of these substitutions on CTS resistance, we experimentally tested the extent to
183
which functional effects of substitutions at these sites are background-dependent.
184
185
We focused functional experiments on ATP1A1, because it is the most ubiquitously expressed
186
paralog and exhibits both the most sequence diversity and the broadest phylogenetic distribution
187
of parallel substitutions. Specifically, we considered ATP1A1 orthologs from nine representative
188
tetrapod species that possess different combinations of wild-type amino acids at 111 and 122 (Fig.
189
4A). Our taxon sampling includes two lizards, two snakes, two birds, two mammals and previously
190
published data for one amphibian (Fig. S4; Fig. S5; Table S3). The ancestral amino acid states of
191
sites 111 and 122 in tetrapods are Q and N, respectively. We found that the sum of the number of
192
derived states at positions 111 and 122 is a strong predictor of the level of CTS-resistance (Fig 4B,
193
IC50, Spearman’s rS=0.85, p=0.001). Nonetheless, we also found greater than 10-fold variation in
194
CTS-resistance among enzymes that had identical paired states at 111 and 122 (e.g., compare
195
chinchilla (CHI) versus red-necked keelback snakes (KEE) or compare rat (RAT) versus the
196
resistant paralog of grass frogs (GRAR)). These differences suggest that substitutions at other sites
197
also contribute to CTS resistance.
198
199
To test for epistatic effects of common CTS-resistant substitutions at sites 111 and 122, we used
200
site-directed mutagenesis to introduce 15 substitutions (nine at position 111 and six at position 122)
201
in the wildtype ATP1A1 backgrounds of 9 different species (Fig. S4). The specific substitutions
202
chosen were either phylogenetically broadly-distributed parallel substitutions and/or divergent
203
substitutions that distinguish closely related clades of species. We expressed each of these 24
204
ATP1A1 constructs with an appropriate species-specific ATP1B1 protein (Table S3). For each
205
recombinant NKA protein complex, we characterized its level of CTS resistance (IC50) and we
206
estimated enzyme activity as the rate of ATP hydrolysis in the absence of CTS (Table S4).
207
208
Of the 12 substitutions for which IC50 could be measured, substitutions had a 15-fold effect on
209
average (Fig. 4C, Table S4) and were equally likely to increase or decrease IC50. To assess the
210
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
7
background-dependence of specific substitutions, we examined five cases in which a given
211
substitution (e.g., E111H), or the reverse substitution (e.g., H111E), could be evaluated on two or
212
more backgrounds. In the absence of intramolecular epistasis, the effect of a substitution in different
213
backgrounds should remain unchanged and the magnitude of the effect of the reverse substitution
214
should also be the same but with opposite sign. This analysis revealed substantial background
215
dependence for IC50 in two of the five informative cases (Fig. 4E; Table S5). In one case, the N122D
216
substitution results in a 200-fold larger increase in IC50 when added to the chinchilla (CHI)
217
background compared to the grass frog (GRA) background (p=1.2e-3 by ANOVA). In the other
218
case, the E111H substitution and the reverse substitution (H111E) produced effects in the same
219
direction (reducing CTS-resistance) when added to different backgrounds (false fer-de-lance (FER)
220
and red-necked keelback (KEE) snakes, respectively, p=1e-7 by ANOVA). Overall, these results
221
suggest that the effect of a given substitution on IC50 can be strongly dependent on the background
222
on which it occurs. The remaining three substitutions (H111T, Q111R and H122D) showed no
223
significant change in the magnitude of the effect on IC50 when introduced into different species
224
backgrounds. These results suggest that, while some substitutions can have strong background-
225
dependent effects, strong intramolecular epistasis with respect to CTS resistance is not universal.
226
227
Pleiotropic effects on NKA activity exhibit little evidence for background-dependence.
228
229
We next tested whether substitutions at sites 111 and 122 have pleiotropic effects on ATPase
230
activity. Because ion transport across the membrane is a primary function of NKA and its disruption
231
can have severe pathological effects (38), mutations that compromise this function are likely to be
232
under strong purifying selection. As suggested by previous work (2325), CTS-resistant
233
substitutions at sites 111 and 122 can decrease enzyme activity. We evaluated the generality of
234
these effects by comparing enzyme activity of the 15 mutant NKA proteins to their corresponding
235
wild-type proteins.
236
237
Interestingly, the wild-type enzymes themselves exhibit substantial variation in activity, from 3-18
238
nmol/mg*min (P = 6e-7 by ANOVA, Fig 4D; Table S4). On average, substitutions at sites 111 and
239
122 changed enzyme activity by 60% (Fig 4D; Fig S4). In two cases, amino acid substitutions at
240
position 122 (N122H and H122D) nearly inactivate lizard NKAs and, in one case, a substitution at
241
position 111 (Q111T) resulted in low expression of the recombinant protein in the transfected cells
242
(Fig S5; Fig. S6). A test of uniformity of pairwise t-test p-values across substitutions suggests a
243
significant enrichment of low p-values (Fig 4D inset; p=2.5e-4, chi-squared test of uniformity). Thus,
244
globally, this set of substitutions has significant effects on NKA activity, but they were not
245
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
8
significantly more likely to decrease than increase activity (10 decrease: 5 increase, p>0.3, binomial
246
test, Fig. 4D, Table S5).
247
248
We next asked to what extent pleiotropic effects of CTS resistant substitutions at positions 111 and
249
122 are dependent on genetic background. This question is motivated by recent studies in insects
250
which revealed that deleterious pleiotropic effects of some resistance-conferring substitutions at
251
sites 111 and 122 are background-dependent (23, 24). Likewise, recent work on ATP1A1 of toad-
252
eating grass frogs showed that effects of Q111R and N122D on NKA activity are background-
253
dependent (25). In contrast, among the five informative cases in which we compared the same
254
substitution (or the reverse substitution) on two or more backgrounds, there is little evidence for
255
background dependence (Fig 4E; Table S5). For example, N122D has similar effects on NKA
256
activity in grass frog and chinchilla despite the substantial divergence between the species’ proteins
257
(8.4% protein sequence divergence; Fig. 4D). Similarly, the effects of Q111R in ostrich or the
258
reverse substitution R111Q in sandgrouse were not significantly different from the effect of Q111R
259
in grass frog (7.5% and 8% protein sequence divergence, respectively).
260
261
To further examine the evidence for background dependence, we tested whether changes to the
262
same amino acid state (regardless of starting state) at 111 and 122 produce different changes in
263
NKA activity (e.g., R111E on the rat background versus H111E on the false fer-de-lance
264
background). If epistasis is important, we expect that the difference in effects of substitutions to a
265
given amino acid state should increase with increasing sequence divergence compared to ATP1A1
266
backgrounds in which that state is wild-type. However, across the 11 possible comparisons, we
267
found no relationship between the difference in the effect of substitutions to the same state and the
268
extent of amino acid divergence between the orthologous proteins (Fig. 5). This pattern suggests
269
that, while pleiotropic effects can be background dependent (23, 25), these effects are not
270
pervasive across species and do not correlate with overall sequence divergence.
271
272
273
The overall rate of convergence across ATP1A proteins does not depend on sequence
274
divergence.
275
276
If intramolecular epistasis is pervasive, we would predict that rates of convergent substitution
277
should decrease as a function of overall sequence divergence (1214). In contrast to this
278
expectation, our experiments suggest that, for ATP1A1, the extent of background sequence
279
divergence is a poor predictor of the magnitude of effects of substitutions at sites 111 and 122 on
280
CTS resistance and enzyme activity. Since our experiments were necessarily limited in scope, we
281
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
9
carried out a broad phylogenetic analysis to evaluate how well our findings align with global
282
estimates of rates of convergence for the ATP1A family beyond ATP1A1 and beyond sites
283
implicated in CTS resistance.
284
285
Using a multisequence alignment of 831 ATP1A protein sequences, including the three ATP1A
286
paralogs shared among tetrapods (i.e., amphibians, non-avian reptiles, birds, and mammals), we
287
inferred a maximum likelihood phylogeny of the gene family (Fig. S1). We then used ancestral
288
sequence reconstruction to infer the history of substitution events on all branches in the tree and
289
counted the number of convergent amino acid substitutions along the protein per site (see Materials
290
and Methods). Convergent substitutions are defined as substitutions on two branches at the same
291
site resulting in the same amino acid state. Interestingly, we do not detect a correlation between
292
the relative number of convergent substitutions with background ATP1A divergence across the tree
293
(Fig. 6A). This result also holds true when considering only substitutions to the key CTS resistance
294
sites 111 and 122 (Fig. S5).
295
296
To gain more insight into the factors that determine convergent evolution in ATP1A, we looked
297
more closely at patterns of individual convergent substitutions at sites 111 and 122 by extracting
298
each convergent substitution and visualizing its distribution along the sequence divergence axis
299
(Fig. 6B). Under the expectation that rates of convergence should tend to decrease as a function
300
of sequence divergence, the distribution of pairwise convergent events along the sequence
301
divergence axis should be left-skewed, with a peak towards lower sequence divergence. In contrast
302
to this expectation, the distribution is bimodal, with one peak at 0.33 and the other at 0.69
303
substitutions/site (Fig. 6B bottom panel). Parallel and convergent substitutions have occurred
304
almost across the full range of protein divergence estimates. For example, if X is any starting state,
305
the substitution X111R has occurred independently in 13 tetrapod lineages and X111L
306
independently in 20 lineages. Both substitutions have a broad phylogenetic distribution, suggesting
307
that their effects do not strongly depend on overall genetic background. Interestingly, however, the
308
distributions for X111H and X111E substitutions are relatively right-skewed, in line with epistasis
309
for CTS resistance that we observed in experiments for H111E/E111H (Fig 4E). Overall, the results
310
of these analyses align well with our functional experiments but run contrary to expectations based
311
on previously reported proteome-wide evolutionary trends (1214).
312
313
Discussion
314
315
Previous work has suggested that rates of convergent amino-acid substitution generally decline as
316
a function of time, a pattern that can potentially be explained by epistatic constraints. According to
317
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
10
this view, the higher the level of sequence divergence between a given pair of homologs, the higher
318
the probability that the same mutation will have different fitness effects on the two backgrounds
319
(12). In that respect, our broad survey of the ATP1A gene family in tetrapods, in combination with
320
previous work, reveals two striking and seemingly contradictory patterns. The first is that some
321
substitutions underlying CTS resistance in tetrapods are broadly distributed phylogenetically and
322
even shared with insects (e.g., N122H is widespread among snakes and found in the monarch
323
butterfly and other insects; see Fig. 3 for more examples). Patterns like these suggest that epistatic
324
constraints have a limited role in the evolution of CTS resistance, as the same mutation can be
325
favored on highly divergent genetic backgrounds. On the other hand, there is also substantial
326
diversity in resistance-conferring states at sites 111 and 122, and some combinations of these
327
substitutions appear to be phylogenetically restricted. For example, the CTS-resistant combination
328
of Q111R+N122D has evolved multiple times in tetrapods but is absent in insects, whereas the
329
CTS-resistant combination Q111V+N122H evolved multiple times in insects but is absent in
330
tetrapods (Fig 3). Additionally, some substitutions also appear to be paralog-specific in tetrapods
331
(Fig 3). These phylogenetic signatures suggest at least some role for epistasis as a source of
332
contingency in the evolution of ATP1A-mediated CTS resistance in animals (i.e., the fitness effects
333
of substitutions depend on the order in which they occur). How can these disparate patterns be
334
reconciled? To what extent do genetic background and contingency limit the evolution of CTS
335
resistance in animals?
336
337
In our survey of putative CTS-resistant substitutions at sites 111 and 122, we find that derived
338
substitutions have largely predictable effects on CTS resistance, with notable exceptions that tend
339
to be in magnitude rather than direction (Fig. 4C and 4E). While derived states at sites 111 and 122
340
are generally a reliable predictor of CTS resistance (Fig. 4A), they do not always predict the effect
341
size of particular substitutions (e.g., Q111R contributes to CTS resistance on many species
342
backgrounds, but not on that of sandgrouse, Fig. 4C). It is also notable that species with identical
343
paired states at 111 and 122 can vary in CTS resistance by more than an order of magnitude. Both
344
patterns point to background determinants of CTS resistance that may be additive rather than
345
epistatic. Yet there are some broadly phylogenetically distributed substitutions, such as N122D,
346
that nonetheless do exhibit background-dependent effects on CTS resistance (Fig. 4C and 4E).
347
348
While epistasis is likely to be a pervasive feature in protein evolution, many mutational effects on
349
structural and functional properties of proteins appear to be purely additive (e.g., (3941). In line
350
with this, our experimental results revealed that the phenotypic effects of individual substitutions
351
on ATPase activity are likely to be additive in general. We also found no correlation between the
352
marginal effect of a substitution with background genetic divergence. Specifically, mutating to the
353
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
11
same amino acid state (irrespective of the initial state) doesn’t result in larger effects in more distant
354
backgrounds. Under additivity, the rate of convergence is expected to be uncorrelated with
355
background genetic distance because the phenotypic effect of a mutation does not depend on the
356
amino acid states at other sites in the protein. Our phylogenetic and experimental results align with
357
this expectation.
358
359
While the extent to which changes in CTS resistance are favorable to an organism depend on
360
physiological constraints and the specific ecological context (e.g., in which tissues NKA is
361
expressed and the presence of dietary CTS), changes in enzyme activity associated with these
362
substitutions are most likely detrimental to organismal fitness. It follows that changes to the ATP1A1
363
background would be required to offset such changes in enzyme activity. Surprisingly, we found
364
that, with rare exceptions, CTS-resistant substitutions at sites 111 and 122 tend to exhibit little or
365
no pleiotropy with respect to enzyme activity. In addition, amino acid substitutions were not
366
significantly more likely to decrease rather than increase activity. Interestingly, the activity of
367
wildtype ATP1A1 enzymes varies 6-fold among the species surveyed (Fig. 4E), suggesting that
368
most species are either robust to changes in NKA activity, or that changes have occurred in other
369
genes (including other ATP1A paralogs) that compensate for changes in activity. Thus, it may be
370
that protein activity itself is either not an important pleiotropic constraint on the evolution of ATP1A
371
CTS resistance or that constraint depends not just on the protein background, but also on the
372
background at higher levels (e.g., other interacting proteins). A further possibility is that detrimental
373
effects of CTS resistant substitutions depend on few sites, and these sites are also highly
374
convergent (e.g., A119S among insect herbivores, see refs. 23 and 24).
375
376
We conclude that intramolecular epistasis in ATP1A -- at the level of protein activity -- is unlikely to
377
represent a substantial constraint in the evolution of CTS resistance. However, the lack of evidence
378
of epistasis at the level of protein function does not preclude an important role for epistasis at higher
379
levels. For example, our results are also consistent with a scenario of nonspecific (or global)
380
epistasis, where mutations have additive effects on molecular phenotypes (e.g., ATPase activity)
381
but have nonadditive effects on fitness due to a nonlinear relationship between phenotype and
382
fitness (7, 40, 42). Nonspecific epistasis predicts a many-to-one relationship with respect to genetic
383
backgrounds and specific mutations (7, 42), such that many genetic backgrounds can compensate
384
for the deleterious effects of a given mutation. Thus, nonspecific epistasis of this form could explain
385
why CTS resistant substitutions at sites 111 and 122 exhibit broad phylogenetic distributions.
386
387
Dependence on few sites, or the many-to-one nature of non-specific epistasis, may also account
388
for the weak signature of decreasing convergence with increasing divergence for ATP1A. Our study
389
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
12
suggests that, while intramolecular epistasis may be pervasive across proteomes, it does not
390
always represent a substantial constraint on the evolution of adaptive traits, as we show here for
391
CTS resistance in tetrapods. Further evaluation of epistasis at higher levels than enzyme activity
392
(e.g., whole organism neural function, CTS tolerance or viability, refs. 23, 24) may elucidate the
393
extent to which nonspecific epistasis constrains protein evolution in these cases.
394
395
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
13
Materials and Methods
396
397
Sample collection and data sources.
398
In order to carry out a comprehensive survey of vertebrate ATP1A paralogs, we collated a total of
399
831 protein sequences for this study (Supplementary Dataset 1). In addition to publicly available
400
data, we also generated RNA-seq data for 27 species of non-avian reptiles (Table S1;
401
PRJNA754197) to achieve a better representation of some previously underrepresented lineages.
402
These included field-caught and museum-archived specimens as well as animals purchased from
403
commercial pet vendors. Purchased animals were processed following the procedures specified in
404
the IACUC Protocol No. 2057-16 (Princeton University) and implemented by a research
405
veterinarian at Princeton University. Wild-caught animals were collected under Colombian umbrella
406
permit resolución No. 1177 granted by the Autoridad Nacional de Licencias Ambientales to the
407
Universidad de los Andes and handled according to protocols approved by the Institutional
408
Committee on the Care and Use of Laboratory Animals (abbreviated CICUAL in Spanish) of the
409
Universidad de los Andes. In all cases, fresh tissues (brain, stomach, and muscle) were taken and
410
preserved in RNAlater (Invitrogen) and stored at -80ºC until used.
411
412
Reconstruction of ATP1A paralogs.
413
RNA-seq libraries were prepared either using TruSeq RNA Library Prep Kit v2 (Illumina) and
414
sequenced on Illumina HiSeq2500 (Genomics Core Facility, Princeton, NJ, USA) or using NEBNext
415
Ultra RNA Library Preparation Lit (NEB) and sequenced on Illumina HiSeq4000 (Genewiz, South
416
Plainfield, NJ, USA) (Table S2). All raw RNA-seq data generated for this study have been deposited
417
in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under
418
bioproject PRJNA754197. Together with SRA datasets downloaded from public database, reads
419
were trimmed to Phred quality 20 and length 20 and then assembled de novo using Trinity
420
v2.2.0 (43). Sequences of ATP1A paralogs 1, 2 and 3 were pulled out with BLAST searches (blast-
421
v2.26), individually curated, and then aligned using ClustalW. Complete alignments of ATP1/2/3
422
can be found in Supplementary Dataset 1.
423
424
Character state mapping and parameter estimation for the ATP1A1-3 paralogs
425
Protein sequences from ATP1A1 (N=429), ATP1A2 (N=197) and ATP1A3 (N=205) including main
426
tetrapod groups (amphibians, non-avian reptiles, birds, and mammals) and lungfish+coelacanth as
427
outgroups were aligned using ClustalW with default parameters. The optimal parameters for
428
phylogenetic reconstruction were taken from the best-fit amino acid substitution model based on
429
Akaike Information Criterion (AIC) as implemented in ModelTest-NG v.0.1.5 (44), and was inferred
430
to be JTT+G4+F. An initial phylogeny was inferred using RAxML HPC v.8 (45) under the
431
JTT+GAMMA model with empirical amino acid frequencies. Branch lengths and node support
432
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
14
(aLRS) were further refined using PhyML v.3.1 (46) with empirical amino acid frequencies and
433
maximum likelihood estimates of rate heterogeneity parameters, I and G. Phylogeny visualization
434
and mapping of character states for each paralog was done using the R package ggtree (47).
435
436
Ancestral sequence reconstruction and convergence calculations
437
Ancestral sequence reconstruction (ASR) was performed in PAML using codeml (48) under the
438
JTT+G4+F substitution model. Statistical confidence in each position’s reconstructed state for each
439
ancestor was determined from the posterior probability (PP), and only states with PP>0.8 were
440
considered. Ancestral sequences from all nodes in the ATP1A phylogeny were retrieved from the
441
codeml output, resulting in an alignment of 1,660 ATP1A proteins (831 extant species and 829
442
inferred ancestral sequences; Fig. S1). For each branch in the tree, we determined the occurrence
443
of substitutions by using the ancestral and derived amino acid states at each site using only states
444
with PP>0.8. All branch pairs were compared, except sister branches and ancestor-descendent
445
pairs (12, 13). When comparing substitutions on two distinct branches at the same site,
446
substitutions to the same amino acid state were counted as convergences, while substitutions away
447
from a common amino acid were counted as divergences. An alignment of 1,040 amino acids was
448
used to calculate the number of molecular convergences and divergences, excluding a putative 30
449
amino acid-long alternative spliced region (positions 834-864). Model-based estimates of sequence
450
divergence, number of convergences, number of divergences, and total number of substitutions
451
since the common ancestor were recorded for each pairwise comparison. We calculated the
452
proportion of observed convergent events per branch as (number of convergences +1) / (number
453
of divergences +1). The line describing the trend was calculated as a running average with window
454
size of 0.05 substitutions/site. 95% confidence intervals were calculated based on 100 bootstrap
455
replicates per window, resampling only variable sites.
456
457
For sites 111 and 122, molecular convergence was coded as “1” when the substitution along
458
branchi was to the same amino acid state as the substitution along branchj, and “0” if substitutions
459
were to different states. Model-based estimates of sequence divergence, amino acid state, and
460
convergence event were recorded for each pairwise comparison (when convergence was “0”,
461
amino acid state was set to “NA”). A logistic regression between molecular convergence (0 or 1)
462
and genetic distance was used to test for the correlation between variables (Fig 6B; Fig. S3)
463
464
Construction of expression vectors.
465
ATP1A1 and ATP1B1 wild-type sequences for the eight selected tetrapod species (Fig 4) were
466
synthesized by InvitrogenTM GeneArt. The
𝛽
1-subunit genes were inserted into pFastBac Dual
467
expression vectors (Life Technologies) at the p10 promoter with XhoI and PaeI (FastDigest Thermo
468
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
15
ScientificTM) and then control sequenced. The α1-subunit genes were inserted at the PH promoter
469
of vectors already containing the corresponding
𝛽
1-subunit proteins using In-Fusion® HD Cloning
470
Kit (Takara Bio, USA Inc.) and control sequenced. All resulting vectors had the α1-subunit gene
471
under the control of the PH promoter and a
𝛽
1-subunit gene under the p10 promoter. The resulting
472
eight vectors were then subjected to site-directed mutagenesis (QuickChange II XL Kit; Agilent
473
Technologies, La Jolla, CA, USA) to introduce the codons of interest. In total, 21 vectors were
474
produced (Table S3).
475
476
Generation of recombinant viruses and transfection into Sf9 cells.
477
Escherichia coli DH10bac cells harboring the baculovirus genome (bacmid) and a transposition
478
helper vector (Life Technologies) were transformed according to the manufacturer’s protocol with
479
expression vectors containing the different gene constructs. Recombinant bacmids were selected
480
through PCR screening, grown, and isolated. Subsequently, Sf9 cells (4 x 105 cells*ml) in 2 ml of
481
Insect-Xpress medium (Lonza, Walkersville, MD, USA) were transfected with recombinant bacmids
482
using Cellfectin reagent (Life Technologies). After a three-day incubation period, recombinant
483
baculoviruses were isolated (P1) and used to infect fresh Sf9 cells (1.2 x 106 cells*ml) in 10 ml of
484
Insect-Xpress medium (Lonza, Walkersville, MD, USA) with 15 mg/ml gentamycin (Roth, Karlsruhe,
485
Germany) at a multiplicity of infection of 0.1. Five days after infection, the amplified viruses were
486
harvested (P2 stock).
487
488
Preparation of Sf9 membranes.
489
For production of recombinant NKA, Sf9 cells were infected with the P2 viral stock at a multiplicity
490
of infection of 103. The cells (1.6 x 106 cells*ml) were grown in 50 ml of Insect-Xpress medium
491
(Lonza, Walkersville, MD, USA) with 15 mg/ml gentamycin (Roth, Karlsruhe, Germany) at 27°C in
492
500 ml flasks (35). After 3 days, Sf9 cells were harvested by centrifugation at 20,000 x g for 10 min.
493
The cells were stored at -80 °C and then resuspended at 0 °C in 15 ml of homogenization buffer
494
(0.25 M sucrose, 2 mM EDTA, and 25 mM HEPES/Tris; pH 7.0). The resuspended cells were
495
sonicated at 60 W (Bandelin Electronic Company, Berlin, Germany) for three 45 s intervals at 0 °C.
496
The cell suspension was then subjected to centrifugation for 30 min at 10,000 x g (J2-21 centrifuge,
497
Beckmann-Coulter, Krefeld, Germany). The supernatant was collected and further centrifuged for
498
60 m at 100,000 x g at 4 °C (Ultra- Centrifuge L-80, Beckmann-Coulter) to pellet the cell
499
membranes. The pelleted membranes were washed once and resuspended in ROTIPURAN® p.a.,
500
ACS water (Roth) and stored at -20 °C. Protein concentrations were determined by Bradford assays
501
using bovine serum albumin as a standard. Three biological replicates were produced for each
502
NKA construct.
503
504
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
16
Verification by SDS-PAGE/western blotting.
505
For each biological replicate, 10 µg of protein were solubilized in 4x SDS-polyacrylamide gel
506
electrophoresis sample buffer and separated on SDS gels containing 10% acrylamide.
507
Subsequently, they were blotted on nitrocellulose membrane (HP42.1, Roth). To block non-specific
508
binding sites after blotting, the membrane was incubated with 5% dried milk in TBS-Tween 20 for
509
1 h. After blocking, the membranes were incubated overnight at 4 °C with the primary monoclonal
510
antibody α5 (Developmental Studies Hybridoma Bank, University of Iowa, Iowa City, IA, USA).
511
Since only membrane proteins were isolated from transfected cells, detection of the α subunit also
512
indicates the presence of the β subunit. The primary antibody was detected using a goat-anti-
513
mouse secondary antibody conjugated with horseradish peroxidase (Dianova, Hamburg,
514
Germany). The staining of the precipitated polypeptide-antibody complexes was performed by
515
addition of 60 mg 4-chloro-1 naphtol (Sigma-Aldrich, Taufkirchen, Germany) in 20 ml ice-cold
516
methanol to 100 ml phosphate buffered saline (PBS) containing 60 µl 30% H2O2. See Fig. S6.
517
518
Ouabain inhibition assay.
519
To determine the sensitivity of each NKA construct against cardiotonic steroids (CTS), we used the
520
water-soluble cardiac glycoside, ouabain (Acrōs Organics), as our representative CTS. 100 ug of
521
each protein was pipetted into each well in a nine-well row on a 96-well microplate (Fisherbrand)
522
containing stabilizing buffers (see buffer formulas in (49)). Each well in the nine-well row was
523
exposed to exponentially decreasing concentrations (10-3 M, 10-4 M, 10-5 M, 10-6 M, 10-7 M, 10-8 M,
524
dissolved in distilled H2O) of ouabain, distilled water only (experimental control), and a combination
525
of an inhibition buffer lacking KCl and 10-2 M ouabain to measure background protein activity (49).
526
The proteins were incubated at 37°C and 200 rpms for 10 minutes on a microplate shaker
527
(Quantifoil Instruments, Jena, Germany). Next, ATP (Sigma Aldrich) was added to each well and
528
the proteins were incubated again at 37°C and 200 rpms for 20 minutes. The activity of NKA
529
following ouabain exposure was determined by quantification of inorganic phosphate (Pi) released
530
from enzymatically hydrolyzed ATP. Reaction Pi levels were measured according to the procedure
531
described in Taussky and Shorr (50) (see Petschenka et al. (49)). All assays were run in duplicate
532
and the average of the two technical replicates was used for subsequent statistical analyses.
533
Absorbance for each well was measured at 650 nm with a plate absorbance reader (BioRad Model
534
680 spectrophotometer and software package). See Table S4.
535
536
ATP hydrolysis assay.
537
To determine the functional efficiency of different NKA constructs, we calculated the amount of Pi
538
hydrolyzed from ATP per mg of protein per minute. The measurements were obtained from the
539
same assay as described above. In brief, absorbance from the experimental control reactions, in
540
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
17
which 100 µg of protein was incubated without any inhibiting factors (i.e., ouabain or buffer
541
excluding KCl), were measured and translated to mM Pi from a standard curve that was run in
542
parallel (1.2 mM Pi, 1 mM Pi, 0.8 mM Pi, 0.6 mM Pi, 0.4 mM Pi, 0.2 mM Pi, 0 mM Pi). See Table
543
S4.
544
545
Statistical analyses of functional data.
546
Background phosphate absorbance levels from reactions with inhibiting factors were used to
547
calibrate phosphate absorbance in wells measuring ouabain inhibition and in the control wells
548
measuring non-inhibited NKA activity (49). For ouabain sensitivity measurements, calibrated
549
absorbance values were converted to percentage non-inhibited NKA activity based on
550
measurements from the control wells (49). These data were plotted and log IC50 values were
551
obtained for each biological replicate from nonlinear fitting using a four-parameter logistic curve,
552
with the top asymptote set to 100 and the bottom asymptote set to zero. Curve fitting was performed
553
with the nlsLM function of the minipack.lm library in R. For comparisons of recombinant protein
554
activity, the calculated Pi concentrations of 100 µg of protein assayed in the absence of ouabain
555
were converted to nmol Pi/mg protein/min. IC50 values were log-transformed. We used pairwise t-
556
tests with Bonferroni corrections to identify significant differences between constructs with and
557
without engineered substitutions. We used a two-way ANOVA to test for background dependence
558
of substitutions (i.e., interaction between background and amino acid substitution) with respect to
559
ouabain resistance (log IC50) and protein activity. Specifically, we tested whether the effects of a
560
substitution X->Y are equal on different backgrounds (null hypothesis: X->Y (background 1) = X-
561
>Y (background 2)). We further assumed that the effects of a substitution X->Y should exactly
562
match that of Y->X. All statistical analyses were implemented in R. Data were plotted using the
563
ggplot2 package in R.
564
565
566
567
Acknowledgments
568
569
We thank C. Natarajan, P. Kowalski, M. Winter, and V. Wagschal for assistance in the laboratory,
570
and D.A. Gómez-Sánchez for assistance in the field. We thank J. Oaks for providing tissue from
571
ring-necked snake. Funding: This study was funded by grants from the National Institutes of Health
572
to PA (R01-GM115523), JFS (R01-HL087216) and SM (F32HL149172), the National Science
573
Foundation (OIA-1736249) to JFS, the Deutsche Forschungsgemeinschaft (Do 517/10-1) to SD,
574
and the Alexander von Humboldt Foundation to SM.
575
576
577
578
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
18
References
579
580
1. D. L. Stern, The genetic causes of convergent evolution. Nat. Rev. Genet. 14, 751764
581
(2013).
582
2. J. F. Storz, Causes of molecular convergence and parallelism in protein evolution. Nat. Rev.
583
Genet. 17, 239 (2016).
584
3. D. L. Stern, Evolution, development, & the predictable genome (Roberts and Co. Publishers,
585
2011).
586
4. P. C. Phillips, Epistasisthe essential role of gene interactions in the structure and
587
evolution of genetic systems. Nat. Rev. Genet. 9, 855867 (2008).
588
5. W. Fitch, Rate of change of concomitantly variable codons. J. Mol. Evol. 1, 8496 (1971).
589
6. B. Callahan, R. A. Neher, D. Bachtrog, P. Andolfatto, B. I. Shraiman, Correlated evolution of
590
nearby residues in Drosophilid proteins. PLoS Genet. 7, e1001315 (2011).
591
7. T. N. Starr, J. W. Thornton, Epistasis in protein evolution. Protein Sci. 25, 12041218
592
(2016).
593
8. J. F. Storz, Compensatory mutations and epistasis for protein function. Curr. Opin. Struct.
594
Biol. 50, 1825 (2018).
595
9. D. D. Pollock, G. Thiltgen, R. A. Goldstein, Amino acid coevolution induces an evolutionary
596
Stokes shift. Proc. Natl. Acad. Sci. 109, E1352 (2012).
597
10. P. Shah, D. M. McCandlish, J. B. Plotkin, Contingency and entrenchment in protein
598
evolution under purifying selection. Proc. Natl. Acad. Sci. 112, E3226 (2015).
599
11. V. O. Pokusaeva, et al., An experimental assay of the interactions of amino acids from
600
orthologous sequences shaping a complex fitness landscape. PLoS Genet. 15, e1008079
601
(2019).
602
12. R. A. Goldstein, S. T. Pollard, S. D. Shah, D. D. Pollock, Nonadaptive amino acid
603
convergence rates decrease over time. Mol. Biol. Evol. 32, 13731381 (2015).
604
13. Z. Zou, J. Zhang, Are convergent and parallel amino acid substitutions in protein evolution
605
more prevalent than neutral expectations? Mol. Biol. Evol. 32, 20852096 (2015).
606
14. Z. Zou, J. Zhang, Gene tree discordance does not explain away the temporal decline of
607
convergence in mammalian protein sequence evolution. Mol. Biol. Evol. 34, 16821688
608
(2017).
609
15. J. B. Lingrel, The physiological significance of the cardiotonic steroid/ouabain-binding site
610
of the Na, K-ATPase. Annu. Rev. Physiol. 72, 395412 (2010).
611
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
19
16. A. Koksoy, Na+, K+-ATPase: a review. J Ank. Med Sch 24, 7382 (2002).
612
17. S. Dobler, S. Dalla, V. Wagschal, A. A. Agrawal, Community-wide convergent evolution in
613
insect adaptation to toxic cardenolides by substitutions in the Na, K-ATPase. Proc. Natl.
614
Acad. Sci. 109, 1304013045 (2012).
615
18. Y. Zhen, M. L. Aardema, E. M. Medina, M. Schumer, P. Andolfatto, Parallel molecular
616
evolution in an herbivore community. Science 337, 16341637 (2012).
617
19. D. J. Moore, D. C. Halliday, D. M. Rowell, A. J. Robinson, J. S. Keogh, Positive Darwinian
618
selection results in resistance to cardioactive toxins in true toads (Anura: Bufonidae). Biol.
619
Lett. 5, 513516 (2009).
620
20. B. Ujvari, et al., Widespread convergence in toxin resistance by predictable molecular
621
evolution. Proc. Natl. Acad. Sci. 112, 1191111916 (2015).
622
21. S. Mohammadi, et al., Toxin-resistant isoforms of Na+/K+-ATPase in snakes do not closely
623
track dietary specialization on toads. Proc. R. Soc. B Biol. Sci. 283, 20162111 (2016).
624
22. S. Mohammadi, L. Yang, M. Bulbert, H. M. Rowland, The evolutionary and behavioural
625
ecology of cardiotonic steroid resistance in predators (In prep).
626
23. A. M. Taverner, et al., Adaptive substitutions underlying cardiac glycoside insensitivity in
627
insects exhibit epistasis in vivo. eLife 8, e48224 (2019).
628
24. M. Karageorgi, et al., Genome editing retraces the evolution of toxin resistance in the
629
monarch butterfly. Nature 574, 409412 (2019).
630
25. S. Mohammadi, et al., Concerted evolution reveals co-adapted amino acid substitutions in
631
frogs that prey on toxic toads. Curr. Biol. 31, 2530-2538.e10 (2021).
632
26. E. M. Price, J. B. Lingrel, Structure-function relationships in the sodium-potassium ATPase.
633
alpha. subunit: site-directed mutagenesis of glutamine-111 to arginine and asparagine-122
634
to aspartic acid generates a ouabain-resistant enzyme. Biochemistry 27, 84008408
635
(1988).
636
27. K. J. Sweadner, et al., Genotype-structure-phenotype relationships diverge in paralogs
637
ATP1A1, ATP1A2, and ATP1A3. Neurol. Genet. 5, e303e303 (2019).
638
28. A. Mobasheri, et al., Na+, K+-ATPase isozyme diversity; comparative biochemistry and
639
physiological implications of novel functional interactions. Biosci. Rep. 20, 5191 (2000).
640
29. B. Ujvari, et al., Isolation breeds naivety: island living robs Australian varanid lizards of
641
toad-toxin immunity via four-base-pair mutation. Evol. Int. J. Org. Evol. 67, 289294
642
(2013).
643
30. B. M. Marshall, et al., Widespread vulnerability of Malagasy predators to the toxins of an
644
introduced toad. Curr. Biol. 28, R654R655 (2018).
645
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
20
31. S. Groen, N. Whiteman, Convergent evolution of cardiac-glycoside resistance in predators
646
and parasites of milkweed herbivores. Curr. Biol. 31, R1465R1466 (2021).
647
32. M. Lunzer, G. B. Golding, A. M. Dean, Pervasive cryptic epistasis in molecular evolution.
648
PLoS Genet 6, e1001162 (2010).
649
33. L. Yang, et al., Predictability in the evolution of Orthopteran cardenolide insensitivity.
650
Philos. Trans. R. Soc. B 374, 20180246 (2019).
651
34. A. Stoltzfus, D. M. McCandlish, Mutational Biases Influence Parallel Adaptation. Mol. Biol.
652
Evol. 34, 21632172 (2017).
653
35. J. Zhang, S. Kumar, Detection of convergent and parallel evolution at the amino acid
654
sequence level. Mol. Biol. Evol. 14, 527536 (1997).
655
36. L. F. Toledo, R. Ribeiro, C. F. Haddad, Anurans as prey: an exploratory analysis and size
656
relationships between predators and their prey. J. Zool. 271, 170177 (2007).
657
37. S. Dobler, et al., New ways to acquire resistance: imperfect convergence in insect
658
adaptations to a potent plant toxin. Proc. R. Soc. B 286, 20190883 (2019).
659
38. M. V. Clausen, F. Hilbers, H. Poulsen, The structure and function of the Na, K-ATPase
660
isoforms in health and disease. Front. Physiol. 8, 371 (2017).
661
39. J. A. Wells, Additivity of mutational effects in proteins. Biochemistry 29, 85098517 (1990).
662
40. M. Lunzer, S. P. Miller, R. Felsheim, A. M. Dean, The biochemical architecture of an ancient
663
adaptive landscape. Science 310, 499501 (2005).
664
41. L. I. Gong, M. A. Suchard, J. D. Bloom, Stability-mediated epistasis constrains the evolution
665
of an influenza protein. Elife 2, e00631 (2013).
666
42. P. Nosil, et al., Ecology shapes epistasis in a genotypephenotypefitness map for stick
667
insect colour. Nat. Ecol. Evol. 4, 16731684 (2020).
668
43. B. J. Haas, et al., De novo transcript sequence reconstruction from RNA-seq using the
669
Trinity platform for reference generation and analysis. Nat. Protoc. 8, 14941512 (2013).
670
44. D. Darriba, et al., ModelTest-NG: a new and scalable tool for the selection of DNA and
671
protein evolutionary models. Mol. Biol. Evol. 37, 291294 (2020).
672
45. A. Stamatakis, RAxML version 8: a tool for phylogenetic analysis and post-analysis of large
673
phylogenies. Bioinformatics 30, 13121313 (2014).
674
46. S. Guindon, et al., New algorithms and methods to estimate maximum-likelihood
675
phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307321 (2010).
676
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
21
47. G. Yu, D. K. Smith, H. Zhu, Y. Guan, T. T. Lam, ggtree: an R package for visualization and
677
annotation of phylogenetic trees with their covariates and other associated data. Methods
678
Ecol. Evol. 8, 2836 (2017).
679
48. Z. Yang, PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586
680
1591 (2007).
681
49. G. Petschenka, et al., Stepwise evolution of resistance to toxic cardenolides via genetic
682
substitutions in the Na+/K+-ATPase of milkweed butterflies (Lepidoptera: Danaini).
683
Evolution 67, 27532761 (2013).
684
50. H. H. Taussky, E. Shorr, A microcolorimetric method for the determination of inorganic
685
phosphorus. J. Biol. Chem. 202, 675685 (1953).
686
687
688
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
22
Figures and Tables
689
690
691
692
Figure 1. Na+,K+-ATPase structure and phylogenetic relationships of ATP1A paralogs among
693
vertebrates. (A) Crystal structure of an Na+,K+-ATPase (NKA) with a bound the representative CTS
694
bufalin in blue (PDB 4RES). The zoomed-in panel shows the H1-H2 extracellular loop, highlighting
695
two amino acid positions (111 and 122 in red) that have been implicated repeatedly in CTS
696
resistance. We highlight key examples of convergence in amino acid substitutions at sites in the
697
H1-H2 extracellular loop associated with CTS resistance in Fig 3. (B) Phylogenetic relationships
698
among ATP1A paralogs of vertebrates and ATPa of insects.
699
700
A
extracellular
intracellular
membrane
122
111
B
insect ATP α
ATP 1A 4
ATP 1A 3
ATP 1A 2
ATP 1A 1
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
23
701
702
Figure 2. Patterns of molecular evolution in the α(M1M2) extracellular loop of ATP1A
703
paralogs shared among tetrapods. (A) Maximum likelihood phylogeny of tetrapod ATP1A1, (B)
704
ATP1A2, and (C) ATP1A3. The character states for eight sites relevant to CTS resistance in and
705
near the H1-H2 loop of the NKA protein are shown at the node tips. Yellow internal nodes indicate
706
ancestral sequences reconstructed to infer derived amino acid states across clades to ease
707
visualization; nodes reconstructed: MRCA of mammals, reptiles, and amphibians. Top right, each
708
semi-circle indicates the site mapped in the main phylogeny with the inferred ancestral amino acid
709
state for each of the three yellow nodes (posterior probability >0.8). In ATP1A1, site 119 was
710
inferred as Q119 for amphibians and mammals, and N119 for reptiles (Table S6); in ATP1A2-3 site
711
119 was inferred as A119 for amphibians and reptiles, and S119 for mammals (Table S6). Site
712
number corresponds to pig (Sus scrofa) reference sequence. Higher number and variation of
713
substitutions in ATP1A1 stand out in comparison to the other paralogs.
714
715
Amino acid
A
aA
aD
aE
aG
aN
aQ
aS
aY
D
E
G
K
L
N
Q
R
S
T
V
Y
Amino acid
aA
aD
aE
aN
aQ
aS
aY
D
E
F
G
H
I
K
L
M
N
Q
R
S
T
V
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
X
Y
A
BC
N
N
N/Q
E
E
A
Y
Q
108
111
112
115
116
119
122
120
A
G
I
P
F
L
V
Y
Hydrophobic
Ancestral
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
N
N
Q/N
E
E
A
Q
Y
108
111
112
115
116
119
120
122
S
N
Q
T
Polar
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
D
E
Acidic
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
N
N
Q/N
E
E
A
Q
Y
108
111
112
115
116
119
120
122
H
K
R
Basic
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
N
N
Q/N
E
E
A
Q
Y
108
111
112
115
116
119
120
122
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
N
N
Q/N
E
E
A
Q
Y
108
111
112
115
116
119
120
122
N
N/G
A/S
D
E
A
Y
Q
ATP 1 A 1
ATP 1 A 2 AT P1 A 3
1
A
1
1
A
2/3
Ancestral states
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
24
716
717
718
Figure 3. Parallel and divergent patterns of CTS-resistant substitutions across ATP
a
1 of
719
insects and the shared ATP1A paralogs of tetrapods. Examples of convergence in ATPa1
720
across insects (A). Convergence in the (B) ATP1A1, (C) ATP1A2, and (D) ATP1A3 paralogs,
721
respectively, across tetrapods. Numbers indicate the number of independent substitutions in each
722
major clade depicted. For ATP1A3, resistance-conferring amino acid substitutions have been
723
identified at site 120, and not 122. A full list of amino acid substitutions can be found in
724
Supplementary Dataset 2 for tetrapods, and Taverner et al. (23) for insects.
725
726
A B
C D
extracellular
S
Q111V
Q111E
intracellular
N122
Q111
Q111T
extracellular
Q111L
intracellular
N122
Q111
N122H
N122D
Q111R
Q111H
extracellular
Q111V
N120R
intracellular
N122
Q111
Q111T
N122H
Q111R Q111V
G120N
extracellular
intracellular
Q111
G120
ATPα1 ATP1A 1
ATP1A2 AT P 1 A 3
Q111T
N120
N120
N120K
Q111H
N122D
8
Q111L
G120R
Q111L
Q111L
N122H
2
1
1
2
1
2
1
1
41
34
2
2
3
3
1
1
1
2
1
2
21
1
1
1
11
1
4
1
1
1
2
2
1
1
2
2
2
2
3
4
3
Q111E
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
25
727
Figure 4. Functional properties of wild-type and engineered ATP1A1. (A) Cladogram
728
relating the surveyed species. GRA: Grass Frog (Leptodactylus); RAT: Rat (Rattus); CHI:
729
Chinchilla (Chinchilla); OST: Ostrich (Struthio); SNG: Sandgrouse (Pterocles); MON: Monitor
730
lizard (Varanus); TEG: Tegu lizard (Tupinambis); FER: False fer-de-lance (Xenodon); KEE: Red-
731
necked keelback snake (Rhabdophis). Two-letter codes underneath each avatar indicate native
732
amino acid states at sites 111 and 122, respectively. Data for grass frog from Mohammadi et al.
733
(2021). (B) Levels of CTS resistance (IC50) among wild-type enzymes. The x-axis distinguishes
734
among ATP1A1 with 0, 1 or 2 derived states at sites 111 and 122. The subscripts S and R refer
735
to the CTS-sensitive and CTS-resistant paralogs, respectively. (C) Effects of changing the
736
number of substitutions at 111 or 122 on CTS resistance (IC50). Substitutions result in
737
predictable changes to resistance except in the reversal R111Q in Sandgrouse (SNG). GRAS
738
represents Q111R+N122D on the sensitive paralog background. (D) Effects of single
739
substitutions on Na+,K+-ATPase (NKA) activity. Each modified ATP1A1 is compared to the wild-
740
type enzyme for that species. The inset shows the distribution of t-test p-values for all 15
741
substitutions, with the dotted line indicating the expectation. (E) Evidence for epistasis for CTS
742
resistance (IC50, upper panel) and lack of such effects for enzyme activity (lower panel). Each
743
line compares the same substitution (or the reverse substitution) tested on at least two
744
backgrounds. Thicker lines correspond to substitutions with significant sequence-context
745
dependent effects (Bonferroni-corrected ANOVA p-values < 0.05, Table S5).
746
A.
C.
B.
D.
E.
7
6
5
4
3
wt mutant
Background
log10(IC50)
Mutation
H111E
H111T
H122D
N122D
Q111R
FER
KEE
MON
SNG
RAT
OST
GRA(S)
FER+
KEE-
MON-
SNG-
RAT-
OST+
GRA (S) +
FER+
FER+
CHI GRA (S) +
CHI+
EN
RD ENHHTNQH
QN RN
QN
QN
ANC
GRA RAT CHI OST SNG MON TEG FER KEE
RD
6
5
4
3
012
# substitutions at sites 111122
log10(IC50)
States
QN
EN
RN
TN
QH
HH
RD
FER
KEE
MON
SNG RAT
PIG
GRA(S)
CHI
TEG
GRA(R)
OST
6
5
4
3
2
0 1 2
# substitutions at sites 111122
log10(IC50)
States
QN
EN
RN
QD
ED
RD
OST
GRA(S)
SNG
OST SNG
GRA(S)
CHI
CHI
GRA(S)
200
0
200
H122D
N122H
N122D
Q111T
R111E
D122H
Q111R
T111H
H111E
R111Q
E111H
H111T
Substitution
%D Activity (pmol/mg*min)
Site
111
122
Background
CHI
FER
GRA
KEE
MON
OST
RAT
SNG
TEG
0
2
4
6
0.00 0.25 0.50 0.75
Pvalue
Frequency
7
6
5
4
3
wt mutant
Background
log10(IC50)
Mutation
H111E
H111T
H122D
N122D
Q111R
FER
KEE
MON
SNG
RAT
OST
GRA(S)
FER
KEE
MON
SNG
RAT
OST
GRA(S)
CHI
CHI
5
10
15
20
wt mutant
Background
Activity (pmol/mg*min)
Mutation
H111E
H111T
H122D
N122D
Q111R
FER
KEE
MON
SNG
RAT
OST
GRA(S)
FER
KEE MON
SNG
RAT
OST
GRA(S)
FER
CHI
GRA(S)
CHI
TEG
TEG
Log10(IC50)
Log10(IC50)
# substitutions at sites 111+122
# substitutions at sites 111+122
Log10(IC50)
Activity (nmol/mg*min)
Background
wt mutant
wt mutant
Activity (nmol/mg*min)
Substitution
P-value
Frequency
QN RD RD EN QN RN TN QH HH EN
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
26
747
748
749
Figure 5. No relationship between the effect of substitution to a given amino state on activity
750
and the extent of divergence between ATP1A1 orthologs. Each point represents a comparison
751
between the effect (% change in activity relative to the wild-type enzyme) of a given amino acid
752
state (e.g., 122D) on two different genetic backgrounds. For example, the effect of 122D between
753
chinchilla and false fer-de-lance is measured as % change [chinchilla vs. chinchilla+N122D] minus
754
the % change [false fer-de-lance vs. false fer-de-lance+H122D]. Comparisons were measured as
755
the difference between the two effects. In total, 11 comparisons were possible. The x-axis
756
represents the number of amino acid differences between two ATP1A1 proteins being compared.
757
Assuming intramolecular epistasis for protein function is prevalent, a positive correlation is
758
predicted. However, no such relationship is observed (Spearman’s correlation, rS = -0.42, p = 0.19).
759
760
0
20
40
60
80
100
120
140
160
180
200
45 55 65 75 85
AAState
111E
111H
111R
111T
122D
122H
# of amino differences between
ATP 1A1 ort holo gs
Difference in effect on two backgrounds (%)
1.0
1.5
2.0
2.5
3.0
3.5
45 50 55 60 65 70 75 80 85 90 95
# of pairwise amino differences between ATP1A1 orthologs
Magnitude of change (LOG |% difference of mean protein activities|)
AAState
111E
111H
111R
111T
122D
122H
AA State
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
27
761
Figure 6. Rate of convergence across ATP1A sequences as a function of increasing
762
sequence divergence. (A) Change in the rate of convergence (protein wide) over time for the
763
ATP1A protein family. The proportion of convergent (C) over divergent (D) substitutions along the
764
entire protein sequence was estimated for all pairs of branches in the ATP1A phylogeny, except
765
for sister branches or ancestor-descendant pairs. Color scale shows the density of dots for both
766
axes. The distance between branches corresponds to the expected number of amino acid
767
substitutions per site between protein pairs being compared (under the JTT+G4+F model). The red
768
line shows a running average with a window size of 0.05 substitutions/site. Dashed lines show the
769
95% confidence interval based on 100 bootstrap replicates per window. (B) For each derived amino
770
acid state at sites 111 and 122, the histograms show the distribution of pairwise convergent events
771
along the sequence divergence axis (expected number of substitutions per site). Substitutions are
772
color coded as in Figure 2. The histogram at the bottom shows the combined distribution of pairwise
773
convergent events for both sites.
774
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
... Most vertebrates possess three paralogs of the NKA subunit α gene (ATP1A1-3) that have tissuespecific expression profiles and are associated with distinct physiological roles. Most amino acid variation among species and paralogs is concentrated in the first extracellular loop (residues 111-122; H1-H2 loop), which makes up part of the CTS binding domain and shows clade-and paralog-specific patterns of variability but also shows remarkable patterns of convergence, parallelism and divergence [125]. Amino acid substitutions at sites 111 and 122 in particular have been found to be key in the evolution of TSI in insect and vertebrate species [21] and have evolved in snakes [63,126], frogs [127,128] and other vertebrates [125]. ...
... Most amino acid variation among species and paralogs is concentrated in the first extracellular loop (residues 111-122; H1-H2 loop), which makes up part of the CTS binding domain and shows clade-and paralog-specific patterns of variability but also shows remarkable patterns of convergence, parallelism and divergence [125]. Amino acid substitutions at sites 111 and 122 in particular have been found to be key in the evolution of TSI in insect and vertebrate species [21] and have evolved in snakes [63,126], frogs [127,128] and other vertebrates [125]. ...
... Understanding the evolutionary history and potential for coevolution of a trait requires some knowledge of the patterns of variation among individuals, populations and species [132]. Where functional tests of TSI substitutions have been performed, there can be greater than 10-fold variation in TSI among enzymes that have identical paired states at 111 and 122 [125], as well as significant variation in enzyme activity, which together suggest that substitutions at other sites also contribute to CTS resistance through intramolecular epistasis and can be subject to selection [31,133]. Enzyme function, however, is but a proxy for predicting effects on organismal fitness, and research exploring how the effects of adaptive mutations at the protein level cascade to the whole-organism fitness, and how they match the defences of prey in different populations and locations will be necessary to understand the potential for coevolution. ...
Article
Full-text available
Predator–prey interactions have long served as models for the investigation of adaptation and fitness in natural environments. Anti-predator defences such as mimicry and camouflage provide some of the best examples of evolution. Predators, in turn, have evolved sensory systems, cognitive abilities and physiological resistance to prey defences. In contrast to prey defences which have been reviewed extensively, the evolution of predator counter-strategies has received less attention. To gain a comprehensive view of how prey defences can influence the evolution of predator counter-strategies, it is essential to investigate how and when selection can operate. In this review we evaluate how predators overcome prey defences during (i) encounter, (ii) detection, (iii) identification, (iv) approach, (v) subjugation, and (vi) consumption. We focus on prey that are protected by cardiotonic steroids (CTS)—defensive compounds that are found in a wide range of taxa, and that have a specific physiological target. In this system, coevolution is well characterized between specialist insect herbivores and their host plants but evidence for coevolution between CTS-defended prey and their predators has received less attention. Using the predation sequence framework, we organize 574 studies reporting predators overcoming CTS defences, integrate these counter-strategies across biological levels of organization, and discuss the costs and benefits of attacking CTS-defended prey. We show that distinct lineages of predators have evolved dissecting behaviour, changes in perception of risk and of taste perception, and target-site insensitivity. We draw attention to biochemical, hormonal and microbiological strategies that have yet to be investigated as predator counter-adaptations to CTS defences. We show that the predation sequence framework will be useful for organizing future studies of chemically mediated systems and coevolution.
Article
Full-text available
Genetic interactions such as epistasis are widespread in nature and can shape evolutionary dynamics. Epistasis occurs due to nonlinearity in biological systems, which can arise via cellular processes that convert genotype to phenotype and via selective processes that connect phenotype to fitness. Few studies in nature have connected genotype to phenotype to fitness for multiple potentially interacting genetic variants. Thus, the causes of epistasis in the wild remain poorly understood. Here, we show that epistasis for fitness is an emergent and predictable property of nonlinear selective processes. We do so by measuring the genetic basis of cryptic colouration and survival in a field experiment with stick insects. We find that colouration shows a largely additive genetic basis but with some effects of epistasis that enhance differentiation between colour morphs. In terms of fitness, different combinations of loci affecting colouration confer high survival in one host-plant treatment. Specifically, nonlinear correlational selection for specific combinations of colour traits in this treatment drives the emergence of pairwise and higher-order epistasis for fitness at loci underlying colour. In turn, this results in a rugged fitness landscape for genotypes. In contrast, fitness epistasis was dampened in another treatment, where selection was weaker. Patterns of epistasis that are shaped by ecologically based selection could be common and central to understanding fitness landscapes, the dynamics of evolution and potentially other complex systems.
Article
Full-text available
Identifying the genetic mechanisms of adaptation requires the elucidation of links between the evolution of DNA sequence, phenotype, and fitness¹. Convergent evolution can be used as a guide to identify candidate mutations that underlie adaptive traits2,3,4, and new genome editing technology is facilitating functional validation of these mutations in whole organisms1,5. We combined these approaches to study a classic case of convergence in insects from six orders, including the monarch butterfly (Danaus plexippus), that have independently evolved to colonize plants that produce cardiac glycoside toxins6,7,8,9,10,11. Many of these insects evolved parallel amino acid substitutions in the α-subunit (ATPα) of the sodium pump (Na⁺/K⁺-ATPase)7,8,9,10,11, the physiological target of cardiac glycosides¹². Here we describe mutational paths involving three repeatedly changing amino acid sites (111, 119 and 122) in ATPα that are associated with cardiac glycoside specialization13,14. We then performed CRISPR–Cas9 base editing on the native Atpα gene in Drosophila melanogaster flies and retraced the mutational path taken across the monarch lineage11,15. We show in vivo, in vitro and in silico that the path conferred resistance and target-site insensitivity to cardiac glycosides¹⁶, culminating in triple mutant ‘monarch flies’ that were as insensitive to cardiac glycosides as monarch butterflies. ‘Monarch flies’ retained small amounts of cardiac glycosides through metamorphosis, a trait that has been optimized in monarch butterflies to deter predators17,18,19. The order in which the substitutions evolved was explained by amelioration of antagonistic pleiotropy through epistasis13,14,20,21,22. Our study illuminates how the monarch butterfly evolved resistance to a class of plant toxins, eventually becoming unpalatable, and changing the nature of species interactions within ecological communities2,6,7,8,9,10,11,15,17,18,19.
Article
Full-text available
Predicting how species will respond to selection pressures requires understanding the factors that constrain their evolution. We use genome engineering of Drosophila to investigate constraints on the repeated evolution of unrelated herbivorous insects to toxic cardiac glycosides, which primarily occurs via a small subset of possible functionally-relevant substitutions to Na+,K+-ATPase. Surprisingly, we find that frequently observed adaptive substitutions at two sites, 111 and 122, are lethal when homozygous and adult heterozygotes exhibit dominant neural dysfunction. We identify a phylogenetically correlated substitution, A119S, that partially ameliorates the deleterious effects of substitutions at 111 and 122. Despite contributing little to cardiac glycoside-insensitivity in vitro, A119S, like substitutions at 111 and 122, substantially increases adult survivorship upon cardiac glycoside exposure. Our results demonstrate the importance of epistasis in constraining adaptive paths. Moreover, by revealing distinct effects of substitutions in vitro and in vivo, our results underscore the importance of evaluating the fitness of adaptive substitutions and their interactions in whole organisms.
Article
Full-text available
ModelTest-NG is a re-implementation from scratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate, and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions. ModelTest-NG is available under a GNU GPL3 license at https://github.com/ddarriba/modeltest.
Article
Full-text available
Evolution of insensitivity to the toxic effects of cardiac glycosides has become a model in the study of convergent evolution, as five taxonomic orders of insects use the same few similar amino acid substitutions in the otherwise highly conserved Na,K-ATPase α. We show here that insensitivity in pyrgomorphid grasshoppers evolved along a slightly divergent path. As in other lineages, duplication of the Na,K-ATPase α gene paved the way for subfunctionalization: one copy maintains the ancestral, sensitive state, while the other copy is resistant. Nonetheless, in contrast with all other investigated insects, the grasshoppers' resistant copy shows length variation by two amino acids in the first extracellular loop, the main part of the cardiac glycoside-binding pocket. RT-qPCR analyses confirmed that this copy is predominantly expressed in tissues exposed to the toxins, while the ancestral copy predominates in the nervous tissue. Functional tests with genetically engineered Drosophila Na,K-ATPases bearing the first extracellular loop of the pyrgomorphid genes showed the derived form to be highly resistant, while the ancestral state is sensitive. Thus, we report convergence in gene duplication and in the gene targets for toxin insensitivity; however, the means to the phenotypic end have been novel in pyrgomorphid grasshoppers.
Article
Full-text available
The repeated evolutionary specialization of distantly related insects to cardenolide-containing host plants provides a stunning example of parallel adaptation. Hundreds of herbivorous insect species have independently evolved insensitivity to cardenolides, which are potent inhibitors of the alpha-subunit of Na ⁺ ,K ⁺ -ATPase (ATPα). Previous studies investigating ATPα-mediated cardenolide insensitivity in five insect orders have revealed remarkably high levels of parallelism in the evolution of this trait, including the frequent occurrence of parallel amino acid substitutions at two sites and recurrent episodes of duplication followed by neo-functionalization. Here we add data for a sixth insect order, Orthoptera, which includes an ancient group of highly aposematic cardenolide-sequestering grasshoppers in the family Pyrgomorphidae. We find that Orthopterans exhibit largely predictable patterns of evolution of insensitivity established by sampling other insect orders. Taken together the data lend further support to the proposal that negative pleiotropic constraints are a key determinant in the evolution of cardenolide insensitivity in insects. Furthermore, analysis of our expanded taxonomic survey implicates positive selection acting on site 111 of cardenolide-sequestering species with a single-copy of ATPα, and sites 115, 118 and 122 in lineages with neo-functionalized duplicate copies, all of which are sites of frequent parallel amino acid substitution. This article is part of the theme issue ‘Convergent evolution in the genomics era: new insights and directions’.
Article
Full-text available
Characterizing the fitness landscape, a representation of fitness for a large set of genotypes, is key to understanding how genetic information is interpreted to create functional organisms. Here we determined the evolutionarily-relevant segment of the fitness landscape of His3, a gene coding for an enzyme in the histidine synthesis pathway, focusing on combinations of amino acid states found at orthologous sites of extant species. Just 15% of amino acids found in yeast His3 orthologues were always neutral while the impact on fitness of the remaining 85% depended on the genetic background. Furthermore, at 67% of sites, amino acid replacements were under sign epistasis, having both strongly positive and negative effect in different genetic backgrounds. 46% of sites were under reciprocal sign epistasis. The fitness impact of amino acid replacements was influenced by only a few genetic backgrounds but involved interaction of multiple sites, shaping a rugged fitness landscape in which many of the shortest paths between highly fit genotypes are inaccessible.
Article
Full-text available
Objective: We tested the assumption that closely related genes should have similar pathogenic variants by analyzing >200 pathogenic variants in a gene family with high neurologic impact and high sequence identity, the Na,K-ATPases ATP1A1, ATP1A2, and ATP1A3. Methods: Data sets of disease-associated variants were compared. Their equivalent positions in protein crystal structures were used for insights into pathogenicity and correlated with the phenotype and conservation of homology. Results: Relatively few mutations affected the corresponding amino acids in 2 genes. In the membrane domain of ATP1A3 (primarily expressed in neurons), variants producing milder neurologic phenotypes had different structural positions than variants producing severe phenotypes. In ATP1A2 (primarily expressed in astrocytes), membrane domain variants characteristic of severe phenotypes in ATP1A3 were absent from patient data. The known variants in ATP1A1 fell into 2 distinct groups. Sequence conservation was an imperfect indicator: it varied among structural domains, and some variants with demonstrated pathogenicity were in low conservation sites. Conclusions: Pathogenic variants varied between genes despite high sequence identity, and there is a genotype-structure-phenotype relationship in ATP1A3 that correlates with neurologic outcomes. The absence of "severe" pathogenic variants in ATP1A2 patients predicts that they will manifest either in a different tissue or by death in utero and that new ATP1A1 variants will produce additional phenotypes. It is important that some variants in poorly conserved amino acids are nonetheless pathogenic and could be incorrectly predicted to be benign.
Article
The community of plant-feeding insects (herbivores) that specialize on milkweeds (Apocynaceae) form a remarkable example of convergent evolution across levels of biological organization¹. In response to toxic cardiac glycosides produced by these plants, the monarch butterfly (Danaus plexippus) and other specialist herbivores have evolved parallel substitutions in the alpha subunit (ATPA) of the Na⁺/K⁺-ATPase. These substitutions render the pump insensitive to cardiac glycosides²,³, allowing the monarch and other specialists, from aphids to beetles, to sequester cardiac glycosides, which in turn provide defense against attacks by enemies from the third trophic level⁴. The evolution of ‘target-site-insensitivity’ substitutions in these herbivores poses a fundamental biological question: have predators and parasitoids that feed on cardiac-glycoside-sequestering insects also evolved Na⁺/K⁺-ATPases that are similarly insensitive to cardiac glycosides (as predicted by Whiteman and Mooney)⁵? In other words, can plant toxins cause evolutionary cascades that reach the third trophic level? Here we show that at least four enemies of the monarch and other milkweed herbivores have indeed evolved amino-acid substitutions associated with target-site insensitivity to cardiac glycosides. These attackers represent four major animal clades, implicating cardiac glycosides as keystone molecules⁶ and establishing ATPalpha, which encodes ATPA, as a keystone gene with effects that reverberate within ecological communities⁷.
Article
Although gene duplication is an important source of evolutionary innovation, the functional divergence of duplicates can be opposed by ongoing gene conversion between them. Here, we report on the evolution of a tandem duplication of Na+,K+-ATPase subunit α1 (ATP1A1) shared by frogs in the genus Leptodactylus, a group of species that feeds on toxic toads. One ATP1A1 paralog evolved resistance to toad toxins although the other retained ancestral susceptibility. Within species, frequent non-allelic gene conversion homogenized most of the sequence between the two copies but was counteracted by strong selection on 12 amino acid substitutions that distinguish the two paralogs. Protein-engineering experiments show that two of these substitutions substantially increase toxin resistance, whereas the additional 10 mitigate their deleterious effects on ATPase activity. Our results reveal how examination of neo-functionalized gene duplicate evolution can help pinpoint key functional substitutions and interactions with the genetic backgrounds on which they arise.