Content uploaded by Susanne Dobler
Author content
All content in this area was uploaded by Susanne Dobler on Dec 09, 2021
Content may be subject to copyright.
1
Main Manuscript for
1
Epistasis is not a strong constraint on the recurrent evolution of toxin-
2
resistant Na+,K+-ATPases among tetrapods.
3
4
Shabnam Mohammadi1,2,*, Lu Yang3, §,*, Santiago Herrera-Álvarez4,5,*, María del Pilar Rodríguez-
5
Ordoñez4,#, Karen Zhang3, Jay F. Storz1, Susanne Dobler2, Andrew J. Crawford4 & Peter
6
Andolfatto6
7
1School of Biological Sciences, University of Nebraska, Lincoln, NE, USA
8
2Molecular Evolutionary Biology, Institute of Zoology, Universität Hamburg, Hamburg, Germany
9
3Department of Ecology and Evolution, Princeton University, Princeton, NJ, USA
10
4Department of Biological Sciences, Universidad de los Andes, Bogotá, 111711, Colombia
11
5Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
12
6Department of Biological Sciences, Columbia University, New York, NY, USA
13
14
*Co-first authorship
15
§ Current address: Wellcome Sanger Institute, Cambridge, United Kingdom
16
# Current address: Université Paris-Saclay Evry, Evry, France
17
18
Email: andrew@dna.ac, pa2543@columbia.edu
19
20
SM: 0000-0003-3450-6424, LY: 0000-0002-2694-1189, SHA: 0000-0002-0793-7811, MPRO:
21
0000-0002-0856-1297, KZ: 0000-0003-4406-9977, JFS: 0000-0001-5448-7924, SD: 0000-0002-
22
0635-7719, AJC: 0000-0003-3153-6898, PA: 0000-0003-3393-4574
23
24
Classification
25
Biological Sciences; Evolution
26
Keywords
27
Epistasis, protein evolution, cardiotonic steroids, toxin resistance, adaptation
28
Author Contributions
29
PA and AJC conceived of and oversaw the project; SM, JFS, SD, AJC and PA
30
designed experiments; KZ, LY, MPRO, SHA, SM collected data; SM, SHA and PA
31
performed evolutionary and statistical analyses; SM, SHA, and PA wrote the paper; All authors
32
edited the manuscript.
33
This PDF file includes:
34
Main Text
35
Figures 1 to 6
36
37
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
2
Abstract
38
Comparative genomic studies reveal a global decline in rates of convergent amino acid substitution
39
as a function of evolutionary distance. This pattern has been attributed to epistatic constraints on
40
protein evolution, the idea being that mutations tend to confer the same fitness effects on more
41
similar genetic backgrounds, so convergent substitutions are more likely to occur in closely related
42
species. However, this hypothesis lacks experimental validation. We tested this model in the
43
context of the recurrent evolution of resistance to cardiotonic steroids (CTS) across diverse groups
44
of tetrapods, which occurs via specific amino acid substitutions to the α-subunit family of Na+,K+-
45
ATPases (ATP1A). After identifying a series of recurrent substitutions at two key sites of ATP1A1
46
predicted to confer CTS resistance, we performed protein engineering experiments to test the
47
functional consequences of introducing these substitutions onto divergent species backgrounds.
48
While we find that substitutions at these sites can have substantial background-dependent effects
49
on CTS resistance, we also find no evidence for background-dependent effects on protein activity.
50
We further show that the magnitude of a substitution’s effect on activity does not depend on the
51
overall extent of ATP1A1 sequence divergence between species. More generally, a global analysis
52
of substitution patterns across ATP1A orthologs and paralogs reveals that the probability of
53
convergent substitution protein-wide is not predicted by sequence divergence. Together, these
54
findings suggest that intramolecular epistasis is not an important constraint on the evolution of
55
ATP1A CTS resistance in tetrapods.
56
57
Significance Statement
58
Individual amino acid residues within a protein work in concert to produce a functionally coherent
59
structure that must be maintained even as orthologous proteins in different species diverge over
60
time. Given this dependence, we expect identical mutations to have more similar effects on protein
61
function in more closely related species. We tested this hypothesis by performing protein-
62
engineering experiments on ATP1A, an enzyme mediating target-site insensitivity to cardiotonic
63
steroids (CTS) in diverse animals. These experiments reveal that although the phenotypic effects
64
of substitutions can sometimes be background-dependent, the magnitude of these effects does not
65
correlate with ATP1A1 sequence divergence. This implies that the genetic background across the
66
ATP1A protein does not strongly limit the evolution of CTS resistance in animals.
67
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
3
Main Text
68
69
Introduction
70
71
Patterns of molecular parallelism and convergence represent a useful paradigm to examine the
72
factors that limit the rate of adaptation and the extent to which adaptive evolutionary paths are
73
predictable (1, 2). In the context of protein evolution, patterns of parallelism and convergence are
74
influenced by pleiotropy (the effect of a given mutation on multiple phenotypes) and intramolecular
75
epistasis (nonadditive interactions between mutant sites in the same protein) (3–11). If the
76
phenotypic and fitness effects of mutations depend on the genetic background on which they arise
77
(i.e. epistasis), a given mutation is expected to have more similar effects in orthologs from closely
78
related species. Therefore, the probability of parallel or convergent substitution resulting in
79
sequence divergence between species is expected to decrease with divergence time. Consistent
80
with this expectation, there is evidence for such a decline in broad-scale phylogenetic comparisons
81
of mitochondrial (12) and nuclear (13, 14) proteins. However, this hypothesis has not been tested
82
experimentally to date.
83
84
To address the question of how changes in the genetic background alter the phenotypic effects of
85
new mutations, we focus on the test case of the repeated evolution of resistance to cardiotonic
86
steroids (CTS) in animals. CTS are potent inhibitors of Na+,K+-ATPase (NKA), a protein that plays
87
a critical role in maintaining membrane potential and is consequently vital for the maintenance of
88
many physiological processes and signaling pathways in animals (15). NKA (Fig. 1A) is a
89
heterodimeric transmembrane protein that consists of a catalytic α-subunit (ATP1A) and a
90
glycoprotein
𝛽
-subunit (ATP1B) (16). CTS inhibit NKA function by binding to a highly conserved
91
domain of ATP1A and blocking the exchange of Na+ and K+ ions (15). NKA is thus often the target
92
of parallel evolution of CTS resistance in insect herbivores that feed on toxic plants (17, 18) as well
93
as vertebrate predators that feed on toxic prey (19–22). Functional investigations of CTS
94
resistance-conferring substitutions in Drosophila (23, 24) and Neotropical grass frogs (25) revealed
95
associated negative pleiotropic effects on protein function and showed that substitutions elsewhere
96
in the protein mitigate these effects. However, despite these examples, the generality of these
97
patterns, and specifically the predicted dependence on evolutionary distance, remain poorly
98
understood given the limited availability of comparative functional data.
99
100
Broad phylogenetic comparisons in vertebrates have focused primarily on the H1-H2 extracellular
101
loop of ATP1A proteins, a subset of the CTS-binding domain that contains two sites (111 and 122)
102
known to underlie CTS resistance in rats and toad-eating frogs (25, 26). Most vertebrates possess
103
three paralogs of the α-subunit gene (ATP1A) that have different tissue-specific expression profiles
104
and are associated with distinct physiological roles (Fig. 1B) (15, 27). Mammals possess a fourth
105
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
4
paralog that is expressed predominantly in testes (28). A major limitation of studies to date is that
106
the H1-H2 extracellular loop has been inconsistently surveyed among vertebrate taxa, with
107
previous studies focusing on ATP1A3 in reptiles (20, 21, 29, 30), ATP1A1 and/or ATP1A2 in birds
108
and mammals (30, 31), and either ATP1A1 or ATP1A3 in amphibians (19, 30). We therefore lack
109
a comprehensive survey of amino acid variation in the ATP1A protein family across vertebrates.
110
111
To bridge this gap, we first surveyed variation in near full-length coding sequences of the three
112
NKA α-subunit paralogs (ATP1A1, ATP1A2, ATP1A3) that are shared across major extant tetrapod
113
groups (mammals, birds, non-avian reptiles, and amphibians), and identified substitutions that
114
occur repeatedly among divergent lineages. Focusing on two key sites implicated in CTS resistance
115
across animals (111 and 122), we tested whether substitutions at these sites have increasingly
116
distinct phenotypic effects on more divergent genetic backgrounds. Specifically, we engineered
117
several common substitutions at sites 111 and 122 of ATP1A1 that differ between species to reveal
118
potential ‘cryptic’ epistasis (8, 32). By quantifying the level of CTS resistance conferred by these
119
substitutions, as well as their effects on enzyme function, we evaluate the extent to which pleiotropy
120
and epistasis have constrained the evolution of CTS-resistant forms of ATP1A1 across tetrapods.
121
122
123
Results
124
125
Patterns of ATP1A sequence evolution across species and paralogs.
126
127
To obtain a more comprehensive portrait of ATP1A amino acid variation among tetrapods, we
128
created multiple sequence alignments for near full-length ATP1A proteins for the three ATP1A
129
paralogs shared among vertebrates. In addition to publicly available data, we generated new RNA-
130
seq data for 27 non-avian reptiles (PRJNA754197) (Table S1-S2). We then de novo assembled
131
full-length transcripts of all ATP1A paralogs using these and RNA-seq data from 18 anuran species
132
(25) (PRJNA627222) to achieve better representation for these groups. In total, this dataset
133
comprises 429 species for ATP1A1, 197 species for ATP1A2 and 204 species for ATP1A3 (831
134
sequences total; Supplemental Dataset 1, Fig. S1).
135
136
Our survey reveals numerous substitutions at sites implicated in CTS resistance of NKA (Fig. 2;
137
Supplementary Dataset 2; for comparison to insects, see Supplemental file 1 of ref. (23)). As
138
anticipated from studies of full-length sequences in insects (17, 18, 23), most amino acid variation
139
among species and paralogs is concentrated in the H1-H2 extracellular loop (residues 111-122;
140
Fig 1A). Despite harboring just 28% of 43 sites previously implicated in CTS resistance (33), the
141
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
5
H1-H2 extracellular loop contains 81.4% of all substitutions identified among the three ATP1A
142
paralogs (Fig. S2).
143
Our survey reveals several clade- and paralog-specific patterns. Notably, ATP1A1 exhibits more
144
variation among species at sites implicated in CTS resistance (Fig. 2). Most of the variation in
145
ATP1A2 at these sites is restricted to squamate reptiles and ATP1A3 lacks substitutions at site 122
146
altogether, despite the well-known potential for substitutions at this site to confer CTS resistance
147
(25, 26). Looking across species and paralogs, the extent of parallelism at sites 111 and 122 is
148
remarkable (Figs. 2-3): for example, the substitutions Q111E, Q111T, Q111H, Q111L, and Q111V
149
all occur in parallel in multiple species of both insects and vertebrates. N122H and N122D also
150
frequently occur in parallel in both of these major clades. The frequent parallelism of CTS-sensitive
151
(i.e. Q111 and N122) to CTS-resistant states at these sites has been interpreted as evidence for
152
adaptive significance of these substitutions (17–20), but may also reflect mutation biases (34) and
153
the nature of physico-chemico constraints (13, 35).
154
In contrast, some parallelism is restricted to specific clades: for example, Q111R occurs in parallel
155
across tetrapods but has not been observed in insects. Similarly, the combination Q111R+N122D
156
has evolved three times independently in ATP1A1 of tetrapods but is not observed in insects.
157
Conversely, insects have evolved Q111V+N122H independently four times, but this combination
158
has never been observed in tetrapods. These patterns suggest that the fitness effects of some
159
CTS-resistant substitutions depend on genetic background, with the result that CTS-resistance
160
evolved via different mutational pathways in different lineages.
161
Beyond known CTS-resistant substitutions at sites 111 and 122, some taxa have evolved other
162
paths to CTS resistance. For example, the Pacman frog (genus Ceratophyrs) is known to prey on
163
CTS-containing toads (36) and its ATP1A1 harbors a known CTS-resistant substitution at site 121
164
(D121N, Supplementary Dataset 2). This substitution is rare among vertebrates but has been
165
previously reported in CTS-adapted milkweed bugs (17, 18). Similarly, the known CTS resistance
166
substitution C104Y is observed among many natricid snakes (Supplementary Dataset 2) and CTS-
167
adapted milkweed weevils (18). Chinchilla (Chinchilla lanigera) and yellow-throated sandgrouse
168
(Pterocles gutturalis) show distinct single-amino acid insertions in the H1–H2 extracellular loop, a
169
characteristic that has been previously associated with CTS resistance in pyrgomorphid
170
grasshoppers (33, 37). Further, in lieu of variation at site 122, ATP1A3 of tetrapods harbors
171
frequent parallel substitutions at site 120 (G120R). Interestingly, this site also shows substantial
172
parallel substitution in the ATP1A1 paralog of birds (where N120K occurs eight times
173
independently) but is mostly invariant in ATP1A1 of other tetrapods.
174
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
6
175
Context-dependent CTS resistance for substitutions at sites 111 and 122
176
177
The clade- and paralog-specific patterns of substitution among ATP1A paralogs outlined above
178
suggest that the evolution of CTS resistance may be highly dependent on sequence context.
179
However, the functional effects of the vast majority of these substitutions on the diverse genetic
180
backgrounds in which they occur remain largely unknown (25, 26, 29). Given the diversity and
181
broad phylogenetic distribution of parallel substitutions at sites 111 and 122, and the documented
182
effects of some of these substitutions on CTS resistance, we experimentally tested the extent to
183
which functional effects of substitutions at these sites are background-dependent.
184
185
We focused functional experiments on ATP1A1, because it is the most ubiquitously expressed
186
paralog and exhibits both the most sequence diversity and the broadest phylogenetic distribution
187
of parallel substitutions. Specifically, we considered ATP1A1 orthologs from nine representative
188
tetrapod species that possess different combinations of wild-type amino acids at 111 and 122 (Fig.
189
4A). Our taxon sampling includes two lizards, two snakes, two birds, two mammals and previously
190
published data for one amphibian (Fig. S4; Fig. S5; Table S3). The ancestral amino acid states of
191
sites 111 and 122 in tetrapods are Q and N, respectively. We found that the sum of the number of
192
derived states at positions 111 and 122 is a strong predictor of the level of CTS-resistance (Fig 4B,
193
IC50, Spearman’s rS=0.85, p=0.001). Nonetheless, we also found greater than 10-fold variation in
194
CTS-resistance among enzymes that had identical paired states at 111 and 122 (e.g., compare
195
chinchilla (CHI) versus red-necked keelback snakes (KEE) or compare rat (RAT) versus the
196
resistant paralog of grass frogs (GRAR)). These differences suggest that substitutions at other sites
197
also contribute to CTS resistance.
198
199
To test for epistatic effects of common CTS-resistant substitutions at sites 111 and 122, we used
200
site-directed mutagenesis to introduce 15 substitutions (nine at position 111 and six at position 122)
201
in the wildtype ATP1A1 backgrounds of 9 different species (Fig. S4). The specific substitutions
202
chosen were either phylogenetically broadly-distributed parallel substitutions and/or divergent
203
substitutions that distinguish closely related clades of species. We expressed each of these 24
204
ATP1A1 constructs with an appropriate species-specific ATP1B1 protein (Table S3). For each
205
recombinant NKA protein complex, we characterized its level of CTS resistance (IC50) and we
206
estimated enzyme activity as the rate of ATP hydrolysis in the absence of CTS (Table S4).
207
208
Of the 12 substitutions for which IC50 could be measured, substitutions had a 15-fold effect on
209
average (Fig. 4C, Table S4) and were equally likely to increase or decrease IC50. To assess the
210
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
7
background-dependence of specific substitutions, we examined five cases in which a given
211
substitution (e.g., E111H), or the reverse substitution (e.g., H111E), could be evaluated on two or
212
more backgrounds. In the absence of intramolecular epistasis, the effect of a substitution in different
213
backgrounds should remain unchanged and the magnitude of the effect of the reverse substitution
214
should also be the same but with opposite sign. This analysis revealed substantial background
215
dependence for IC50 in two of the five informative cases (Fig. 4E; Table S5). In one case, the N122D
216
substitution results in a 200-fold larger increase in IC50 when added to the chinchilla (CHI)
217
background compared to the grass frog (GRA) background (p=1.2e-3 by ANOVA). In the other
218
case, the E111H substitution and the reverse substitution (H111E) produced effects in the same
219
direction (reducing CTS-resistance) when added to different backgrounds (false fer-de-lance (FER)
220
and red-necked keelback (KEE) snakes, respectively, p=1e-7 by ANOVA). Overall, these results
221
suggest that the effect of a given substitution on IC50 can be strongly dependent on the background
222
on which it occurs. The remaining three substitutions (H111T, Q111R and H122D) showed no
223
significant change in the magnitude of the effect on IC50 when introduced into different species’
224
backgrounds. These results suggest that, while some substitutions can have strong background-
225
dependent effects, strong intramolecular epistasis with respect to CTS resistance is not universal.
226
227
Pleiotropic effects on NKA activity exhibit little evidence for background-dependence.
228
229
We next tested whether substitutions at sites 111 and 122 have pleiotropic effects on ATPase
230
activity. Because ion transport across the membrane is a primary function of NKA and its disruption
231
can have severe pathological effects (38), mutations that compromise this function are likely to be
232
under strong purifying selection. As suggested by previous work (23–25), CTS-resistant
233
substitutions at sites 111 and 122 can decrease enzyme activity. We evaluated the generality of
234
these effects by comparing enzyme activity of the 15 mutant NKA proteins to their corresponding
235
wild-type proteins.
236
237
Interestingly, the wild-type enzymes themselves exhibit substantial variation in activity, from 3-18
238
nmol/mg*min (P = 6e-7 by ANOVA, Fig 4D; Table S4). On average, substitutions at sites 111 and
239
122 changed enzyme activity by 60% (Fig 4D; Fig S4). In two cases, amino acid substitutions at
240
position 122 (N122H and H122D) nearly inactivate lizard NKAs and, in one case, a substitution at
241
position 111 (Q111T) resulted in low expression of the recombinant protein in the transfected cells
242
(Fig S5; Fig. S6). A test of uniformity of pairwise t-test p-values across substitutions suggests a
243
significant enrichment of low p-values (Fig 4D inset; p=2.5e-4, chi-squared test of uniformity). Thus,
244
globally, this set of substitutions has significant effects on NKA activity, but they were not
245
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
8
significantly more likely to decrease than increase activity (10 decrease: 5 increase, p>0.3, binomial
246
test, Fig. 4D, Table S5).
247
248
We next asked to what extent pleiotropic effects of CTS resistant substitutions at positions 111 and
249
122 are dependent on genetic background. This question is motivated by recent studies in insects
250
which revealed that deleterious pleiotropic effects of some resistance-conferring substitutions at
251
sites 111 and 122 are background-dependent (23, 24). Likewise, recent work on ATP1A1 of toad-
252
eating grass frogs showed that effects of Q111R and N122D on NKA activity are background-
253
dependent (25). In contrast, among the five informative cases in which we compared the same
254
substitution (or the reverse substitution) on two or more backgrounds, there is little evidence for
255
background dependence (Fig 4E; Table S5). For example, N122D has similar effects on NKA
256
activity in grass frog and chinchilla despite the substantial divergence between the species’ proteins
257
(8.4% protein sequence divergence; Fig. 4D). Similarly, the effects of Q111R in ostrich or the
258
reverse substitution R111Q in sandgrouse were not significantly different from the effect of Q111R
259
in grass frog (7.5% and 8% protein sequence divergence, respectively).
260
261
To further examine the evidence for background dependence, we tested whether changes to the
262
same amino acid state (regardless of starting state) at 111 and 122 produce different changes in
263
NKA activity (e.g., R111E on the rat background versus H111E on the false fer-de-lance
264
background). If epistasis is important, we expect that the difference in effects of substitutions to a
265
given amino acid state should increase with increasing sequence divergence compared to ATP1A1
266
backgrounds in which that state is wild-type. However, across the 11 possible comparisons, we
267
found no relationship between the difference in the effect of substitutions to the same state and the
268
extent of amino acid divergence between the orthologous proteins (Fig. 5). This pattern suggests
269
that, while pleiotropic effects can be background dependent (23, 25), these effects are not
270
pervasive across species and do not correlate with overall sequence divergence.
271
272
273
The overall rate of convergence across ATP1A proteins does not depend on sequence
274
divergence.
275
276
If intramolecular epistasis is pervasive, we would predict that rates of convergent substitution
277
should decrease as a function of overall sequence divergence (12–14). In contrast to this
278
expectation, our experiments suggest that, for ATP1A1, the extent of background sequence
279
divergence is a poor predictor of the magnitude of effects of substitutions at sites 111 and 122 on
280
CTS resistance and enzyme activity. Since our experiments were necessarily limited in scope, we
281
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
9
carried out a broad phylogenetic analysis to evaluate how well our findings align with global
282
estimates of rates of convergence for the ATP1A family beyond ATP1A1 and beyond sites
283
implicated in CTS resistance.
284
285
Using a multisequence alignment of 831 ATP1A protein sequences, including the three ATP1A
286
paralogs shared among tetrapods (i.e., amphibians, non-avian reptiles, birds, and mammals), we
287
inferred a maximum likelihood phylogeny of the gene family (Fig. S1). We then used ancestral
288
sequence reconstruction to infer the history of substitution events on all branches in the tree and
289
counted the number of convergent amino acid substitutions along the protein per site (see Materials
290
and Methods). Convergent substitutions are defined as substitutions on two branches at the same
291
site resulting in the same amino acid state. Interestingly, we do not detect a correlation between
292
the relative number of convergent substitutions with background ATP1A divergence across the tree
293
(Fig. 6A). This result also holds true when considering only substitutions to the key CTS resistance
294
sites 111 and 122 (Fig. S5).
295
296
To gain more insight into the factors that determine convergent evolution in ATP1A, we looked
297
more closely at patterns of individual convergent substitutions at sites 111 and 122 by extracting
298
each convergent substitution and visualizing its distribution along the sequence divergence axis
299
(Fig. 6B). Under the expectation that rates of convergence should tend to decrease as a function
300
of sequence divergence, the distribution of pairwise convergent events along the sequence
301
divergence axis should be left-skewed, with a peak towards lower sequence divergence. In contrast
302
to this expectation, the distribution is bimodal, with one peak at 0.33 and the other at 0.69
303
substitutions/site (Fig. 6B bottom panel). Parallel and convergent substitutions have occurred
304
almost across the full range of protein divergence estimates. For example, if X is any starting state,
305
the substitution X111R has occurred independently in 13 tetrapod lineages and X111L
306
independently in 20 lineages. Both substitutions have a broad phylogenetic distribution, suggesting
307
that their effects do not strongly depend on overall genetic background. Interestingly, however, the
308
distributions for X111H and X111E substitutions are relatively right-skewed, in line with epistasis
309
for CTS resistance that we observed in experiments for H111E/E111H (Fig 4E). Overall, the results
310
of these analyses align well with our functional experiments but run contrary to expectations based
311
on previously reported proteome-wide evolutionary trends (12–14).
312
313
Discussion
314
315
Previous work has suggested that rates of convergent amino-acid substitution generally decline as
316
a function of time, a pattern that can potentially be explained by epistatic constraints. According to
317
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
10
this view, the higher the level of sequence divergence between a given pair of homologs, the higher
318
the probability that the same mutation will have different fitness effects on the two backgrounds
319
(12). In that respect, our broad survey of the ATP1A gene family in tetrapods, in combination with
320
previous work, reveals two striking and seemingly contradictory patterns. The first is that some
321
substitutions underlying CTS resistance in tetrapods are broadly distributed phylogenetically and
322
even shared with insects (e.g., N122H is widespread among snakes and found in the monarch
323
butterfly and other insects; see Fig. 3 for more examples). Patterns like these suggest that epistatic
324
constraints have a limited role in the evolution of CTS resistance, as the same mutation can be
325
favored on highly divergent genetic backgrounds. On the other hand, there is also substantial
326
diversity in resistance-conferring states at sites 111 and 122, and some combinations of these
327
substitutions appear to be phylogenetically restricted. For example, the CTS-resistant combination
328
of Q111R+N122D has evolved multiple times in tetrapods but is absent in insects, whereas the
329
CTS-resistant combination Q111V+N122H evolved multiple times in insects but is absent in
330
tetrapods (Fig 3). Additionally, some substitutions also appear to be paralog-specific in tetrapods
331
(Fig 3). These phylogenetic signatures suggest at least some role for epistasis as a source of
332
contingency in the evolution of ATP1A-mediated CTS resistance in animals (i.e., the fitness effects
333
of substitutions depend on the order in which they occur). How can these disparate patterns be
334
reconciled? To what extent do genetic background and contingency limit the evolution of CTS
335
resistance in animals?
336
337
In our survey of putative CTS-resistant substitutions at sites 111 and 122, we find that derived
338
substitutions have largely predictable effects on CTS resistance, with notable exceptions that tend
339
to be in magnitude rather than direction (Fig. 4C and 4E). While derived states at sites 111 and 122
340
are generally a reliable predictor of CTS resistance (Fig. 4A), they do not always predict the effect
341
size of particular substitutions (e.g., Q111R contributes to CTS resistance on many species’
342
backgrounds, but not on that of sandgrouse, Fig. 4C). It is also notable that species with identical
343
paired states at 111 and 122 can vary in CTS resistance by more than an order of magnitude. Both
344
patterns point to background determinants of CTS resistance that may be additive rather than
345
epistatic. Yet there are some broadly phylogenetically distributed substitutions, such as N122D,
346
that nonetheless do exhibit background-dependent effects on CTS resistance (Fig. 4C and 4E).
347
348
While epistasis is likely to be a pervasive feature in protein evolution, many mutational effects on
349
structural and functional properties of proteins appear to be purely additive (e.g., (39–41). In line
350
with this, our experimental results revealed that the phenotypic effects of individual substitutions
351
on ATPase activity are likely to be additive in general. We also found no correlation between the
352
marginal effect of a substitution with background genetic divergence. Specifically, mutating to the
353
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
11
same amino acid state (irrespective of the initial state) doesn’t result in larger effects in more distant
354
backgrounds. Under additivity, the rate of convergence is expected to be uncorrelated with
355
background genetic distance because the phenotypic effect of a mutation does not depend on the
356
amino acid states at other sites in the protein. Our phylogenetic and experimental results align with
357
this expectation.
358
359
While the extent to which changes in CTS resistance are favorable to an organism depend on
360
physiological constraints and the specific ecological context (e.g., in which tissues NKA is
361
expressed and the presence of dietary CTS), changes in enzyme activity associated with these
362
substitutions are most likely detrimental to organismal fitness. It follows that changes to the ATP1A1
363
background would be required to offset such changes in enzyme activity. Surprisingly, we found
364
that, with rare exceptions, CTS-resistant substitutions at sites 111 and 122 tend to exhibit little or
365
no pleiotropy with respect to enzyme activity. In addition, amino acid substitutions were not
366
significantly more likely to decrease rather than increase activity. Interestingly, the activity of
367
wildtype ATP1A1 enzymes varies 6-fold among the species surveyed (Fig. 4E), suggesting that
368
most species are either robust to changes in NKA activity, or that changes have occurred in other
369
genes (including other ATP1A paralogs) that compensate for changes in activity. Thus, it may be
370
that protein activity itself is either not an important pleiotropic constraint on the evolution of ATP1A
371
CTS resistance or that constraint depends not just on the protein background, but also on the
372
background at higher levels (e.g., other interacting proteins). A further possibility is that detrimental
373
effects of CTS resistant substitutions depend on few sites, and these sites are also highly
374
convergent (e.g., A119S among insect herbivores, see refs. 23 and 24).
375
376
We conclude that intramolecular epistasis in ATP1A -- at the level of protein activity -- is unlikely to
377
represent a substantial constraint in the evolution of CTS resistance. However, the lack of evidence
378
of epistasis at the level of protein function does not preclude an important role for epistasis at higher
379
levels. For example, our results are also consistent with a scenario of nonspecific (or global)
380
epistasis, where mutations have additive effects on molecular phenotypes (e.g., ATPase activity)
381
but have nonadditive effects on fitness due to a nonlinear relationship between phenotype and
382
fitness (7, 40, 42). Nonspecific epistasis predicts a many-to-one relationship with respect to genetic
383
backgrounds and specific mutations (7, 42), such that many genetic backgrounds can compensate
384
for the deleterious effects of a given mutation. Thus, nonspecific epistasis of this form could explain
385
why CTS resistant substitutions at sites 111 and 122 exhibit broad phylogenetic distributions.
386
387
Dependence on few sites, or the many-to-one nature of non-specific epistasis, may also account
388
for the weak signature of decreasing convergence with increasing divergence for ATP1A. Our study
389
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
12
suggests that, while intramolecular epistasis may be pervasive across proteomes, it does not
390
always represent a substantial constraint on the evolution of adaptive traits, as we show here for
391
CTS resistance in tetrapods. Further evaluation of epistasis at higher levels than enzyme activity
392
(e.g., whole organism neural function, CTS tolerance or viability, refs. 23, 24) may elucidate the
393
extent to which nonspecific epistasis constrains protein evolution in these cases.
394
395
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
13
Materials and Methods
396
397
Sample collection and data sources.
398
In order to carry out a comprehensive survey of vertebrate ATP1A paralogs, we collated a total of
399
831 protein sequences for this study (Supplementary Dataset 1). In addition to publicly available
400
data, we also generated RNA-seq data for 27 species of non-avian reptiles (Table S1;
401
PRJNA754197) to achieve a better representation of some previously underrepresented lineages.
402
These included field-caught and museum-archived specimens as well as animals purchased from
403
commercial pet vendors. Purchased animals were processed following the procedures specified in
404
the IACUC Protocol No. 2057-16 (Princeton University) and implemented by a research
405
veterinarian at Princeton University. Wild-caught animals were collected under Colombian umbrella
406
permit resolución No. 1177 granted by the Autoridad Nacional de Licencias Ambientales to the
407
Universidad de los Andes and handled according to protocols approved by the Institutional
408
Committee on the Care and Use of Laboratory Animals (abbreviated CICUAL in Spanish) of the
409
Universidad de los Andes. In all cases, fresh tissues (brain, stomach, and muscle) were taken and
410
preserved in RNAlater (Invitrogen) and stored at -80ºC until used.
411
412
Reconstruction of ATP1A paralogs.
413
RNA-seq libraries were prepared either using TruSeq RNA Library Prep Kit v2 (Illumina) and
414
sequenced on Illumina HiSeq2500 (Genomics Core Facility, Princeton, NJ, USA) or using NEBNext
415
Ultra RNA Library Preparation Lit (NEB) and sequenced on Illumina HiSeq4000 (Genewiz, South
416
Plainfield, NJ, USA) (Table S2). All raw RNA-seq data generated for this study have been deposited
417
in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under
418
bioproject PRJNA754197. Together with SRA datasets downloaded from public database, reads
419
were trimmed to Phred quality ≥ 20 and length ≥ 20 and then assembled de novo using Trinity
420
v2.2.0 (43). Sequences of ATP1A paralogs 1, 2 and 3 were pulled out with BLAST searches (blast-
421
v2.26), individually curated, and then aligned using ClustalW. Complete alignments of ATP1/2/3
422
can be found in Supplementary Dataset 1.
423
424
Character state mapping and parameter estimation for the ATP1A1-3 paralogs
425
Protein sequences from ATP1A1 (N=429), ATP1A2 (N=197) and ATP1A3 (N=205) including main
426
tetrapod groups (amphibians, non-avian reptiles, birds, and mammals) and lungfish+coelacanth as
427
outgroups were aligned using ClustalW with default parameters. The optimal parameters for
428
phylogenetic reconstruction were taken from the best-fit amino acid substitution model based on
429
Akaike Information Criterion (AIC) as implemented in ModelTest-NG v.0.1.5 (44), and was inferred
430
to be JTT+G4+F. An initial phylogeny was inferred using RAxML HPC v.8 (45) under the
431
JTT+GAMMA model with empirical amino acid frequencies. Branch lengths and node support
432
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
14
(aLRS) were further refined using PhyML v.3.1 (46) with empirical amino acid frequencies and
433
maximum likelihood estimates of rate heterogeneity parameters, I and G. Phylogeny visualization
434
and mapping of character states for each paralog was done using the R package ggtree (47).
435
436
Ancestral sequence reconstruction and convergence calculations
437
Ancestral sequence reconstruction (ASR) was performed in PAML using codeml (48) under the
438
JTT+G4+F substitution model. Statistical confidence in each position’s reconstructed state for each
439
ancestor was determined from the posterior probability (PP), and only states with PP>0.8 were
440
considered. Ancestral sequences from all nodes in the ATP1A phylogeny were retrieved from the
441
codeml output, resulting in an alignment of 1,660 ATP1A proteins (831 extant species and 829
442
inferred ancestral sequences; Fig. S1). For each branch in the tree, we determined the occurrence
443
of substitutions by using the ancestral and derived amino acid states at each site using only states
444
with PP>0.8. All branch pairs were compared, except sister branches and ancestor-descendent
445
pairs (12, 13). When comparing substitutions on two distinct branches at the same site,
446
substitutions to the same amino acid state were counted as convergences, while substitutions away
447
from a common amino acid were counted as divergences. An alignment of 1,040 amino acids was
448
used to calculate the number of molecular convergences and divergences, excluding a putative 30
449
amino acid-long alternative spliced region (positions 834-864). Model-based estimates of sequence
450
divergence, number of convergences, number of divergences, and total number of substitutions
451
since the common ancestor were recorded for each pairwise comparison. We calculated the
452
proportion of observed convergent events per branch as (number of convergences +1) / (number
453
of divergences +1). The line describing the trend was calculated as a running average with window
454
size of 0.05 substitutions/site. 95% confidence intervals were calculated based on 100 bootstrap
455
replicates per window, resampling only variable sites.
456
457
For sites 111 and 122, molecular convergence was coded as “1” when the substitution along
458
branchi was to the same amino acid state as the substitution along branchj, and “0” if substitutions
459
were to different states. Model-based estimates of sequence divergence, amino acid state, and
460
convergence event were recorded for each pairwise comparison (when convergence was “0”,
461
amino acid state was set to “NA”). A logistic regression between molecular convergence (0 or 1)
462
and genetic distance was used to test for the correlation between variables (Fig 6B; Fig. S3)
463
464
Construction of expression vectors.
465
ATP1A1 and ATP1B1 wild-type sequences for the eight selected tetrapod species (Fig 4) were
466
synthesized by InvitrogenTM GeneArt. The
𝛽
1-subunit genes were inserted into pFastBac Dual
467
expression vectors (Life Technologies) at the p10 promoter with XhoI and PaeI (FastDigest Thermo
468
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
15
ScientificTM) and then control sequenced. The α1-subunit genes were inserted at the PH promoter
469
of vectors already containing the corresponding
𝛽
1-subunit proteins using In-Fusion® HD Cloning
470
Kit (Takara Bio, USA Inc.) and control sequenced. All resulting vectors had the α1-subunit gene
471
under the control of the PH promoter and a
𝛽
1-subunit gene under the p10 promoter. The resulting
472
eight vectors were then subjected to site-directed mutagenesis (QuickChange II XL Kit; Agilent
473
Technologies, La Jolla, CA, USA) to introduce the codons of interest. In total, 21 vectors were
474
produced (Table S3).
475
476
Generation of recombinant viruses and transfection into Sf9 cells.
477
Escherichia coli DH10bac cells harboring the baculovirus genome (bacmid) and a transposition
478
helper vector (Life Technologies) were transformed according to the manufacturer’s protocol with
479
expression vectors containing the different gene constructs. Recombinant bacmids were selected
480
through PCR screening, grown, and isolated. Subsequently, Sf9 cells (4 x 105 cells*ml) in 2 ml of
481
Insect-Xpress medium (Lonza, Walkersville, MD, USA) were transfected with recombinant bacmids
482
using Cellfectin reagent (Life Technologies). After a three-day incubation period, recombinant
483
baculoviruses were isolated (P1) and used to infect fresh Sf9 cells (1.2 x 106 cells*ml) in 10 ml of
484
Insect-Xpress medium (Lonza, Walkersville, MD, USA) with 15 mg/ml gentamycin (Roth, Karlsruhe,
485
Germany) at a multiplicity of infection of 0.1. Five days after infection, the amplified viruses were
486
harvested (P2 stock).
487
488
Preparation of Sf9 membranes.
489
For production of recombinant NKA, Sf9 cells were infected with the P2 viral stock at a multiplicity
490
of infection of 103. The cells (1.6 x 106 cells*ml) were grown in 50 ml of Insect-Xpress medium
491
(Lonza, Walkersville, MD, USA) with 15 mg/ml gentamycin (Roth, Karlsruhe, Germany) at 27°C in
492
500 ml flasks (35). After 3 days, Sf9 cells were harvested by centrifugation at 20,000 x g for 10 min.
493
The cells were stored at -80 °C and then resuspended at 0 °C in 15 ml of homogenization buffer
494
(0.25 M sucrose, 2 mM EDTA, and 25 mM HEPES/Tris; pH 7.0). The resuspended cells were
495
sonicated at 60 W (Bandelin Electronic Company, Berlin, Germany) for three 45 s intervals at 0 °C.
496
The cell suspension was then subjected to centrifugation for 30 min at 10,000 x g (J2-21 centrifuge,
497
Beckmann-Coulter, Krefeld, Germany). The supernatant was collected and further centrifuged for
498
60 m at 100,000 x g at 4 °C (Ultra- Centrifuge L-80, Beckmann-Coulter) to pellet the cell
499
membranes. The pelleted membranes were washed once and resuspended in ROTIPURAN® p.a.,
500
ACS water (Roth) and stored at -20 °C. Protein concentrations were determined by Bradford assays
501
using bovine serum albumin as a standard. Three biological replicates were produced for each
502
NKA construct.
503
504
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
16
Verification by SDS-PAGE/western blotting.
505
For each biological replicate, 10 µg of protein were solubilized in 4x SDS-polyacrylamide gel
506
electrophoresis sample buffer and separated on SDS gels containing 10% acrylamide.
507
Subsequently, they were blotted on nitrocellulose membrane (HP42.1, Roth). To block non-specific
508
binding sites after blotting, the membrane was incubated with 5% dried milk in TBS-Tween 20 for
509
1 h. After blocking, the membranes were incubated overnight at 4 °C with the primary monoclonal
510
antibody α5 (Developmental Studies Hybridoma Bank, University of Iowa, Iowa City, IA, USA).
511
Since only membrane proteins were isolated from transfected cells, detection of the α subunit also
512
indicates the presence of the β subunit. The primary antibody was detected using a goat-anti-
513
mouse secondary antibody conjugated with horseradish peroxidase (Dianova, Hamburg,
514
Germany). The staining of the precipitated polypeptide-antibody complexes was performed by
515
addition of 60 mg 4-chloro-1 naphtol (Sigma-Aldrich, Taufkirchen, Germany) in 20 ml ice-cold
516
methanol to 100 ml phosphate buffered saline (PBS) containing 60 µl 30% H2O2. See Fig. S6.
517
518
Ouabain inhibition assay.
519
To determine the sensitivity of each NKA construct against cardiotonic steroids (CTS), we used the
520
water-soluble cardiac glycoside, ouabain (Acrōs Organics), as our representative CTS. 100 ug of
521
each protein was pipetted into each well in a nine-well row on a 96-well microplate (Fisherbrand)
522
containing stabilizing buffers (see buffer formulas in (49)). Each well in the nine-well row was
523
exposed to exponentially decreasing concentrations (10-3 M, 10-4 M, 10-5 M, 10-6 M, 10-7 M, 10-8 M,
524
dissolved in distilled H2O) of ouabain, distilled water only (experimental control), and a combination
525
of an inhibition buffer lacking KCl and 10-2 M ouabain to measure background protein activity (49).
526
The proteins were incubated at 37°C and 200 rpms for 10 minutes on a microplate shaker
527
(Quantifoil Instruments, Jena, Germany). Next, ATP (Sigma Aldrich) was added to each well and
528
the proteins were incubated again at 37°C and 200 rpms for 20 minutes. The activity of NKA
529
following ouabain exposure was determined by quantification of inorganic phosphate (Pi) released
530
from enzymatically hydrolyzed ATP. Reaction Pi levels were measured according to the procedure
531
described in Taussky and Shorr (50) (see Petschenka et al. (49)). All assays were run in duplicate
532
and the average of the two technical replicates was used for subsequent statistical analyses.
533
Absorbance for each well was measured at 650 nm with a plate absorbance reader (BioRad Model
534
680 spectrophotometer and software package). See Table S4.
535
536
ATP hydrolysis assay.
537
To determine the functional efficiency of different NKA constructs, we calculated the amount of Pi
538
hydrolyzed from ATP per mg of protein per minute. The measurements were obtained from the
539
same assay as described above. In brief, absorbance from the experimental control reactions, in
540
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
17
which 100 µg of protein was incubated without any inhibiting factors (i.e., ouabain or buffer
541
excluding KCl), were measured and translated to mM Pi from a standard curve that was run in
542
parallel (1.2 mM Pi, 1 mM Pi, 0.8 mM Pi, 0.6 mM Pi, 0.4 mM Pi, 0.2 mM Pi, 0 mM Pi). See Table
543
S4.
544
545
Statistical analyses of functional data.
546
Background phosphate absorbance levels from reactions with inhibiting factors were used to
547
calibrate phosphate absorbance in wells measuring ouabain inhibition and in the control wells
548
measuring non-inhibited NKA activity (49). For ouabain sensitivity measurements, calibrated
549
absorbance values were converted to percentage non-inhibited NKA activity based on
550
measurements from the control wells (49). These data were plotted and log IC50 values were
551
obtained for each biological replicate from nonlinear fitting using a four-parameter logistic curve,
552
with the top asymptote set to 100 and the bottom asymptote set to zero. Curve fitting was performed
553
with the nlsLM function of the minipack.lm library in R. For comparisons of recombinant protein
554
activity, the calculated Pi concentrations of 100 µg of protein assayed in the absence of ouabain
555
were converted to nmol Pi/mg protein/min. IC50 values were log-transformed. We used pairwise t-
556
tests with Bonferroni corrections to identify significant differences between constructs with and
557
without engineered substitutions. We used a two-way ANOVA to test for background dependence
558
of substitutions (i.e., interaction between background and amino acid substitution) with respect to
559
ouabain resistance (log IC50) and protein activity. Specifically, we tested whether the effects of a
560
substitution X->Y are equal on different backgrounds (null hypothesis: X->Y (background 1) = X-
561
>Y (background 2)). We further assumed that the effects of a substitution X->Y should exactly
562
match that of Y->X. All statistical analyses were implemented in R. Data were plotted using the
563
ggplot2 package in R.
564
565
566
567
Acknowledgments
568
569
We thank C. Natarajan, P. Kowalski, M. Winter, and V. Wagschal for assistance in the laboratory,
570
and D.A. Gómez-Sánchez for assistance in the field. We thank J. Oaks for providing tissue from
571
ring-necked snake. Funding: This study was funded by grants from the National Institutes of Health
572
to PA (R01-GM115523), JFS (R01-HL087216) and SM (F32–HL149172), the National Science
573
Foundation (OIA-1736249) to JFS, the Deutsche Forschungsgemeinschaft (Do 517/10-1) to SD,
574
and the Alexander von Humboldt Foundation to SM.
575
576
577
578
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
18
References
579
580
1. D. L. Stern, The genetic causes of convergent evolution. Nat. Rev. Genet. 14, 751–764
581
(2013).
582
2. J. F. Storz, Causes of molecular convergence and parallelism in protein evolution. Nat. Rev.
583
Genet. 17, 239 (2016).
584
3. D. L. Stern, Evolution, development, & the predictable genome (Roberts and Co. Publishers,
585
2011).
586
4. P. C. Phillips, Epistasis—the essential role of gene interactions in the structure and
587
evolution of genetic systems. Nat. Rev. Genet. 9, 855–867 (2008).
588
5. W. Fitch, Rate of change of concomitantly variable codons. J. Mol. Evol. 1, 84–96 (1971).
589
6. B. Callahan, R. A. Neher, D. Bachtrog, P. Andolfatto, B. I. Shraiman, Correlated evolution of
590
nearby residues in Drosophilid proteins. PLoS Genet. 7, e1001315 (2011).
591
7. T. N. Starr, J. W. Thornton, Epistasis in protein evolution. Protein Sci. 25, 1204–1218
592
(2016).
593
8. J. F. Storz, Compensatory mutations and epistasis for protein function. Curr. Opin. Struct.
594
Biol. 50, 18–25 (2018).
595
9. D. D. Pollock, G. Thiltgen, R. A. Goldstein, Amino acid coevolution induces an evolutionary
596
Stokes shift. Proc. Natl. Acad. Sci. 109, E1352 (2012).
597
10. P. Shah, D. M. McCandlish, J. B. Plotkin, Contingency and entrenchment in protein
598
evolution under purifying selection. Proc. Natl. Acad. Sci. 112, E3226 (2015).
599
11. V. O. Pokusaeva, et al., An experimental assay of the interactions of amino acids from
600
orthologous sequences shaping a complex fitness landscape. PLoS Genet. 15, e1008079
601
(2019).
602
12. R. A. Goldstein, S. T. Pollard, S. D. Shah, D. D. Pollock, Nonadaptive amino acid
603
convergence rates decrease over time. Mol. Biol. Evol. 32, 1373–1381 (2015).
604
13. Z. Zou, J. Zhang, Are convergent and parallel amino acid substitutions in protein evolution
605
more prevalent than neutral expectations? Mol. Biol. Evol. 32, 2085–2096 (2015).
606
14. Z. Zou, J. Zhang, Gene tree discordance does not explain away the temporal decline of
607
convergence in mammalian protein sequence evolution. Mol. Biol. Evol. 34, 1682–1688
608
(2017).
609
15. J. B. Lingrel, The physiological significance of the cardiotonic steroid/ouabain-binding site
610
of the Na, K-ATPase. Annu. Rev. Physiol. 72, 395–412 (2010).
611
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
19
16. A. Koksoy, Na+, K+-ATPase: a review. J Ank. Med Sch 24, 73–82 (2002).
612
17. S. Dobler, S. Dalla, V. Wagschal, A. A. Agrawal, Community-wide convergent evolution in
613
insect adaptation to toxic cardenolides by substitutions in the Na, K-ATPase. Proc. Natl.
614
Acad. Sci. 109, 13040–13045 (2012).
615
18. Y. Zhen, M. L. Aardema, E. M. Medina, M. Schumer, P. Andolfatto, Parallel molecular
616
evolution in an herbivore community. Science 337, 1634–1637 (2012).
617
19. D. J. Moore, D. C. Halliday, D. M. Rowell, A. J. Robinson, J. S. Keogh, Positive Darwinian
618
selection results in resistance to cardioactive toxins in true toads (Anura: Bufonidae). Biol.
619
Lett. 5, 513–516 (2009).
620
20. B. Ujvari, et al., Widespread convergence in toxin resistance by predictable molecular
621
evolution. Proc. Natl. Acad. Sci. 112, 11911–11916 (2015).
622
21. S. Mohammadi, et al., Toxin-resistant isoforms of Na+/K+-ATPase in snakes do not closely
623
track dietary specialization on toads. Proc. R. Soc. B Biol. Sci. 283, 20162111 (2016).
624
22. S. Mohammadi, L. Yang, M. Bulbert, H. M. Rowland, The evolutionary and behavioural
625
ecology of cardiotonic steroid resistance in predators (In prep).
626
23. A. M. Taverner, et al., Adaptive substitutions underlying cardiac glycoside insensitivity in
627
insects exhibit epistasis in vivo. eLife 8, e48224 (2019).
628
24. M. Karageorgi, et al., Genome editing retraces the evolution of toxin resistance in the
629
monarch butterfly. Nature 574, 409–412 (2019).
630
25. S. Mohammadi, et al., Concerted evolution reveals co-adapted amino acid substitutions in
631
frogs that prey on toxic toads. Curr. Biol. 31, 2530-2538.e10 (2021).
632
26. E. M. Price, J. B. Lingrel, Structure-function relationships in the sodium-potassium ATPase.
633
alpha. subunit: site-directed mutagenesis of glutamine-111 to arginine and asparagine-122
634
to aspartic acid generates a ouabain-resistant enzyme. Biochemistry 27, 8400–8408
635
(1988).
636
27. K. J. Sweadner, et al., Genotype-structure-phenotype relationships diverge in paralogs
637
ATP1A1, ATP1A2, and ATP1A3. Neurol. Genet. 5, e303–e303 (2019).
638
28. A. Mobasheri, et al., Na+, K+-ATPase isozyme diversity; comparative biochemistry and
639
physiological implications of novel functional interactions. Biosci. Rep. 20, 51–91 (2000).
640
29. B. Ujvari, et al., Isolation breeds naivety: island living robs Australian varanid lizards of
641
toad-toxin immunity via four-base-pair mutation. Evol. Int. J. Org. Evol. 67, 289–294
642
(2013).
643
30. B. M. Marshall, et al., Widespread vulnerability of Malagasy predators to the toxins of an
644
introduced toad. Curr. Biol. 28, R654–R655 (2018).
645
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
20
31. S. Groen, N. Whiteman, Convergent evolution of cardiac-glycoside resistance in predators
646
and parasites of milkweed herbivores. Curr. Biol. 31, R1465–R1466 (2021).
647
32. M. Lunzer, G. B. Golding, A. M. Dean, Pervasive cryptic epistasis in molecular evolution.
648
PLoS Genet 6, e1001162 (2010).
649
33. L. Yang, et al., Predictability in the evolution of Orthopteran cardenolide insensitivity.
650
Philos. Trans. R. Soc. B 374, 20180246 (2019).
651
34. A. Stoltzfus, D. M. McCandlish, Mutational Biases Influence Parallel Adaptation. Mol. Biol.
652
Evol. 34, 2163–2172 (2017).
653
35. J. Zhang, S. Kumar, Detection of convergent and parallel evolution at the amino acid
654
sequence level. Mol. Biol. Evol. 14, 527–536 (1997).
655
36. L. F. Toledo, R. Ribeiro, C. F. Haddad, Anurans as prey: an exploratory analysis and size
656
relationships between predators and their prey. J. Zool. 271, 170–177 (2007).
657
37. S. Dobler, et al., New ways to acquire resistance: imperfect convergence in insect
658
adaptations to a potent plant toxin. Proc. R. Soc. B 286, 20190883 (2019).
659
38. M. V. Clausen, F. Hilbers, H. Poulsen, The structure and function of the Na, K-ATPase
660
isoforms in health and disease. Front. Physiol. 8, 371 (2017).
661
39. J. A. Wells, Additivity of mutational effects in proteins. Biochemistry 29, 8509–8517 (1990).
662
40. M. Lunzer, S. P. Miller, R. Felsheim, A. M. Dean, The biochemical architecture of an ancient
663
adaptive landscape. Science 310, 499–501 (2005).
664
41. L. I. Gong, M. A. Suchard, J. D. Bloom, Stability-mediated epistasis constrains the evolution
665
of an influenza protein. Elife 2, e00631 (2013).
666
42. P. Nosil, et al., Ecology shapes epistasis in a genotype–phenotype–fitness map for stick
667
insect colour. Nat. Ecol. Evol. 4, 1673–1684 (2020).
668
43. B. J. Haas, et al., De novo transcript sequence reconstruction from RNA-seq using the
669
Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
670
44. D. Darriba, et al., ModelTest-NG: a new and scalable tool for the selection of DNA and
671
protein evolutionary models. Mol. Biol. Evol. 37, 291–294 (2020).
672
45. A. Stamatakis, RAxML version 8: a tool for phylogenetic analysis and post-analysis of large
673
phylogenies. Bioinformatics 30, 1312–1313 (2014).
674
46. S. Guindon, et al., New algorithms and methods to estimate maximum-likelihood
675
phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
676
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
21
47. G. Yu, D. K. Smith, H. Zhu, Y. Guan, T. T. Lam, ggtree: an R package for visualization and
677
annotation of phylogenetic trees with their covariates and other associated data. Methods
678
Ecol. Evol. 8, 28–36 (2017).
679
48. Z. Yang, PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–
680
1591 (2007).
681
49. G. Petschenka, et al., Stepwise evolution of resistance to toxic cardenolides via genetic
682
substitutions in the Na+/K+-ATPase of milkweed butterflies (Lepidoptera: Danaini).
683
Evolution 67, 2753–2761 (2013).
684
50. H. H. Taussky, E. Shorr, A microcolorimetric method for the determination of inorganic
685
phosphorus. J. Biol. Chem. 202, 675–685 (1953).
686
687
688
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
22
Figures and Tables
689
690
691
692
Figure 1. Na+,K+-ATPase structure and phylogenetic relationships of ATP1A paralogs among
693
vertebrates. (A) Crystal structure of an Na+,K+-ATPase (NKA) with a bound the representative CTS
694
bufalin in blue (PDB 4RES). The zoomed-in panel shows the H1-H2 extracellular loop, highlighting
695
two amino acid positions (111 and 122 in red) that have been implicated repeatedly in CTS
696
resistance. We highlight key examples of convergence in amino acid substitutions at sites in the
697
H1-H2 extracellular loop associated with CTS resistance in Fig 3. (B) Phylogenetic relationships
698
among ATP1A paralogs of vertebrates and ATPa of insects.
699
700
A
extracellular
intracellular
membrane
122
111
B
insect ATP α
ATP 1A 4
ATP 1A 3
ATP 1A 2
ATP 1A 1
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
23
701
702
Figure 2. Patterns of molecular evolution in the α(M1–M2) extracellular loop of ATP1A
703
paralogs shared among tetrapods. (A) Maximum likelihood phylogeny of tetrapod ATP1A1, (B)
704
ATP1A2, and (C) ATP1A3. The character states for eight sites relevant to CTS resistance in and
705
near the H1-H2 loop of the NKA protein are shown at the node tips. Yellow internal nodes indicate
706
ancestral sequences reconstructed to infer derived amino acid states across clades to ease
707
visualization; nodes reconstructed: MRCA of mammals, reptiles, and amphibians. Top right, each
708
semi-circle indicates the site mapped in the main phylogeny with the inferred ancestral amino acid
709
state for each of the three yellow nodes (posterior probability >0.8). In ATP1A1, site 119 was
710
inferred as Q119 for amphibians and mammals, and N119 for reptiles (Table S6); in ATP1A2-3 site
711
119 was inferred as A119 for amphibians and reptiles, and S119 for mammals (Table S6). Site
712
number corresponds to pig (Sus scrofa) reference sequence. Higher number and variation of
713
substitutions in ATP1A1 stand out in comparison to the other paralogs.
714
715
Amino acid
A
aA
aD
aE
aG
aN
aQ
aS
aY
D
E
G
K
L
N
Q
R
S
T
V
Y
Amino acid
aA
aD
aE
aN
aQ
aS
aY
D
E
F
G
H
I
K
L
M
N
Q
R
S
T
V
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
X
Y
A
BC
N
N
N/Q
E
E
A
Y
Q
108
111
112
115
116
119
122
120
A
G
I
P
F
L
V
Y
Hydrophobic
Ancestral
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
N
N
Q/N
E
E
A
Q
Y
108
111
112
115
116
119
120
122
S
N
Q
T
Polar
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
N
N
Q/N
E
E
A
Q
Y
108
111
112
115
116
119
120
122
D
E
Acidic
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
N
N
Q/N
E
E
A
Q
Y
108
111
112
115
116
119
120
122
H
K
R
Basic
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
N
N
Q/N
E
E
A
Q
Y
108
111
112
115
116
119
120
122
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
Amino acid
A
aA
aE
aN
aQ
aY
D
E
F
G
H
I
K
L
N
P
Q
R
S
T
V
Y
Acidic
D
E
Hydrophobic
A
G
I
P
Basic
H
K
R
Polar
S
N
Q
Amino acid
F
L
V
Y
T
Ancestral state
108
111
112
115
116
119
120
122
N
N
Q/N
E
E
A
Q
Y
N
N
Q/N
E
E
A
Q
Y
108
111
112
115
116
119
120
122
N
N/G
A/S
D
E
A
Y
Q
ATP 1 A 1
ATP 1 A 2 AT P1 A 3
1
A
1
1
A
2/3
Ancestral states
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
24
716
717
718
Figure 3. Parallel and divergent patterns of CTS-resistant substitutions across ATP
a
1 of
719
insects and the shared ATP1A paralogs of tetrapods. Examples of convergence in ATPa1
720
across insects (A). Convergence in the (B) ATP1A1, (C) ATP1A2, and (D) ATP1A3 paralogs,
721
respectively, across tetrapods. Numbers indicate the number of independent substitutions in each
722
major clade depicted. For ATP1A3, resistance-conferring amino acid substitutions have been
723
identified at site 120, and not 122. A full list of amino acid substitutions can be found in
724
Supplementary Dataset 2 for tetrapods, and Taverner et al. (23) for insects.
725
726
A B
C D
extracellular
S
Q111V
Q111E
intracellular
N122
Q111
Q111T
extracellular
Q111L
intracellular
N122
Q111
N122H
N122D
Q111R
Q111H
extracellular
Q111V
N120R
intracellular
N122
Q111
Q111T
N122H
Q111R Q111V
G120N
extracellular
intracellular
Q111
G120
ATPα1 ATP1A 1
ATP1A2 AT P 1 A 3
Q111T
N120
N120
N120K
Q111H
N122D
8
Q111L
G120R
Q111L
Q111L
N122H
2
1
1
2
1
2
1
1
41
34
2
2
3
3
1
1
1
2
1
2
21
1
1
1
11
1
4
1
1
1
2
2
1
1
2
2
2
2
3
4
3
Q111E
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
25
727
Figure 4. Functional properties of wild-type and engineered ATP1A1. (A) Cladogram
728
relating the surveyed species. GRA: Grass Frog (Leptodactylus); RAT: Rat (Rattus); CHI:
729
Chinchilla (Chinchilla); OST: Ostrich (Struthio); SNG: Sandgrouse (Pterocles); MON: Monitor
730
lizard (Varanus); TEG: Tegu lizard (Tupinambis); FER: False fer-de-lance (Xenodon); KEE: Red-
731
necked keelback snake (Rhabdophis). Two-letter codes underneath each avatar indicate native
732
amino acid states at sites 111 and 122, respectively. Data for grass frog from Mohammadi et al.
733
(2021). (B) Levels of CTS resistance (IC50) among wild-type enzymes. The x-axis distinguishes
734
among ATP1A1 with 0, 1 or 2 derived states at sites 111 and 122. The subscripts S and R refer
735
to the CTS-sensitive and CTS-resistant paralogs, respectively. (C) Effects of changing the
736
number of substitutions at 111 or 122 on CTS resistance (IC50). Substitutions result in
737
predictable changes to resistance except in the reversal R111Q in Sandgrouse (SNG). GRAS
738
represents Q111R+N122D on the sensitive paralog background. (D) Effects of single
739
substitutions on Na+,K+-ATPase (NKA) activity. Each modified ATP1A1 is compared to the wild-
740
type enzyme for that species. The inset shows the distribution of t-test p-values for all 15
741
substitutions, with the dotted line indicating the expectation. (E) Evidence for epistasis for CTS
742
resistance (IC50, upper panel) and lack of such effects for enzyme activity (lower panel). Each
743
line compares the same substitution (or the reverse substitution) tested on at least two
744
backgrounds. Thicker lines correspond to substitutions with significant sequence-context
745
dependent effects (Bonferroni-corrected ANOVA p-values < 0.05, Table S5).
746
A.
C.
B.
D.
E.
−7
−6
−5
−4
−3
wt mutant
Background
log10(IC50)
Mutation
H111E
H111T
H122D
N122D
Q111R
FER
KEE
MON
SNG
RAT
OST
GRA(S)
FER+
KEE-
MON-
SNG-
RAT-
OST+
GRA (S) +
FER+
FER+
CHI GRA (S) +
CHI+
EN
RD ENHHTNQH
QN RN
QN
QN
ANC
GRA RAT CHI OST SNG MON TEG FER KEE
RD
−6
−5
−4
−3
012
# substitutions at sites 111−122
log10(IC50)
States
QN
EN
RN
TN
QH
HH
RD
FER
KEE
MON
SNG RAT
PIG
GRA(S)
CHI
TEG
GRA(R)
OST
−6
−5
−4
−3
−2
0 1 2
# substitutions at sites 111−122
log10(IC50)
States
QN
EN
RN
QD
ED
RD
OST
GRA(S)
SNG
OST SNG
GRA(S)
CHI
CHI
GRA(S)
−200
0
200
H122D
N122H
N122D
Q111T
R111E
D122H
Q111R
T111H
H111E
R111Q
E111H
H111T
Substitution
%D Activity (pmol/mg*min)
Site
111
122
Background
CHI
FER
GRA
KEE
MON
OST
RAT
SNG
TEG
0
2
4
6
0.00 0.25 0.50 0.75
P−value
Frequency
−7
−6
−5
−4
−3
wt mutant
Background
log10(IC50)
Mutation
H111E
H111T
H122D
N122D
Q111R
FER
KEE
MON
SNG
RAT
OST
GRA(S)
FER
KEE
MON
SNG
RAT
OST
GRA(S)
CHI
CHI
5
10
15
20
wt mutant
Background
Activity (pmol/mg*min)
Mutation
H111E
H111T
H122D
N122D
Q111R
FER
KEE
MON
SNG
RAT
OST
GRA(S)
FER
KEE MON
SNG
RAT
OST
GRA(S)
FER
CHI
GRA(S)
CHI
TEG
TEG
Log10(IC50)
Log10(IC50)
# substitutions at sites 111+122
# substitutions at sites 111+122
Log10(IC50)
Activity (nmol/mg*min)
Background
wt mutant
wt mutant
%Δ Activity (nmol/mg*min)
Substitution
P-value
Frequency
QN RD RD EN QN RN TN QH HH EN
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
26
747
748
749
Figure 5. No relationship between the effect of substitution to a given amino state on activity
750
and the extent of divergence between ATP1A1 orthologs. Each point represents a comparison
751
between the effect (% change in activity relative to the wild-type enzyme) of a given amino acid
752
state (e.g., 122D) on two different genetic backgrounds. For example, the effect of 122D between
753
chinchilla and false fer-de-lance is measured as % change [chinchilla vs. chinchilla+N122D] minus
754
the % change [false fer-de-lance vs. false fer-de-lance+H122D]. Comparisons were measured as
755
the difference between the two effects. In total, 11 comparisons were possible. The x-axis
756
represents the number of amino acid differences between two ATP1A1 proteins being compared.
757
Assuming intramolecular epistasis for protein function is prevalent, a positive correlation is
758
predicted. However, no such relationship is observed (Spearman’s correlation, rS = -0.42, p = 0.19).
759
760
0
20
40
60
80
100
120
140
160
180
200
45 55 65 75 85
AAState
111E
111H
111R
111T
122D
122H
# of amino differences between
ATP 1A1 ort holo gs
Difference in effect on two backgrounds (%)
1.0
1.5
2.0
2.5
3.0
3.5
45 50 55 60 65 70 75 80 85 90 95
# of pairwise amino differences between ATP1A1 orthologs
Magnitude of change (LOG |% difference of mean protein activities|)
AAState
111E
111H
111R
111T
122D
122H
AA State
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint
27
761
Figure 6. Rate of convergence across ATP1A sequences as a function of increasing
762
sequence divergence. (A) Change in the rate of convergence (protein wide) over time for the
763
ATP1A protein family. The proportion of convergent (C) over divergent (D) substitutions along the
764
entire protein sequence was estimated for all pairs of branches in the ATP1A phylogeny, except
765
for sister branches or ancestor-descendant pairs. Color scale shows the density of dots for both
766
axes. The distance between branches corresponds to the expected number of amino acid
767
substitutions per site between protein pairs being compared (under the JTT+G4+F model). The red
768
line shows a running average with a window size of 0.05 substitutions/site. Dashed lines show the
769
95% confidence interval based on 100 bootstrap replicates per window. (B) For each derived amino
770
acid state at sites 111 and 122, the histograms show the distribution of pairwise convergent events
771
along the sequence divergence axis (expected number of substitutions per site). Substitutions are
772
color coded as in Figure 2. The histogram at the bottom shows the combined distribution of pairwise
773
convergent events for both sites.
774
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted November 30, 2021. ; https://doi.org/10.1101/2021.11.29.470343doi: bioRxiv preprint