Page 1
Quantifying evolvability in small biological networks
Andrew Mugler∗
Department of Physics, Columbia University, New York, NY 10027
Etay Ziv†
College of Physicians and Surgeons, Columbia University, New York, NY 10027
Ilya Nemenman‡
Computer, Computational and Statistical Sciences Division,
and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545
Chris H. Wiggins§
Department of Applied Physics and Applied Mathematics,
Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10027
(Dated: November 18, 2008)
We introduce a quantitative measure of the capacity of a small biological network to evolve. We
apply our measure to a stochastic description of the experimental setup of Guet et al. (Science
296:1466, 2002), treating chemical inducers as functional inputs to biochemical networks and the
expression of a reporter gene as the functional output. We take an information-theoretic approach,
allowing the system to set parameters that optimize signal processing ability, thus enumerating
each network’s highest-fidelity functions. We find that all networks studied are highly evolvable
by our measure, meaning that change in function has little dependence on change in parameters.
Moreover, we find that each network’s functions are connected by paths in the parameter space
along which information is not significantly lowered, meaning a network may continuously change
its functionality without losing it along the way. This property further underscores the evolvability
of the networks.
Many signals in cells are processed using a network of
interacting genes: exogenous signals affect expression of
genes coding for transcription factor proteins, which in
turn regulate the expression of other genes. Although
early works have suggested that the connectivity of such
regulatory networks dictates their function [1, 2, 3], re-
cent studies offer evidence that a network with fixed con-
nectivity can change its function simply by varying its
biochemical parameters [4, 5, 6, 7]. The diversity of a
network’s achievable functions and the ease with which
it can realize them are central to its capacity to evolve
epigenetically, without slow and costly modifications to
the genetic code, and thus central to the evolutionary
capacity of the organism as a whole.
The evolvability of a regulatory network has been a
topic of much discussion in recent literature [7, 8, 9, 10,
11, 12], but little has been done to quantify the con-
cept in a principled way. Here we propose a quantitative
measure of network evolvability, and we apply it to a
set of small regulatory networks, such that a principled
comparison can be made across networks. Networks are
taken from the experimental setup of Guet et al. [4] and
modeled stochastically. We find biochemical parameters
that optimize the information flow between a chemical
“input” signal and a particular “output” gene, and we
indeed find that a single network performs different func-
tions at different sets of optimal parameters. We argue
that a more evolvable network will be able to access a
richer diversity of its functions with smaller changes in
its parameters, and as such we quantify evolvability us-
ing a measure of anti-correlation between parametric and
functional change.
We find that while there are small differences among
networks’ evolvability scores, all are highly evolvable,
meaning that the magnitude of a functional change has
little dependence on the parametric change required to
produce it. Moreover, we find that transitions among a
network’s optimally informative functions can be made
without significant loss of the input-output information
along the way. By proposing and demonstrating a princi-
pled evolvability measure, we reveal these features quan-
titatively; both features suggest a high capacity of the
studied regulatory networks to evolve.
METHODS
First we briefly outline the methods used to develop
a quantitative measure of evolvability; each step is dis-
cussed in more detail in the sections that follow. The sys-
tem of interest is a small (4-gene) transcriptional regula-
tory network. As in Guet et al. [4], we treat the presence
or absence of chemical inducers (small molecules that af-
fect the efficacy of the transcription factors) as the inputs
to the network, and the expression of a particular gene as
the output. We use a stochastic model to find expression
level distributions at a steady-state, and we search for the
model parameters which maximize the mutual informa-
arXiv:0811.2834v1 [q-bio.MN] 18 Nov 2008
Page 2
2
tion between input and output. We characterize the func-
tion that the network performs by the order in which the
output distributions corresponding to each input state
are encoded [13]; a single network can perform different
functions at different parameter settings. We define the
evolvability of the network as the ability to perform a
diverse set of functions with only small changes in pa-
rameters, and we quantify evolvability accordingly using
a measure of anti-correlation between pair-wise function
distance and parameter distance.
Model
Following the experimental setup of Guet et al. [4], we
study all networks that can be built out of three genes
A, B, and C, in which each gene is regulated by one
other gene, and regulation edges can be up-regulating
or down-regulating. Additionally, as in the experiment,
gene C down-regulates a “reporter” gene G (e.g., GFP),
whose expression we treat as the functional output of the
network. This yields a total of 24 networks, as shown on
the horizontal axis of Figure 2. Also in analogy to the
experiment, the efficacy of each transcription factor can
be inhibited by the presence of a chemical inducer s, a
small molecule that binds to the transcription factor and
lowers its affinity for its binding site. The presence or
absence of the chemical inducers sA, sB, and sC corre-
sponding to each transcription factor A, B, and C define
the functional input state of the network. The inhibitory
effect of each inducer is illustrated for an example net-
work in the top panel of Figure 1A, and the eight possible
input states i, determined by the presence or absence of
the inducers, are listed in the bottom panel.
For a typical regulatory network inside a cell, intrin-
sic noise arising from fluctuations in the small numbers of
species [14, 15] is the primary factor limiting transmission
of information from a chemical input to a genetic out-
put [5, 13, 16, 17]. This observation has two important
consequences for modeling: (a) a realistic model should
capture not just mean protein concentrations, but proba-
bility distributions over numbers of proteins [18, 19], and
(b) the most biologically relevant model parameters will
retain an optimal flow of information in the presence of
this noise [5, 20, 21].
The first consequence is most fully addressed by solv-
ing the chemical master equation [22], which describes
the time-evolution of the joint probability distribution
for the numbers of all molecules in the system given the
elementary reactions (which are ultimately a function of
the network topology). For our systems, the master equa-
tion is not analytically solvable. Progress can be made
either by Monte-Carlo simulation of the master equation
[23], or by approximating the master equation, e.g. with
the linear-noise approximation (LNA) [5, 13, 22, 24, 25].
Since it does not rely on sampling, the LNA is much more
computationally efficient (and thus more amenable to a
search for high-fidelity model parameters), and in previ-
ous work [5] we found the distributions obtained via the
LNA were practically indistinguishable from those ob-
tained via stochastic simulation for copy numbers above
10 − 20.
In the LNA, the reaction rates in the master equation
are linearized, and the steady-state solution is a multi-
dimensional Gaussian distribution [5, 13, 22]. The indi-
vidual species’ marginal distributions are thus described
at the level of Gaussian fluctuations, with means given
by the steady-state solution to the deterministic dynam-
ical system describing mean protein numbers. Mean ex-
pression has been modeled with remarkable success by
combining transcription and translation into one step
[26, 27, 28], and accordingly, for each of our networks,
we use the following dynamical system in which species
are directly coupled to one another,
dXj
dt
= αj(Xπj/sπj) − RjXj,
(1)
where the Xj∈ {A,B,C,G} are the expression levels (in
units of proteins per cell) of the four genes, the Rj are
degradation rates, and the αjare production rates which
depend on the expression level Xπjand a scale factor sπj
of the parent πjof gene j (where the parent-children con-
nectivity is determined by the network topology). The
scale factors, sπj∈ {sA,sB,sC} ≥ 1, incorporate the in-
hibitory effect of each chemical inducer (when present)
by reducing the effective transcription factor concentra-
tion; when an inducer is absent, its scale factor is set to
1. Regulation is modeled using Hill functions,
?
a0+ aj
xn+(Kj)n
αj(x) =
a0+ aj
xn
xn+(Kj)n
(Kj)n
up − regulating,
down − regulating,
(2)
where a0(kept the same for all genes) is the basal pro-
duction rate, a0+ajis the maximal production rate, Kj
is the binding constant or the protein number at which
production is half-maximal, and n is the cooperativity,
which we set to 2. Note from Eqns. (1-2) that increas-
ing the scale factor can be equivalently interpreted as
increasing the effective binding constant or lowering the
binding affinity. Steady states of Eqn. (1) are found by
solving (using MATLAB’s roots) the polynomial equa-
tions that result from setting the left-hand side to zero,
and keeping only those solutions for which the Jacobian
matrix J of Eqn. (1) has eigenvalues whose real parts are
all negative.
The variances of the marginal distributions are the di-
agonal entries in the covariance matrix Ξ, which under
the LNA satisfies a Lyapunov equation,
JΞ + ΞJT+ D = 0,
(3)
where D is a diagonal matrix with, for the system in Eqn.
(1), the jth entry equal to αj(Xπj/sπj) + RjXj. Eqn.
Page 3
3
(3) is solved using a standard Lyapunov solver (MAT-
LAB’s lyap). Since the steady-state distributions are
Gaussian under the LNA, the solution is fully specified
by the means and the variances. The distributions of par-
ticular functional importance are P(G|i), the probability
of expressing G reporter proteins per cell given that the
chemical inducers are in state i.
To address the second consequence, that biologically
relevant solutions often optimize information flow in the
presence of intrinsic noise [20, 21], as in previous work
[5] we allow the system to set parameters that maximize
the mutual information I [29] between input state i and
output expression G, where
?
1
|i|
i=1
Here I is measured in bits, and the second step uses
P(i,G) = P(G|i)P(i), P(G) =?
(i.e., P(i) = 1/|i|, where |i| = 8 is the number of input
states) to write I entirely in terms of the model solutions
P(G|i).
Two computationally trivial ways for the system to
maximize I are (a) to use an unbounded number of re-
porter proteins G to encode the signal, and (b) to set
degradation rates such that G responds on a timescale
much longer than that of the upstream genes (called a
“stiff” system), which has the effect of averaging out the
upstream noise. In contrast, in cells, protein production
requires energy, which sets a limit on the number of pro-
teins that a cell can produce, and most protein degra-
dation rates are comparable. Therefore we seek model
parameters?θ∗that optimize a constrained objective func-
tion,
?I − λ?Xj? − γ?Rπj?/RG
where the constants λ and γ are a metabolic cost and a
constraint against stiffness, respectively, the average ?Xj?
is taken over all genes, and the average ?Rπj? is taken over
upstream genes A, B, and C. Optimization is performed
using a simplex algorithm (MATLAB’s fminsearch) in
a 15-dimensional parameter space, as
I =
?
i
dGP(i,G)log2
P(i,G)
P(i)P(G)
(4)
=
|i|
?
?
dGp(G|i)log2
|i|P(G|i)
?|i|
i?P(i?,G), and an as-
i?=1P(G|i?)
.
(5)
sertion that each input state occurs with equal likelihood
?θ∗= argmax?θ
?,
(6)
?θ = {a0,aA,aB,aC,aG,KA,KB,KC,KG,
RA,RB,RC,sA,sB,sC}
(RGwas fixed at 4×10−4s−1to set a biologically realistic
degradation rate scale).
Varying initial parameters yields many local optima?θ∗
at which the input signal may be encoded differently in
the output distributions P(G|i). For example, two opti-
mally informative solutions are shown in Figure 1B for
(7)
the network in Figure 1A. Intuitively, maximizing mu-
tual information has resulted in sets of distributions that
are well separated, such that knowledge of the output
G would leave little ambiguity about the original input
state i. We point out, however, that the ordering of the
output distributions is different between the two solu-
tions, meaning that the network is performing two differ-
ent functions at two different points in parameter space.
The relationship between diversity of such functions and
exploration of parameters is crucial to the discussion of
evolvability; in the next section we develop a quantitative
measure of evolvability in the context of this system.
Quantifying evolvability
As seen in several experimental and numerical stud-
ies [4, 5, 6, 7], and in data from the model described
above, a single regulatory network can perform different
functions simply by varying its biochemical parameters.
Intuitively, a network should be deemed more evolvable
if it is able to access a richer diversity of its functions
with smaller changes in its parameters. Quantification of
this concept requires definitions of both parametric and
functional change.
As in Barkai et al. [30], we characterize the magnitude
of the parametric change in going from one model solu-
tion to another by calculating fold-changes in the model
parameters. Specifically, we define a parameter distance
∆θ between two solutions as the Euclidean distance in
the logs of the parameters,
?
∆θ = |log2?θ∗
1− log2?θ∗
2| =
?
?
?
|k|
?
k=1
?
log2(θ∗
1,k/θ∗
2,k)
?2
, (8)
where |k| = 15 is the number of parameters. Under this
definition, equal fold-changes in each parameter consti-
tute equal contributions to ∆θ (for scale, the doubling of
one parameter corresponds to ∆θ = 1).
As in previous work [13] and in the original exper-
iment of Guet et al. [4], we define the function of a
network analogously to logic in electrical circuits (AND,
OR, XOR, etc.), in which the function is determined by
the magnitude of the output’s response to each input
state (for example, with two inputs, AND would be de-
fined by a “high” output in response to the [++] state,
and a “low” output in response to the [−−], [−+], and
[+−] states). Since, in our setup, optimizing information
produces well-separated output distributions P(G|i) (see
Figure 1B), we extend this idea beyond a simple “high”
or “low” output classification, and characterize function
by the order of the P(G|i). Specifically, we record a vec-
tor ? r of ranks of the P(G|i); for example, in the top
panel of Figure 1B, the first output distribution (i = 1)
is ranked 4th, the second (i = 2) is ranked 7th, the third
(i = 3) is ranked 1st, and so on, so the rank vector is
Page 4
4
? r = (4,7,1,...). We then define the function distance
∆f between two solutions in terms of the vector distance
between their rank vectors,
∆f =1
2|? r1−? r2|2=1
2
|i|
?
i=1
(r1,i− r2,i)2
(9)
(for scale, the swapping of two adjacent output distribu-
tions corresponds to ∆f = 1) [40]. Other function dis-
tances, including other permutation distances between
the rank vectors, and a continuous distance measure de-
fined by averaging the Jensen-Shannon divergence [31]
between corresponding output distributions in the solu-
tion pair, produced similar results, as discussed in the
Results section.
It is now clear that, if a network is better able to ex-
plore its function set with smaller changes in its parame-
ters (i.e., is more evolvable by our definition), then it will
exhibit less correlation between ∆f and ∆θ than other
networks. Therefore we define an evolvability score E for
a given network as a measure of anti-correlation between
∆f and ∆θ, calculated for every pair of its optimal model
solutions [41]. Specifically,
E = 1 − (τ + 1)/2,
(10)
where τ is Kendall’s tau [32], a nonparametric measure of
correlation between all pairwise ∆f and ∆θ; we rescale τ
such that 0 < E < 1 and take its complement to obtain
an anti-correlation. Using a nonparametric correlation
statistic has the advantage that our evolvability measure
remains invariant upon any monotonic rescaling in the
definitions of either ∆θ or ∆f. Additionally, we note
that E can be thought of as the probability that a pair of
solutions drawn at random have a larger ∆f than another
pair given that the first pair had a smaller ∆θ, or as the
fraction of discordant pairs of (∆θ,∆f) data points [42].
Function distance ∆f vs. parameter distance ∆θ for all
pairs of model solutions is plotted in Figure 1C for the
example network in Figure 1A. The evolvability score
calculated from these data is E = 0.482 which, since
there is little correlation (or anti-correlation) between ∆f
and ∆θ in this case, is near the middle value E = 0.5.
We obtain a fairer estimate of E and an estimate of its
error by subsampling. Specifically, in the spirit of Strong
et al. [33], we compute the mean¯E and standard error
δE in E values calculated on randomly drawn subsets
of a given size n (from the full data set of size N). We
then repeat for various n, plot¯E ± δE vs. N/n, and
fit with a line (all plots generated were roughly linear).
The value and uncertainty of the N/n = 0 intercept give
an estimate of E, extrapolated to infinite data, and a
measure of sampling error, respectively. The sampling
error estimated in this way for the data in Figure 1C is
0.001.
RESULTS
All networks studied are evolvable
Using the methods described above, between 200 and
500 optimally informative model solutions were obtained,
and an evolvability score E was calculated for each of the
24 networks shown on the horizontal axis of Fig. 2. The
constraints were set to λ = 0.01 or 0.005, for an average
protein count of ∼100 − 200, and γ = 0.001, allowing a
maximum of about 3 orders of magnitude between up-
stream and reporter degradation rates. Solutions with
mutual information values below I = 2 bits were dis-
carded as not transmitting high enough information (for
scale, a solution with perfectly overlapping output dis-
tributions would have I = 0 bits, and a solution with 8
perfectly non-overlapping output states would have I = 3
bits).
Networks’ evolvability scores are shown in Fig. 2. All
24 networks had E values within 5% of 0.5 (recall that
E is bounded by 0 ≤ E ≤ 1), which means that, in all
cases, there is little correlation between change in func-
tion and change in parameters, suggesting that all net-
works studied are evolvable. Using other function dis-
tances, including other permutation distances between
the rank vectors, and a continuous distance measure de-
fined by averaging the Jensen-Shannon divergence [31]
between corresponding output distributions in the solu-
tion pair, produced similar results: E scores were very
near 0.5, indicating little correlation between functional
and parametric distances.
The claim that function has little dependence on pa-
rameters can be tested more rigorously by comparison
with a null hypothesis. The null hypothesis that func-
tion is independent of parameters was implemented in
two ways. First, given each network’s solution set, loca-
tions of solutions in parameter space were kept the same,
but the functions associated with each solution were ran-
domly permuted. Second, locations of solutions in pa-
rameter space were again kept the same, but functions
were drawn randomly from the set of possible functions
for each network [43]. In each case, the function reassign-
ment was performed many times, and the E value was
computed each time to produce a distribution of null E
scores. There was no correlation between the means or
variances of the networks’ null distributions and their ac-
tual E scores, so the individual null distributions were av-
eraged across networks. Averaged null distributions from
each of the two implementations are qualitatively similar,
and both are shown in Figure 2. All networks’ actual E
values lie well within both null distributions (the small-
est p-value is 0.023, and, with 24 networks, we expect at
least one to attain a p-value lower than 1/24 = 0.04 sim-
ply by chance). This means that none of the networks’
solution sets significantly differ from a set in which the
Page 5
5
ABCG
sA
sB
sC
A
Presence/
absence
of inducers
sA
sB
−
−
−
−
+
−
+
−
++
++
Input
state
i
1
2
3
4
5
6
7
8
sC
−
+
−
+
−
+
−
+
−
−
+
+
1
0100200300 400500
0
0.05
0.1
i = 1
i = 2
i = 4
i = 5
i = 6
i = 7
i = 8
i = 3
ri = (4, 7, 1, 3, 6, 8, 2, 5)
B
0 100
reporter expression, G (proteins/cell)
200300400500
0
0.05
0.1
i = 1
i = 2
i = 4
i = 5
i = 6
i = 7
i = 8
i = 3
conditional distribution, P(G|i)
ri = (2, 5, 1, 4, 7, 8, 3, 6)
02040 6080100120140160
0
5
10
15
20
parameter distance, !"
function distance, !f
C
FIG. 1: Defining evolvability. A Top: a sample regulatory network (see Figure 2 for diagrams of all 24 networks studied). A,
B, and C are genes whose transcription factors regulate each other’s expression according to the given network topology, and G
is a “reporter” gene, such as GFP. Sharp arrows indicate up-regulation, while blunt arrows indicate down-regulation (all arrows
are blunt in this network). sA, sB, and sC are chemical inducers that reduce the efficacy of the corresponding transcription
factors. Bottom: Table showing the 8 input states i that are defined by the presence or absence of each chemical inducer in
the cell (+ indicates presence, − indicates absence). In the model, the sA, sB, and sC are scale factors that are free parameters
(greater than 1, to effect an interference with transcriptional regulation) if the inducer is present, and are set to 1 if the inducer
is absent. B: Two maximally informative functions performed by the sample network at two different parameter settings.
Function is characterized by the order of the output distributions P(G|i), the probability of expressing G proteins per cell given
that the system is in input state i. Specifically, the function is quantified by the vector ? r of ranks of the P(G|i), as shown for
each function in the upper right corner. For example, in the top function, the first output distribution (i = 1) is ranked 4th,
the second (i = 2) is ranked 7th, the third (i = 3) is ranked 1st, and so on, so the rank vector is ? r = (4,7,1,...). C: Plot
of function distance ∆f [Eqn. (9)] vs. parameter distance ∆θ [Eqn. (8)] for all pairs of maximally informative model solutions
(340 solutions were used for this network). Function distance is scaled such that the swapping of two adjacent output states
from solution one to solution two gives ∆f = 1, and parameter distance is scaled such that the doubling of one parameter from
solution one to solution two gives ∆θ = 1. The point corresponding to the pair of functions in B is circled. The evolvability
score for this network, calculated from these data via Eqn. (10), is E = 0.482 ± 0.001.
function performed is independent of the setting of the
parameters.
Even though all E values lie within the null distribu-
tion, only two lie above the null mean of 0.5; the prob-
ability of this happening by chance is 2 × 10−5. For a
network with E much larger than 0.5, the parameter
and the functional distances would be anti-correlated,
and the network function would evolve dramatically with
very small parameter changes. Thus the vast majority of
the networks studied show a statistically significant, yet
unintuitively small, positive correlation among the func-
tional and the parametric distance.
Despite the fact that the E values lie in a narrow range,
sampling errors are small (see Fig. 2), meaning that the
networks can be ranked with some confidence accord-
ing to their evolvability. We asked statistically whether
this ranking was correlated with any topological features
of the network, including the sign of the regulation of
each gene, the length and net sign of the feedback cy-
cle, and the total number of activators and repressors
in the network, both in and out of the cycle. Correla-
tion was tested for features with categorical values using
a Wilcoxon rank-sum test [34, 35] (for two categories)
or a Kruskal-Wallis H-test [36] (for more than two cate-
gories), and for features with real values using Kendall’s τ
[32]. No topological feature significantly correlated with
E. The lowest p-value was 0.04, and, since many correla-
tions were tested for at once, a Bonferoni correction [37]
showed that the likelihood of obtaining a p-value this low
simply by chance was 0.33. Thus we identified no topo-
logical aspect that significantly imparted higher or lower
evolvability to the networks.
Changing functions without losing functionality
As described in the previous section, we have found
that the networks studied organize their optimally infor-
mative solutions in parameter space in such a way that
change in function is largely independent of change in
parameters. We further demonstrate here that the net-
works can change from one function to another in param-
eter space without significant loss of the input-output in-
formation along the way. This further underscores the
evolvability of these networks, since it shows that ran-
dom steps in parameter space not only explore the full
variety of a network’s functions, but do so without sig-
nificant loss of fidelity. In the context of electric logical
circuits, such evolvability would correspond to an ability
to continuously change a logic gate from performing one
logical function to another while remaining a functional
gate in the interim.
Page 6
6
0.47
0.48
0.49
0.5
0.51
0.52
0.53
evolvability score, E
050
null p(E)
FIG. 2: Left: Evolvability scores E for all 24 regulatory net-
works studied. Networks are shown along the horizontal axis,
ranked by E (sharp arrows denote up-regulation, and blunt
arrows denote down-regulation). E values are calculated via
Eqn. (10), with error bars showing the sampling error, calcu-
lated as described in the text. Right: Two null distributions
generated according to the null hypothesis that the function
distance is independent of the parameter distance. The solid
line is the distribution of E scores calculated from solution
sets in which the locations in parameter space were held fixed,
and the function assignments were randomly permuted. The
dotted line is the distribution of E scores calculated from so-
lution sets in which the locations in parameter space were
held fixed, and the function assignments were drawn ran-
domly from the set of possible functions for the given network.
Both distributions are averages over the individual distribu-
tions for each network, as there was no correlation between
the means or variances of the individual distributions and the
networks’ E scores.
For each network, mutual information I [Eqn. (5)] was
calculated along straight-line paths in parameter space
between all solutions pairs within a randomly chosen sub-
set of its optimally informative solutions. Examples of
these paths are shown in Figure 3A, for 10 solutions from
the inset network. The solutions at either end are local
maxima in I, and the paths show the loss in information
capacity the network would suffer if it were to move from
one solution to the other along a straight line in parame-
ter space. Some information loss is unavoidable: chang-
ing function requires reordering the output distributions
(see Figure 1B), which means overlapping at least two
of them at a time, and with 8 distributions the shift of
two distributions from fully separated to fully overlapped
incurs a minimum loss of at 0.25 bits. Seven of the 10
functions corresponding to the 10 solutions in Figure 3A
are unique; at least 91% of the plotted paths involve a
change in function.
Nonetheless, we find that the loss in information suf-
fered in going between optimal solutions is surprisingly
minimal. The right panel of Figure 3A shows the dis-
tribution of minimal mutual information values I0along
the paths for the inset network, and Figure 3B shows the
means and the standard deviations of I0 distributions
for all networks. For only a few networks do a significant
portion of the paths drop below 1.5 bits, and almost no
paths drop below 1 bit. We note in passing that the net-
works in Figure 3B are shown as in Figure 2, i.e. ranked
by evolvability score E, and so Figure 3B also demon-
strates that there is no significant correlation between I0
and E.
We emphasize that Figure 3B represents a lower bound
on minimum mutual information encountered in transi-
tioning between solutions. It is by no means necessary
(and is most likely biologically unrealistic) for a func-
tional change to proceed via such uniform changes in
biochemical parameters. It is more likely that there exist
transition paths that are more optimal than the straight-
line paths, and that the most optimal I0distributions are
actually shifted higher in information than those gener-
ated here. Thus it is quite nontrivial (and it is further
testament to their evolvability) that even along direct
paths between optimal solutions these networks in most
cases do not drop below 1.5 bits of processing ability,
considering that the solutions themselves operate in the
range of ∼2 − 2.8 bits. A network can be evolving and
functional at the same time.
DISCUSSION
We have quantified the concept of evolvability in the
context of regulatory networks by introducing an in-
terpretable measure, and by probing the space of the
networks’ most informative functions. Our measure is
an anti-correlation between the amount of functional
change experienced by a network and the parametric
change required to effect it, such that more evolvable net-
works explore more diverse functions with smaller varia-
tion in their biochemical parameters. We have fully de-
fined functional and parametric distances (as well as the
characterization of ‘function’ itself) in the context of a
stochastic description of the experimental setup of Guet
et al. [4], and we have chosen a correlation measure that
is invariant to monotonic transformations in either defi-
nition.
We have found that all networks studied share the
property that functional change is largely independent of
parametric change, meaning that they are highly evolv-
able by our measure. This property holds for several dif-
ferent definitions of function distance. This means that
high-information functions are not organized in param-
eter space in such a way that similar functions are near
each other; instead nearby solutions are approximately as
likely to be similar in function as they are to be different
in function.
Furthermore, we have found that all networks studied
can transition among their maximally informative func-
Page 7
7
0
0.5
1
1.5
2
2.5
3
normalized distance along
straight line in parameter space
mutual information, I (bits)
!1
!2
A
I0
012
p(I0)
0
0.5
1
1.5
2
2.5
3
minimum mutual
information, I0 (bits)
B
FIG. 3: Changing function without losing information. A
Left: Mutual information I along straight-line paths in pa-
rameter space between pairs of 10 randomly chosen optimally
informative model solutions for a particular network (inset).
For each path, the starting and ending solution’s locations in
parameter space are denoted?θ1 and?θ2 respectively on the
horizontal axis. The minimum mutual information I0 along
each path is marked with a triangle. A specific function is
performed at each of the 10 solutions (as characterized in
Methods); 7 of the 10 functions are unique. Right: Distribu-
tion of I0values built from paths between 37 randomly chosen
solutions for the inset network, of which the 10 solutions used
for the left plot are a subset. B: Means (circles) and stan-
dard deviations (error bars) of I0 distributions like that in A
(right), for all networks studied; 37 randomly chosen solutions
were used to build each network’s distribution. Networks are
shown on the horizontal axis, in the same order as in Figure
2, i.e. ranked by evolvability score E.
tions without significant loss of information in the pro-
cess. Along straight-line paths in parameter space be-
tween functions (with mutual information values in the
range ∼2 − 2.8 bits), mutual information remains above
∼2 bits on average and very rarely drops below 1 bit.
Moreover, these values represents a lower bound, since
transition paths need not be straight. This suggests that
the networks can evolve without losing functionality in
the process, which resonates with the idea from evolu-
tionary biology that evolution happens not by crossing
high fitness barriers (low-information solutions in our
case), but by finding neutral paths [38].
Ultimately we have uncovered two important proper-
ties of the regulatory networks described by our model:
(a) high-information solutions do not cluster by function,
and (b) transitions among solutions are possible without
significant loss of fidelity. Both of these properties under-
score the high evolvability of the networks studied. It is
possible that these properties are general characteristics
of a class of systems extending beyond small transcrip-
tional regulatory networks, particularly systems governed
by a large number of tunable parameters. However, we
argue that these properties are especially relevant here,
as they are critical to a quantitative description of the
capacity of biological networks to evolve.
We are grateful to the organizers, participants, and
sponsors of The Second q-bio Conference in Santa Fe,
New Mexico, where a preliminary version of this work
was presented. AM was supported by NSF Grant DGE-
0742450.IN was supported by DOE under Contract
No. DE-AC52-06NA25396 and by NSF Grant No. ECS-
0425850.
∗Electronic address: ajm2121@columbia.edu
†Electronic address: ez87@columbia.edu
‡Electronic address: nemenman@lanl.gov
§Electronic address: chris.wiggins@columbia.edu
[1] S. S. Shen-Orr, R. Milo, S. Mangan, and U. Alon, Nat
Genet 31, 64 (2002).
[2] S. Mangan and U. Alon, Proc Natl Acad Sci USA 100,
11980 (2003).
[3] M. Kollmann, L. Løvdok, K. Bartholom´ e, J. Timmer,
and V. Sourjik, Nature 438, 504 (2005).
[4] C. C. Guet, M. B. Elowitz, W. Hsing, and S. Leibler,
Science 296, 1466 (2002).
[5] E. Ziv, I. Nemenman, and C. H. Wiggins, PLoS ONE 2,
e1077 (2007).
[6] M. E. Wall, M. J. Dunlop, and W. S. Hlavacek, J Mol
Biol 349, 501 (2005).
[7] C. A. Voigt, D. M. Wolf, and A. P. Arkin, Genetics 169,
1187 (2005).
[8] M. Ptashne and A. Gann, Curr Biol 8, R897 (1998).
[9] H. Kitano, Nat Rev Genet 5, 826 (2004).
[10] N. E. Buchler, U. Gerland, and T. Hwa, Proc Natl Acad
Sci USA 100, 5136 (2003).
[11] S. Braunewell and S. Bornholdt, Phys. Rev. E 77, 60902
(2008).
[12] N. Kashtan and U. Alon, Proc Natl Acad Sci USA 102,
13773 (2005).
[13] A. Mugler, E. Ziv, I. Nemenman, and C. Wiggins, IET
Systems Biol 2, 313 (2008).
[14] M. B. Elowitz, A. J. Levine, E. D. Siggia, and P. S. Swain,
Science 297, 1183 (2002).
[15] M. Thattai and A. van Oudenaarden, Proc Natl Acad
Sci USA 98, 8614 (2001).
Page 8
8
[16] M. Acar, A. Becskei, and A. V. Oudenaarden, Nature
435, 228 (2005).
[17] J. M. Pedraza and A. V. Oudenaarden, Science 307, 1965
(2005).
[18] V. Shahrezaei and P. S. Swain, Proc Natl Acad Sci USA
(2008).
[19] J. E. Hornos, D. Schultz, G. C. Innocentini, J. Wang,
A. M. Walczak, J. N. Onuchic, and P. G. Wolynes, Phys.
Rev. E 72, 51907 (2005).
[20] G. Tkacik, C. G. Callan, and W. Bialek, Proc Natl Acad
Sci USA 105, 12265 (2008).
[21] T. Doan, A. Mendez, P. B. Detwiler, J. Chen, and
F. Rieke, Science 313, 530 (2006).
[22] N. G. van Kampen, Stochastic processes in physics and
chemistry (Amsterdam: North-Holland, 1992).
[23] D. T. Gillespie, J Phys Chem 81, 2340 (1977).
[24] J. Paulsson, Nature 427, 415 (2004).
[25] J. Elf and M. Ehrenberg, Genome Res 13, 2475 (2003).
[26] M. B. Elowitz and S. Leibler, Nature 403, 335 (2000).
[27] T. S. Gardner, C. R. Cantor, and J. J. Collins, Nature
403, 339 (2000).
[28] J. Hasty, D. McMillen, F. Isaacs, and J. J. Collins, Nat
Rev Genet 2, 268 (2001).
[29] C. E. Shannon, Proc IRE 37, 10 (1949).
[30] N. Barkai and S. Leibler, Nature 387, 913 (1997).
[31] J. Lin, Information Theory, IEEE Transactions on 37,
145 (1991).
[32] M. G. Kendall, Biometrika 30, 81 (1938).
[33] S. P. Strong, R. Koberle, R. R. de Ruyter van Steveninck,
and W. Bialek, Phys Rev Lett 80, 197 (1998).
[34] H. B. Mann and D. R. Whitney, Ann Math Stat 18, 50
(1947).
[35] F. Wilcoxon, Biometrics Bull 1, 80 (1945).
[36] W. H. Kruskal and W. A. Wallis, J Am Stat Assoc 47,
583 (1952).
[37] N. Salkind, Encyclopedia of Measurement and Statistics
(Thousand Oaks, CA: Sage, 2007).
[38] E. van Nimwegen and J. P. Crutchfield, Bull Math Biol
62, 799 (2000).
[39] M. G. Kendall, Rank Correlation Methods (Charles Grif-
fin, 1990).
[40] In networks in which the overall sign of the feedback cy-
cle is negative, there can exist parameter values that sup-
port multiple stable fixed points. This would correspond
to one or more of the output distributions being multi-
modal. Since we effectively minimize overlap of output
states by optimizing information transmission, such so-
lutions are rare (13% occurrence in all negative-feedback
networks). When they do occur, we equally weight each
fixed point in constructing the multimodal Gaussian out-
put, and continue to define ? r by the ranks of the means
of the output distributions.
[41] If two solutions from the same local information max-
imum are treated as distinct, they will have the same
function but (slightly) different parameters; this will ar-
tificially lower E. To correct for this effect, we merge (at
their mean parameter location) nearest neighbors whose
functions are the same until all nearest neighbors have
different functions. This procedure reduced networks’ so-
lution sets by at most ∼10%.
[42] Many sources (including MATLAB’s built-in corr) use
an adjustment to the calculation of τ in the case of tied
data (see e.g. [39]). In keeping with the interpretation
of our statistic as a probability, we do not introduce an
adjustment; we simply count each tied pair as neither
concordant nor discordant (i.e. if, for example, in com-
puting the fraction of concordant pairs, we assigned each
concordant pair a 1 and each discordant pair a 0, a tied
pair would count as 0.5).
[43] Not all 8! rankings of the output distributions are allowed
functions for a given network. As shown in previous work
[13], the topology of the network constrains the set of pos-
sible steady-state functions. Specifically, since each gene
is regulated by one other gene, allowed functions are “di-
rect” functions: those in which the output distribution
responds to a change in inducer concentration according
to the direct path from inducer to reporter (i.e., ignor-
ing feedback pathways). For example, for the network
in Fig. 1A, in going from state [− − −] (i=1) to [− + −]
(i=3), sB increases; the direct path from sB to G consists
of a repression–repression–repression chain, which is net
repressive, so the output distribution must decrease (as
it does in both panels of Fig. 1B). With 3 inducers, there
are 48 direct functions for each network; this is the set
from which functions are randomly drawn in the second
implementation of the null hypothesis.
Download full-text