ArticlePDF Available

Abstract and Figures

Several authors have argued that causes differ in the degree to which they are ‘specific’ to their effects. Woodward has used this idea to enrich his influential interventionist theory of causal explanation. Here we propose a way to measure causal specificity using tools from information theory. We show that the specificity of a causal variable is not well-defined without a probability distribution over the states of that variable. We demonstrate the tractability and interest of our proposed measure by measuring the specificity of coding DNA and other factors in a simple model of the production of mRNA.
Content may be subject to copyright.
Measuring Causal Specificity
Paul E. Griffiths1, Arnaud Pocheville1, Brett Calcott2, Karola Stotz3,
Hyunju Kim4and Rob Knight 5
1Department of Philosophy and Charles Perkins Centre, University of Sydney, NSW
2006, Australia
2ASU/SFI Center for Complex Biosocial Systems School of Life Sciences, Arizona State
University, Tempe, AZ 85287, USA
3Department of Philosophy, Macquarie University, NSW 2109, Australia
4Beyond Center for Fundamental Concepts in Science, Arizona State University, Tempe,
AZ 85287, USA
5Howard Hughes Medical Institute, and Departments of Chemistry & Biochemistry and
Computer Science, and BioFrontiers Institute, University of Colorado at Boulder,
Boulder, CO 80309, USA
March 18, 2015
Philosophy of Science, in press
Abstract
Several authors have argued that causes differ in the degree to which they
are ‘specific’ to their effects. Woodward has used this idea to enrich his
influential interventionist theory of causal explanation. Here we propose a
way to measure causal specificity using tools from information theory. We
show that the specificity of a causal variable is not well-defined without a
probability distribution over the states of that variable. We demonstrate the
tractability and interest of our proposed measure by measuring the specificity
of coding DNA and other factors in a simple model of the production of
mRNA.
Acknowledgements
This publication was made possible through the support of a grant from
the Templeton World Charity Foundation. The opinions expressed in this
publication are those of the author(s) and do not necessarily reflect the views
of the Templeton World Charity Foundation. Brett Calcott was supported
by Joshua Epstein’s NIH Director’s Pioneer Award, Number DP1OD003874
from the Office of the Director, National Institutes of Health. The paper is
the result of a workshop held at the University of Colorado, Boulder, CO
with support from Templeton World Charity Foundation. BC, PG, AP and
KS wrote the manuscript, and all authors agreed on the final content. We
would like to thank two anonymous referees for their helpful comments.
2
1 Causal Specificity
Several authors have argued that causes differ in the degree to which they
are ‘specific’ to their effects. The existing literature on causal specificity is
mostly qualitative and recognizes that the idea is not yet adequately precise
(e.g. Waters, 2007; Weber, 2006, 2013; Woodward, 2010). Marcel Weber has
suggested that the next step should be a quantitative measure of specificity
(2006, 606). In this article we examine how to measure specificity using tools
from information theory.
Causal specificity is often introduced by contrasting the tuning dial and
the on/off switch of a radio. Hearing the news is equally dependent on the
dial (or digital tuner) taking the value ‘576’ and on the switch taking the
value ‘ON’. But the dial seems to have a different kind of causal relationship
with the news broadcast than the switch. The switch is a non-specific cause,
whereas the dial (or digital tuner) is a specific cause. The difference has
something to do with the range of alternative effects that can be produced
by manipulating the tuner, as opposed to manipulating the switch.
Another widely discussed example of specific and non-specific causes con-
trasts a coding sequence of DNA with other factors involved in DNA tran-
scription and translation (e.g. Waters, 2007). But this example has to be
carefully tailored to produce the desired intuition about specificity (Griffiths
& Stotz, 2013). In Section 5 we will show that the causal specificity of coding
sequences of DNA differs dramatically in different cases.
Like most of the recent literature, our account of causal specificity makes
use of Woodward’s interventionist theory of causal explanation (Woodward
2003). We will give only the briefest summary of Woodward’s theory here,
since it should be well known to the presumptive audience for this paper and
Woodward has provided a succinct and readily accessible summary online
(Woodward 2012). Woodward construes causation as a relationship between
variables in a scientific representation of a system. There is a causal relation-
3
ship between variables Xand Yif it is possible to manipulate the value of
Yby intervening to change the value of X. ‘Intervention’ here is a technical
notion with various restrictions. For example, changing a third variable Z
that simultaneously changes Xand Ydoes not count as ‘intervening’ on
X. Causal relationships between variables differ in how ‘invariant’ they are.
Invariance is a measure of the range of values of Xand Yacross which the
relationship between Xand Yholds. But even relationships with very small
ranges of invariance are causal relationships.
Both Kenneth Waters (2007) and Woodward (2010) have suggested that
causal specificity is related to ‘causal influence’ (Lewis 2000 and see Sec-
tion 2). A causal variable has ‘influence’ on an effect variable if a range of
values of the cause produces a range of values of the effect, as in the example
of the tuner. However, whilst Lewis proposed that ‘influence’ distinguishes
causes from non-causes, for Woodward it merely marks out causes that are
particularly apt for intervention.
Although Woodward (2010) gives the most complete account of speci-
ficity to date there remains much to be done, as he recognizes. Marcel We-
ber has suggested that causal specificity is merely a variety of Woodward’s
invariance. A variable is a more specific cause of some other variable, Weber
suggests, to the extent that the causal relationship between cause and effect
variables is invariant across the range of values of both variables, and to the
extent that the two variables have large ranges of values (Weber, 2006, 606).
Woodward disagrees, arguing that a causal relationship with these proper-
ties may fail to meet some of the other conditions we discuss below, such as
being a bijective1function from cause variable to effect variable (Woodward,
2010 fn17). An attempt to quantify specificity is one obvious way to move
discussion forward. As we will see below, the points that Weber and Wood-
1A function mapping causes to effects will be injective if no effect has more than one
cause; surjective if every effect has at least one cause; bijective if it is both injective and
surjective every effect has one and only one cause, and vice versa.
4
ward are making become much clearer when expressed using a quantitative
measure.
A skeptical reader may wonder why the apparently elusive notion of
causal specificity deserves such effort. Our motivation is the same as that of
Waters and Weber: clarifying the notion of causal specificity may elucidate
the notion of biological specificity, and facilitate the study of specificity in
actual biological systems. The term ‘specificity’ entered biology in the 1890s
in response to the extraordinary precision of biochemical reactions, such as
the ability to produce an immune response to a single infective agent, or the
ability of an enzyme to interact with just one substrate. By the 1940s biolog-
ical specificity had come to be identified with the precision of stereochemical
relationships between biomolecules. In 1958, however, Francis Crick’s theo-
retical breakthrough in understanding protein synthesis introduced a com-
plementary conception of specificity, sometime referred to as ‘informational
specificity’. Stereochemical specificity results from the unique, complex 3-
dimensional structure of a molecule that allows some molecules but not other
to bind to it and interact. In contrast, informational specificity is produced
by exploiting combinatorial complexity within a linear sequence, which can
be done with a relatively simple and homogenous molecule such as DNA (see
Griffiths & Stotz 2013, Ch3).
The notion of causal specificity in philosophy of science was not intro-
duced with any a priori assumption that it is the same thing as biological
specificity. However, Waters has used the idea of causal specificity to argue
that DNA encodes biological specificity for gene products, unlike other fac-
tors involved in making those products (Waters, 2007). In contrast, Stotz
and Griffiths have used causal specificity to argue that the biological speci-
ficity for a gene product is distributed across several of these factors (Griffiths
& Stotz, 2013; Stotz, 2006).
A merely intuitive approach to causal specificity is unlikely to be helpful
5
in settling disputes like this. In Section 5 we show that a quantitative ap-
proach may allow a more definitive resolution. At the very least, it makes
clear which assumptions are driving the different conclusions reached by the
protagonists.
2 Specificity and Information
Causal specificity has been characterized by Woodward as a property of the
mapping between causes and effects:
My proposal is that, other things being equal, we are inclined to think of C
as having more rather than less influence on E(and as a more rather than
less specific cause of E) to the extent that it is true that:
(INF) There are a number of different possible states of C(c1... cn), a
number of different possible states of E(e1... em) and a mapping Ffrom
Cto Esuch that for many states of Ceach such state has a unique image
under Fin E(that is, Fis a function or close to it, so that the same state of
Cis not associated with different states of E, either on the same or different
occasions), not too many different states of Care mapped onto the same
state of Eand most states of Eare the image under Fof some state of C.
(Woodward, 2010, 305)
We propose to quantify Woodward’s proposal that a cause becomes more
specific as the mapping of cause to effect resembles a bijection.
We start from the simple idea that the more specific the relationship
between a cause variable and an effect variable, the more information we
will have about the effect after we perform an intervention on the cause.
Starting from this idea, we can apply the tools of information theory to
measure some properties of causal mappings that relate values of the cause
to values of the effect. For simplicity, we restrict ourselves to variables that
take nominal values, with no obvious metric relating the diverse values. 2
2Variants of our approach to causal specificity are possible for metric variables. The
analysis of variance, for example, gives measures that are respectively equivalent to en-
tropy, conditional entropy and mutual information. The information theoretic approach
6
One property we can measure in this way is Woodward’s INF. Rather than
describing a relationship as injective or bijective, information theory allows
us to express the tendency towards a bijective relationship as a continuous
variable. Thus, our informational measure of specificity will preserve the
essence of Woodward’s proposal while allowing this desirable flexibility.
We use the term ‘information’ in the classic sense of a reduction of un-
certainty (Shannon & Weaver 1949). In information theory, the uncertainty
about an event can be measured by the entropy of the probability distribu-
tion of events belonging to the same class (see Box 1). Uncertainty about
the outcome of throwing a die is measured by the entropy of the probability
distribution of the six possible outcomes. Maximum entropy occurs when all
six faces of the die have equal probabilities. If the die is loaded, the entropy
is smaller and there is less uncertainty about the outcome, because one side
is more probable than the others.
Applying this framework to a causal relationship allows one to measure
how much knowing the value set by an intervention on a causal variable re-
duces one’s uncertainty about the value of an effect variable. We can measure
this reduction of uncertainty by comparing the entropy of the probability dis-
tribution of the value of the effect before and after knowing the value of the
cause set by an intervention. The more the difference in entropies, the more
our uncertainty has been reduced. The maximum reduction of uncertainty
occurs when we start from complete ignorance (i.e., maximum entropy) and
when, after knowing the value of the cause set by an intervention, we end up
with a completely specified value for the effect (null entropy for instance,
when a die is so heavily loaded that it always comes up 6).
taken here is more general, but the analysis of variance retains more information about
the metric (see Garner & McGill 1956 for a comparison). Information theoretic variants
have also been developed to deal with continuous variables (e.g. Reshef et al. 2011; Ross
2014).
7
Box 1. A primer on information theory
Information theory provides us with tools to measure uncertainty, and to mea-
sure the reduction of that uncertainty. Importantly, for our purposes, it tells
us how information about the value of one variable can reduce the uncertainty
about the value of another, related, variable.
The simplest case occurs when a discrete variable has only two values, which
can then be known by answering a single question (e.g. by yes or no). The
answer is said to convey one unit of information (a bit). If the set of possible
values for the variable now contains 2nequally likely elements, we can remark
that n dichotomous questions (nbits) are needed to determine the actual value
of the variable. The quantity of information contained in knowing the actual
value is thus n= log2(2n). If we adopt a probabilistic framework where each
possible value has equal probability p= 1/2n, we can say that knowing any
actual value of the variable brings log2pbits of information. When the
values are not equiprobable, the average information gained by knowing an
actual value of the variable is measured as an average over the probabilities of
the different values. This quantity is the entropy of the probability distribution
of the variable, defined as:
H(X) =
N
X
i=1
p(xi) log2p(xi)
where xirepresent values of the variable Xand Nis the number of different
values. Entropy measures the uncertainty about the value of the variable and
is always non-negative. Uncertainty is maximised (maximum entropy) when
each value is equiprobable. Departing from uniformity will always make one
(or more) values more probable, and so decrease uncertainty. In a similar way,
increasing the number of possible values will increase uncertainty. All of the
above can be generalized to cases where the number of possible values is not
a power of 2.
If Xand Yare two random variables (with respectively Nand Mdifferent
values, noted xi,yj), we can define the entropy of the couple X,Y:
H(X, Y ) =
N
X
i=1
M
X
j=1
p(xi, yj) log2p(xi, yj)
8
Box 1. (Continued)
This enables us to define the conditional entropy, representing the amount of
uncertainty remaining on Y when we already know X:
H(Y|X) = H(X, Y )H(X)
=
N
X
i=1
p(xi)
M
X
j=1
p(yj|xi) log2p(yj|xi)
In a similar way, the mutual information, that is, the amount of redundant
information present in Xand Yis obtained by:
I(X;Y) = H(X) + H(Y)H(X, Y )
=
N
X
i=1
M
X
j=1
p(xi, yj) log2
p(xi, yj)
p(xi)p(yj)
Mutual information can be thought of as the amount of information that one
variable, X, contains about the other, Y(normalized variants of mutual infor-
mation are available).
Conditional entropy is null, and mutual information is maximal, when Yis
completely determined by X. Note that conditional entropy is generally asym-
metric while mutual information is always symmetric:
H(X|Y)6=H(Y|X)
I(X;Y) = I(Y;X)
The relationships between these three different measures are represented in
figure 1. See Cover and Thomas (2012) for more detail.
Figure 1: Diagram of the relationships between the different infor-
mational measures, entropy H(X), conditional entropy H(X|Y)
and mutual information I(X;Y).
9
These ideas can be illustrated with simple diagrams showing how different
values of a causal variable (C) map to different values of an effect variable
(E). We draw the reader’s attention to the fact that these diagrams are
causal mappings rather than conventional causal graphs. Nodes represent
values of variables, rather than variables, as they would in a causal graph.
Likewise, arrows do not represent causal connections between variables, as
they would in a causal graph. An arrow connecting a value of a cause to
a value of an effect means that interventions which set the cause to that
value will lead to the effect having that value, with some probability. For
instance, the arrow stemming from ciand pointing to ejcorresponds to the
joint event (bci, ej)with probability p(bci, ej). The hat in the formula means
that the value ciis fixed by an ‘atomic’ intervention (see Box 2).
For ease of presentation, we will make some simplifying assumptions:
1. We consider only cases where we start from complete ignorance about
the effect (maximum entropy).
2. We assume that all causal values, arrows, and effect values, are equiprob-
able.
3. We consider only cases relating one cause and one effect, ruling out the
possibility of confounding factors. However, the same measures could
be used in cases with confounding factors, as atomic interventions on
the causal value will break the confounding influence of such factors
on the association between values of the cause and values of the effect.
The simplest case is a bijection, where each value of the cause corresponds
to one value of the effect and vice versa (see figure 2). Here, complete
ignorance (maximum entropy) obtains when each value of the effect has a
10
probability of ½before knowing the value set by the intervention on the cause:
H(E) =
2
X
j=1
p(ej) log2p(ej)
=
2
X
j=1
1
2log21
2= 1 [bit ]
After knowing the value of the cause set by the intervention (say, bc1), the
effect is now fully specified (it is e1with probability 1), and the conditional
entropy is:
HEb
C=
2
X
i=1
p(ˆci)
2
X
j=1
p(ej|ˆci) log2p(ej|ˆci)
=
2
X
1
1
2n1 log2(1) + 0 log2(0) o= 0 [bit ]
The information gained by knowing the cause can be obtained by measur-
ing the difference between the entropy before and the entropy after knowing
the value set for the cause by the intervention. This quantity is the mutual
information between Eand b
C:
IE;b
C=H(E)HEb
C= 1 [bit ]
These three quantities H(E),HEb
C, and IE;b
Ccharacterize in-
teresting properties of the causal mapping above. The entropy, H(E)mea-
sures how large and even the repertoire of possible effects is. It is the amount
of information that can be gained by totally specifying an effect among a set
of possible effects (here, this is one bit). The conditional entropy HEb
C
characterizes the remaining uncertainty about an effect when the value set
for the cause is known, (here it is fully specified, so the uncertainty is 0
bit). Finally, the mutual information IE;b
Cmeasures the extent to which
knowing the value set for the cause specifies the value of the effect (here,
knowing the value of the cause brings 1 bit of information).
Another simple case is where any value of the cause can lead to any value
of the effect (see figure 3). We only present this as a limiting case, because
11
Box 2: Causal modeling
Causal modeling provides us with the tools to track the effects of interventions
on a system. Where statistical modeling would look at statistical associations
between supposed causes and supposed effects, causal modeling introduces the
requirement of intervening on the system to compute the causal effect. More
precisely, consider a causal model consisting of:
1. a set of functional relationships xi=f(pai, ui),i= 1. . .n, where xiis
the value of the variable Xibeing caused by Xi’s parent variables pai,
according to some function f, given some background conditions ui
2. a joint distribution function P(u)on the background factors.
Then the simplest ‘atomic’ intervention consists in forcing Xito take some
value xiirrespective of the value of the parent variables pai, keeping everything
else unchanged. Such an intervention can be written formally with the do()
operator. As Pearl writes: “Formally, this atomic intervention, which we
denote by do(Xi=xi)or do(xi)[or bxi] for short, amounts to removing the
equation xi=f(pai, ui)from the model and substituting Xi=xiin the
remaining equations. The new model when solved for the distribution of Xj,
yields the causal effect of Xion Xj, which is denoted P(xj|bxi).” (Pearl 2009,
70)
The causal effect P(xj|bxi)is to be contrasted with the observational con-
ditional probability P(xj|xi), which can be affected by confounding factors
leading to spurious associations or spurious independence.
Other recent works in mathematics and computer sciences have brought infor-
mation theory together with causal modeling to study information processing
in complex systems (Ay & Polani, 2008; Lizier & Prokopenko, 2010). These
works also builds on Pearl (2009), and are consistent with the work presented
here. However, our approach and measures are significantly different, reflect-
ing the fact that we start from a concern with ‘causal selection’ in a context
of intervention and control. The differences between these approaches will be
explored in a future paper.
See Pearl (2009, esp. chapter 3) for more details.
12
Figure 2: Bijection between causal values and effect values.
Figure 3: Any value of the cause can lead to any value of the effect.
manipulating the value of Cbetween c1and c2would have no effect on the
value of E, and so Cis not a cause of Eon the interventionist account. In
this case, as in the previous case:
H(E) =
2
X
1
1
2log21
2= 1 [bit ]
Because in this case knowing the value set by an intervention on Cgives
no information about the value of E, the conditional entropy HEb
Cis
equal to H(E)(our uncertainty is unchanged):
HE|b
C=
2
X
1
1
2
2
X
1
1
2log21
2= 1 [bit ]
Thus, the information gained by knowing the value set for Cis nil (Cis
entirely non-specific):
IE;b
C=H(E)HE|b
C= 0 [bit ]
Notice that we can approach this null mutual information as a limit of
a genuine cause whose different values make decreasingly small differences
13
Figure 4: A single value of the cause can lead to more than one
value of the effect.
as regards the value of the effect. This implies that specificity and the
interventionist criterion of causation are not fully independent.
These two cases, bijection (figure 2) and exhaustive connection (figure 3)
illustrate limit cases of Woodward’s ‘degree of bijectivity’ of causal mappings.
We can go further by examining two slightly more complicated cases.
The first is where each value of a cause leads to a proper set of values of the
effect (see figure 4). In this case the maximum uncertainty about the effect
is larger:
H(E) =
4
X
1
1
4log21
4= 2 [bits ]
Furthermore, knowing the cause less than fully specifies the effect. As-
suming equiprobability between the two effect values that can be produced
by a single value of the cause, the conditional entropy HEb
Cis:
HEb
C=
2
X
i=1
p(bci)
4
X
j=1
p(ej|bci) log2p(ej|bci)
=
2
X
1
1
2n21
2log21
2+ 2 (0 log2(0)) o= 1 [bit ]
Thus, the information about the effect gained by knowing the cause is:
IE;b
C=H(E)HEb
C= 1 [bit ]
14
Figure 5: Different values of the cause lead to the same outcome.
Notice that knowing the value of the cause provides as much information
about the effect as in figure 2, but because the repertoire of effects is larger,
the remaining uncertainty HEb
C is not null anymore. The repertoire
of effects will be larger if, for instance, we increase the level of detail when
describing effects (compare a game of dice based on odd versus even outcomes
to a game based on the values of the six individual faces).
Let us now consider the symmetric case (see figure 5). As in figure 2 and
figure 3, if we suppose complete ignorance of the effects:
H(E) = 1 [bit ]
Although in figure 5 two values of the cause can lead to the same effect,
knowing the value of the cause fully specifies the value of the effect just as
effectively as it does in figure 2. Thus:
HEb
C= 0 [bit ]
Therefore, the difference in uncertainty about the effect between not
knowing the value of the cause and knowing it is:
IE;b
C= 1 [bit ]
15
Here again, knowing the cause provides as much information about the
effects as in figure 2, but because the repertoire of states of the causal variable
is now larger, some values lead to the same effects (this can happen if we
increase the level of detail in our description of the cause). Notice that this
will not matter if we are interested in controlling the value of the effect:
applying c1or c2will deterministically lead to e1.
Furthermore, we can distinguish between figure 2 and figure 5 if we intro-
duce a fourth quantity, that is, the entropy characterizing the repertoire of
the cause, which in these two cases is the maximum entropy. In figure 2 the
entropy Hb
C= 1 [bit ]whereas in figure 5, Hb
C=P4
11
/4log2(1
/4) =
2 [bits].
Thus, both the conditional entropy HEb
Cand the mutual informa-
tion IE;b
Ccapture aspects of the intuition that causes differ in ‘speci-
ficity’. Because the prior uncertainty H(E)is not constant it depends in
particular on the size of the repertoire of effects both measures are needed
to understand how much a cause specifies an effect (this is given by IE;b
C)
and how much an effect is specified when knowing the value set for the cause
(given by HEb
C).
In the cases considered here, if HEb
C= 0 then manipulating Cpro-
vides complete control over E. This corresponds to Woodward’s observation
(2010, 305) that it is more important that the mapping from Cto Eis a
surjective function than that it is also bijective. Woodward’s notion of a
fine-grained control, however, would be better represented using H(E)and
IE;b
C. That is, fine-grained control requires that the repertoire of effects
is large and that a cause screens off many of them (recall that we are cur-
rently dealing only with nominal variables). In the ideal case, H(E)would
tend toward infinity and IE;b
Cwould tend toward H(E).
16
3 Comparing Two Variables
We now have a proposal for a measure of causal specificity:
SPEC: the specificity of a causal variable is obtained by measur-
ing how much mutual information interventions on that causal
variable carry about the effect variable.
It is important to note that, whilst mutual information is a symmetric
measure: I(X;Y) = I(Y;X), the mutual information between an interven-
tion and its effect is not symmetrical because the fact that interventions on C
change Edoes not imply that interventions on Ewill change C: in general,
I(b
C;E)6=I(b
E;C).
Recall that the aim of producing a measure of causal specificity was to
use it to compare different causes of the same effect. So we need to look at
a case where an effect depends on more than one upstream causal variable,
and compare the mutual information they carry. To do so we will explore
some increasingly complex cases involving gene transcription. In each case
we focus on (messenger) RNA as the effect variable, and look at the relative
specificity of different upstream causal variables.
We begin with a simple case that has already been discussed in the
literature, namely comparing the causal contributions of RNA polymerase
and DNA coding sequences to the structure of a messenger RNA (Waters,
2007). Both are causes of RNA, since manipulating either makes a difference
to the RNA. Polymerase is like the radio on/off button, and the DNA is like
the channel tuner, with a number of settings.3
We can formalise this in the following way (figure 6). There are two causal
variables, DNA and POL, and one effect variable, RNA. Each variable can
3Because we do not impose an order on the values of the DNA variable, it is more
like a digital tuner, to which any combination of digits can be entered, than an analogue
tuning dial.
17
Figure 6: Causal mapping and probability distributions for DNA
and RNA (left) and POL and RNA (right).
take on a number of values. Assume, for now, that there are four possible
DNA sequences (d1,d2,d3,d4), and that the RNA polymerase is either
present or absent. Our effect variable can thus take on five values—four
correspond to the RNA sequences (r1,r2,r3,r4) transcribed from the DNA,
and one is a state we will call r0, that occurs when there is no transcription.
In order to calculate the mutual information, we need to assign each of the
values a probability, and these must sum to 1. We begin by simply assigning
uniform probabilities over the causal variables, DNA and POL. What does
our specificity measure tell us about the two causal variables in this simple
scenario?
When we do the calculation (see Supplementary Online Materials §1),
interventions on either DNA or POL carry the same amount of mutual in-
formation:
I[
DNA;RNA=p[
POL×H[
DNA= 0.5×2 = 1 [bit ]
I[
POL;RNA=H[
POL= 1 [bit ]
They are (given our working assumptions) equally causally specific. That
might seem odd, as the DNA sequences can take on four different values,
and the Polymerase is simply 0present0or 0absent 0. Our measurement seems
18
to be saying there is no difference between on/off switches and tuning knobs.
What has gone wrong?
To understand why this happens, recall that mutual information mea-
sures how much information on average we get by looking at a causal variable.
Notice that the value of
[
DNA is irrelevant if
[
POL =0absent0, and our uni-
form distribution sets the probability of this at 0.5. So half the time, when
we look at the value of
[
DNA, we learn nothing about the system. When
[
POL =0present0, knowing the value of
[
DNA is useful: it delivers 2 bits of
information. In short, half the time,
[
DNA gives us 0 bits of information, and
the other half of the time 2 bits. Hence, 1 bit on average.
What this shows is that our proposed measure for causal specificity is sen-
sitive to the probability distribution of the causal variables. This means that
either our specificity measure is incorrect, or Woodward’s INF (Section 2) is
missing something, because that condition makes no mention of the proba-
bility distributions over the variables. In the next section we will see that
this dilemma corresponds to two different approaches to causal specificity.
4 Specific Actual Difference Making
The suggestion that the actual probability distributions of the causal vari-
ables matters when assessing which causes are significant is an idea we have
heard before. Waters argues that in order to pick out the significant causes,
you need to know the actual difference makers. For example, even when it is
possible to manipulate POL (which identifies it as a potential cause), if there
is no actual difference in POL in a population of cells, as Waters assumes,
then it is not a significant cause. Waters notion of an “actual difference
maker” (Waters, 2007, 567) can be related to our specificity measure.
Waters treats the question of whether a variable exhibits actual variation
as though it were a binary choice, but it makes sense to treat it as continuous.
The ‘actual variation’ is the entropy of the variable.
19
0.0 0.2 0.4 0.6 0.8 1.0
p
(
d
POL
)
0.0
0.5
1.0
1.5
2.0
2.5
Entropy (Bits)
H
(
RNA
)
I
(
RNA
;
d
DNA
)
I
(
RNA
;
d
POL
)
Figure 7: Effects of changing probability of
[
POL =0present0on
several informational measures: the entropy of RNA (the effect),
the mutual information between RNA and
[
DNA, and the mu-
tual information between RNA and intervening on the presence of
polymerase. It can be shown that H(RNA) = IRNA,
[
DNA+
IRNA,
[
POL=p[
POL×H[
DNA+H[
POL(see Supple-
mentary Online Materials §1). The variation in the effect can thus
here be decomposed into the respective contributions of the causes.
20
To show how this idea fits into our specificity measure, consider how the
mutual information (specificity) of each of our two variables
[
DNA and
[
POL
with RNA changes as we vary the probability distribution of
[
POL (which,
in turn varies its entropy). In figure 7, each value on the X axis represents a
different case. These range from cases where the probability of 0present0is 0
(Polymerase is never around) to systems where the probability of 0present0
is 1 (Polymerase is always around). In these extreme cases, the variable
has become a fixed background factor and doesn’t actually vary, and thus
the entropy H[
POLis 0. When the probability of present is 0.5,
[
POL
is maximally variable, and has maximum entropy. The mutual information
between
[
POL and RNA is also maximized at this point. Notice also, that as
we increase p(present)to 1, the mutual information between
[
DNA and RNA
increases. When
[
POL =present all the time, the full 2 bits of information
about RNA can be found in
[
DNA.
Our proposed measure of specificity captures two things: the extent to
which a relationship approaches a bijection (Woodward’s INF) and the de-
gree to which the cause is an actual difference maker (i.e. the cause also
has high entropy). So the mutual information measure appears to capture
the degree to which a cause is a ‘specific actual difference maker’, or SAD
(Waters, 2007).
Within our information theoretic framework there is a clear difference
between the SAD concept and Woodward’s INF. SAD uses the actual prob-
ability distribution over the values of a causal variable in some population.
INF makes no distinction between the states of a causal variable. We will
represent this by supposing that the variable has maximum entropy: all its
states are equiprobable. This makes sense when we recall that for Wood-
ward causal variables are sites of intervention. For an idealised external
agent intervening on the system, the value of a causal variable is whatever
they choose to make it.
21
It is possible to find different scientific contexts in which biologists seem
to approach causal relationships in ways that correspond to SAD and INF
respectively. Waters argues that classical genetics of the Morgan school
was only concerned to characterize causes which actually varied in their
laboratory populations (Waters, 2007). Griffiths and Stotz argue that some
work in behavioral developmental and much work in systems biology sets out
to characterize the effect on the system of forcing all causal variables through
their full range of potential variation (Griffiths & Stotz, 2013, 198-9). This
kind of research, they argue, is done with the aim of discovering new ways to
intervene in complex systems. The information theoretic framework allows
us to distinguish between the specificity of potential (INF) and actual (SAD)
difference-makers.
Our measure of causal specificity sheds light on another issue that we dis-
cussed in our introduction. Weber proposed that the specificity of a causal
relationship is simply the range of values of the variables across which a
causal relationship holds, or what Woodward calls the “range of invariance”
(Woodward 2003, 254). Woodward rejected this idea because a causal rela-
tionship might hold across a large range of invariance but fail to be bijective.
Our information theoretic framework captures both why Weber makes this
suggestion and why Woodward’s additional condition is needed. Weber’s
point corresponds to the fact that mutual information between cause and ef-
fect variables will typically be greater when these variables have more values,
simply because the entropy of both variables is higher. Woodward’s caveat
corresponds to the fact that it will not do to increase the number of values of
a cause variable unless the additional values of the cause map onto distinct
values of the effect. Increasing the entropy of the cause variable will not in-
crease mutual information when no additional entropy in the effect variable
is captured. This is why the mutual information between the variables is
the same in figure 2 and in figure 5. In terms of the diagram in Box 1, such
22
an increase in the size of region H(X)would be confined to the sub-region
H(X|Y)with no increase in sub-region I(X;Y). The same point, of course,
holds mutatis mutandis for the effect variable.
In addition to the SAD and INF conceptions of specificity, there is a third
option corresponding to a suggestion by Weber that causal specificity should
be assessed on the assumption that causal variables are neither restricted
to their actual variation in some population, nor allowed to vary freely, but
instead restricted to their ‘biologically normal’ range of variation: “What we
need is a distinction between relevant and irrelevant counterfactuals, where
relevant counterfactuals are such that they describe biologically normal possi-
ble interventions (Weber, 2013, 7, his italics). We will call this REL. Weber
tells us that a biologically normal intervention must (1) involve a naturally
occurring causal process and (2) not kill the organism. More work is ob-
viously needed to make this idea precise, but we will see in Section 5 that
even in this crude form REL provides a useful framework for modeling actual
cases. At a practical level, we interpret REL as assessing causal specificity
with uniform probability distributions within the range of variation in the
variable that would be produced by known mechanisms acting on relevant
timescales for the causal processes we are trying to model.
5 Distributed Causal Specificity
We have suggested that causal specificity can be measured by the amount
of mutual information between variables representing cause and effect. This
implies that the degree of specificity of a causal relationship depends on the
probability distributions over the two variables, and we have argued that this
relates to Waters’ claim that significant causes are specific actual difference
makers. We have also taken on board Weber’s point that it may be more
interesting to explore, not the strictly actual variation, but the ‘biologically
normal’ variation (REL). In this section we apply our measure to a more
23
complex case than the roles of RNA polymerase and DNA in the production
of RNA, namely the role of splicing factors and DNA in the production of
alternatively spliced mRNA. Importantly, we shall also attempt to fill out
these measures with realistic values.
In contemporary molecular biology the image of the gene as a simple
sequence of coding DNA with an adjacent promoter region is very much a
special case. This image remains important in the practice of annotating
genomes with ‘nominal genes’ regions that resemble reasonably closely the
textbook image (Burian, 2004; Fogle, 2000; Griffiths & Stotz, 2007; Grif-
fiths & Stotz 2013). But a more representative image of the gene, at least
in eukaryotes, is a complex region of DNA whose structure is best under-
stood top-down in light of how that DNA can be used in transcription and
translation to make a range of products. Multiple promoter regions allow
transcripts of different lengths to be produced from a single region. This
and other mechanisms allow the same region to be transcribed with differ-
ent reading frames. mRNA editing allows single bases in a transcript to be
changed before translation. Trans-splicing allows different DNA regions to
contribute components to a single mRNA. Here, however, we will concen-
trate on the most ubiquitous of these mechanisms, alternative cis-splicing, a
process known to occur, for example, in circa 95% of human genes (nominal
genes)4.
Genes are annotated with two kinds of regions, exons and introns. The
typically much larger introns are cut out of the corresponding mRNA and
discarded. In alternative cis-splicing (hereafter just ‘splicing’) there is more
than one way to do this, giving rise to a number of different proteins or
functional RNAs. For simplicity, we will ignore mechanisms such as exon
4For more detail on all these processes, see (Griffiths and Stotz, 2013). It may be useful
to know that the prefix trans- denotes processes involving a different region of the DNA,
whilst the prefix cis- denotes processes involving the same or an immediately adjacent
region.
24
repetition or reversal, and the fact that exon/intron boundaries may vary,
and treat this process as if it were simply a matter of choosing to include or
omit each of a determinate set of exons in the final transcript.
With alternative splicing, the final product is co-determined by the cod-
ing region from which the transcript originates and some combination of
trans-acting factors which bind to the transcript to determine whether cer-
tain exons will be included or excluded. These factors are transcribed from
elsewhere in the genome, and their presence at their site of action requires the
activation of those regions and correct processing, transport and activation
of the product. The entire process thus exemplifies the themes of ‘regulated
recruitment and combinatorial control’ characteristic of much recent work on
the control of genome expression (Griffiths & Stotz 2013; Ptashne & Gann
2002). We will simplify this by representing alternative splicing as a sin-
gle variable each of whose values correspond to a set of trans-acting factors
sufficient to determine a unique splice-variant.
The role of alternative splicing is well known, but recent work on causal
specificity does not treat this issue with much care. Weber states that, “De-
pending on what protein factors are present, a cell can make a considerable
variety of different polypeptides from the same gene. Thus we have some
causal specificity, but it is no match for the extremely high number of differ-
ent protein sequences that may result by substituting nucleic acids” (Weber,
2006 endorsed by Waters, 2007, fn28). Here Weber seems to be making a
problematic comparison of the actual range of splicing variants present in
a single organism with the possible genetic variants that could be produced
by mutation. Recently, Weber has explicitly argued for this comparison, ar-
guing that only ‘biologically normal’ interventions should be considered and
that variation in DNA coding sequences is biologically normal. He concludes
that DNA and RNA deserve a unique status amongst biological causes be-
cause their biologically normal ability to vary in a way that influences the
25
structure of gene products is “vastly higher (i.e., many orders of magnitude)
than that of any other causal variables that bear the relation INF to protein
sequences (e.g., splicing agents)” (Weber, 2013, 31).
We are not convinced that it is a meaningful comparison to take, for ex-
ample, the Drosophila DSCAM gene5with 38,016 splice variants all or most
of which are found in any actual population of flies, and say that alternative
splicing has negligible causal specificity because this number of variants, is
much less than the number of variants possible by mutation of the DSCAM
coding sequence with no limit on the number of mutational steps away from
the actual sequence (Weber, 2013, 19). This seems to be a classic example
of the way in which philosophers are unable to sustain parity of reasoning
(Oyama 2000, 200ff) when thinking about DNA. The principle that only
‘biologically normal’ variation should be counted is rigorously enforced for
non-genetic causes but not for genetic causes. An anonymous reviewer has
pointed out that even when variation in the coding DNA sequence is re-
stricted to a small (and thus ‘biologically normal’) number of mutational
steps, the number of possible variants expands very rapidly because of the
sheer number of nucleotides (about 6000 in DSCAM). Which ranges of varia-
tion in splicing agents and coding sequences it is meaningful to compare will
depend on the biological question being addressed, as we will now discuss.
5In the Drosophila receptor DSCAM (Down Syndrome Cell Adhesion Molecule), 4 of
the 24 exons of the Dscam gene are arranged in large tandem arrays, whose regulation is an
example of mutually exclusive splicing. One block has 2 exons - leading to 1 of 2 alternative
transmembrane segments, the others contain respectively 12, 48 and 33 alternative exons
- leading to 19,008 different ecto-domains. Neuron cells not only differ with respect to
which one of the 38,016 variants (in a genome of about 15,000 genes) it expresses, but in
the exact ratio in which it expresses up to 50 variants at a time. Each block of exons seems
to possess a unique mechanism that ensures that exclusively only one of the alternative
exons is included in the final transcript. For details and references, see Supplementary
Online Materials §3.
26
To make a meaningful comparison between splicing agents and coding
sequences it is also necessary to specify a population of entities across which
they produce variation. Waters (2007) focuses on two examples in which
most of the actual variation is caused by variation in DNA. The first is the
population of phenotypic Drosophila mutants in a classical genetics labora-
tory. The second is the population of RNA transcripts at one point in time
in a bacterial cell in which there is no alternative splicing. Obviously, neither
of these cases is a useful one with which to evaluate the causal specificity of
splicing agents, but they do exemplify two important classes of comparisons
we might make. First, we might compare the variation between individuals
in an evolving population and seek to determine if variation in DNA coding
sequences is the sole or main specific difference maker. Second, we might
consider the transcriptome (population of transcripts) in a single cell, either
at a time or across time, and ask whether variation in DNA coding sequences
is the sole or main specific difference maker between these transcripts. We-
ber also considers examples of these two kinds. However, neither Waters nor
Weber considers a third important case, which is the variation between cells
in an organism, both spatial and temporal. This is the kind of variation
that needs to be explained to understand development, the context in which
controversy over the causal roles of genes and other factors most often arises.
Both actual and relevant (‘biologically normal’) variation in genes or
splicing agents will be different in each of these three cases. In the case of
an evolving population mutation is a biologically normal source of variation,
but without any limit on the number of mutational steps from the current
sequence, let alone variation in genome size or ploidy, the values of the DNA
variable would simply be every possible genome, which would be both un-
manageable and biologically meaningless. It might seem natural to exclude
any other sources of variation on the grounds that they are not heritable,
but a number of evolutionary theorists would hotly dispute this (e.g. Bon-
27
duriansky, 2012; Jablonka & Lamb, 2005; Uller, 2012). Furthermore, the
machinery of splicing also changes over evolutionary time, so in the evolu-
tionary case the ‘biologically normal’ variation in splicing is greater than the
amount of variation observed in any actual population. These are very com-
plex issues, and we cannot undertake the extensive work of establishing the
relevant ranges of variation of genetic and other variables in the evolutionary
case in this paper.
Instead, we will examine the simpler case suggested by Waters, the pop-
ulation of RNA transcripts in a single cell at one time. But while Waters
considers only cells with no splicing, we will consider cells with splicing, so
as to make a comparison possible. For the transcriptome of a single cell at
a time, the relevant values of the DNA variable are the different sequences
that can be transcribed by the polymerase. If we ignore complexities such
as multiple promoters, we can set this equal to the nominal gene count in
the genome, so that realistic figures are available. The values of the DNA
variable will be weighted by the probability of each gene being expressed.
The values of the splicing variable can be set equal to the number of splicing
variants from each gene, weighted by the probability of each splice variant.
We now propose a quantification of the respective causal specificity of the
DNA and splicing variables for this very simple case. To further simplify the
exposition we assume that the polymerase is always present (an assumption
which can be relaxed easily, see Supplementary Online Materials §2). We
focus on the mutual information measure outlined above, but we need to
take a slightly different approach to compare the specificity of splicing with
the specificity of DNA, for we assume that splicing factors are recruited only
after a given strand of DNA has been transcribed. We do this because, in
reality, it is not the case that any set of splicing factors can be combined
with any gene. If we were to model splicing in this way, then the outcome
of most combinations of genes and sets of splicing factors would be that the
28
system fails to produce any biologically meaningful outcome. So it is both
simpler and more biologically realistic to represent the process sequentially,
as the transcription of an mRNA followed by the recruitment of a set of
splicing factors. In other words, the transcription of a given DNA strand
opens a set of possibilities among a proper set of the possible combinations
of splicing factors (figure 11). This entails that the information in splicing
factors, measured by Hb
S, contains all the information in DNA, measured
by Hb
D:6
Hb
D, b
S=Hb
S
Because the entropy in the DNA variable is conserved in the entropy of the
splicing variable, the mutual information between RNA and splicing will
also conserve the mutual information between RNA and DNA. Thus, we will
need a way to decompose our causal specificity measure into two components,
isolating the separate contributions of DNA and splicing.
As mentioned above, we treat the splicing process as if it were simply
a matter of choosing to include or omit each of a determinate set of exons
in the final transcript. Each value of our splicing variable corresponds to
a set of trans-acting factors sufficient to determine a unique splice-variant
of the RNA. In other words, we consider a bijective relationship between
sets of splicing factors (once recruited) and RNA variants. This bijection en-
tails that the mutual information between RNA and interventions on splicing
IR;b
Sis simply equal to the so-called self-information of splicing, Ib
S;b
S,
which is itself equal to the entropy of splicing Hb
S. We can then decom-
pose the entropy of splicing according to well-known chain rules:
IR;b
S=Ib
S;b
S=Hb
S=Hb
D, b
S=Hb
S|b
D+Hb
D
Noting that IR;b
D=Hb
Dwhen the polymerase is always present
6In the following equations, Dand Rare the variables DNA,RNA(see figure 7) and S
is the splicing variable.
29
Figure 8: The simplified relationship between DNA (D), splicing
(S) and RNA (R) variables, assumed in the models in Section 5.
Selection of a value for DNA opens a proper set of possibilities of
Splicing. There is a bijective relationship between Splicing and
RNA.
(see Section 5), and that IR;b
Sb
D=Hb
Sb
D(see Supplementary On-
line Materials §2), we can rewrite the equation as:
IR;b
S=IR;b
Sb
D+IR;b
D
This equation provides a decomposition of the mutual information between
RNA and splicing, IR;b
S, into two components, the mutual information
between RNA and DNA, IR;b
D, and the mutual information between
RNA and splicing conditional on DNA, IR;b
Sb
D. Because IR;b
Sb
D0,
this entails that IR;b
SIR;b
D. If we simply proceed as before, taking
mutual information as a measure of causal specificity, we find that the speci-
ficity of splicing is always greater than or equal to the specificity of DNA. As
we mentioned above, however, we need to account for the fact that all the in-
formation contributed by DNA to RNA is conserved in the splicing variable.
Fortunately, we can decompose the mutual information in splicing to obtain
30
two terms which represent the contribution from the DNA and the contri-
bution from the splicing process. The term Hb
Din the decomposition
of IR;b
Srepresents the amount of information which is preserved in the
splicing process but originates in the DNA. The variation in RNA properly
coming from the splicing process is represented by the term HSb
D a
term that, roughly, reflects the number of splicing variants per DNA strand.
Thus, if one wants to compare the causal specificity of splicing and DNA,
one needs to know which of these two terms, Hb
Dand Hb
Sb
D, makes
the greatest contribution to IR;b
S.
The answer will crucially depend on the biological system. In drosophila,
an important determinant of neuronal diversity is the single Dscam gene with
38,016 splice variants (see Supplementary Online Materials §3). This gives a
maximum entropy of circa log2(38016) = 15.2bits for Hb
Sb
D, compared
with 0bits for Hb
D. The diversity of this class of transcripts in drosophila
is entirely explained by post-transcriptional processing.7
The homologs of this gene in humans, Dscam and Dscam-like present a
very different picture. The number of splicing variants per gene appear to
be no greater than 3. Assuming that the transcription of each of these two
DNA regions is equiprobable, this gives a maximum entropy of circa 1.6bits
for Hb
Sb
D, to be compared with 1 bit for Hb
D. DNA and splicing are
roughly equal determinants of diversity in this class of transcripts.
A more meaningful comparison to the Dscam case in drosophila, however,
may be other classes of vertebrate cell-surface proteins. Generalising from
real cases8we might imagine a class of transcripts that derives from, say 100
7Our decision to use actual figures for genes and isoforms but assume equiprobability
(maximum entropy) for each variable can be justified in this particular case on both the
INF and REL approaches (Section 4). The data required for Waters’ SAD approach are
not available, but there is no reason to suppose it would give qualitatively different results.
8Dscam is homologous between almost all animals, but in vertebrates the two homol-
ogous genes, Dscam and DscamL1, do not encode multiple isoforms. There are however,
several hundred cell adhesion and surface receptor genes in vertebrates: the Ig superfamily,
31
related genes, each of which has 150 splicing variants. Assuming once again
that the transcription of any of these DNA regions is equiprobable, this gives
circa 7.2bits for Hb
Sb
D, to be compared with circa 6.6for Hb
D. Both
DNA and splicing variables are important determinants of diversity in this
class of transcripts.
Assigning specificity to the causes of transcript diversity in a single cell
at a time is relatively tractable. The analyses just given could, in principle,
be extended to the entire transcriptome at one stage in the life-cycle of a
well-studied system such as yeast. But this would be of limited interest.
What is at stake in disagreements over the relative causal roles of coding
regions of DNA and other factors in gene expression would be better rep-
resented by comparing the transcriptome in a cell at different times in its
life-cycle, or comparing transcriptomes between different cell-types in an or-
ganism. These comparisons are both ways of thinking about development
the process by which regulated genome expression produces an organism
and its life-cycle. In comparing the same cell across times, a critical feature
is that which genes are transcribed and how their products are processed de-
pends on transcription and processing at earlier times. For the population of
cells in an organism somatic mutations that could arise during development
become relevant, leading to the need to say something about the number of
mutational steps that counts as a ‘biologically realistic’ intervention on this
variable. We hope to confront these complexities in future work.
as well as integrins, cadherins, and selectins. This genetic diversity is combined with com-
plex regulatory patterns, albeit not on the scale of the Dscam expression in Drosophila.
The three neurexin genes display extensive alternative splicing, a process that can po-
tentially generate thousands of neurexin isoforms alone. For details and references, see
Supplementary Online Materials §3.
32
6 Conclusion
Causal specificity is the label given to an intuitive distinction amongst the
many conditions that are necessary to produce an effect. The specific causes
are those variables that can be used for fine-grained control of an effect
variable. It has been suggested that a specific relationship between two
variables is one that resembles a bijective mapping between the values of the
two variables (Woodward, 2010). The concept of causal specificity can be
clarified considerably by going a step further and attempting to measure it.
Our quantitative measure of specificity starts from the simple idea that
the more specific the relationship between a cause variable and an effect vari-
able, the more information we will have about the effect after we perform
an intervention on the cause. Section 2 used information theoretic measures
to express this idea. We found that if the conditional entropy of the effect
on interventions on the cause HEb
C= 0 then manipulating Cprovides
complete control over E. We argued, however, that the idea of sensitive
manipulation, or fine-grained influence (Woodward, 2010) would be better
represented by measuring the entropy of the effect H(E)and the mutual
information between cause and effect IE;b
C. Fine-grained influence re-
quires both that the repertoire of effects is large and that the state of the
cause contains a great deal of information about the state of the effect. In
the ideal case, H(E)would tend toward infinity and IE;b
Cwould tend
toward H(E).
Section 3 examined the behavior of IE;b
Cas a measure of causal
specificity (SPEC). The behavior of the measure depends on the probability
distributions over the states of the variables, as well as the structure of the
causal graph. Other things being equal, a variable with many states that
are rarely or never occupied is a less specific cause than one equally likely
to be in any of its states, that is, one with higher entropy. Section 4 showed
that this feature is a strength of our proposed measure. It is in line with the
33
qualitative reasoning of Waters (2007), who argues that the property which
justifies singling out one cause as more significant that another can be its
specificity with respect to the actual variation seen in some population and
of Weber (2013) who suggests that we focus on the somewhat wider class of
‘biologically normal’ variation.
The sensitivity of our measure to the underlying probability distribu-
tions contrasts with presentations of causal specificity where it is assumed
that the value can be inferred from the structure of a causal graph. Our at-
tempt to quantify specificity forces this assumption to become explicit. The
least arbitrary way to represent this assumption in our models would seem
to be to make all values of the causal variables equiprobable. Making this
assumption is probably not appropriate for settling the disputes about the
relative significance of various causal factors in biology with which Waters
and Weber are concerned. However, in the broader context of the interven-
tionist account of causation it may be entirely appropriate, because causal
variables are the sites of voluntary intervention by an idealized agent.
Section 5 used our measure to assess the relative specificity of different
causes that contribute to the same effect. The idea of specificity has been
used to argue that DNA sequences are the most significant causes, because
of their supposedly unrivalled degree of specificity. Our discussion revealed
that this is completely premature. First, it is necessary to specify the causal
process in question. The causes of individual differences in an evolving pop-
ulation are quite different from the causes of transcript diversity in a single
cell, and different again from the causes of spatial and temporal diversity
amongst the cells of a single organism. We constructed a simple model with
which we were able to quantify the specificity of a DNA coding sequence
and of splicing factors with respect to transcript diversity in a single cell at
a time. We showed that the relative specificity of these two variables can be
very different for different classes of transcripts. The idea that that DNA
34
obviously has an unrivalled degree of specificity seems to arise because ear-
lier, qualitative discussions implicitly compared the actual variation in the
splicing variable within cells to the possible variation in the DNA variable
on an evolutionary timescale.
While it seems plausible to us that the specificity of coding DNA as a
cause of evolutionary change is very high, we pointed out that proper explo-
ration of this would require serious thought about which range of variation in
the DNA variable can be meaningfully compared with which range of varia-
tion in other cellular mechanisms. Similar work would be needed before our
measure can be applied to what is arguably the most pressing case, namely
the relative specificity of different causes in development. We hope to focus
on this case in future work.
We believe that the work reported here amply demonstrates the philo-
sophical payoff of developing quantitative measures of causal specificity.
However, a great deal remains to be done. First, although our measures
provides information about causal specificity rather than the presence of
causation per se, in future work we hope to provide an information theoretic
statement of the interventionist criterion of causation. Second, our measure
of specificity is only one of several information theoretic measures that can
be used to characterize causal relationships. In future work we hope to ex-
plore the potential of these other measures for the philosophy of causation.
Thirdly, and perhaps most urgently, we gave only minimal attention in this
paper (in Section 4) to the ways in which the relationship between two vari-
ables can be affected by additional variables. In a forthcoming paper we
extend our framework to deal with these interactions.
Supplementary Online Materials can be downloaded from http:
//philsci-archive.pitt.edu/0123456789
35
7 References
Bonduriansky, R. (2012). Rethinking heredity, again. Trends in Ecology and
Evolution, 27(6), 330-336.
Burian, R. M. (2004). Molecular Epigenesis, Molecular Pleiotropy, and
Molecular Gene Definitions. History and Philosophy of the Life Sciences,
26(1), 59-80.
Cover, T. M., & Thomas, J. A. (2012). Elements of Information Theory.
Hoboken, NJ: John Wiley & Sons.
Garner, W. R., & McGill, W. (1956). The relation between information
and variance analyses. Psychometrika, 21 (3), 219-228.
Griffiths, P. E., & Stotz, K. (2013). Genetics and Philosophy: An intro-
duction. New York: Cambridge University Press.
Jablonka, E., & Lamb, M. J. (2005). Evolution in Four Dimensions :
Genetic, Epigenetic, Behavioral, and Symbolic Variation in the History of
Life. Cambridge, Mass: MIT Press.
Lewis, D. K. (2000). Causation as influence. Journal of Philosophy, 97,
182-197.
Oyama, S. (2000). The Ontogeny of Information: Developmental systems
and evolution (Second edition, revised and expanded. ed.). Durham, North
Carolina: Duke University Press.
Pearl, Judea. (2009). Causality Cambridge University Press.
Ptashne, M., & Gann, A. (2002). Genes and Signals. Cold Spring Har-
bor, NY: Cold Spring Harbor Laboratory Press.
Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean,
G., Turnbaugh, P. J., et al. (2011). Detecting Novel Associations in Large
Data Sets. Science, 334 (6062), 1518-1524.
Ross, B. C. (2014). Mutual Information between Discrete and Continu-
ous Data Sets. PLoS ONE, 9 (2), e87357.
Shannon, C. E., & Weaver, W. (1949). The Mathematical Theory of
36
Communication. Urbana, Ill.: Univ. of Illinois Press.
Uller, T. (2012). Parental effects in development and evolution. In N. J.
Royle, P. T. Smiseth & M. Kölliker (Eds.), The Evolution of Parental Care
(pp. 247-266). Oxford: Oxford University Press.
Waters, C. K. (2007). Causes that make a difference. Journal of Philos-
ophy, 104 (11), 551-579.
Weber, M. (2006). The Central Dogma as a thesis of causal specificity.
History and Philosophy of the Life Sciences, 28(4), 595-609.
Weber, M. (2013). Causal Selection versus Causal Parity in Biology:
Relevant Counterfactuals and Biologically Normal Interventions, What If?
On the meaning, relevance and epistemology of counterfactual claims and
thought experiments (pp. 1-44). Konstanz: University of Konstanz.
Woodward, J. (2003). Making Things Happen: A Theory of Causal Ex-
plananation. New York & Oxford: Oxford University Press.
Woodward, J. (2010). Causation in biology: stability, specificity, and the
choice of levels of explanation. Biology & Philosophy, 25 (3), 287-318.
Woodward, J. (2012). Causation and Manipulability. The Stanford
Encyclopedia of Philosophy (Winter 2012 Edition). Retrieved from http:
//plato.stanford.edu/archives/win2012/entries/causation-mani/ .
37
8 Supplementary Online Materials (Online Materi-
als to be posted at http://philsci-archive.pitt.
edu)
8.1 The effect of transcription probability
Here we derive the equations of the curves in Figure 7 (reproduced below
as Figure 10) describing the effect of transcription probability on several
informational measures on RNA, DNA and transcription. For the ease of
presentation, we will ignore splicing.
To ease reading, we will write the variables RNA as R(with values ri),
transcription as T(with values th), and DNA as D(with values dj). Again,
hats on variables mean that their values are fixed by a surgical intervention.
8.1.1 The mutual information between RNA and transcription
We suppose that if there is no transcription (h= 0), there is no RNA
strand produced (i= 0), while if there is transcription (h= 1), there is one
RNA strand produced among npossible variants (i= 1 . . . n). This implies
that once a given value for RNA is obtained (either i= 0, i.e. absence,
or i= 1 . . . n) we also know whether transcription was on or off. In other
words, the joint probability for RNA and transcription is given as follows
(see Figure 9):
p(ri,b
th) =
p(r0),if h= 0 and i= 0.
p(ri),for h= 1 and i= 1,2, . . . , n.
0,otherwise.
(1)
Also, by computing the marginal probability of transcription, p(b
th) =
Pn
i=0 p(ri,b
th), we can obtain that p(b
t0) = p(r0,b
t0)and p(b
t1) = Pn
i=1 p(ri,b
t1).
Therefore,
p(b
t0) = p(r0)and p(b
t1) =
n
X
i=1
p(ri)(2)
38
d1#
d2#
d3#
d4#
r1#
r2#
r3#
r4#
r0#
t1#
t1#
t1#
t1#
t0#
0.125#
0.125#
0.125#
0.125#
0.5#
0.5#
0.5#
0.5#
0.5#
0.5#
0.25#
0.25#
0.25#
0.25#
Figure 9: Diagram showing events with non-null probabilities in
our model of transcription, when splicing is ignored. Transcription
can be either on (h= 1), in which case a DNA strand jwill deter-
ministically lead to a RNA strand j, or off (h= 0), in which case
any DNA strand will lead to a null RNA. (Probabilities assigned to
events are for illustratory purpose only, but notice that p(t0)and
p(t1)sum to 1.)
39
0.0 0.2 0.4 0.6 0.8 1.0
p
(
d
POL
)
0.0
0.5
1.0
1.5
2.0
2.5
Entropy (Bits)
H
(
RNA
)
I
(
RNA
;
d
DNA
)
I
(
RNA
;
d
POL
)
Figure 10: Effects of changing probability of transcription on sev-
eral informational measures: the entropy of RNA (the effect), the
mutual information between RNA and DNA, and the mutual infor-
mation between RNA and the presence of polymerase.
Now, using (1) and (2), we can compute the mutual information between
RNA and transcription.
IR;b
T=
1
X
h=0
n
X
i=0
pri,b
thlog pri,b
th
p(ri)pb
th(3)
=pr0,b
t0log pr0,b
t0
p(r0)pb
t0+
n
X
i=1
pri,b
t1log pri,b
t1
p(ri)pb
t1(4)
=p(r0) log p(r0)
p(r0)pb
t0+
n
X
i=1
p(ri) log p(ri)
p(ri)pb
t1(5)
=p(r0) log 1
pb
t0+
n
X
i=1
p(ri) log 1
pb
t1(6)
=p(r0) log 1
pb
t0+ n
X
i=1
p(ri)!log 1
pb
t1(7)
=pb
t0log 1
pb
t0+pb
t1log 1
pb
t1(8)
=Hb
T(9)
That IR;b
T=Hb
Tsimply reflects that there is a bijection between
having transcription set to on (respectively off ) and obtaining some non-null
40
(respectively null ) RNA. In other words, none of the values for transcription
lead to convergent results: there is no loss of information about transcription
when it occurs (or not).
8.1.2 The mutual information between RNA and DNA
We suppose that if there is transcription (h= 1), a given strand of DNA
(j= 1...n) will deterministically lead to a given strand of RNA (i= 1. . .n).
If there is no transcription (h= 0), any strand of DNA will lead to no RNA
(i= 0) (see Figure 11). In other terms, there is a bijection between DNA
and RNA if and only if transcription is on, otherwise all values of DNA lead
to the same null result. We also suppose that state of the polymerase and
the choice of a DNA strand to transcribe are independent events.
We begin with:
IR;b
D=
n
X
i=0
n
X
j=1
pri,b
djlog
pri,b
dj
p(ri)pb
dj(10)
We will consider now how this measure behaves when we take into ac-
count the probability of transcription.
To simplify writing, we will first notice that many joint events have null
probabilities, which makes them cancel out in the calculus of mutual infor-
mation. These joint events are (ri>0, dj6=i): it is impossible to get another
strand of RNA than the one the DNA strand codes for (whatever the tran-
scription state, see Figure 9).
Thus, without loss of generality, we can write, splitting the cases with
non-null (i > 0) and null (i= 0) RNA:
IR;b
D=
n
X
i=1
pri,b
dilog
pri,b
di
p(ri)pb
di+
n
X
j=1
pr0,b
djlog
pr0,b
dj
p(r0)pb
dj
(11)
Using the diagram in Figure 9, we can easily see the following relation-
41
ships:
(a) p(b
di|ri)=1,if i > 0.
(b) p(b
dj|r0) = p(b
dj),for j= 1, . . . n.
(c) p(ri|b
di) = p(b
t1),if i > 0.
Using these relationships, we can simplify I(R;b
D)as follows:
IR;b
D=
n
X
i=1
pri,b
dilog
pri,b
di
p(ri)pb
di
+
n
X
j=1
pr0,b
djlog
pr0,b
dj
p(r0)pb
dj(12)
=
n
X
i=1
pri,b
dilog
pb
di|ri
pb
di
+
n
X
j=1
pr0,b
djlog
pb
dj|r0
pb
dj(13)
Due to relationships (a) and (b),
IR;b
D=
n
X
i=1
pri,b
dilog 1
pb
di+
n
X
j=1
pr0,b
djlog
pb
dj
pb
dj(14)
=
n
X
i=1
pri,b
dilog 1
pb
di(15)
=
n
X
i=1
pri|b
dipb
dilog 1
pb
di(16)
Due to relationship (c),
IR;b
D=
n
X
i=1
p(b
t1)pb
dilog 1
pb
di(17)
=p(b
t1)H(b
D)(18)
This equation reflects the fact that the informativity of DNA is condi-
tional upon the presence of transcription. If transcription were always on,
42
there would be a bijection between DNA and RNA. However, when the tran-
scription is sometimes off, there is a loss of information between DNA and
the RNA outputs, as several strands of DNA can lead to the same result (no
RNA) when there is no transcription. The information loss is simply this
part of DNA entropy which is not present in the mutual information between
DNA and RNA, that is, Hb
D|R:
Hb
D|R=Hb
DIR;b
D(19)
=1pb
t1Hb
D(20)
8.1.3 The entropy of RNA
Here we derive the entropy of RNA in terms of mutual information between
RNA and DNA and the entropy of transcription. We again split between the
cases where there is transcription (b
t1) or none (b
t0). We again use the fact
that and that p(ri) = pb
dipb
ti. We also remark that Pn
i=1 pb
dipb
t1
sums to pb
t1.
H(R) =
n
X
i=0
p(ri) log p(ri)(21)
=
n
X
i=1
pb
djpb
t1log pb
djpb
t1p(r0) log p(r0)(22)
=p(t1) n
X
i=1
pb
djlog pb
dj+
n
X
i=1
pb
djlog pb
t1!
p(r0) log p(r0)(23)
=p(t1)
n
X
i=1
pb
djlog pb
djpb
t1log pb
t1
pb
t0log pb
t0(24)
We recognize:
H(R) = pb
t1Hb
D+Hb
T(25)
=IR;b
D+IR;b
T(26)
43
8.2 The mutual information between RNA and splicing
8.2.1 When transcription is always on
Here we derive the equations for the mutual information between RNA and
splicing.
For the sake of simplicity, we shall first ignore transcription probability
and assume that pb
t1= 1. This amounts to relaxing the conditionalisation
upon transcription.
In the model considered here the splicing factor variants are recruited
only once a given strand of DNA has been transcribed. In addition, we sup-
pose that the transcription of a given DNA strand opens a set of possibilities
among a proper set of splicing factors (see Figure 11). This entails that the
information in splicing Hb
Scontains all the information in DNA, Hb
D:
Hb
D, b
S=Hb
S(27)
In addition, we consider a bijective relationship between splicing factors
and RNA variants. This bijection entails that the mutual information be-
tween RNA and splicing is equal to the self-information of splicing (that is,
the entropy of splicing). We can then decompose the entropy of splicing
according to well known chain rules:
IR;b
S=Ib
S;b
S(28)
=Hb
S(29)
=Hb
D, b
S(30)
=Hb
Sb
D+Hb
D(31)
From equation (18), we know that Hb
D=IR;b
D, assuming that
transcription always occurs. In addition, the bijection between splicing and
RNA (including the null value) entails that the conditional entropy of splicing
(conditioned on DNA) is the conditional mutual information of splicing and
44
Figure 11: Diagram of our model of splicing, when transcription
is assumed to be on. A DNA strand deterministically leads to a
proper set of splicing factor variants, each of them deterministically
leading to a proper RNA strand.
RNA: Hb
Sb
D=IR;b
Sb
D(as an immediate calculation would show).
We thus can rewrite equation (31) as:
IR;b
S=IR;b
Sb
D+IR;b
D(32)
Readers familiar with information theory will recognize the decomposi-
tion of the mutual information IR;b
S, b
Dwhich happens to be, in this
particular example, equal to IR;b
S. That is, knowing the value of DNA
does not bring us any information as regards RNA in addition to knowing
the value of splicing. Notice equation (32) also provides a decomposition of
the entropy of splicing, that is, H(S) = IR;b
Sin virtue of the bijection
between RNA and splicing.
8.2.2 When transcription can be either on or off
For the sake of completeness, we now give equation (32) in a version taking
into account the probability of transcription. The reasoning is grounded on
45
the hypothesis that a given splicing factor occurs only when there is tran-
scription and a given DNA strand has been choosen. Then, decomposition
of I(R;b
S)gives:
IR;b
S=Ib
S;b
S(33)
=Hb
S(34)
=Hb
Sb
D, b
T+Ib
S;b
D, b
T(35)
=Hb
Sb
D, b
T+Ib
S;b
Db
T+Ib
S;b
T(36)
Again, we take advantage of the bijection between splicing and RNA,
to replace Hb
Sb
D, b
T=IR;b
Sb
D, b
T. We also take advantage of the
fact that there is no interaction information between DNA, RNA, and tran-
scription, that is, IR;b
Db
T=IR;b
D. This can be shown with the
calculation sketched below. We again use relationship (1) to simplify, hence
if i > 0we have p(ri, dj, t1) = p(ri, dj)and p(ri, t1) = p(ri). A similar re-
placement method would hold for i= 0, but we directly simplify this term
as it is null.
IR;b
Db
T=
1
X
h=0
p(b
th)
n
X
j=0
m
X
i=1
pri,b
djb
thlog
pri,b
djb
th
prib
thpb
djb
th
(37)
=p(b
t0)
n
X
j=0
pr0,b
djb
t0log
pr0,b
djb
t0
pr0b
t0pb
djb
t0(38)
+p(b
t1)
n
X
j=0
m
X
i=1
pri,b
djb
t1log
pri,b
djb
t1
prib
t1pb
djb
t1(39)
= 0 +
n
X
j=0
m
X
i=1
pri,b
dj,b
t1log
pri,b
dj,b
t1
p(b
t1)prib
t1pb
dj(40)
=
n
X
j=0
m
X
i=1
pri,b
djlog
pri,b
dj
p(ri)pb
dj(41)
=IR;b
D(42)
46
Injecting these terms in equation (36), we obtain:
IR;b
S=Hb
Sb
D, b
T+Ib
S;b
D+Ib
S;b
T(43)
=Hb
Sb
D, b
T+pb
t1Hb
D+Hb
T(44)
Again, noticing that Hb
Sb
D, b
T=IR;b
Sb
D, b
T, we retrieve an equa-
tion similar to equation (32):
IR;b
S=IR;b
Sb
D, b
T+IR;b
D+IR;b
T(45)
Readers familiar with information theory will recognize the decomposi-
tion of the mutual information IR;b
S, b
D, b
Twhich happens to be, in this
particular example, equal to IR;b
S. That is, knowing the value of DNA
and transcription does not bring us any more information as regards RNA
than just knowing the value of splicing. Notice that similarly to equation (32)
in the case where transcription is always on, equation (45) provides a decom-
position of the entropy of splicing, that is, H(S) = IR;b
Sin virtue of the
bijection between RNA and splicing.
To wrap up, in this model transcription adds variation in the set of
splicing factor variants (the absence of any factor now belongs to the set of
possibilities), which is independent from DNA.
8.3 Alternative splicing in Drosophila Dscam
The Drosophila receptor DSCAM (Down Syndrome Cell Adhesion Molecule),
a member of the immunoglobulin (Ig) superfamily, is a remarkable exam-
ple of homophilic binding specificity that functions in important biological
processes, such as innate immunity and neural wiring. In insects and also
crustaceans (e.g. Daphnia) 4 of the 24 exons of the Dscam gene are ar-
ranged in large tandem arrays, whose regulation is an example of mutually
exclusive splicing. In Drosophila one block has 2 exons - leading to 1 of 2
alternative transmembrane segments, the others contain respectively 12, 48
47
and 33 alternative exons - leading to 19,008 different ecto-domains. Together
they produce, 38,016 alternative protein isoforms, within a genome of 15,016
protein-coding genes [1]. There are several interesting aspects about this
case:
1. For each block of exons there seem to exist a unique mechanism that
ensures that exclusively only one of the alternative axons is included
in the final transcript. Only two of the mechanisms are known in some
detail. Researchers have identified specific cis-acting sequences and
trans-acting splicing factors that tightly regulate splicing of exon 4.2,
but for most others the details are again not yet known [4, 3].
2. It is not only the large number of alternative transcripts that allow
for high diversity of functions, but in addition most alternative exons
are expressed in neurons and found in many combinations. Neurons
express up 50 variants at a time, which makes for an even larger combi-
natorial spectrum of neuron differentiation. This ensures that branches
from different neurons will share, at most, a few isoforms in common.
This diversity of function enables branches of neurons to distinguish
between sister branches and branches of other neurons, and also for
patterning of neural circuits [8].
3. There seem to be distinct ways of regulating isoforms in the two differ-
ent functions. For self-recognition purposes, neurons seem to express
DSCAM isoforms in a stochastic yet biased fashion. Which isoform is
expressed in a single neuron is unimportant as long as it sufficiently dif-
ferent from its neighbour. It might simply be an indirect consequence
of the expression of different splicing factors in different neurons that
leads to this bias. For appropriate branching patterns, however, the
research to date suggests that the expression of Dscam isoforms in
some neurons is under tight developmental control. So we find a con-
48
trolled mix of stochasticity and regulation in the expression of Dscam
in drosophila [9].
4. Dscam is homologous between almost all animals, which places its ori-
gin to over 600 million years ago before the split between the deuteros-
tomes and protostomes [2]. But while in vertebrates their two homol-
ogous genes, Dscam and DscamL1 do not encode multiple isoforms, in
arthropods the single gene is highly enriched with alternative exons.
That leads to the interesting hypothesis that while in simple animals
cell adhesion and cell recognition is controlled by complex genes, in
complex animals this is done by relatively simple genes. This raises
the question of how to address the difficulty of accounting for a molec-
ular diversity large enough to provide specificity for the extraordinary
large number of neurons in the more complex vertebrate brains [5].
Vertebrates seem to manage their increase in cell recognition specifici-
ties through the combinatorial association of different recognition systems
such as gene duplication and the successive divergence of other loci, and via
the graded expression of recognition proteins [9]. There exists a large range
of cell adhesion, recognition and surface receptor genes in vertebrates: the
calcium-independent Ig superfamily, and calcium-dependent integrins, cad-
herins, and selectins. The human immunoglobulins (Ig) are the products
of three unlinked sets of genes: the immunoglobulin heavy (IGH), the im-
munoglobulin (IGK), and the immunoglobulin (IGL) genes, with a total of
about 150 functional genes. A large number of cadherin superfamily genes
have been identified to date, and most of them seem to be expressed in the
CNS. At least 80 members of the cadherin superfamily have been shown to
be expressed within a single mammalian species. Integrins have two different
chains, the (alpha) and (beta) subunits of which mammals possess eighteen
and eight subunits, while Drosophila has five and two subunits.
49
This genetic diversity is combined with complex regulatory patterns. One
example are the neurexin and neuroligin proteins in humans which are all
encoded by multiple genes. Neurexin is encoded by three genes controlled
each by two promoters which produce 6 main forms of neurexin. Both genes
display relative extensive alternative splicing, a process that can potentially
generate thousands of neurexin isoforms alone [2, 6]. Splice form diversity is
most extensive in the mammalian brain [7].
50
8.4 References
1. Alicia M. Celottoa and Brenton R. Graveley. Alternative splicing of
the drosophila dscam pre mRNA is both temporally and spatially reg-
ulated. Genetics, 159:599–608, 2001.
2. Mack E. Crayton III, Bradford C. Powell, Todd J. Vision, and Mor-
gan C. Giddings. Tracking the evolution of alternatively spliced exons
within the dscam family. BMC Evolutionary Biology, 6(16), 2006.
3. Brenton R. Graveley. Mutually exclusive splicing of the insect dscam
pre-mRNA directed by competing intronic RNA secondary structures.
Cell, 123:65–73, 2005.
4. Jung Woo Park and Brenton R. Graveley. Complex alternative splic-
ing. In Benjamin J. Blencowe and Brenton R. Graveley, editors, Al-
ternative Splicing in the Postgenomic Era, number 623 in Advances in
Experiemental Medicine and Biology, pages 50–63. Landes Bioscience
and Springer Science+Business Media, 2007.
5. Dietmar Schmucker and Brian Chen. Dscam and DSCAM: complex
genes in simple animals, complex animals yet simple genes. Genes
Dev., 23:147–156, 2009.
6. Barbara Treutleina, Ozgun Gokcec, Stephen R. Quakea, and Thomas
C. Sudhof. Cartography of neurexin alternative splicing mapped by
single-molecule long-read mRNA sequencing. PNAS, 11(13):E1291–E1299,
2014.
7. Julian P. Venables, Jamal Tazi, and Franois Juge. Regulated functional
alternative splicing in drosophila. Nucleic Acids Research, 40(1):1–10,
2012.
8. Woj M. Wojtowicz, John J. Flanagan, S. Sean Millard, S. Lawrence
51
Zipursky, and James C. Clemens. Alternative splicing of drosophila
dscam generates axon guidance receptors that exhibit isoform-specific
homophilic binding. Cell, 118(5):619–633, 2004.
9. S. Lawrence Zipursky, Woj M. Wojtowicz, and Daisuke Hattori. Got
diversity? wiring the fly brain with dscam. Trends in Biochemical
Sciences, 31(10):581–588, 2006.
52
... Under these assumptions, we derive a Bayesian backdoor estimator for categorical intervention distributions, which is flexible with respect to the causal contrasts of interest. In particular, we focus on a novel causal effect for the general categorical setting, which we define as the Jensen-Shannon divergence (JSD) between the intervention distributions (Lin, 1991;Griffiths et al., 2015). ...
... To the best of our knowledge, the JSD has not previously been used to quantify causal effects in the causal discovery literature. However, similar information theoretic tools are used to formalize concepts of causal influence (see for example Janzing et al. 2013;Griffiths et al. 2015; Wieczorek and Roth 2019). In particular, Griffiths et al. (2015) suggest τ ij as a measure of causal specificity, a concept regarding to which extent a cause controls an outcome. ...
... However, similar information theoretic tools are used to formalize concepts of causal influence (see for example Janzing et al. 2013;Griffiths et al. 2015; Wieczorek and Roth 2019). In particular, Griffiths et al. (2015) suggest τ ij as a measure of causal specificity, a concept regarding to which extent a cause controls an outcome. They motivate it as the mutual information between a random intervention and an outcome as follows. ...
Preprint
Full-text available
We present a Bayesian procedure for estimation of pairwise intervention effects in a high-dimensional system of categorical variables. We assume that we have observational data generated from an unknown causal Bayesian network for which there are no latent confounders. Most of the existing methods developed for this setting assume that the underlying model is linear Gaussian, including the Bayesian IDA (BIDA) method that we build upon in this work. By combining a Bayesian backdoor estimator with model averaging, we obtain a posterior over the intervention distributions of a cause-effect pair that can be expressed as a mixture over stochastic linear combinations of Dirichlet distributions. Although there is no closed-form expression for the posterior density, it is straightforward to produce Monte Carlo approximations of target quantities through direct sampling, and we also derive closed-form expressions for a few selected moments. To scale up the proposed procedure, we employ Markov Chain Monte Carlo (MCMC), which also enables us to use more efficient adjustment sets compared to the current exact BIDA. Finally, we use Jensen-Shannon divergence to define a novel causal effect based on a set of intervention distributions in the general categorical setting. We compare our method to the original IDA method and existing Bayesian approaches in numerical simulations and show that categorical BIDA performs favorably against the existing alternative methods in terms of producing point estimates and discovering strong effects.
... In this paper, I focus on one shortcoming: that the account is based on associative rather than causal measures. Using the interventionist account of causation (Pearl, 2009;Woodward, 2003) and recent work applying this account in the context of information theory (Griffiths et al., 2015;Pocheville et al., 2017;Bourrat, 2019), I refine Krakauer et al.'s approach by deriving a causal measure of individuality. Second, I show how this causal measure can be deployed to assess whether an ETI has occurred, following the ecological scaffolding model proposed by Black et al. (2020). ...
... Recently, Griffiths et al. (2015; see also Korb et al., 2011;Ay & Polani, 2008;Pocheville et al., 2017) have proposed information-theoretic measures of causal relationships based on mutual information within the interventionist framework. I will apply them here in the context of individuality. ...
... Note that this and other causal measures rely on a probability distribution for the intervened upon states of S. It might be chosen to be uniform, identical to the distribution of S in the population studied, or again within a 'normal' range of values, depending on the explanatory goals of the measure. SeeGriffiths et al. (2015) andPocheville et al. (2017) for discussions of this point. ...
Article
Full-text available
I extend work from Krakauer et al. (2020), who propose a conception of individuality as the capacity to propagate information through time. From this conception, they develop information-theoretic measures. I identify several shortcomings with these measures—in particular, that they are associative rather than causal. I rectify this shortcoming by deriving a causal information-theoretic measure of individuality. I then illustrate how this measure can be implemented and extended in the context of evolutionary transitions in individuality.
... Finally, like autonomy, reproductive causal control can be quantified in information-theoretic terms using measures based on mutual information, as proposed by Griffiths et al. (2015) and others (Korb et al., 2011;Pocheville et al., 2017;Bourrat, 2019aBourrat, , b, 2021a. Specifically, one could measure whether increasingly finer ideal interventions on the phenotype of a parental object (e.g., intervening at the grain of the color, then the shade, then the exact RGB value) reduces the uncertainty in the value of the offspring phenotype at that grain (increase in mutual information). ...
... Specifically, one could measure whether increasingly finer ideal interventions on the phenotype of a parental object (e.g., intervening at the grain of the color, then the shade, then the exact RGB value) reduces the uncertainty in the value of the offspring phenotype at that grain (increase in mutual information). This corresponds to the measure INF, a type of causal specificity-namely, fine-grained control-proposed by Griffiths et al. (2015) based on Woodward's (2010) ...
Article
Full-text available
This paper investigates the concept of reproduction in an evolutionary context. It draws a distinction between objects that are reproduced (reproducees), objects that reproduce thanks to some reproductive autonomy (reproducers), and Darwinian individuals that are reproducers with a high degree of reproductive causal control. This threefold distinction is then applied to different biological objects classically invoked in reproduction processes (e.g., genes, viruses, cells) to explain why they do not have the same status with respect to reproduction. The distinction also provides some fuel for the view proposed by Griesemer: that material overlap during reproduction is a condition for reproduction.
... We can now turn to the second concept, namely causal specificity. This concept is probably the most important when it comes to "measuring" the specific weight that different kinds of causes can have on the same phenomenon (Griffiths et al. 2015). From an interventionist perspective, it is about finding the factors that fulfill the function of difference makers: in other words, factors that, if changed, origin of phenotypic variation, and it would become the real explanans. ...
Article
Full-text available
Phenotypic plasticity has long been a phenomenon studied in evolutionary biology, but in recent decades it has attracted renewed interest among biologists. This interest has culminated in the so-called plasticity-led evolution hypothesis, according to which phenotypic plasticity, under certain circumstances, is capable of acting as an evolutionary factor, influencing the direction of adaptive change and promoting the emergence of new phenotypic variation. This article aims to analyze this hypothesis and the controversies it has generated and to bring more clarity to a debate that is often characterized by ambiguity and conceptual confusion. We will show how this debate can best be understood as a debate about the explanatory range of the two rival (but not mutually exclusive) models, plasticity-led evolution and mutation-led evolution, and their ability to explain the origin of phenotypic variation. We will then make use of Woodward’s account of causality in biology to assess the “specific weight” that the causes theorized by plasticity-led evolution can have within evolutionary explanations. To this end, we will analyze one of the best-known cases of plasticity-led evolution in nature, the North American spadefoot toad. Our analysis leads us to conclude that models based on plasticity-led evolution can be regarded as autonomous explanans, irreducible to the action of natural selection on preexisting genetic variation. This is due to the causal role played by environmental stimuli, which is accorded greater significance than in classical approaches. Furthermore, the plasticity-led evolution hypothesis could pave the way for a reassessment of the role of development in evolutionary processes.
... Mutual information only takes non-negative values, being exactly zero when hypotheses and states are independent so that the method conveys no information about the world. Mutual information and other information-theoretic concepts have already found fruitful application to several topics in philosophy, ranging from the nature of representation (Skyrms, 2010;Martínez, 2019), to the characterization of causal relationships (Griffiths et al., 2015;Bourrat, 2019;Love, 2020), to the significance of unification in philosophy of science (Myrvold, 2003;Blanchard, 2018). As a measure of research quality, mutual information measures how much information we Synthese (2025) 205:67 obtain about the world by using an experimental method of interest. ...
Article
Full-text available
How should we measure the quality of experimental research? With talk of a looming “replicability crisis”, this question has gained additional significance. Yet, common measures of research quality based on reliability and validity do not always track core epistemic virtues. To remedy this issue, we draw on information theory and propose a measure of research quality based on mutual information. Mutual information measures how much information an experimental method carries about the world. We show that this measure tracks epistemic virtues that reliability and validity do not. We conclude by discussing implications of this information-theoretic measure of research quality and address some limitations of this approach.
... Unlike Yablo, we do not regard proportionality as a defining condition for causation or a necessary condition for an acceptable causal explanation (cf. Griffiths et al., 2015;Woodward, 2010). However, we do take it as a virtue (among other virtues that may sometimes pull in different directions) and one of our purposes is to present a simple and compelling reason for answering why it is a virtue. ...
Article
Full-text available
Stephen Yablo’s notion of proportionality, despite controversies surrounding it, has played a significant role in philosophical discussions of mental causation and of high-level causation more generally. In particular, it is invoked in James Woodward’s interventionist account of high-level causation and explanation, and is implicit in a novel approach to constructing variables for causal modeling in the machine learning literature, known as causal feature learning (CFL). In this article, we articulate an account of proportionality inspired by both Yablo’s account of proportionality and the CFL account of variable construction. The resulting account has at least three merits. First, it illuminates an important feature of the notion of proportionality, when it is adapted to a probabilistic and interventionist framework. The feature is that at the center of the notion of proportionality lies the concept of “determinate intervention effects.” Second, it makes manifest a virtue of (common types of) high-level causal/explanatory statements over low-level ones, when relevant intervention effects are determinate. Third, it overcomes a limitation of the CFL framework and thereby also addresses a challenge to interventionist accounts of high-level causation.
Chapter
The purpose of this paper is to propose an original definition of stochasticity. It accomplishes this via a molecular case study that has rarely been analyzed in the philosophical literature: mutually exclusive alternative splicing (MEAS). The development of this definition can prove useful for rethinking the epistemological meaning of stochasticity in molecular and cellular biology. Exploring biological practice, I will show that work by Hiesinger and Hassan (Trends Neurosci 41:577–586:2018) is a good starting point for clarifying the stochastic aspect of MEAS as they provide a definition which uses the notions of causal dependence (CD) and unpredictability (U). I then emphasize that this definition is very similar to Gould’s notion of contingency which has been unpacked and clarified by the philosophers such as Beatty (J Philos 103:336–362:2006) and Turner (Biol Philos 26:65–79:2011). Taking this similarity seriously, I developed a definition of stochasticity that I call “insufficiency of causal dependence” (ICD). In ICD, the stochasticity/chanciness of a given phenomenon is characterized by the fact that certain clusters of causes X can be necessary but not sufficient to bring about a given outcome. This definition allows us to move away from defining stochasticity in terms of epistemic limitation (a phenomenon is “chancy” because we do not actually know what is happening), and/or in terms of the intrinsic properties of biological processes (something is “chancy” because we are referring to indeterministic phenomena). I will show that the main advantage of ICD is in helping to untangle the explanatory and predictive dimensions of stochasticity in the context of MEAS and, more generally, in molecular and cellular biology.
Article
Full-text available
Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with "binning" when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen-Shannon divergence of two or more data sets.
Book
Full-text available
Parental care includes a wide variety of traits that enhance offspring development and survival. It is taxonomically widespread and is central to the maintenance of biodiversity through its close association with other phenomena such as sexual selection, life-history evolution, sex allocation, sociality, cooperation and conflict, growth and development, genetic architecture, and phenotypic plasticity. This novel book provides a fresh perspective on the study of the evolution of parental care based on contributions from some of the top researchers in the field. It provides evidence that the dynamic nature of family interactions, and particularly the potential for co-evolution among family members, has contributed to the great diversity of forms of parental care and life-histories across as well as within taxa. The Evolution of Parental Care aims to stimulate students and researchers alike to pursue exciting new directions in this fascinating and important area of behavioural and evolutionary biology. It will be of relevance and use to those working in the fields of animal behaviour, ecology, evolution, and genetics, as well as related disciplines such as psychology and sociology.
Article
In the past century, nearly all of the biological sciences have been directly affected by discoveries and developments in genetics, a fast-evolving subject with important theoretical dimensions. in this rich and accessible book, Paul Griffiths and Karola Stotz show how the concept of the gene has evolved and diversified across the many fields that make up modern biology. By examining the molecular biology of the 'environment', they situate genetics in the developmental biology of whole organisms, and reveal how the molecular biosciences have undermined the nature/nurture distinction. Their discussion gives full weight to the revolutionary impacts of molecular biology, while rejecting 'genocentrism' and 'reductionism', and brings the topic right up to date with the philosophical implications of the most recent developments in genetics. Their book will be invaluable for those studying the philosophy of biology, genetics and other life sciences.
Article
Written by one of the preeminent researchers in the field, this book provides a comprehensive exposition of modern analysis of causation. It shows how causality has grown from a nebulous concept into a mathematical theory with significant applications in the fields of statistics, artificial intelligence, economics, philosophy, cognitive science, and the health and social sciences. Judea Pearl presents and unifies the probabilistic, manipulative, counterfactual, and structural approaches to causation and devises simple mathematical tools for studying the relationships between causal connections and statistical associations. The book will open the way for including causal analysis in the standard curricula of statistics, artificial intelligence, business, epidemiology, social sciences, and economics. Students in these fields will find natural models, simple inferential procedures, and precise mathematical definitions of causal concepts that traditional texts have evaded or made unduly complicated. The first edition of Causality has led to a paradigmatic change in the way that causality is treated in statistics, philosophy, computer science, social science, and economics. Cited in more than 5,000 scientific publications, it continues to liberate scientists from the traditional molds of statistical thinking. In this revised edition, Judea Pearl elucidates thorny issues, answers readers’ questions, and offers a panoramic view of recent advances in this field of research. Causality will be of interests to students and professionals in a wide variety of fields. Anyone who wishes to elucidate meaningful relationships from data, predict effects of actions and policies, assess explanations of reported events, or form theories of causal understanding and causal speech will find this book stimulating and invaluable.
Article
Face au phenomene de causation redondante qui l'oblige a reviser son analyse contrefactuelle de la causation, l'A. met en evidence la distinction entre la cause (preempting) et l'effet (preempted) de la causation preemptive, quelque soit le cas, symetrique ou asymetrique, de la preemption premiere et de la preemption tardive, definies comme rupture (cutting) de la chaine causale. Rejetant l'idee de quasi-dependance, et corrigeant celle de fragilite, l'A. definit la transitivite de la causation comme modele d'influence.
Article
Significance Neurexins are presynaptic cell-adhesion molecules that are essential for synapse formation and synaptic transmission. Extensive alternative splicing of neurexin transcripts may generate thousands of isoforms, but it is unclear how many distinct neurexins are physiologically produced. We used unbiased long-read sequencing of full-length neurexin mRNAs to systematically assess the alternative splicing of neurexins in prefrontal cortex. We identified a novel, abundantly used alternatively spliced exon of neurexins, and found that the different events of alternative splicing of neurexins appear to be independent of each other. Our data suggest that thousands of neurexin isoforms are physiologically generated, consistent with the notion that neurexins represent transsynaptic protein-interaction scaffolds that mediate diverse functions and are regulated by alternative splicing at multiple independent sites.
Article
This book develops a manipulationist theory of causation and explanation: causal and explanatory relationships are relationships that are potentially exploitable for purposes of manipulation and control. The resulting theory is a species of counterfactual theory that (I claim) avoids the difficulties and counterexamples that have infected alternative accounts of causation and explanation, from the Deductive-Nomological model onwards. One of the key concepts in this theory is the notion of an intervention, which is an idealization of the notion of an experimental manipulation that is stripped of its anthropocentric elements. This notion is used to provide a characterization of causal relationships that is non-reductive but also not viciously circular. Relationships that correctly tell us how the value of one variable Y would change under interventions on a second variable Y are invariant. The notion of an invariant relationship is more helpful than the notion of a law of nature (the notion on which philosophers have traditionally relied) in understanding how explanation and causal attribution work in the special sciences.