ArticlePDF Available

The Inflation Technique for Causal Inference with Latent Variables

Authors:

Abstract and Figures

The problem of causal inference is to determine if a given probability distribution on observed variables is compatible with some causal structure. The difficult case is when the structure includes latent variables. We here introduce the inflation technique for tackling this problem. An inflation of a causal structure is a new causal structure that can contain multiple copies of each of the original variables, but where the ancestry of each copy mirrors that of the original. For every distribution compatible with the original causal structure we identify a corresponding family of distributions, over certain subsets of inflation variables, which is compatible with the inflation structure. It follows that compatibility constraints at the inflation level can be translated to compatibility constraints at the level of the original causal structure; even if the former are weak, such as observable statistical independences implied by disjoint causal ancestry, the translated constraints can be strong. In particular, we can derive inequalities whose violation by a distribution witnesses that distribution's incompatibility with the causal structure (of which Bell inequalities and Pearl's instrumental inequality are prominent examples). We describe an algorithm for deriving all of the inequalities for the original causal structure that follow from ancestral independences in the inflation. Applied to an inflation of the Triangle scenario with binary variables, it yields inequalities that are stronger in at least some aspects than those obtainable by existing methods. We also describe an algorithm that derives a weaker set of inequalities but is much more efficient. Finally, we discuss which inflations
This content is subject to copyright. Terms and conditions apply.
    
       
     
  

           
 The problem of causal inference is to determine if a given probability distribution on observed vari-
ables is compatible with some causal structure. The dicult case is when the causal structure includes latent
variables. We here introduce the ination technique for tackling this problem. An ination of a causal struc-
ture is a new causal structure that can contain multiple copies of each of the original variables, but where the
ancestry of each copy mirrors that of the original. To every distribution of the observed variables that is com-
patible with the original causal structure, we assign a family of marginal distributions on certain subsets of
the copies that are compatible with the inated causal structure. It follows that compatibility constraints for
the ination can be translated into compatibility constraints for the original causal structure. Even if the con-
straints at the level of ination are weak, such as observable statistical independences implied by disjoint
causal ancestry, the translated constraints can be strong. We apply this method to derive new inequalities
whose violation by a distribution witnesses that distribution’s incompatibility with the causal structure (of
which Bell inequalities and Pearl’s instrumental inequality are prominent examples). We describe an algo-
rithm for deriving all such inequalities for the original causal structure that follow from ancestral indepen-
dences in the ination. For three observed binary variables with pairwise common causes, it yields inequali-
ties that are stronger in at least some aspects than those obtainable by existing methods. We also describe an
algorithm that derives a weaker set of inequalities but is more ecient. Finally, we discuss which inations
are such that the inequalities one obtains from them remain valid even for quantum (and post-quantum)
generalizations of the notion of a causal model.
 causal inference with latent variables, ination technique, causal compatibility inequalities,
marginal problem, Bell inequalities, Hardy paradox, graph symmetries, quantum causal models, GPT causal
models, triangle scenario

Given a joint probability distribution of some observed variables, the problem of causal inference is to de-
termine which hypotheses about the causal mechanism can explain the given distribution. Here, a causal
mechanism may comprise both causal relations among the observed variables, as well as causal relations
among these and a number of unobserved variables, and among unobserved variables only. Causal inference
has applications in all areas of science that use statistical data and for which causal relations are impor-
tant. Examples include determining the eectiveness of medical treatments, sussing out biological pathways,
making data-based social policy decisions, and possibly even in developing strong machine learning algo-
rithms [1–5]. A closely related type of problem is to determine, for a given set of causal relations, the set of
all distributions on observed variables that can be generated from them. A special case of both problems is
the following decision problem: given a probability distribution and a hypothesis about the causal relations,
determine whether the two are compatible: could the given distribution have been generated by the hypoth-
              
  
               
 
            
esized causal relations? This is the problem that we focus on. We develop necessary conditions for a given
distribution to be compatible with a given hypothesis about the causal relations.
In the simplest setting, the causal hypothesis consists of a directed acyclic graph (DAG) all of whose nodes
correspond to observed variables. In this case, obtaining a verdict on the compatibility of a given distribution
with the causal hypothesis is simple: the compatibility holds if and only if the distribution is Markov with
respect to the DAG, which is to say that the distribution features all of the conditional independence relations
that are implied by d-separation relations among variables in the DAG. The DAGs that are compatible with
the given distribution can be determined algorithmically [1].1
A signicantly more dicult case is when one considers a causal hypothesis which consists of a DAG
some of whose nodes correspond to latent (i. e., unobserved) variables, so that the set of observed variables
corresponds to a strict subset of the nodes of the DAG. This case occurs, e.g., in situations where one needs
to deal with the possible presence of unobserved confounders, and thus is particularly relevant for experi-
mental design in applications. With latent variables, the condition that all of the conditional independence
relations among the observed variables that are implied by d-separation relations in the DAG is still a nec-
essary condition for compatibility of a given such distribution with the DAG, but in general it is no longer
sucient, and this is what makes the problem dicult.
Whenever the observed variables in a DAG have nite cardinality,2one may also restrict the latent vari-
ables in the causal hypothesis to be of nite cardinality as well, without loss of generality [6]. As such, the
mathematical problem which one must solve toinfer the distributions that are compatible with the hypothesis
is a quantier elimination problem for some nite number of variables, as follows: The probability distribu-
tions of the observed variables can all be expressed as functions of the parameters specifying the conditional
probabilities of each node given its parents, many of which involve latent variables. If one can eliminate these
parameters, then one obtains constraints that refer exclusively to the probability distribution of the observed
variables. This is a nonlinear quantier elimination problem. The Tarski-Seidenberg theorem provides an in
principle algorithm for an exact solution, but unfortunately the computational complexity of such quantier
elimination techniques is far too large to be practical, except in particularly simple scenarios [7, 8].3Most
uses of such techniques have been in the service of deriving compatibility conditions that are necessary but
not sucient, for both observational [10–13] and interventionist data [14–16].
Historically, the insuciency of the conditional independence relations for causal inference in the pres-
ence of latent variables was rst noted by Bell in the context of the hidden variable problem in quantum
physics [17]. Bell considered an experiment for which considerations from relativity theory implied a very
particular causal structure, and he derived an inequality that any distribution compatible with this structure,
and compatible with certain constraints imposed by quantum theory, must satisfy. Bell also showed that this
inequality was violated by distributions generated from entangled quantum states with particular choices of
incompatible measurements. Later work, by Clauser, Horne, Shimony and Holt (CHSH) derived inequalities
without assuming any facts about quantum correlations [18]; this derivation can retrospectively be under-
stood as the rst derivation of a constraint arising from the causal structure of the Bell scenario alone [19].
The CHSH inequality was the rst example of a compatibility condition that appealed to the strength of the
correlations rather than simply the conditional independence relations inherent therein. Since then, many
generalizations of the CHSH inequality have been derived for the same sort of causal structure [20]. The idea
that such work is best understood as a contribution to the eld of causal inference has only recently been put
forward [19, 21–23], as has the idea that techniques developed by researchers in the foundations of quantum
theory may be usefully adapted to causal inference.4
Independently of Bell’s work, Pearl later derived the instrumental inequality [31], which provides a
necessary condition for the compatibility of a distribution with a causal structure known as the instrumental
1As illustrated by the vast amount of literature on the subject, the problem can still be dicult in practice, for example due to a
large number of variables in certain applications or due to nite statistics.
2The cardinality of a variable is the number of possible values it can take.
3Techniques for nding approximate solutions to nonlinear quantier elimination may help [9].
4The current article being another example of the phenomenon [9, 23–30].
            
scenario. This causal structure comes up when considering, for instance, certain kinds of noncompliance in
drug trials. More recently, Steudel and Ay [32] derived an inequality which must hold whenever a distribution
on nvariables is compatible with a causal structure where no set of more than cvariables has a common
ancestor, for arbitrary n,c. More recent work has focused specically on the simplest nontrivial case,
with n=3 and c=2, a causal structure that has been called the Triangle scenario [21, 33] (Fig. 1).
Recently, Henson, Lal and Pusey [22] have investigated those causal structures for which merely con-
rming that a given distribution on observed variables satises all of the conditional independence relations
implied by d-separation relations does not guarantee that this distribution is compatible with the causal
structure. They coined the term interesting for causal structures that have this property. They presented a cat-
alogue of all potentially interesting causal structures having six or fewer nodes in [22, App. E], of which all but
three were shown to be indeed interesting. Evans has also sought to generate such a catalogue [34]. The Bell
scenario, the Instrumental scenario, and the Triangle scenario all appear in the catalogue, together with many
others. Furthermore,they provided numerical evidence and an intuitive argument in favour of the hypothesis
that the fraction of causal structures that are interesting increases as the total number of nodes increases.
This highlights the need for moving beyond a case-by-case consideration of individual causal structures and
for developing techniques for deriving constraints beyond conditional independence relations that can be
applied to any interesting causal structure. Shannon-type entropic inequalities are an example of such con-
straints [21, 25, 32, 33, 35]. They can be derived for a given causal structure with relative ease, via exclusively
linear quantier elimination, since conditional independence relations are linear equations at the level of
entropies. They also have the advantage that they apply for any nite cardinality of the observed variables.
Recent work has also looked at non-Shannon type inequalities, potentially further strengthening the entropic
constraints [26, 36]. However, entropic techniques are still wanting, since the resulting inequalities are often
rather weak. For example, they are not sensitive enough to witness some known incompatibilities, in particu-
lar for distributions that only arise in quantum but not classical models with a given causal structure [21, 26].5
In order to improve this state of aairs, we here introduce a new technique for deriving necessary condi-
tions for the compatibility of a distribution of observed variables with a given causal structure, which we term
the ination technique. This technique is frequently capable of witnessing incompatibility when many other
causal inference techniques fail. For example, in Example 2 of Sec. 3.2 we prove that the tripartite “W-type”
distribution is incompatible with the Triangle scenario, despite the incompatibility being invisible to other
causal inference tools such as conditional independence relations, Shannon-type [25, 33, 35] or non-Shanon-
type entropic inequalities [26], or covariance matrices [27].
The ination technique works roughly as follows. For a given causal structure under consideration, one
can construct many new causal structures, termed inations of this causal structure. An ination duplicates
one or more of the nodes of the original causal structure, while mirroring the form of the subgraph describ-
ing each node’s ancestry. Furthermore, the causal parameters that one adds to the inated causal structure
mirror those of the original causal structure. We show that if marginal distributions on certain subsets of the
observed variables in the original causal structure are compatible with the original causal structure, then
the same marginal distributions on certain copies of those subsets in the inated causal structure are com-
patible with the inated causal structure (Lemma 4). Similarly, we show that any necessary condition for
compatibility of such distributions with the inated causal structure translates into a necessary condition for
compatibility with the original causal structure (Corollary 6). Thus, applying standard techniques for deriving
causal compatibility inequalities to the inated causal structure typically results in new causal compatibility
inequalities for the original causal structure. The reader interested in seeing an example of how our technique
works may want to take a sneak peak at Sec.3.2.
5It should be noted that non-standard entropic inequalities can be obtained through a ne-graining of the causal scenario,
namely by conditioning on the distinct nite possible outcomes of root variables (“settings”), and these types of inequalities have
proven somewhat sensitive to quantum-classical separations [33, 37, 38]. Such inequalities are still limited, however, in that they
are only applicable to those causal structures which feature observed root nodes. The potential utility of entropic analysis where
ne-graining is generalized to non-root observed nodes is currently being explored by E. W. and Rafael Chaves. Jacques Pienaar
has also alluded to similar considerations as a possible avenue for further research [36].
            
Concretely, we consider causal compatibility inequalities for the inated causal structure that are ob-
tained as follows. One begins by identifying inequalities for the marginal problem, which is the problem of
determining when a givenfamily of marginal distributions on some subsets of variables can arise as marginals
of a global joint distribution. One then looks for sets of variables within the inated causal structure which
admit of nontrivial d-separation relations. (We mainly consider sets of variables with disjoint ancestries.)
For each such set, one writes down the appropriate factorization of their joint distribution. These factoriza-
tion conditions are nally substituted into the marginal problem inequalities to obtain causal compatibility
inequalities for the inated causal structure. Although these constraints are extremely weak, the ination
technique turns them into powerful necessary conditions for compatibility with the original causal structure.
We show how to identify all relevant factorization conditions from the structure of the inated causal
structure, and also how to obtain all marginal problem inequalities by enumerating all facets of the associ-
ated marginal polytope (Sec.4.2). Translating the resulting causal compatibility inequalities on the inated
causal structure back to the original causal structure, we obtain causal compatibility conditions in the form of
nonlinear (polynomial) inequalities. As a concrete example of our technique, we present all the causal com-
patibility inequalities that can be derived in this manner from a particular ination of the Triangle scenario
(Sec. 4.3). In general, we also show how to eciently obtain a partial set of marginal problem inequalities by
enumerating transversals of a certain hypergraph (Sec. 4.4).
Besides the entropic techniques discussed above, our method is the rst systematic tool for causal infer-
ence with latent variables that goes beyond observed conditional independence relations while not assuming
any bounds on the cardinality of each latent variable. While our method can be used tosystematically gener-
ate necessary conditions for compatibility with a given causal structure, we do not know whether the set of
inequalities thus generated are also sucient.
We present our technique primarily as a tool for standard causal inference, but we also briey discuss
applications to quantum causal models [22, 23, 39–43] and causal models within generalized probabilistic
theories [22] (Sec. 5.4). In particular, we discuss when our inequalities are necessary conditions for a distribu-
tion of observed variables to be compatible with a given causal structure within any generalized probabilistic
theory [44, 45] rather than simply within classical probability theory.
      
Acausal model consists of a pair of objects: a causal structure and a family of causal parameters. We dene
each in turn. First, recall that a directed acyclic graph (DAG) Gconsists of a nite set of nodes (G)and
a set of directed edges (G)(G)×(G), meaning that an edge is an ordered pair of nodes,
such that this directed graph is acylic, which means that there is no way to start and end at the same node by
traversing edges forward. In the context of a causal model, each node X(G)will be equipped with a
random variable that we denote by the same letter X. A directed edge XYcorresponds to the possibility of a
direct causal inuence from the variable Xto the variable Y. In this way, the edges represent causal relations.
Our terminology for the causal relations between the nodes in a DAG is the standard one. The parents of a
node Xin Gare dened as those nodes from which an outgoing edge terminates at X, i. e. G(X)={Y|YX}.
When the graph Gis clear from the context, we omit the subscript. Similarly, the children of a node Xare
dened as those nodes at which edges originating at Xterminate, i. e. G(X)={Y|XY}. If Xis a set
of nodes, then we put G(X) :=XXG(X)and G(X) :=XXG(X). The ancestors of a set of nodes
X, denoted G(X), are dened as those nodes which have a directed path to some node in X, including
the nodes in Xthemselves.6Equivalently, (X) :=nn(X), where n(X)is dened inductively via
0(X) :=Xand n+1(X) :=(n(X)).
6The inclusion of a node itself within the set of its ancestors is contrary to the colloquial use of the term “ancestors”. One uses
this denition so that any correlation between two variables can always be attributed to a common “ancestor”. This includes, for
instance, the case where one variable is a parent of the other.
            
Acausal structure is a DAG that incorporates a distinction between two types of nodes: the set of ob-
served nodes, and the set of latent nodes.7Following [22], we will depict the observed nodes by triangles and
the latent nodes by circles, as in Fig.1.8Henceforth, we will use Gto refer to the causal structure rather than
just the DAG, so that Gincludes a specication of which variables are observed, denoted (G),
and which are latent, denoted (G). Frequently, we will also imagine the causal structure to in-
clude a specication of the cardinalities of the observed variables. While these are nite in all of our examples,
the ination technique may apply in the case of continuous variables as well. Although we will not do so in
this work, the ination technique can also be applied in the presence of other types of constraints, e.g., when
all variables are assumed to be Gaussian.
The second component of a causal model is a family of causal parameters. The causal parameters spec-
ify, for each node X, the conditional probability distribution over the values of the random variable X, given
the values of the variables in (X). In the case of root nodes, we have (X)=0, and the conditional distri-
bution is an unconditioned distribution. We write PY|Xfor the conditional distribution of a variable Ygiven
a variable X, while the particular conditional probability of the variable Ytaking the value ygiven that the
variable Xtakes the values xis denoted9PY|X(y|x). Therefore, a family of causal parameters has the form
{PX|G(X):X(G)}. (1)
Finally, a causal model Mconsists of a causal structure together with a family of causal parameters,
M=(G,{PX|G(X):X(G)}).
A causal model species a joint distribution of all variables in the causal structure via
P(G)=
X(G)
PX|G(X),(2)
where denotes the usual product of functions, so that e.g. (PY|X×PY)(x,y)=PY|X(y|x)PX(x). A distribution
P(G)arises in this way if and only if it satises the Markov conditions associated to G[1, Sec. 1.2].
The joint distribution of the observed variables is obtained from the joint distribution of all variables by
marginalization over the latent variables,
P(G)=
{U:U(G)}
P(G),(3)
where Udenotes marginalization over the (latent) variable U, so that (UPUV )(v) :=uPUV (uv).
  A given distribution P(G)is compatible with a given causal structure Gif there is
some choice of the causal parameters that yields P(G)via Eqs. (2, 3). A given family of distributions
on a family of subsets of observed variables is compatible with a given causal structure if and only if there
exists some P(G)such that both
1. P(G)is compatible with the causal structure, and
2. P(G)yields the given family as marginals.
7Pearl [1, Def. 2.3.2] uses the term latent structure when referring to a DAG supplemented by a specication of latent nodes,
whereas here that specication is implicit in our term causal structure.
8Note that this convention diers from that of [39], where triangles represent classical variables and circles represent quantum
systems.
9Although our notation suggests that all variables are either discrete or described by densities, wedo not make this assumption.
All of our equations can be translated straightforwardly into proper measure-theoretic notation.
            
     
    
We now introduce the notion of an ination of a causal model. If a causal model species a causal structure
G, then an ination of this model species a new causal structure, G󸀠, which we refer to as an ination of G.
For a given causal structure G, there are many causal structures G󸀠constituting an ination of G. We denote
the set of such causal structures (G). The particular choice of G󸀠(G)then determines
how to map a causal model Mon Ginto a causal model M󸀠on G󸀠, since the family of causal parameters of M󸀠
will be determined by a function M󸀠=GG󸀠(M)that we dene below. We begin by dening when a
causal structure G󸀠is an ination of G, building on some preliminary denitions.
For any subset of nodes X(G), we denote the induced subgraph on Xby G(X). It con-
sists of the nodes Xand those edges of Gwhich have both endpoints in X. Of special importance to us is the
ancestral subgraph G(X), which is the subgraph induced by the ancestry of X,G(X):=
G(G(X)).
In an inated causal structure G󸀠, every node is also labelled by a node of G. That is, every node of the
inated causal structure G󸀠is a copy of some node of the original causal structure G, and the copies of a node
Xof Gin G󸀠are denoted X1,...,Xk. The subscript that indexes the copies is termed the copy-index. A copy
is classied as observed or latent according to the classication of the original. Similarly, any constraints on
cardinality or other types of constraints such as Gaussianity are also inherited from the original. When two
objects (e. g. nodes, sets of nodes, causal structures, etc…) are the same up to copy-indices, then we use to
indicate this, as in XiXjX. In particular, XX󸀠for sets of nodes X(G)and X󸀠(G󸀠)if
and only if X󸀠contains exactly one copy of every node in X. Similarly, G󸀠(X󸀠)G(X)means
that in addition to XX󸀠, an edge is present between two nodes in X󸀠if and only if it is present between the
two associated nodes in X.
In order to be an ination, G󸀠must locally mirror the causal structure of G:
  The causal structure G󸀠is said to be an ination of G, that is, G󸀠(G), if and only
if for every Vi(G󸀠), the ancestral subgraph of Viin G󸀠is equivalent, under removal of the
copy-index, to the ancestral subgraph of Vin G,
G󸀠(G)i Vi(G󸀠) : G󸀠(Vi)G(V). (4)
Equivalently, the condition can be restated wholly in terms of local causal relationships, i.e.
G󸀠(G)i Xi(G󸀠) : G󸀠(Xi)G(X). (5)
In particular, this means that an ination is a bration of graphs [46], although there are brations that
are not inations.
To illustrate the notion of ination, we consider the causal structure of Fig. 1, which is called the Triangle
scenario (for obvious reasons) and which has been studied recently by a number of authors [22 (Fig. E#8),
19 (Fig. 18b), 21 (Fig. 3), 33 (Fig. 6a), 40 (Fig.1a), 47 (Fig. 8), 32 (Fig. 1b), 25 (Fig.4b)]. Dierent inations of the
Triangle scenario are depicted in Figs. 2 to 6, which will be referred to as the Web,Spiral,Capped, and Cut
ination, respectively.
We now dene the function GG󸀠, that is, we specify how causal parameters are dened for a
given inated causal structure in terms of causal parameters on the original causal structure.
  Consider causal models Mand M󸀠where (M)=Gand (M󸀠)=G󸀠, where G󸀠is an in-
ation of G. Then M󸀠is said to be the GG󸀠ination of M, that is, M󸀠=GG󸀠(M), if and only if
for every node Xiin G󸀠, the manner in which Xidepends causally on its parents within G󸀠is the same as the
manner in which Xdepends causally on its parents within G. Noting that XiXand that G󸀠(Xi)G(X)
by Eq. (5), one can formalize this condition as:
Xi(G󸀠) : PXi|G󸀠(Xi)=PX|G(X).(6)
            
    
           
          
         
          
      
             
     {AABBCC}    
               
 {AABC}    
                
{ABC}            
    
            
          
For a given triple G,G󸀠, and M, this denition species a unique ination model M󸀠, resulting in a well-
dened function GG󸀠.
To sum up, the ination of a causal model is a new causal model where (i) each variable in the original
causal structure may have counterparts in the inated causal structure with ancestral subgraphs mirroring
those of the originals, and (ii) the manner in which a variable depends causally on its parents in the inated
causal structure is given by the manner in which its counterpart in the original causal structure depends
causally on its parents. The operation of modifying a DAG and equipping the modied version with con-
ditional probability distributions that mirror those of the original also appears in the do calculus and twin
networks of Pearl [1], and moreover bears some resemblance to the adhesivity technique used in deriving
non-Shannon-type entropic inequalities (see also Appendix E).
We are now in a position to describe the key property of the ination of a causal model, the one that makes
it useful for causal inference. With notation as in Denition 3, let PXand PX󸀠denote marginal distributions
on some X(G)and X󸀠(G󸀠), respectively. Then
if X󸀠Xand G󸀠(X󸀠)G(X), then PX󸀠=PX.(7)
This follows from the fact that the distributions on X󸀠and Xdepend only on their ancestral subgraphs and the
parameters dened thereon, which by the denition of ination are the same for X󸀠and for X. It is useful to
have a name for those sets of observednodes in G󸀠which satisfy the antecedent of Eq. (7), that is, for which one
can nd a copy-index-equivalent set in the original causal structure Gwith a copy-index-equivalent ancestral
subgraph. We call such subsets of the observed nodes of G󸀠injectable sets,
V󸀠(G󸀠)
i V(G):V󸀠Vand G󸀠(V󸀠)G(V). (8)
Similarly, those sets of observed nodes in the original causal structure Gwhich satisfy the antecedent
of Eq. (7), that is, for which one can nd a corresponding set in the inated causal structure G󸀠with a copy-
index-equivalent ancestral subgraph, we describe as images of the injectable sets under the dropping of
copy-indices,
V(G)
i V󸀠(G󸀠):V󸀠Vand G󸀠(V󸀠)G(V). (9)
Clearly, V(G)i V󸀠(G󸀠)such that VV󸀠.
For example in the Spiral ination of the Triangle scenario depicted in Fig. 3, the set {A1B1C1}is injectable
because its ancestral subgraph is equivalent up to copy-indices to the ancestral subgraph of {ABC}in the
original causal structure, and the set {A2C1}is injectable because its ancestral subgraph is equivalent to that
of {AC}in the original causal structure.
A set of nodes in the inated causal structure can only be injectable if it contains at most one copy of
any node from the original causal structure. More strongly, it can only be injectable if its ancestral subgraph
contains at most one copy of any observed or latent node from the original causal structure. Thus, in Fig.3,
{A1A2C1}is not injectable because it contains two copies of A, and {A2B1C1}is not injectable because its an-
cestral subgraph contains two copies of Y.
We can now express Eq.(7) in the language of injectable sets,
PV󸀠=PVif V󸀠Vand V󸀠(G󸀠). (10)
            
In the example of Fig. 3, injectability of the sets {A1B1C1}and {A2C1}thus implies that the marginals on
each of these are equal to the marginals on their counterparts, {ABC}and {AC}, in the original causal model,
so that PA1B1C1=PABC and PA2C1=PAC.
  
Finally, we can explain why ination is relevant for deciding whether a distribution is compatible
with a causal structure. For a distribution P(G)to be compatible with G, there must be a causal
model Mthat yields it. Per Denition 1, given a P(G)compatible with G, the family of
marginals of P(G)on the images of the injectable sets of observed variables in G,{PV:
V(G)}, are also said to be compatible with G. Looking at the ination model
M󸀠=GG󸀠(M), Eq. (10) implies that the family of distributions on the injectable sets given by
{PV󸀠:V󸀠(G󸀠)} where PV󸀠=PVfor V󸀠V is compatible with G󸀠.
The same considerations apply for any family of distributions such that each set of variables in the fam-
ily corresponds to an injectable set (i.e., when the family of distributions is associated with an incomplete
collection of injectable sets.) Formally,
  Let the causal structure G󸀠be an ination of G. Let 𝕊󸀠(G󸀠)be a collection of in-
jectable sets, and let 𝕊(G)be the images of this collection under the dropping of copy-
indices. If a distribution P(G)is compatible with G, then the family of distributions {PV:V𝕊}
is compatible with G per Denition 1. Furthermore the corresponding family of distributions {PV󸀠:V󸀠𝕊󸀠},
dened via PV󸀠=PVfor V󸀠V, must be compatible with G󸀠.
We have thereby related a question about compatibility with the original causal structure to one about
compatibility with the inated causal structure. If one can show that the new compatibility question on G󸀠
is answered in the negative, then it follows that the original compatibility question on Gis answered in the
negative as well. Some simple examples serve to illustrate the idea.
         . Consider the follow-
ing causal inference problem. We are given a joint distribution of three binary variables, PABC, where the
marginal on each variable is uniform and the three are perfectly correlated,
PABC =[000]+[111]
2,i. e., PABC(abc)=1
2if a=b=c,
0 otherwise,(11)
and we would like to determine whether it is compatible with the Triangle scenario (Fig. 1). The notation
[abc]in Eq. (11) is shorthand for the deterministic distribution where A,B, and Ctake the values a,b, and c
respectively; in terms of the Kronecker delta, [abc] :=δA,aδB,bδC,c.
Since there are no conditional independence relations among the observed variables in the Triangle sce-
nario, there is no opportunity for ruling out the distribution on the grounds that it fails to satisfy the required
conditional independences.
To solve the causal inference problem, we consider the Cut ination (Fig. 5). The injectable sets include
{A2C1}and {B1C1}. Their images in the original causal structure are {AC}and {BC}, respectively.
We will show that the distribution of Eq.(11) is not compatible with the Triangle scenario by demonstrat-
ing that the contrary assumption of compatibility implies a contradiction. If the distribution of Eq. (11) were
compatible with the Triangle scenario, then so too would its pair of marginals on {AC}and {BC}, which are
given by:
PAC =PBC =[00]+[11]
2.
By Lemma 4, this compatibility assumption would entail that the marginals
PA2C1=PB1C1=[00]+[11]
2(12)
             
are compatible with the Cut ination of the Triangle scenario. We now show that the latter compatibility
cannot hold, thereby obtaining our contradiction. It suces to note that (i) the only joint distribution that ex-
hibits perfect correlation between A2and C1and between B1and C1also exhibits perfect correlation between
A2and B1, and (ii) A2and B1have no common ancestor in the Cut ination and hence must be marginally
independent in any distribution that is compatible with it.
We have therefore certied that the distribution PABC of Eq. (11) is not compatible with the Triangle sce-
nario, recovering a result originally proven by Steudel and Ay [32].
         . Consider another causal
inference problem on the Triangle scenario, namely, that of determining whether the distribution
PABC =[100]+[010]+[001]
3,i. e., PABC(abc)=1
3if a+b+c=1,
0 otherwise.(13)
is compatible with it. We call this the W-type distribution.10 To settle this compatibility question, we consider
the Spiral ination of the Triangle scenario (Fig. 3). The injectable sets in this case include {A1B1C1},{A2C1},
{B2A1},{C2B1},{A2},{B2}and {C2}.
Therefore, we turn our attention to determining whether the marginals of the W-type distribution on the
images of these injectable sets are compatible with the Triangle scenario. These marginals are:
PABC =[100]+[010]+[001]
3,(14)
PAC =PBA =PCB =[10]+[01]+[00]
3,(15)
PA=PB=PC=2
3[0]+1
3[1]. (16)
By Lemma 4, this compatibility holds only if the associated marginals for the injectable sets, namely,
PA1B1C1=[100]+[010]+[001]
3,(17)
PA2C1=PB2A1=PC2B1=[10]+[01]+[00]
3,(18)
PA2=PB2=PC2=2
3[0]+1
3[1], (19)
are compatible with the Spiral ination (Fig.3). Eq. (18) implies that C1=0 whenever A2=1. It similarly implies
that A1=0 whenever B2=1, and that B1=0 whenever C2=1,
A2=1󳨐 C1=0,
B2=1󳨐 A1=0,
C2=1󳨐 B1=0.(20)
The Spiral ination is such that A2,B2and C2have no common ancestor and consequently are marginally
independent in any distribution compatible with it. Together with the fact that each value of these variables
has a nonzero probability of occurrence (by Eq.(19)), this implies that
Sometimes A2=1 and B2=1 and C2=1.(21)
10 The name stems from the fact that this distribution is reminiscent of the famous quantum state appearing in [48], called the
W state.
             
Finally, Eq. (20) together with Eq. (21) entails
Sometimes A1=0 and B1=0 and C1=0.(22)
This, however, contradicts Eq. (17). Consequently, the family of marginals described in Eqs. (17–19) is not com-
patible with the causal structure of Fig.3. By Lemma 4, this implies that the family of marginals described in
Eqs. (14–16)—and therefore the W-type distribution of which they are marginals—is not compatible with the
Triangle scenario.
To our knowledge, this is a new result. In fact, the incompatibility of the W-type distribution with the
Triangle scenario cannot be derived via any of the existing causal inference techniques. In particular:
1. Checking conditional independence relations is not relevant here, as there are no conditional indepen-
dence relations between any observed variables in the Triangle scenario.
2. The relevant Shannon-type entropic inequalities for the Triangle scenario have been classied, and they
do not witness the incompatibility [25, 33, 35].
3. Moreover, no entropic inequality can witness the W-type distribution as unrealizable. Weilenmann and
Colbeck [26] have constructed an inner approximation to the entropic cone of the Triangle causal struc-
ture, and the entropies of the W-distribution form a point in this cone. In other words, a distribution with
the same entropic prole as the W-type distribution can arise from the Triangle scenario.
4. The newly-developed method of covariance matrix causal inference due to Kela et al. [27], which gives
tighter constraints than entropic inequalities for the Triangle scenario, also cannot detect the incompat-
ibility.
Therefore, in this case at least, the ination technique appears to be more powerful.
We have arrived at our incompatibility verdict by combining ination with reasoning reminiscent of
Hardy’s version of Bell’s theorem [49, 50]. Sec. 4.4 will present a generalization of this kind of argument and
its applications to causal inference.
       . Bell’s theorem [17, 18, 20, 51] con-
cerns the question of whether the distribution obtained in an experiment involving a pair of systems that are
measured at space-like separation is compatible with a causal structure of the form of Fig. 7. Here, the ob-
served variables are {A,B,X,Y}, and Λ is a latent variable acting as a common cause of Aand B. We shall term
this causal structure the Bell scenario. While the causal inference formulation of Bell’s theorem is not the tra-
ditional one, several recent articles have introduced and advocated this perspective [19 (Fig. 19), 22 (Fig.E#2),
23 (Fig. 1), 33 (Fig. 1), 52 (Fig. 2b), 53 (Fig. 2)].
          A B    
              
   X Y
We consider the distribution PABXY =PAB|XYPXPY, where PXand PYare arbitrary full-support distribu-
tions on {0,1},11 and
PAB|XY =
1
2([00]+[11]) if x=0,y=0
1
2([00]+[11]) if x=1,y=0
1
2([00]+[11]) if x=0,y=1
1
2([01]+[10]) if x=1,y=1
,i. e., PAB|XY (ab|xy)=1
2if ab=xy,
0 otherwise.(23)
11 In the literature on the Bell scenario, the variables Xand Yare termed “settings”. Generally, we may think of observed root
variables as settings, coloring them light green in the gures. They are natural candidates for variables to condition on.
             
            
      
This conditional distribution was discovered by Tsirelson [54] and later independently by Popescu and
Rohrlich [55, 56]. It has become known in the eld of quantum foundations as the PR-box after the latter
authors.12
The Bell scenario implies nontrivial conditional independences13 among the observed variables, namely,
X Y,A Y|X, and B X|Y, as well as those that can be generated from these by the semi-graphoid ax-
ioms [19]. It is straightforward to check that these conditional independence relations are respected by the
PABXY resulting from Eq. (23). It is well-known that this distribution is nonetheless incompatible with the
Bell scenario, since it violates the CHSH inequality. Here we present a proof of incompatibility in the style of
Hardy’s proof of Bell’s theorem [49] in terms of the ination technique, using the ination of the Bell scenario
depicted in Fig.8.
We begin by noting that {A1B1X1Y1},{A2B1X2Y1},{A1B2X1Y2},{A2B2X2Y2},{X1},{X2},{Y1}, and {Y2}are all
injectable sets. By Lemma 4, it follows that any causal model that recovers PABXY inates to a model that
results in marginals
PA1B1X1Y1=PA2B1X2Y1=PA1B2X1Y2=PA2B2X2Y2=PABXY ,(24)
PX1=PX2=PX,PY1=PY2=PY.(25)
Using the denition of conditional probability, we infer that
PA1B1|X1Y1=PA2B1|X2Y1=PA1B2|X1Y2=PA2B2|X2Y2=PAB|XY .(26)
Because {X1},{X2},{Y1}, and {Y2}have no common ancestor in the inated causal structure, these variables
must be marginally independent in any distribution compatible with it, so that PX1X2Y1Y2=PX1PX2PY1PY2.
Given the assumption that the distributions PXand PYhave full support, it follows from Eq. (25) that
Sometimes X1=0 and X2=1 and Y1=0 and Y2=1.(27)
On the other hand, from Eq. (26) together with the denition of PR-box, Eq.(23), we conclude that
X1=0,Y1=0󳨐 A1=B1,
X1=0,Y2=1󳨐 A1=B2,
X2=1,Y1=0󳨐 A2=B1,
X2=1,Y2=1󳨐 A2= B2.(28)
Combining this with Eq. (27), we obtain
Sometimes A1=B1and A1=B2and A2=B1and A2= B2.(29)
No values of A1,A2,B1, and B2can jointly satisfy these conditions. So we have reached a contradiction,
showing that our original assumption of compatibility of PABXY with the Bell scenario must have been false.
12 The PR-box is of interest becauseit represents a manner in which experimental observations could deviate from the predictions
of quantum theory while still being consistent with relativity.
13 Recall that variables Xand Yare conditionally independent given Zif PXY|Z(xy|z)=PX|Z(x|z)PY|Z(y|z)for all zwith PZ(z)>0.
Such a conditional independence is denoted by X Y|Z.
             
The structure of this argument parallels that of standard proofs of the incompatibility of the PR-box with
the Bell scenario. Standard proofs focus on a set of variables {A0A1B0B1}where Axis the value of Awhen X=x
and Byis the value of Bwhen Y=y. Note that the distribution
Λ
PA0|ΛPA1|ΛPB0|ΛPB1|ΛPΛis a joint distribution
of these four variables for which the marginals on pairs {A0B0},{A0B1},{A1B0}and {A1B1}are those that can
arise in the Bell scenario. The existence of such a joint distribution rules out the possibility of having A1=B1,
A1=B2,A2=B1but A2= B2, and therefore shows that the PR-box distribution is incompatible with the Bell
scenario [57, 58]. In light of our use of Eq. (27), the reasoning based on the ination of Fig. 8 is really the same
argument in disguise.
Appendix G shows that the ination of the Bell scenario depicted in Fig. 8 is sucient to witness the
incompatibility of any distribution that is incompatible with the Bell scenario.
    
The ination technique can be used not only to witness the incompatibility of a given distribution with a given
causal structure, but also to derive necessary conditions that a distribution must satisfy to be compatible with
the given causal structure. These conditions can always be expressed as inequalities, and we will refer to them
as causal compatibility inequalities.14 Formally, we have:
  Let Gbe a causal structure and let 𝕊be a family of subsets of the observed variables of G,𝕊
2(G). Let I𝕊denote an inequality that operates on the corresponding family of distributions, {PV:
V𝕊}. Then I𝕊is a causal compatibility inequality for the causal structure Gwhenever it is satised by
every family of distributions {PV:V𝕊}that is compatible with G.
While violation of a causal compatibility inequality witnesses the incompatibility with the causal struc-
ture, satisfaction of the inequality does not guarantee compatibility. This is the sense in which it merely pro-
vides a necessary condition for compatibility.
The ination technique is useful for deriving causal compatibility inequalities because of the following
consequence of Lemma 4:
  Suppose that G󸀠is an ination of G. Let 𝕊󸀠(G󸀠)be a family of injectable sets and
𝕊(G)the images of members of 𝕊󸀠under the dropping of copy-indices. Let I𝕊󸀠be a
causal compatibility inequality for G󸀠operating on families {PV󸀠:V󸀠𝕊󸀠}. Dene an inequality I𝕊as follows: in
the functional form of I𝕊󸀠, replace every occurrence of a term PV󸀠by PVfor the unique VS with VV󸀠. Then
I𝕊is a causal compatibility inequality for G operating on families {PV:V𝕊}.
Proof. Suppose that the family {PV:V𝕊}is compatible with G. By Lemma 4, it follows that the family
{PV󸀠:V󸀠𝕊󸀠}where PV󸀠:=PVfor V󸀠Vis compatible with G󸀠. Since I𝕊󸀠is a causal compatibility inequality
for G󸀠, it follows that {PV󸀠:V󸀠𝕊󸀠}satises I𝕊󸀠. But by the denition of I𝕊, its evaluation on {PV:V𝕊}is
equal to I𝕊󸀠evaluated on {PV󸀠:V󸀠𝕊󸀠}. It therefore follows that {PV:V𝕊}satises I𝕊. Since {PV:V𝕊}
was an arbitrary family compatible with G, we conclude that I𝕊is a causal compatibility inequality for G.
We now present some simple examples of causal compatibility inequalities for the Triangle scenario that
one can derive from the ination technique via Corollary 6. Some terminology and notation will facilitate
their description. We refer to a pair of nodes which do not share any common ancestor as being ancestrally
independent. This is equivalent to being d-separated by the empty set [1–4]. Given that the conventional
notation for Xand Ybeing d-separated by Zin a DAG is XdY|Z, we denote Xand Ybeing ancestrally
independent within Gas XdY. Generalizing to sets, XdYindicates that no node in Xshares a common
14 Note that we can include equality constraints for causal compatibility within the framework of causal compatibility inequali-
ties alone; it suces to note that an equality constraint can always be expressed as a pair of inequalities, i. e., satisfying x=yis
equivalent to satisfying both xyand xy. The requirement that a distribution must be Markov (or Nested Markov) relative to
a DAG is usually formulated as a set of equality constraints.
             
ancestor with any node in Ywithin the causal structure G,
XdYi G(X)G(Y)=0. (30)
Ancestral independence is closed under union; that is, XdYand XdZimplies Xd(YZ). Conse-
quently, pairwise ancestral independence implies joint factorizability; i.e., i=jXidXjimplies that PiXi=
iPXi.
        correlators. As in Example 1 of the previous subsec-
tion, consider the Cut ination of the Triangle scenario (Fig. 5), where all observed variables are binary. For
technical convenience, we assume that they take values in the set {1,+1}, rather than taking values in {0,1}
as was presumed in the last subsection.
The injectable sets that we make use of are {A2C1},{B1C1},{A2}, and {B1}. From Corollary 6, any causal
compatibility inequality for the inated causal structure that operates on the marginal distributions of {A2C1},
{B1C1},{A2}, and {B1}will yield a causal compatibility inequality for the original causal structure that operates
on the marginal distributions on {AC},{BC},{A}, and {B}. We begin by noting that for any distribution on
three binary variables {A2B1C1}, that is, regardless of the causal structure in which they are embedded, the
marginals on {A2C1},{B1C1}and {A2B1}satisfy the following inequality for expectation values [59–63],
𝔼[A2C1]+𝔼[B1C1]1+𝔼[A2B1]. (31)
This is an example of a constraint on pairwise correlators that arises from the presumption that they are con-
sistent with a joint distribution. (The problem of deriving such constraints is the marginal constraint problem,
discussed in detail in Sec. 4.)
But in the Cut ination of the Triangle scenario (Fig. 5), A2and B1have no common ancestor and con-
sequently any distribution compatible with this inated causal structure must make A2and B1marginally
independent. In terms of correlators, this can be expressed as
A2dB1󳨐 A2 B1󳨐 𝔼[A2B1]=𝔼[A2]𝔼[B1]. (32)
Substituting this into Eq. (31), we have
𝔼[A2C1]+𝔼[B1C1]1+𝔼[A2]𝔼[B1]. (33)
This is an example of a simple but nontrivial causal compatibility inequality for the causal structure of Fig. 5.
Finally, by Corollary 6, we infer that
𝔼[AC]+𝔼[BC]1+𝔼[A]𝔼[B](34)
is a causal compatibility inequality for the Triangle scenario. This inequality expresses the fact that as long as
Aand Bare not completely biased, there is a tradeo between the strength of AC correlations and the strength
of BC correlations.
Given the symmetry of the Triangle scenario under permutations and sign ips of A,Band C, it is clear
that the image of inequality (34) under any such symmetry is also a valid causal compatibility inequality.
Together, these inequalities constitute a type of monogamy15 of correlations in the Triangle scenario with
binary variables: if any two observed variables with unbiased marginals are perfectly correlated, then they
are both independent of the third.
Moreover, since inequality (31) is valid even for continuous variables with values in the interval [1,+1],
it follows that the polynomial inequality (34) is valid in this case as well.
15 We are here using the term “monogamy” in the same sort of manner in which it is used in the context of entanglement the-
ory [64].
             
Note that inequality (31) serves as a robust witness certifying the incompatibility of 3-way perfect correla-
tion (described in Eq. (11)) with the Triangle scenario. Inequality (31) is robust in the sense that it demonstrates
the incompatibility of distributions close to 3-way perfect correlation.
One might be curious as to how close to perfect correlation one can get while still being compatible with
the Triangle scenario. To partially answer this question, we used Eq.(31) to rule out many distributions close
to perfect correlation and we also pursued explicit model-construction to rule in various distributions su-
ciently far from perfect correlation. Explicitly, we found that distributions of the form
PABC =α[000]+[111]
2+(1α)[else]
6,i. e., PABC(abc)=α
2if a=b=c,
1α
6otherwise,(35)
where [else]denotes any point distribution [abc]other than [000]or [111], are incompatible for the range
5
8=0.625 <α1 as a consequence of Eq. (31). On the other hand, we found a family of explicit models
allowing us to certify the compatibility of distributions for 0 α1
2.
The presence of this gap between our inner and outer constructions could reect either the inadequacy of
our limited model constructions or the inadequacy of relatively small inations of the Triangle causal struc-
ture to generate suitably sensitive inequalities. We defer closing the gap to future work.16
        entropic quantities. One way to derive constraints
that are independent of the cardinality of the observed variables is to express these in terms of the mutual
information between observed variables rather than in terms of correlators. The ination technique can also
be applied to achieve this. To see how this works in the case of the Triangle scenario, consider again the Cut
ination (Fig. 5).
One can follow the same logic as in the preceding example, but starting from a dierent constraint on
marginals. For any distribution on three variables {A2B1C1}of arbitrary cardinality (again, regardless of the
causal structure in which they are embedded), the marginals on {A2C1},{B1C1}and {A2B1}satisfy the inequal-
ity [35, Eq. (29)]
I(A2:C1)+I(C1:B1)H(C1)+I(A2:B1), (36)
where H(X)denotes the Shannon entropy of the distribution of X, and I(X:Y)denotes the mutual informa-
tion between Xand Ywith respect to the marginal joint distribution on the pair of variables Xand Y. The fact
that A2and B1have no common ancestor in the inated causal structure implies that in any distribution that
is compatible with it, A2and B1are marginally independent. This is expressed entropically as the vanishing
of their mutual information,
A2dB1󳨐 A2 B1󳨐 I(A2:B1)=0.(37)
Substituting the latter equality into Eq.(36), we have
I(A2:C1)+I(C1:B1)H(C1). (38)
This is another example of a nontrivial causal compatibility inequality for the causal structure of Fig. 5. By
Corollary 6, it follows that
I(A:C)+I(C:B)H(C)(39)
is also a causal compatibility inequality for the Triangle scenario. This inequality was originally derived
in [21]. Our rederivation in terms of ination coincides with the proof found by Henson etal. [22].
16 Using the Web ination of the Triangle as depicted in Fig.2 we were able to slightly improve the range of certiably incom-
patible α, namely we nd that PABC is incompatible with the Triangle scenario for all 33
220.598 <α. The relevant causal
compatibility inequality justifying the improved bound is 6𝔼[_,_]+𝔼[_,_]24𝔼[_]23, where 𝔼[_,_]:=𝔼[AB]+𝔼[BC ]+𝔼[AC]
3and
𝔼[_]:=𝔼[A]+𝔼[B]+𝔼[C]
3.
             
Standard algorithms already exist for deriving entropic casual compatibility inequalities given a causal
structure [25, 33, 35]. We do not expect the methodology of causal ination to oer any computation advan-
tage in the task of deriving entropic inequalities. The advantage of the ination approach is that it provides
a narrative for explaining an entropic inequality without reference to unobserved variables. As elaborated in
Sec. 5.4, this consequently has applications to quantum information theory. A further advantage is the po-
tential of the ination approach to give rise to non-Shannon type inequalities, starting from Shannon type
inequalities; see Appendix E for further discussion.
        joint distributions. Consider the Spiral ination of
the Triangle scenario (Fig. 3) with the injectable sets {A1B1C1},{A1B2},{B1C2},{A1,C2},{A2},{B2}, and {C2}.
We derive a causal compatibility inequality under the assumption that the observed variables are binary,
adopting the convention that they take values in {0,1}.
We begin by notingthat the following is a constraint that holds for any joint distribution of {A1B1C1A2B2C2},
regardless of the causal structure,
PA2B2C2(111)PA1B2C2(111)+PB1C2A2(111)+PA2C1B2(111)+PA1B1C1(000). (40)
To prove this claim, it suces to check that the inequality holds for each of the 26deterministic assignments of
outcomes to {A1B1C1A2B2C2}, from which the general case follows by convex linearity. A more intuitive proof
will be provided in Sec. 4.4.
Next, we note that certain sets of variables have no common ancestors with other sets of variables in the
inated causal structure, which implies the marginal independence of these sets. Such independences are
expressed in the language of joint distributions as factorizations,
A1B2dC2󳨐 PA1B2C2=PA1B2PC2,
B1C2dA2󳨐 PB1C2A2=PB1C2PA2,
A2C1dB2󳨐 PA2C1B2=PA2C1PB2,
A2dB2dC2󳨐 PA2B2C2=PA2PB2PC2.
(41)
Substituting these factorizations into Eq.(40), we obtain the polynomial inequality
PA2(1)PB2(1)PC2(1)PA1B2(11)PC2(1)+PB1C2(11)PA2(1)+PA2C1(11)PB2(1)+PA1B1C1(000). (42)
This, therefore, is a causal compatibility inequality for the inated causal structure. Finally, by Corollary 6,
we infer that
PA(1)PB(1)PC(1)PAB(11)PC(1)+PBC (11)PA(1)+PAC(11)PB(1)+PABC(000)(43)
is a causal compatibility inequality for the Triangle scenario.
What is distinctive about this inequality is that—through the presence of the term PABC (000)—it takes
into account genuine three-way correlations, while the inequalities we derived earlier only depend on the
two-variable marginals. This inequality is strong enough to demonstrate the incompatibility of the W-type
distribution of Eq. (13) with the Triangle scenario: for this distribution, the right-hand side of the inequality
vanishes while the left-hand side does not.
Of the known techniques for witnessing the incompatibility of a distribution with a causal structure or
deriving necessary conditions for compatibility, the most straightforward one is to consider the constraints
implied by ancestral independences among the observed variables of the causal structure. The constraints
derived in the last two sections have all made use of this basic technique, but at the level of the inated causal
structure rather than the original causal structure. The constraints that one thereby infers for the original
causal structure reect facts about it that cannot be expressed in terms of ancestral independences among
its observed variables. The ination technique exposes these facts in the ancestral independences among
observed variables of the inated causal structure.
             
In the rest of this article, we shall continue to rely only on the ancestral independences among observed
variables within the inated causal structure to derive examples of compatibility constraints on the original
causal structure. Nonetheless, it seems plausible that the ination technique can also amplify the power of
other techniques that do not merely consider ancestral independences among the observed variables. We
consider some prospects in Sec. 5.
    

This section considers the problem of how to generalize the above examples of causal inference via the in-
ation technique to a systematic procedure. We start by introducing the crucial concept of an expressible set,
which gures implicitly in our earlier examples. By reformulating Example 1, we sketch our general method
and explain why solving a marginal problem is an essential subroutine of our method. Subsequently, Sec. 4.1
explains how to systematically identify, for a given inated causal structure, all of the sets that are expressible
by virtue of ancestral independences. Sec. 4.2 describes how to solve any sort of marginal problem. This may
involve determining all the facets of the marginal polytope, which is computationally costly (Appendix A). It
is therefore useful to also consider relaxations of the marginal problem that are more tractable by deriving
valid linear inequalities which may or may not bound the marginal polytope tightly. We describe one such
approach based on possibilistic Hardy-type paradoxes and the hypergraph transversal problem in Sec. 4.4.
As far as causal compatibility inequalities are concerned, we limit ourselves to those expressed in terms
of probabilities,17 as these are generally the most powerful. However, essentially the same techniques can be
used to derive inequalities expressed in terms of entropies [35], as demonstrated in Example 5.
In the examples from the previous section, the initial inequality—a constraint upon marginals that is
independent of the causal structure—involves sets of observed variables that are not all injectable sets. How-
ever, the Markov conditions on the inated causal structures nevertheless allowed us to express the distribu-
tion on these sets in terms of the known distributions on the injectable sets. For instance, in Example 4, the
set {A2B1}is not injectable, but it can be partitioned into the singleton sets {A2}and {B1}which are ancestrally
independent, so that one has PA2B1=PA2PB1=PAPBin every inated causal model. This motivates us to de-
ne the notion of an expressible set of variables in an inated causal structure as one for which the joint
distribution can be expressed as a function of distributions over injectable sets by making repeated use of
the conditional independences implied by d-separation relations as well as marginalization. More formally,
  Consider an ination G󸀠of a causal structure G. Sucient conditions for a set of variables
V󸀠(G󸀠)to be expressible include V󸀠(G󸀠), or if V󸀠can be obtained from a
collection of injectable sets by recursively applying the following rules:
1. For X󸀠,Y󸀠,Z󸀠(G󸀠), if X󸀠dY󸀠|Z󸀠and X󸀠Z󸀠and Y󸀠Z󸀠are expressible, then X󸀠Y󸀠Z󸀠
is also expressible. This follows by constructing
PX󸀠Y󸀠Z󸀠(xyz)=
PX󸀠Z󸀠(xz)PY󸀠Z󸀠(y z)
PZ󸀠(z)if PZ󸀠(z)>0,
0 if PZ󸀠(z)=0.
2. If V󸀠(G󸀠)is expressible, then so is every subset of V󸀠. This follows by marginalization.
An expressible set is maximal if it is not a proper subset of another expressible set.
Expressible sets are important since in an inated model, the distribution of the variables making up an
expressible set can be computed explicitly from the known distributions on the injectable sets, by repeatedly
17 Or, for binary variables, equivalently in terms of correlators, as in the rst example of Sec.3.3.
             
using the conditional independences implied by d-separation and taking marginals. Appendix D.1 provides
a good example.
With the exception of Appendix D, in the remainder of this article we will limit ourselves to working with
expressible sets of a particularly simple kind and leave the investigation of more general expressible sets to
future work.
  A set of nodes V󸀠(G󸀠)is ai-expressible if it can be written as a union of
injectable sets that are ancestrally independent,
V󸀠-(G󸀠)
i {X󸀠
i(G󸀠)} s. t. V󸀠=
i
X󸀠
iand i=j:X󸀠
idX󸀠
jin G󸀠.(44)
An ai-expressible set is maximal if it is not a proper subset of another ai-expressible set.
Because ancestral independence in G󸀠implies statistical independence for any compatible distribution, it
follows that if V󸀠is an ai-expressible set with ancestrally independent and injectable components V󸀠
1,...,V󸀠
n,
then we have the factorization
PV󸀠=PV󸀠
1⋅⋅⋅PV󸀠
n(45)
for any distribution compatible with G󸀠. The situation, therefore, is this: for any constraint that one can de-
rive for the marginals on the ai-expressible sets based on the existence of a joint distribution—and hence
without reference to the causal structure—one can infer a constraint that does refer to the causal structure
by substituting within the derived constraint a factorization of the form of Eq.(45). This results in a causal
compatibility inequality on G󸀠of a very weak form that only takes into account the independences between
observed variables.
As a build-up to our exposition of a systematic application of the ination technique, we now revisit Ex-
ample 1. As before, to demonstrate the incompatibility of the distribution of Eq. (11) with the Triangle scenario,
we assume compatibility and derive a contradiction. Given the distribution of Eq. (11), Lemma 4 implies that
the marginal distributions on the injectable sets of the Cut ination of the Triangle scenario are
PA2C1=PB1C1=1
2[00]+1
2[11], (46)
and
PA2=PB1=1
2[0]+1
2[1]. (47)
From the fact that A2and B1are ancestrally independent in the Cut ination, we also infer that the distribution
on the ai-expressible set {A2B1}must be
PA2