ArticlePDF Available

Abstract

Causality is a central concept in science, in philosophy and in life. However, reviewing various approaches to it over the entire knowledge tree, from philosophy to science and to scientific and technological applications, we locate several problems, which prevent these approaches from defining sufficient conditions for the existence of causal links. We thus choose to determine necessary conditions that are operationally useful in identifying or falsifying causality claims. Our proposed approach is based on stochastics, in which events are replaced by processes. Starting from the idea of stochastic causal systems, we extend it to the more general concept of hen-or-egg causality, which includes as special cases the classic causal, and the potentially causal and anti-causal systems. Theoretical considerations allow the development of an effective algorithm, applicable to large-scale open systems, which are neither controllable nor repeatable. The derivation and details of the algorithm are described in this paper, while in a companion paper we illustrate and showcase the proposed framework with a number of case studies, some of which are controlled synthetic examples and others real-world ones arising from interesting scientific problems.
Revisiting causality using stochastics: 1. Theory
Demetris Koutsoyiannis1, Christian Onof2, Antonis Christofides1 and
Zbigniew W. Kundzewicz3
1 Department of Water Resources and Environmental Engineering, School of Civil Engineering, National
Technical University of Athens (dk@itia.ntua.gr)
2 Department of Civil and Environmental Engineering, Faculty of Engineering, Imperial College London
3 Meteorology Lab, Department of Construction and Geoengineering, Faculty of Environmental
Engineering and Mechanical Engineering, Poznan University of Life Sciences, Poznań, Poland
Abstract Causality is a central concept in science, in philosophy and in life. However,
reviewing various approaches to it over the entire knowledge tree, from philosophy to
science and to scientific and technological application, we locate several problems, which
prevent these approaches from defining sufficient conditions for the existence of causal
links. We thus choose to determine necessary conditions that are operationally useful in
identifying or falsifying causality claims. Our proposed approach is based on stochastics,
in which events are replaced by processes. Starting from the idea of stochastic causal
systems, we extend it to the more general concept of hen-or-egg causality, which includes
as special cases the classic causal, and the potentially causal and anticausal systems.
Theoretical considerations allow the development of an effective algorithm, applicable to
large-scale open systems, which are neither controllable nor repeatable. The derivation
and details of the algorithm are described in this paper, while in a companion paper we
illustrate and showcase the proposed framework with a number of case studies, some of
which are controlled synthetic examples and others real-world ones arising from
interesting scientific problems.
Keywords causality, causal systems, stochastics, impulse response function, system
identification
ὡς ἐγὼ οὐ νῦν πρῶτον ἀλλὰ καὶ ἀεὶ τοιοῦτος οἷος τῶν ἐμῶν μηδενὶ ἄλλῳ πείθεσθαι τῷ λόγῳ ὃς ἄν
μοι λογιζομένῳ βέλτιστος φαίνηται.
I am not only now but always a man who follows nothing but the reasoning which on consideration
seems to me best.
Plato, Crito, 46b-47d, quoting Socrates
Postprint (2022-06-11) with updates on the analytical solution (p. 22, equations (30)-(37))
and a correction of a typo in equation (27)
2
1 Introduction
Causality (or causation
1
) is a central concept in science, in philosophy and in life, yet its
meaning is not clear. Difficulties in disambiguating it extend over the entire knowledge
tree, from philosophy and to science to scientific and technological applications. When it
comes to science, an operational framework is well established, which is based on the
notion of controlled and repeatable experiments. However, this framework is not
applicable to large-scale open systems, which are neither controllable nor repeatable. The
difficulties in identifying causality in such systems are amplified. Even though several
algorithms have been proposed to identify causality in such systems, based on
probabilistic considerations and statistical processing of data, they are mostly
problematic.
Apparently, as giants in philosophy and science have not yet resolved these
problems, one should not expect our humble set of two companion papers to do that. On
the other hand, existing knowledge gaps offer us grounds for trying to make some
headway in attempting to locate and elucidate those gaps (Section 2 of this paper) and
propose a different identification framework applicable to open systems (Section 3 of this
paper). We illustrate and showcase the proposed framework by means of a number of
case studies, some of which are controlled synthetic examples and others real-world ones
characterizing interesting scientific problems. To avoid a long paper, we present the case
studies in the companion paper (Koutsoyiannis et al. 2022a).
2 Theoretical background
2.1 Philosophical background
There is a mystery in the concept of cause. While at first glance it seems clear what we
mean with the word, whenever we consider it more closely, we find ourselves unable even
to define it. Aristotle (384322 BC) seems to have noticed this, for he wrote, among other
things:
that which when present is the cause of something, when absent we sometimes
consider to be the cause of the contrary; for example, we consider the absence of the
captain to be the cause of the ship's capsizing, whereas his presence was the cause of
the ship's rescue” ( γὰρ παρὸν αἴτιον τοῦδε, τοῦτο καὶ ἀπὸν αἰτιώμεθα ἐνίοτε τοῦ
1
The terms “causation” and “causality” most of the times, also in this text, are typically used interchangeably
as synonymous, meaning the existence of a cause-effect relationship. We note though that the two terms
sometimes have been used with slightly differing meanings. The former sometimes (e.g. Sion 2010) denotes
a deterministic cause-effect relationship (deterministic causality). The latter may also mean the principle
that everything has a cause (e.g. Bunge 1979, p.3).
3
ἐναντίου, οἷον τὴν ἀπουσίαν τοῦ κυβερνήτου τῆς τοῦ πλοίου ἀνατροπῆς, οὗ ἦν
παρουσία αἰτία τῆς σωτηρίας [Bekker number 195a12-14]).
In some languages, the original meaning of the word appears to be “the one who is
to blame”. According to various dictionaries, the Greek word “αἴτιος”, which comes from
the verb “αἴνυμαι”, “to grab”, must originally have meant “he who has a part”. Likewise,
the Latin word causa is also the origin of "accuse". The German "Ursache", on the other
hand, literally means “the original thing”. This suggests two important aspects of what we
understand as cause: (i) insofar as identifying a cause is identifying what is responsible
for something, it provides an explanation; (ii) insofar as a cause is the origin/ground
leading to some occurrence, it is connected with the idea of some process (physical or
psychological). However, this common understanding of the notion of cause hides certain
philosophical problems.
The Scottish philosopher David Hume (1711-1776) was the first to raise doubts
about the existence of causes. His work has been so influential that almost all modern
philosophical and scientific studies referring to causation start the thread from him. He
pointed out that causal connections are not visible or otherwise directly perceivable by
our senses. When a flame causes heat, we perceive the flame, we perceive the heat, but
the causal connection between the flame and the heat we do not perceive; we deduce it.
From the time of birth, we observe certain regularities in the world. For Hume, when we
say that the flame is the cause and heat is the effect, we only express an observed
regularitya physical law. Here is a related passage from Hume (1748, Section VI, part
Iwith modernized spelling and punctuation):
Suppose a person, though endowed with the strongest faculties of reason and
reflection, to be brought on a sudden into this world; he would indeed immediately
observe a continual succession of objects, and one event following another; but he
would not be able to discover anything further […] Suppose, again, that he has
acquired more experience, and has lived so long in the world as to have observed
familiar objects or events to be constantly conjoined together. What is the consequence
of this experience? He immediately infers the existence of one object from the
appearance of the other. Yet he has not, by all his experience, acquired any idea or
knowledge of the secret power by which the one object produces the other; nor is it by
any process of reasoning he is engaged to draw this inference.
Humes conclusion is that the concept of a cause is merely a way we use to describe
regularities.
Hume’s attack on the notion of cause served as a wake-up call for the German
philosopher Immannuel Kant (17241804), who agreed with Hume’s diagnosis but not
his conclusion. Rather, he argued that Hume did not go far enough (Kant 1787/1998,
4
B788-9; pp. 653-4; see Gardner 1999, p.11). This was one of the triggers for Kant’s so-
called critical turnwhich introduced a revolutionary approach to how we understand
what it is to be an object. While Kant did not doubt that knowledge of objects starts with
the information we receive through our senses, the question of how this sensory input
leads to the experience/knowledge/representation of an object is one that philosophy
had never really provided a satisfactory answer to. Rather than assume, as his
predecessors had done, that our knowledge of objects depends in some way (that has
never been explainede.g. how does raw sensory information lead to the representation
of an object?) upon objects that are already given to us independently of our cognition, he
proposed that our cognition is partly responsible for constituting such objects. So,
features like space, time and causality are contributed by the subject as ways of
structuring the raw sensory input. The concept of cause and effect in particular, and Kant’s
claim that “All alterations occur in accordance with the law of the connection of cause and
effect” (Kant 1787/1998, B232, p. 304) play a fundamental role in the temporal structure
of objective experience. For our purposes, what is important in Kant’s understanding of
causality is that (a) it is understood in terms of rule-governedness (i.e. that which is
regular has a rule governing its behaviour), and (b) the temporal/causal order is
irreversible (ibid. B237, pp. 306-7). These two characteristics will be used below.
The general question of how to define and identify causes remains however and this
is the locus of the contemporary debate around the concept of cause. Many philosophers
and mathematicians have attempted to answer it by developing theories of causation.
These are based on interventions (A causes B if by deliberately creating A, B follows),
counterfactuals (A causes B if B would not have occurred had A been absent), necessary
and sufficient conditions (A causes B if A is necessary and sufficient for B to occur), or
probability (A causes B if the presence of A increases the probability of B; see section 2.2).
Combinations of these approaches have also been proposed. However, no completely
satisfactory characterization has been formulated. Naturally, this philosophical perplexity
is also reflected in the mathematical representation of causality, which defines the scope
of our study. In particular, this paper and its companion are a contribution to the
probabilistic approach to causality.
2.2 Probabilistic theories of causality
While the above considerations show that it is difficult to define what causality is, we have
seen grounds for presupposing it is intimately connected with temporal asymmetry and
irreversibility. Thus, Mehlberg (1983) explained that no causal process (i.e., such that of
two consecutive phases, one is always the cause of the other) can be reversible and also
presented the causal theory of time, according to which “two events are simultaneous by
5
definition if there can be no causal action between them. The issue of time directionality
of causality was also discussed by Kline (1980).
Further, insofar as a causal link is governed by some rule, it is natural to turn to a
mathematical representation of causation. In a deterministic framework, the definition of
what constitutes a causal link can be formalised into a “logic of causation (e.g. Sion, 2010).
Application of the deterministic framework in describing a natural system is rather easy
if the system is simple and the mechanisms acting on the system are well understood (e.g.
gravity is a cause behind Earth’s rotation around the Sun, as well as for chains of
sequential events, such as the popular example of a falling row of dominoes, depicted on
the cover page of the cited book by Sion). The problem of detecting and establishing
causality becomes challenging when the mechanisms are complex and not well
understood, and when inference by deduction together with empirical causal laws is not
possible. In this case we have to resort to induction based on observations, use
probabilistic logic and model the system by stochastics.
Among the first who connected causality with probability and statistics were Hopf
(1934), and Birkhoff and Lewis Jr. (1935). The latter authors used the term causal
system”, for which they noted:
In the practical calculation with actual causal systems, only a limited degree of
accuracy is sought, since the laws of the system are at most an idealization of the actual
laws, and the isolation of the system from other systems, which is always postulated,
can never be more than imperfectly realized.
Later, Wold (1954, 1960), as well as Strotz and Wold (1960) also studied causality in the
framework of econometrics and made again, within this framework, a connection with
probability and statistics.
Probabilistic definitions of causality are based on time asymmetry on the one hand
and conditional probability on the other hand. Thus, Suppes (1970) defined it as follows
An event Bt′ [occurring at time t] is a prima facie cause of the event At [occurring at
time t] if and only if
󰇛󰇜󰆒
󰇛󰇜󰇛󰆓󰇜,
(iii) 󰇛󰆓󰇜󰇛󰇜.
The notion of a “prima facie causeis discussed below. In plain language, the cause must
precede the effect and the conditional probability of the effect under the condition of the
cause must exceed the unconditional probability. This definition does not exclude the
possibility of more than one cause.
6
Suppes’s third criterion conveys the idea that the presence of the cause raises the
probability of occurrence of the effect. This idea is arguably better expressed as an
inequality between conditional probabilities (Skyrms 1980):
(iii΄󰇜󰇛󰆓󰇜󰆓
where 󰆓 is the absence (non-occurrence) of event 󰆓. However, using the obvious
relationship 󰇛󰇜󰇛󰆓󰇜󰇛󰆓󰇜, it can easily be shown that the two versions
are equivalent. Cox (1992) points out that such a condition still allows for spurious
causality. The latter could only be eliminated by adding a condition such as:
(iv) there is no event 󰆓󰆓 at time 󰆒󰆒 󰆒 such that 󰇛󰆓󰆓󰆓󰇜
󰆓󰆓󰆓.
A version of this condition was also defined by Salmon (1998) within his statistical-
relevance theory of explanation, as the key to distinguishing between statistical and
causal relevance which he defines as:
(iv΄) there is no event 󰆓󰆓 at time 󰆒󰆒 󰆒 which screens off 󰆓 from such
that 󰇛󰆓󰆓󰆓󰇜󰇛󰆓󰆓󰇜.
Salmon’s example of statistical relevance which is not causal and therefore defines a
spurious correlation is if 󰆓󰆓󰆓 refer respectively to the occurrence of a storm, a
barometer drop and an air pressure drop. However, conditions such as (iv) or (iv΄) are
pretty much impossible to verify satisfactorily in practice. This places limits upon the
practical use of these characterisations of causation.
Another popular definition, given by Granger (1980), is the following: “ is said to
cause , if 󰇛󰇜󰇛󰇜 for some A.” In this, Granger
assumes discrete time which he denotes as , while he denotes the knowledge in the
universe available at that time” and the information composed of “the values taken by a
variable up to time , where ” (with the last notation best rendered as ).
Further, he provided three axioms, the first of which is equivalent to (i) above and the
third highlights the constancy in causality direction throughout time.
In his earlier publication, which has been much more influential
2
, Granger (1969)
gave a different version of his definition in an attempt to be statistically testable. With his
notation of the later definition stated above, the condition upon which the earlier
definition is based reads 󰇟󰇠󰇟󰇠. That is, the probability of an
event here becomes variance and the inequality sign “≠” here becomes “<”. Granger
(1969) clarified his mathematical expression thus: “ is causing if we are better able to
predict using all available information than if the information apart from had been
used.” Furthermore, Granger (1969) defined the feedback in this way (after replacing the
2
27 000 Google Scholar citations vs. 2300 of Granger (1980).
7
notation with that of Granger (1980)): If 󰇟󰇠󰇟󰇠,
󰇟󰇠󰇟󰇠we say that feedback is occurring […], i.e., feedback is
said to occur when is causing and also is causing ”.
Granger (1969) also proposed what has later been known as the “Granger causality
test”. This is based on the improvement in the prediction of a process by considering
the influence of a “causing” process . Notice that, at this point on, we do not follow
Granger’s original notational conventions; rather we make it clear that  are
stochastic processes and we underline stochastic (random) variables and stochastic
processes to distinguish them from common variables (representing single real numbers)
and deterministic functions, respectively. The prediction equation is the Granger
regression model:

 

(1)
where and are the regression coefficients and is an error term. Notice that the
equation (1) does not include the term that is synchronous with and thus it excludes
what Granger calls “instantaneous causality”. We note though that in reality this does not
indicate “instantaneous causality” but treatment of discrete time as if it were continuous
(we discuss this point in sections 3.3 4.) The test is based on the null hypothesis (󰇜
that the process is not actually causing , formally expressed as:

(2)
Algorithmic details of the test are given in Gujarati and Porter (2009), among others. The
test is quite popular and several software platforms include free applications to execute
it
3
. The rejection of the null hypothesis is commonly interpreted in the literature with the
statement Granger-causes .
It is clear that Granger’s statement is causing if we are better able to predict
…” in reality identifies improvement of predictability with causality, or in other words,
statistical association with causation. If this statement is taken together with his
regression equation (1), in which the involved parameters are calculated through
correlation coefficients, it eventually identifies correlation with causation. But the
mantras “association is not causation and correlation is not causationexpress a widely
held opinion which we believe is correct.
4
Granger (1980) was clearly aware of this:
3
For example, the function GRANGER_TEST is available for Excel by C. Zaiontz (Real Statistics Using Excel,
http://www.realstatistics.com/; Real Statistics Examples Workbooks. http://www.real-
statistics.com/free-download/real-statistics-examples-workbook/; accessed on 1 September 2020).
4
Google counts 348 000 appearances of the former and 533 000 of the latter.
8
when discussing the interpretation of a correlation coefficient or a regression, most
textbooks warn that an observed relationship does not allow one to say anything about
causation between the variables.
Replacing correlation with probability, as did Granger (1980) and before him
Suppes (1970), does not change the essence in the problem. Perhaps this is the reason
why Suppes used the term “prima facie cause” in his definition given above (the adjective
prima facie, originating from Latin, means based on the first impression; accepted as
correct until proved otherwise). Suppes attributed the expression to J. Hintikka but he did
not explicitly explain it. Furthermore, he discussed spurious causes and eventually defined
the genuine cause as a “prima facie cause that is not spurious”; he also discussed the very
existence of genuine causes. The term prima facie cause” was also used by Granger. In
particular, Granger and Newbold (1986) noted that a cause satisfying a causality test still
remains prima facie because it is always possible that, if a different information set were
used, then [it] would fail the new test. This is in line with the inductive, rather than
deductive, character of statistical tests, insofar as the conclusion is never the confirmation
of a hypothesis but only its non-rejection.
Despite these caveats, the term “Granger causality” is very popular, particularly in
the expression “Granger causality test” (e.g., Gujarati and Porter, 2009). This terminology
has misled many to understand the test as identifying causality and resolving the
“correlation is not causation” problem. In fact, all it detects is correlation, not genuine
causality.
Cohen (2014) clearly saw the problem when he suggested replacing the term
“Granger causality” with “Granger prediction” after correctly pointing out that:
Results from Granger causality analyses neither establish nor require causality.
Granger causality results do not reveal causal interactions, although they can provide
evidence in support of a hypothesis about causal interactions.
The ambition to identify genuine causes with statistical tools and thereby overcome
the concern that “correlation is not causation” has motivated others to find statistics other
than the correlation coefficients to characterize causality. For example, Liang (2016) used
the so-called information flow (or information transfer) between two processes, while in
later works this method has been called “Liang causality (Stips et al., 2016). He asserted
that causation implies correlation, but not vice versa” (Liang, 2016) and causality actually
can be rigorously derived in terms of information flow from first principles (Liang, 2018), .
On the other hand, Koutsoyiannis and Kundzewicz (2020) asserted that:
[The] vanity [of this approach] to determine genuine causality is easy to infer: It
suffices to consider the case where the two processes for which causality is studied are
jointly Gaussian. It is well known that in any multivariate Gaussian process, the
9
covariance matrix (or the correlation matrix along with the variances) fully
determines all properties of the multivariate distribution of any order. For example,
the mutual information in a bivariate Gaussian process is (Papoulis, 1991)
󰇟󰇠󰇡󰇛󰇜󰇢
(3)
where and r denote standard deviation, and correlation, respectively. Thus, using
any quantity related to entropy (or information) is virtually identical to using
correlation. Furthermore, in Gaussian processes, whatever statistic is used in
describing causality is readily reduced to correlation. This is evident even in Liang
(2016), where, e.g., in his Equation (102), the information flow turns out to be the
correlation coefficient multiplied by a constant.
In a similar vein, Verbitsky et al. (2019) used a technique of distances of multivariate
vectors to reconstruct the system dynamics. To do so, they assumed that each time series
is a variable produced by its hypothetical low dimensional system of dynamical equations”.
But if indeed the system dynamics were of low dimensionality, it would be preferable to
model the system by deduction, rather than induction based upon doubtful statistical
techniques. As pointed out by Koutsoyiannis and Kundzewicz (2020) (also referring to
Koutsoyiannis, 2006),
such assumptions and techniques are good for simple toy models but, when real world
systems are examined, low dimensionality appears as a statistical artifact because the
reconstruction actually needs an incredibly high number of observations to work,
which are hardly available. The fact that the sums of multivariate vectors of distances
is a statistical estimator with huge uncertainty is often missed in studies of this type,
which treat data as deterministic quantities, thereby obtaining unreliable results. We
do not believe that the Earth system and Earth processes […] are of low dimensionality.
A more satisfactory framework was proposed by Hannart et al. (2016), based on the
works by Pearl (2009) and Pearl et al. (2016). In it they used the so-called causal graph
reflecting the assumed dependencies among the studied variables along with the notion
of exogeneity (perhaps borrowed from Wold (1960), and Strotz and Wold (1960)). To
define the latter, they stated that a sufficient condition for X to be exogenous wrt any
variable is to be a top node of a causal graph. But importantly, this assumes that we
already have a causal graph, i.e., a way of identifying causes.
Further, central to the framework of Hannart et al. (2016) is the notion of
intervention of an experimenter (perhaps again borrowed from Strotz and Wold (1960)).
But clearly, while experimentation is feasible in laboratory experiments, it is infeasible in
natural (e.g. geophysical) processes. To bypass this fundamental obstacle, Hannart et al.
resorted to the “so-called in silico experimentation. While this is indeed an impressive
name, it simply means experimentation with a mathematical model that represents the
10
process. It is trivial to note that models, however sophisticated, are not identical to the
real world. Hence, objectively the technique examines a hypothetical “causality” that is
incorporated in the model rather than natural causality. Arguably, calculating
probabilities by model simulations is inferior to inspecting the model’s equations or code.
The latter method would be more appropriate to reveal what causality is incorporated
into the model through its construction.
Hannart et al. (2016), studied the probability of occurrence of an event Y, conditional
upon the two-valued (binary) variable , which indicates whether or not a forcing f is
present, for which they stated:
The probability 󰇛󰇜 of the event occurring in the real world, with f
present, is referred to as factual, while 󰇛󰇜 is referred to as
counterfactual. […] The so-called fraction of attributable risk (FAR) is then defined as

(4)
The FAR is interpreted as the fraction of the likelihood of an event that is attributable
to the external forcing.
They showed that, under some conditions, FAR is a probability, which they denoted PN
and called probability of necessary causality. They stressed that it is important to
distinguish between necessary and sufficient causalityand they associated PN, with the
first facet of causality, that of necessity”. They claimed to have “introduced its second facet,
that of sufficiency, which is associated with the symmetric quantity󰇛󰇜󰇛󰇜”;
they denoted it as PS and called it probability of sufficient causality.
However, the framework has several drawbacks and can fail, as illustrated by the
following counter-example by Koutsoyiannis and Kundzewicz (2020): When the
atmospheric temperature is high people wear light clothes and also sweat much more
than when it is cold. Thus, the weight of clothes improves the prediction of the sweat
quantity. Koutsoyiannis and Kundzewicz (2020) used the two-valued stochastic variables
 to model the states of temperature, clothes weight and sweat, respectively, and
assumed a hypothetical “artificial intelligence entity” (AIE) which decides on causality
based upon the probability rules of Hannart et al. (2016). After assigning plausible values
to the conditional probabilities of high sweat for the four conditions of cold/hot and
heavy/light clothes, and following detailed numerical calculations of PN and PS, they
obtained the absurd result that the AIE will decide that there is all necessary and sufficient
evidence that light clothes cause high sweat. Hannart et al. (2016) might protest that the
absurd result occurs because of improper assignment of the exogenous variable. But how
could the AIE know that? How do we know the chain of causation a priori in order to
create the causal graph?
11
The above critical summary of some recent and earlier studies on causality
strengthens what was stated by Koutsoyiannis and Kundzewicz (2020), i.e., that
identifying genuine causality is not a problem of choosing the best algorithm to establish a
statistical relationship (including its directionality) between two variables” and, ultimately,
that the big philosophical problem of causality cannot be resolved by technical tricks.
Therefore, here we focus on simpler problems, such as falsifying an assumed
genuine causality and adding statistical evidence, in an inductive context, for potential
causality and its direction.
3 Proposed framework
3.1 From seeking a definition to defining necessary conditions
Coming back to the probabilistic definitions of causality summarized in section 2.2, we
may remark that they do not have the clarity and unambiguousness required in science.
The only clear element, at least in a classical physical framework, is the time precedence
of the cause from the effect. The conditional probability element of the definition or the
related axioms do not help clarify real causality, if we assume that such a thing really
exists. If the inequality 󰇛󰆓󰇜󰇛󰇜 entails (prima facie) causality then the opposite
one, 󰇛󰆓󰇜󰇛󰇜 also does, because it can be written as 󰆓󰇛󰇜. Indeed,
using standard probability calculus and noting that 󰇛󰆓󰇜󰇛󰇜 implies 󰇛󰆓󰇜
󰇛󰇜󰇛󰆓󰇜 we find:
󰆓󰆓
󰆓󰇛󰇜󰇛󰆓󰇜
󰇛󰆓󰇜󰇛󰇜󰇛󰇜󰇛󰆓󰇜
󰇛󰆓󰇜󰇛󰇜
Therefore, the only case where (prima facie) causality is excluded is stochastic
independence, in which󰇛󰆓󰇜󰇛󰇜󰇛󰆓󰇜󰇛󰇜󰇛󰆓󰇜. It is
thus understandable why Granger (1980) generalized the inequality order “>” in Suppes’s
(1970) definition, replacing it with “”.
A similar argument can be applied by reversing the time inequality 󰆒 to 󰆒,
and stating that in the latter case, provided that the events and 󰆓 are not independent,
(prima facie) causes 󰆓.
Thus, in effect the existing definitions assert that any two events that are neither
synchronous nor independent establish a causal relationship, with the direction of
causality determined by the time order. This is too general to have any usefulness. Also, it
is rather unnecessary, as it does not add anything important to the well-defined notion of
(in)dependence.
There are additional problems with the usefulness of the above definitions, related
to the estimation of probabilities from real-world data. One may assume that the notions
of experimentation and its repeatability tacitly lie behind these definitions. And indeed,
12
these are possible in experimental physics in which laboratory experiments are used. A
laboratory case represents a closed system where one can exclude influences from all
kinds of external factors, which may even be very distant (cf. quantum entanglement, for
instance). This, however, cannot be the case in open systems. Thus, in geophysics (a
particular case of an open system) there is no repeatability because of the influence of
these ever-changing external factors and the impossibility of controlling such large
systems. The temporal evolution of a geophysical system is unique and unrepeatable, so
that we cannot have observed samples. We can only have time series that cannot be
regarded as a sample because consecutive measurements are never independent. Details
about the differences between random samples and time series can be found in
Koutsoyiannis (2021).
For these reasons, here we abandon the use of the notion of events and we
reformulate the notion of causation on the basis of stochastic processes, which are
families of (infinitely many) stochastic variables indexed by time. A series of observations
from a natural process is termed a time series and is regarded as a single (and unique in
geophysical processes) realization of a stochastic process.
Furthermore, given the philosophical problems that, as we have seen, characterise
attempts to give a a definition of causality, we limit our scope of our investigation to
providing necessary (and not sufficient) conditions of causality. We stress that necessary
conditions are particularly useful in falsifying hypotheses of causality, rather than
confirming it. An obvious necessary condition which we retain from all existing
definitions of causality is the time precedence of the cause with respect to the effect. Other
conditions are studied below. As the necessary conditions can hardly confirm causality,
we use the term potential causality (cf. the Aristotelian notion of δύναμις—Latin: potentia;
English: potency or potentiality).
3.2 Basic concepts and definitions
Let 󰇛󰇜 and 󰇛󰇜 denote two stochastic processes in continuous time t. We recall from
stochastics (e.g. Papoulis, 1991, pp. 405, 508) that the two processes form a causal system,
with 󰇛󰇜 being the cause and 󰇛󰇜 the effect, if they are related by:
󰇛󰇜󰇛󰇜󰇛󰇜
Here the deterministic function 󰇛󰇜 is termed impulse response function (IRF), with
being a time lag. In the case of a causal system (sometimes also called nonanticipative
system), 󰇛󰇜 for any . Noticeably, Papoulis did not provide a definition of
causality per se, but used the concept of a causal system, defined through equation (6).
The property characterizing a causal system is precisely defined by the zero values of IRF
13
for negative lags (Papoulis, 1991, p. 405). Notice that all values of 󰇛󰇜 that contribute to
󰇛󰇜 through the integral of the right-hand side of equation (6) correspond to a time
period earlier than t, i.e. to the past. Also notice that, in theory, the entire past matters and
hence the infinity in the upper limit of the integral. In practice the function 󰇛󰇜, if
determined from observations, has to be assumed zero beyond a certain value, i.e. the
upper limit of the integral becomes finite. This, for example, has been the case in the
application of the idea in hydrology, namely in the notion of the unit hydrograph (an
implementation of the IRF in precipitation-runoff), even though its pioneers (Nash, 1959,
Dooge, 1959) also used the full (infinite) range. Finally, notice the linearity of the
relationship, which is discussed further below.
The theory of causal systems has been based upon a pioneering work by
Kolmogorov (1941) followed by works by Wold (1948) and Wiener (1949). Notably, Wold
(1938, 1948), influenced by Kolmogorov (see his interview by Hendry and Morgan, 1994),
introduced the celebrated Wold decomposition, proving that any stochastic process can
be decomposed into a regular process (i.e., a process linearly equivalent to a white noise
process) and a predictable process (i.e., a process that can be expressed deterministically
in terms of its past values). In none of these works did these pioneers use the term “causal
system”, nor did they explicitly speak about causality. However, each of them studied a
form of the linear filter that was later to be called causal. The objective of these works was
to enable stochastic prediction based on the past, a prediction which Kolmogorov and
Wiener called “extrapolation”. A little later, Bode and Shannon
5
(1950), drawing upon
Kolmogorov’s and Wiener’s works, made the connection with causality, stating:
How is it possible to predict at all the future behavior of a function when all that is
known is a perturbed version of its past history? This question is closely associated with
the problems of causality and induction in philosophy and with the significance of
physical laws.
The connection with causality is also mentioned by Robbins (1959). In the 1960s,
the term causal system, earlier used with a different meaning by Birkhoff and Lewis Jr.
(1935) as mentioned above, was connected with Kolmogorov’s and Wiener’s
“extrapolation” filter (essentially our equation (6)), particularly in the literature of
communication engineering (Drenick, 1963; Post, 1963; Sharnoff, 1964; Masani, 1966;
Keats, 1967; Parzen, 1968; Clifton, 1968). But it was perhaps the book by Papoulis (1991,
first edition 1965), that disseminated the concept of a “causal system”.
The relationship of equation (6) is an ideal that we can hardly meet, in a precise
fashion, in a natural process. In fact, it can only be valid in a mathematical process that is
5
It is relevant to note than two years earlier, Shannon (1948) had introduced the modern definition of
entropy, while Wiener (1948) in his famous book Cybernetics, had used essentially the same definition
(albeit with a negative sign) for information.
14
defined by equation (6). Therefore, if we keep equation (6) as a definition of a causal
process, we will exclude causality in natural processes. Instead, here we call the system
defined by equation (6) a classic causal system and we will relax the requirements for
calling a system causal. What is meant by ‘classicis that the effect is (1) fully explained (2)
by one well-identified cause. This condition is implicitly assumed in the absence of other
additive terms in equation (6), either random 󰇛󰇜 (i.e., 󰇛󰇜󰇛󰇜󰇛󰇜
󰇛󰇜)
or causal from a second cause 󰇛󰇜 (i.e., 󰇛󰇜󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜
).
We recall that, given any two stationary stochastic processes 󰇛󰇜 and 󰇛󰇜, we can
write an equation relating them of the form (cf. Papoulis, 1991, equation (14.12)):
󰇛󰇜 󰇛󰇜󰇛󰇜
 󰇛󰇜
where, compared to equation (6), in the lower limit of the integral we have replaced 0
with  and also added a third stochastic process, 󰇛󰇜, assumed to be uncorrelated with
󰇛󰇜. The function 󰇛󰇜 is no longer unique, but infinitely many such functions exist. The
most interesting among them is the one that corresponds to the minimum variance of
󰇛󰇜, typically called the least-squares solution. Since this is a general property of any two
processes, we can also write it in the reverse direction, i.e.,
󰇛󰇜 󰇛󰇜󰇛󰇜
 󰇛󰇜
(8)
where again the most interesting of the infinitely many solutions is the one yielding the
minimum variance of 󰇛󰇜. Note that, 󰇛󰇜 in equation (8) is different from 󰇛󰇜 in (7)
there is no symmetry. Likewise, the process 󰇛󰇜, which is now uncorrelated to the
process 󰇛󰇜, is different from 󰇛󰇜. Naturally, the selection of the optimally applicable
equation between equations (7) and (8) depends upon which of them gives the minimum
variance in relative terms, i.e. as a proportion of the variance of 󰇛󰇜 or 󰇛󰇜, respectively.
In what follows we will exclusively use equation (7) which denotes a causality direction
. When we examine the reverse direction, , instead of explicitly using equation
(8), we interchange processes (󰇛󰇜󰇛󰇜) and again use equation (7). Equivalently, the
final choice of the direction or depends upon which maximizes the explained
variance ratio, defined as

󰇣󰇤
We will discuss some additional desiderata for the two IRFs below, which also define
additional criteria for the selection of the best solution.
15
Further explanations on the motivation for the use of equation (7) as necessary
condition for causation are provided in Supplementary Information (Section SI1.2),
including a justification for its linear form. The linearity of the equation is kept from the
original definition of a causal system by Papoulis (1991) (equation (6)). Certainly,
linearity could be regarded by many as a limitation of our approach and possible future
nonlinear extensions thereof could be considered. However, it is our opinion that linearity
may suffice for most problems, for the following reasons:
We use a stochastic approach, in which the meaning of linearity vs. nonlinearity is
dramatically different from that in deterministic approaches, something not often
recognized in literature. In stochastics, linearity is rather a powerful characteristic
enabling the study of demanding problems, rather than a limitation. For example,
stochastic dynamics need not be nonlinear to produce realistic trajectories and
change. Conversely, in a deterministic system with linear dynamics, any
perturbation of initial conditions dies off, as does the potential for changeand
hence the importance of nonlinearity in deterministic approaches (Koutsoyiannis
2014a, 2021; Koutsoyiannis and Dimitriadis, 2021),
In stochastics, linearity is not an (over)simplification of the dynamics but has some
sound justification, as indicated by the already mentioned Wold decomposition, in
which the stochastic component (the regular process) is linearly equivalent to a
white noise process (i.e. a linear combination of white noise terms; Wold, 1938,
1948; Papoulis 1991).
In addition, linearity in a stochastic description results from maximum entropy
considerations (under plausible conditions; e.g. Papoulis, 1991) and hence it is
related to the most powerful mathematical and physical principle of maximum
entropy (Jaynes, 1991; Koutsoyiannis et al. 2008; Koutsoyiannis 2014b).
In a stochastic approach, a deviation from linearity can be conveniently
incorporated through an error term, which is already included in our proposed
equation (7), in order to generalize Papoulis’ (1991) original equation (6).
The fact that linearity is not regarded as a severe limitation in causality assessment
is indirectly reflected in the popularity of Granger’s (1969) approach, which is also
linear (equation (1)).
In the companion paper (Koutsoyiannis et al., 2022 and its Supplementary
Information), we show that the linear form of the framework effectively captures
the important characteristics of causality, even in cases that the true dynamics is a
priori known to be nonlinear.
We further note that our proposed bivariate approach to causality could allow for the
possibility of “spurious” causality, where changes in both variables are affected by
another cause, possibly with different time delays and response functions. This is not a
16
drawback insofar as our framework of detecting necessary, rather than sufficient,
conditions. But further, the inclusion of an error term in it allows for such more remote
causes to be represented in the framework. Additional clarifications on multiple causes
are provided in Supplementary Information (Section SI1.2),
Following the above considerations, and assuming that a least-squares solution of
equation (7) has been determined for the system (󰇛󰇜󰇛󰇜), we will call that system:
1. potentially causal if 󰇛󰇜 for any , while the explained variance is non
negligible;
2. potentially anticausal if 󰇛󰇜 for any , while the explained variance is non
negligible (this means that the system (󰇛󰇜󰇛󰇜) is potentially causal);
3. potentially hen-or-egg (HOE) causal if 󰇛󰇜 for some and some ,
while the explained variance is non negligible;
4. noncausal if the explained variance is negligible.
These cases are graphically illustrated in Figure 1. We note that the term “negligible” can
be quantified in statistical terms, e.g. by invoking statistical significance. However, here
we will treat this in a practical manner leaving the related theoretical reflections for future
research. In the hypothetical case that the explained variance reaches its upper limit, i.e.,
1 (100%), it may be justified to replace the term “potentially” with classic”, in accordance
with the definition given above for a classic causal system. Further, we note that in some
texts the term noncausal is used for systems which here we call potentially HOE causal.
In other texts, the potentially HOE causal systems are treated as causal systems with
feedback.
In this respect, in a HOE causal system, earlier realizations of 󰇛󰇜 affect the current
realization of 󰇛󰇜, but also earlier realizations of 󰇛󰇜 affect the current realization of
󰇛󰇜. Thus, each one of the processes 󰇛󰇜 and 󰇛󰇜 is correlated to both the past and the
future of the other one. This may seem paradoxical in terms of a conventional way of
thinking about causality, but it is not more paradoxical than the expression “hen-or-egg”,
first used by Plutarch (Moralia, Quaestiones convivales, B, Question III). Clearly, Plutarch
(and subsequent users of this expression) did not mean one particular hen and one
particular egg; in this case the existence or not of a causal relationship would be easy to
tell. Rather, he meant the sequences of all hens and all eggs, something similar with what
the abstract term “process” used here represents. For further explanation of the term
hen-or-egg see Koutsoyiannis and Kundzewicz (2020).
17
Figure 1 Explanatory sketch for the definition of the different potential causality types. For each graph the
mean is also plotted with dashed line.
It is often stated that in causal systems the present of the process 󰇛󰇜 does not
stochastically depend on the future of 󰇛󰇜 (i.e., 󰇛󰇜, ). This may be intuitive, but
it is also clearly wrong: the intuition involves a confusion between causal and stochastic
dependence. To see this, we define the autocovariance 󰇛󰇜 of process 󰇛󰇜,
󰇛󰇜󰇛󰇜󰇛󰇜
(10)
where denotes the covariance of any stochastic
variables and  denotes the mean of . Likewise, we define the autocovariance
󰇛󰇜 of the process 󰇛󰇜. The cross-covariance of the two processes is
󰇛󰇜󰇣󰇛󰇜󰇛󰇜󰇤
(11)
It is easily seen that the autocovariance is an even function of the lag , i.e., 󰇛󰇜
󰇛󰇜 and that this does not hold for the cross-covariance, 󰇛󰇜, which is generally an
asymmetric function of ; here the symmetry appears if we change the order of the
variables, i.e., 󰇛󰇜󰇛󰇜. Furthermore, it is shown in the Supplementary
Information (Section SI1.3; see also Papoulis, 1991) that the autocovariance and cross-
covariance functions are related by
󰇛󰇜 󰇛󰇜󰇛󰇜

(12)
Now, in a potentially causal system the latter equation takes the form
IRF
< 0 0 > 0
Time lag
Potentially causal Potentially anticausal
Potentially hen-or-egg causal Noncausal
18
󰇛󰇜󰇛󰇜󰇛󰇜
(13)
The covariance of 󰇛󰇜 with the future variable 󰇛󰇜, is 󰇣󰇛󰇜󰇛󰇜󰇤
󰇣󰇛󰇜󰇛󰇜󰇤󰇛󰇜 and is determined as
󰇛󰇜󰇛󰇜󰇛󰇜
󰇛󰇜󰇛󰇜
(14)
This is clearly nonzero, which proves that independence of the current value 󰇛󰇜 from
the future of 󰇛󰇜 does not hold. There is only one trivial exception, i.e., when 󰇛󰇜
for any , which is the case only if the cause 󰇛󰇜 is white noise. This exception, along
with the fact that the future 󰇛󰇜 does not functionally appear in equation (6), seems
to have been the culprit for misleading our intuitions.
Clearly, in a potentially causal system the time order is explicitly reflected in the
definition. In a potentially HOE causal system the time order needs to be clarified by
defining the principal direction. This could be done in several ways, the most natural
being the following:
1. The time lag maximizing the (absolute value of) cross-covariance 󰇛󰇜,
(equation (11)).
2. The mean (time average) of the function󰇛󰇜, defined as:
󰇛󰇜
  󰇛󰇜

(15)
3. The median  of the function󰇛󰇜, implicitly defined by:
󰇛󰇜


(16)
The index is independent of the function 󰇛󰇜, while the other two depend on it. The
indices  and for 󰇛󰇜, respectively. It is
reasonable to expect that, unless a system is noncausal (with 󰇜, all three variants,
, will have the same sign, which determines a principal direction in the HOE
causality. Thus, if the sign is nonnegative, the principal causality direction is 󰇛󰇜󰇛󰇜.
The principal direction is crucial for the understanding of the system studied. For
example, in a system characterized as 󰇛󰇜󰇛󰇜, if we speak about a positive feedback,
we would mean that the effect of 󰇛󰇜 on 󰇛󰇜 is magnified, rather than vice-versa.
By taking expectations in equation (7), it is readily seen that,
(17)
19
where
󰇣󰇛󰇜󰇤 󰇛󰇜 󰇛󰇜
(18)
Additional bulk characteristics of the IRFmore specifically, its temporal means and
higher moments in relation to those of the processes 󰇛󰇜󰇛󰇜are given in the
Supplementary Information (section SI1.4).
3.3 Properties and desiderata for IRF
In contrast to Granger’s analysis of causality (section 2.2), which treats the processes in
discrete time by definition, here we treat them in continuous (i.e. natural) time, and we
only convert them to discrete time for estimation purposes. If we think of the processes
in natural time, we understand that a causality relationship is not an instantaneous one.
In other words, if 󰇛󰇜 affects 󰇛󰇜, where 󰆒, it is reasonable to assume that, for small
, 󰇛󰇜 will also affect 󰇛󰇜. Therefore, the IRF, 󰇛󰇜, is not a Dirac delta function, but
one with some domain, , of nonzero (and potentially infinite) measure, where
󰇛󰇜 for . It is also reasonable to assume that 󰇛󰇜 is a continuous function and
has the same sign for all . The latter can be justified as follows. If 󰇛󰇜 is positively
correlated with 󰇛󰇜, then it is reasonable that 󰇛󰇜 are also positively correlated with
󰇛󰇜. Without loss of generality, in what follows we will assume that 󰇛󰇜 for
(if it were 󰇛󰇜, we would reflect 󰇛󰇜, i.e. replace it with 󰇛󰇜, and hence 󰇛󰇜 would
also be reflected becoming nonnegative).
Here we clarify that the problem of identifying causality is different from that of
recovering the full system dynamics. The former and not the latter, is the scope of our
study. We note that, while there exist oscillatory nonlinear systems, in which the sign of
󰇛󰇜 could alternate, we avoid subsuming them under the causality notion, particularly
when causality is inferred from data in an inductive manner. This choice is consistent with
Cox’s (1992) conditions for causality, according to which the effect “shows a monotone
relation with ‘dose’of the cause. Here we note that in our framework the “dose” is not
regarded as an instantaneous event, but one with some time span (see details in
Supplementary Information, section SI1.2).
The continuity desideratum can be quantified by defining a measure of roughness
and demanding that it be restricted below a threshold. We may define such a roughness
index by means of the squares of second derivative (cf. Koutsoyiannis, 2000). Specifically,
we define a roughness index as
󰆒󰆒󰇛󰇜

(19)
In summary the desiderata for the IRF are:
an adequate time span ;
20
nonnegativity, 󰇛󰇜 for all ;
smoothness, , where is a real number;
minimum variance of 󰇛󰇜.
3.4 Estimation of IRF
The literature offers several methods for estimating an IRF in terms of auto- and cross-
correlations (Young, 2011, 2015) or their Fourier transforms, i.e., power spectra and
cross-spectra (e.g. Papoulis, 1991). Here we seek a more direct method that can work with
time series of observations per se, being easily understandable and reproducible by any
reader using simple computational means, and can also host our desiderata for the IRF.
In dealing with observations, we first note that they are necessarily made in discrete
time τ, representing the time period 󰇛󰇜 to  (where τ is integer and D is the
discretization time step). It is assumed that each measurement represents the time
average of the process in this period (other cases are also discussed in the Supplementary
Information, Section SI1.1). Thus,
󰇛󰇜

󰇛󰇜
(20)
and likewise for and . The discrete-time version of equation (7) becomes


(21)
Specifically, using the definition of the discrete-time processes in equation (20) we show
in the Supplementary Information (Section SI1.1) that equation (21) follows from (7) and
that the discrete-time version of the IRF is related to the continuous-time one by
󰇡󰇛󰇜󰇛󰇜󰇛󰇜󰇢
(22)
where
󰇛󰇜 󰇛󰇜
 

(23)
Second, the observation period L is finite and hence the series sought should
necessarily be assumed of finite length too. We thus formulate the estimation equation as


(24)
where is the estimate of given the series of and J is an integer chosen as and
is determined from equation (17). This estimation results in an error:
21


(25)
Assuming that we have simultaneous observations of the and series, for a length L
the estimator of the variance of is
 󰇭
 󰇮


(26)
and the explained variance ratio is


(27)
As already mentioned, the above least-squares-based determination of the
ordinates is not the only technique for the identification of the IRF; additional
techniques can be found in Young (2011, 2015 and references therein). A well-known
weakness of determining numerous ordinates is that it is an over-parameterized problem,
which is typically addressed by assuming a parametric model (such as a Box-Jenkins
model or an autoregressive moving average exogenousARMAXmodel; Young, 2011,
2015). Here we prefer to use a nonparametric approach and we tackle the over-
parameterization problem by imposing the roughness threshold, as discussed above. An
additional parametric method, formulated in terms of parameterizing the IRF per se in
continuous time is also discussed and compared to the proposed non-parametric method
in the Supplementary Information of the companion paper (Koutsoyiannis et al., 2022;
sections SI2.3 and SI2.4).
Now the roughness index of the IRF can be formulated by replacing the continuous
second derivative with the discrete one as



(28)
and can also be expressed as a standardized index by

(29)
The constant 8 in the denominator has been introduced in order for the standardized
roughness index ε of nonnegative to have a maximum value of 1 (which becomes 2
without the nonnegativity constraint), while its least value is obviously zero. The zero
value corresponds to the case where all are identical. The value 1 corresponds to the
case where J is large (theoretically, tends to infinity) while the IRF is saw-like with 
󰇛󰇛
󰇜󰇜 (even values of the index) and  (odd values of the index).
22
It is computationally convenient to write the above equations in vector form, also
replacing estimators with estimates and stochastic processes with time series of
observations (cf. Koutsoyiannis, 2000). Thus, combining equations (24) and (17) we write
󰇛
 󰇜 and in vector form we will have

(30)
where
 is the vector of estimates of 
 is a vector with the  elements of IRF,
and X is a matrix with  rows, numbered to , and  columns whose
row numbered τ is .
Now, combining equations (25) and (17), the estimate of the variance of is
󰇛󰇜
󰇛󰇜󰇛󰇜
(31)
while the roughness index is

(32)
where is a matrix with  rows and  columns whose ijth entry is



(33)
In this way, the estimation of IRF is formulated as the following optimization
problem:
 󰇛󰇜
󰇛󰇜󰇛󰇜
 
(34)
If we ignore the last constraint (nonnegativity) and combine the second one (small
roughness) with the objective function using a weight (multiplier) 󰇛󰇜, we obtain
an unconstrained optimization problem, i.e.
󰇛󰇜󰇛󰇜󰇛󰇜
(35)
which has an analytical solution. Indeed, 󰇛󰇜 has derivative


(36)
Equating the derivative to 0 and solving for g we find
󰇛󰇜
(37)
One may notice in equation (37) that the term represents autocovariance estimates
of the process  represents cross-covariance estimates of the
23
processes and . By increasing the weight of the roughness term we can
decrease the roughness of g and thus with a proper choise of λ we can satisfy the
roughness constraint.
If the solution in equation (37) also satisfies the nonnegativity constraint, then we
have determined the IRF sought analytically. Otherwise, we have to abandon the
analytical solution and solve the problem numerically, as no analytical solution is
available for the nonnegativity constraint (Chen and Plemmons, 2009). However, all
software platforms, including common spreadsheet software, provide solvers that can
easily tackle the full optimization problem in equation (34) within seconds or minutes,
depending on the time series length.
The discussion of the analytical solution helps us to understand that the
autocovariance, and hence the autocorrelation, even though it does not explicitly appear
in the numerical version of the optimization procedure, strongly influences the parameter
estimation. It is relevant to mention that high autocorrelation results in increased
estimation uncertainty and may even result in spurious causality claims. This is illustrated
in the Supplementary Information of the companion paper (Koutsoyiannis, 2022, Section
SI2.2) along with techniques to handle such situations and avoid false conclusions.
4 Discussion and conclusions
We have briefly examined various approaches to the notion of causality and the criteria
for identifying an event as causally responsible for another. When dealing with physical
quantities whose values are only known with some degree of certainty, a probabilistic
approach to identifying causal links is required. However, attempts to define probabilistic
necessary and sufficient conditions have all been found to have limitations. In particular,
it is clear that no sufficient condition for concluding to a causal link has ever been
identified.
This therefore suggests that the focus should be exclusively upon identifying
necessary conditions for causation. Additionally, in view of the additional problem of
validating probabilistic statements about a unique occurrence of a putative causal link, it
is practically necessary to resort to considering stochastic processes rather than uniquely
temporally located events.
Drawing upon Papoulis’s proposal for a causal system consisting of two stochastic
processes 󰇛󰇜 and 󰇛󰇜, which turns upon the existence of an impulse response function
(IRF) connecting the two, we identified necessary conditions for the existence of a (linear)
causal link either from to or to , or of a hen-or-egg (HOE) situation in which causal
influences appear to go in both directions. The distinction between these depends upon
the nonexistence of nonzero weights of the linear relationship for negative (respectively
positive) lags for the causation (respectively the causation) with HOE
24
otherwise. Additionally, the IRF must enable enough of the variance of the caused process
to be explained for a causal link to be a possibility. Additional constraints of smoothness
and nonnegativity of the IRF are included in the method.
An additional benefit of the proposed method, albeit not discussed above, is its
direct applicability to the simulation of bivariate processes that exhibit time directionality
and causality. Conventional stochastic models generate time symmetric processes. The
problem of simulating a scalar process with time directionality has been tackled recently
(Koutsoyiannis, 2019, 2020). The present framework provides direct methods to simulate
time-directional vector processes with two variates, as well as hints for multivariate
processesa problem to be studied in future research.
The methodological framework proposed herein features substantial differences
from existing methods, such as those discussed in section 2.2. A first difference is in its
epistemological background which leads to a less ambitious objective, that of seeking
necessary conditions of causality rather than sufficient ones. The usefulness of this
objective lies in its ability to falsify an assumed causality and to add statistical evidence,
in an inductive context, for potential causality and its direction.
A second difference is that our focus is upon maximizing not the predictability per
se, but the lucidity in identifying the (potentially causal) relationship between two
processes and . This can be seen by comparing Granger’s expression in equation (1)
with our expression in equation (24). To estimate , the former includes terms for
times earlier than τ while the second does not. Such terms may increase predictability but
say nothing about a potentially causal relationship between the two processes and ;
rather, they may obscure that relationship, as autocorrelation is by definition symmetric
in time.
Furthermore, by its construction, our framework can detect not only mono-
directional (potentially) causal relationships, but also causality of HOE type. Notably, our
method is formulated from the outset for the latter case, while the former case will be
obtained as a result if in one of the two directions the IRF weights are zero. Further, like
other methods (e.g. Granger’s) our method allows testing in two directions, and
. The results in each direction are not anti-symmetrical in terms of the estimated
IRFs and thus the method provides two different views of the (potentially) causal
relationship, thus becoming more insightful.
A fourth difference of our method from many other methods lies in the recognition
that natural time is continuous rather than discrete (nb., some methods, e.g. Liang, 2016,
also use continuous time). The discrete-time relationships, which are necessary in
estimation based on observations, are deduced from the continuous-time formulation,
rather than taken as such from the outset. To understand the importance of this difference
in foundation, consider a classic causal system in continuous time. If we considered the
25
system in discrete time from the outset, then the weight for lag zero would be zero to
exclude synchrony (cf. Granger’s expression in equation (1)). But considering continuous
time, it becomes clear from equation (22) that 󰇛󰇜 and this is not a
violation of the axiom of time precedence.
The properties of an adequate time span of causality, instead of an instantaneous
action, along with the nonnegativity and roughness (or smoothness) constraints are
additional specific features of the method. Their importance derives from their enabling
the true dynamics of the system 󰇡󰇢 to be revealed, in addition to just identifying the
time lagsat least for systems consistent with the constraints, i.e. not those with
oscillatory or excessively rough actual dynamics. This importance will become evident in
the second part of this study, devoted to applications (Koutsoyiannis et al., 2022). On the
negative side, the constraints certainly make the formal statistical testing more
challenging. This would certainly be feasible with a Monte Carlo approach, but it is not
within the scope of this paper.
Acknowledgments: The constructive comments of two anonymous reviewers helped us to substantially
improve our work. We also thank the Board Member Graham Hughes for the processing of the paper and
the favourable decision, as well as Keith Beven for his advice on a preliminary version of the manuscript.
Funding: This research received no external funding but was motivated by the scientific curiosity of the
authors.
Conflicts of Interest: We declare no conflict of interest.
References
Birkhoff, G. D. & Lewis Jr., D. C. 1935 Stability in causal systems, Philosophy of Science 2(3), 304-333,
https://www.jstor.org/stable/184526 (accessed 21 August 2021).
Bode, H. W. & Shannon, C. E. 1950 A simplified derivation of linear least square smoothing and prediction
theory. Proc. of the IRE 38(4), 417-425.
Bunge, M. 1979 Causality and Modern Science. Third edition, Dover Publications.
Chen, D. & Plemmons, R. J. 2009. Nonnegativity constraints in numerical analysis. In The Birth of Numerical
Analysis (ed. A. Bultheel & R. Cools), pp. 109-139, World Scientific, https://doi.org/10.1142/7075
Clifton, D. L. 1968. The use of the transfer function and impulse response in acoustic scattering problems.
Defense Research Laboratory, The University of Texas at Austin, Austin, Texas, USA.
https://apps.dtic.mil/sti/pdfs/ADA033260.pdf (accessed 21 August 2021).
Cohen, M. X. 2014 Analyzing Neural Time Series Data: Theory and Practice. Cambridge, MA, USA: MIT Press.
Cox, D.R., 1992 Causality: Some statistical aspects. J. Roy. Stat. Soc. A 155(2), 291-301.
Dooge, J. C. 1959 A general theory of the unit hydrograph. J. Geoph. Res. 64(2), 241-256.
Drenick, R. F. 1963 Random processes in control and communications, IEEE Trans. on Military Electronics,
MIL-7(4), 275-280 , doi: 10.1109/TME.1963.4323091.
Gardner, S. 1999 Kant and the Critique of Pure Reason. London: Routledge.
26
Granger, C. W. 1969 Investigating causal relations by econometric models and cross-spectral methods.
Econometrica, 37(3), 424-438.
Granger, C. W. 1980. Testing for causality: a personal viewpoint. J. Econ. Dynamics and Control 2, 329-352.
Granger, C. & Newbold, P. 1986 Forecasting Economic Time Series. San Diego, CA, USA: Academic Press,.
Gujarati, D. N. & Porter, D. C. 2009 Basic Econometrics. 5th ed., Boston, MA, USA: McGraw Hill.
Hannart, A., Pearl, J., Otto, F. E. L., Naveau, P. & Ghil, M. 2016 Causal counterfactual theory for the attribution
of weather and climate-related events. Bull. Amer. Met. Soc. 97(1), 99-110.
Hendry, D. F. & Morgan, M. S. 1994 The ET interview. Professor H.O.A. Wold: 1907-1992. Econometric Theory
10, 419-433, http://korora.econ.yale.edu/et/interview/wold.pdf (accessed 21 August 2021).
Hopf, E. 1934 On causality, statistics and probability, J. of Mathematics and Physics, 13, doi:
10.1002/sapm193413151.
Hume, D. 1748 An enquiry concerning human understanding.
Jaynes, E.T. 1957 Information theory and statistical mechanics. Physical Review, 106 (4), 620-630
Kant, I. 1787/1998 Critique of Pure Reason. (transl. P. Guyer & A. W. Wood), Cambridge: Cambridge
University Press.
Keats, R. G. 1967 A note on the black box problem. Appl. Prob. 4(1), 113-122,
https://www.jstor.org/stable/3212303 (accessed 21 August 2021).
Kline, A. D. 1980 Are there cases of simultaneous causation? PSA Proc. Bienn. Meet. Philos. Sci. Assoc., 292
301, Philosophy of Science Association.
Kolmogorov, A. N. 1941 Interpolation und extrapolation, Izv. Akad. Nauk SSSR Ser. Mat. 5, 3-14 (Republished
in Kolmogorov, A. N. & Shiryayev, A. N. 1992 Selected works of A. N. Kolmogorov: Vol. 2, Probability theory
and mathematical statistics. Kluwer Academic, pp. 272-280).
Koutsoyiannis, D. 2000 Broken line smoothing: A simple method for interpolating and smoothing data
series. Environ. Modelling and Software 15(2), 139149.
Koutsoyiannis, D. 2006 On the quest for chaotic attractors in hydrological processes. Hydrol. Sci. J. 51(6),
10651091, doi: 10.1623/hysj.51.6.1065.
Koutsoyiannis, D. 2014a Random musings on stochastics (Lorenz Lecture). AGU 2014 Fall Meeting,
American Geophysical Union, San Francisco, USA , doi: 10.13140/RG.2.1.2852.8804.
Koutsoyiannis, D. 2014b Entropy: from thermodynamics to hydrology, Entropy, 16 (3), 12871314,
doi:10.3390/e16031287.
Koutsoyiannis, D. 2019 Time’s arrow in stochastic characterization and simulation of atmospheric and
hydrological processes. Hydrol. Sci. J. 64(9), 10131037, doi:10.1080/02626667.2019.1600700,.
Koutsoyiannis, D. 2020 Simple stochastic simulation of time irreversible and reversible processes. Hydrol.
Sci. J. 65(4), 536551, doi:10.1080/02626667.2019.1705302.
Koutsoyiannis, D. 2021 Stochastics of Hydroclimatic Extremes - A Cool Look at Risk. Athens: Kallipos, ISBN:
978-618-85370-0-2, 333 pp; http://www.itia.ntua.gr/2000/.
Koutsoyiannis, D. & Dimitriadis, P. 2021 Towards generic simulation for demanding stochastic processes,
Sci, 3, 34, doi: 10.3390/sci3030034.
Koutsoyiannis, D. & Kundzewicz, Z. W. 2020 Atmospheric temperature and CO₂: Hen-or-egg causality? Sci
2(4), 83, doi:10.3390/sci2040083.
Koutsoyiannis, D., Onof, C. Christofides, A. & Kundzewicz, Z. W. 2022 Revisiting causality using stochastics:
2. Applications, Proceedings of the Royal Society A, this issue.
27
Koutsoyiannis, D., Yao, H. & Georgakakos, A. 2008 Medium-range flow prediction for the Nile: a comparison
of stochastic and deterministic methods, Hydrol. Sci. J., 53 (1), 142164, doi:10.1623/hysj.53.1.142.
Liang, X. S. 2016 Information flow and causality as rigorous notions ab initio. Phys. Rev. E 94, 052201.
Liang, X.S., 2018. Causation and information flow with respect to relative entropy. Chaos: An
Interdisciplinary Journal of Nonlinear Science, 28(7), 075311.
Masani, P. 1966 Wiener’s contributions to generalized harmonic analysis, prediction theory and filter
theory. Bull. Amer. Math. Soc. 72(1), 73-125.
Mehlberg, H. M. 1983. Time, Causality, and the Quantum Theory. Dordrecht, Holland: Reidel,.
Nash, J.E. 1959 Systematic determination of unit hydrograph parameters. J. Geoph. Res. 64(1), 111-115.
Papoulis, A. 1991 Probability, Random Variables and Stochastic Processes, 3rd ed.; New York, NY, USA:
McGraw-Hill (1st edition 1965).
Parzen, E. 1968 Multiple Time Series Modeling. Technical Report 22, Department of Statistics, Stanford
University, Stanford, CA, USA, https://apps.dtic.mil/sti/pdfs/AD0627023.pdf (accessed 21 August
2021).
Pearl, J. 2009 Causal inference in statistics: An overview. Statistics Surveys 3, 96-146, doi: 10.1214/09-
SS057.
Pearl, J., Glymour, M. & Jewell, N. P. 2016 Causal Inference in Statistics: A Primer. Chichester, UK: Wiley.
Post, E. J. 1963 Note on phase-amplitude relations. Proc. of the IEEE 51(4), 627-627.
Robbins, H. 1959 An extension of Wiener filter theory to partly sampled systems. IRE Transactions on Circuit
Theory 6(4), 362-370, doi: 10.1109/TCT.1959.1086575.
Salmon, W. 1998 Causality and Explanation, New York: Oxford University Press.
Shannon, C. E. 1948 The mathematical theory of communication. Bell System Technical Journal 27(3), 379-
423.
Sharnoff, M. 1964 Validity conditions for the Kramers-Kronig relations. Am. J. Phys. 32, 40-44, doi:
10.1119/1.1970070.
Sion, A. 2010 The logic of causation: Definition, induction and deduction of deterministic causality. Avi Sion
(self-publication), http://thelogician.net/LOGIC-OF-CAUSATION/Cover-page.htm (accessed 3 July
2021).
Skyrms, B., 1980. Causal Necessity: A Pragmatic Investigation of The Necessity of Laws. New Haven and
London: Yale University Press.
Stips, A., Macias, D., Coughlan, C., Garcia-Gorriz, E., Liang, X. S. 2016 On the causal structure between CO2
and global temperature. Sci. Rep. 6, 21691, doi:10.1038/srep21691.
Strotz, R. H, & Wold, H. O. A. 1960 Recursive vs. nonrecursive systems: an attempt at synthesis (Part I of a
triptych on causal chain systems). Econometrica 28(2), 417-427,
https://www.jstor.org/stable/1907731 (accessed 21 August 2021).
Suppes, P. 1970 A Probabilistic Theory of Causality. Amsterdam, The Netherlands: North-Holland Publishing.
Verbitsky, M. Y., Mann, M. E., Steinman, B. A. & Volobuev, D. M. 2019 Detecting causality signal in
instrumental measurements and climate model simulations: global warming case study. Geosci. Model
Dev. 12, 40534060, doi: 10.5194/gmd-12-4053-2019.
Wiener, N. 1948 Cybernetics or Control and Communication in the Animal and the Machine. Cambridge, MA,
USA: MIT Press, 212 pp.
Wiener, N., 1949. Extrapolation, Interpolation and Smoothing of Stationary Time Series, with Engineering
Applications. New York, NY, USA: Wiley.
28
Wold, H.O. 1938 A Study in the Analysis of Stationary Time-Series. Ph.D. Thesis, Almquist and Wicksell,
Uppsala, Sweden.
Wold, H. O. A. 1948 On prediction in stationary time series. Ann. Math. Statist. 19(4), 558 567, doi:
10.1214/aoms/1177730151.
Wold, H. O. A. 1954 Causality and econometrics. Econometrica, 22 (2), 162-177,
https://www.jstor.org/stable/1907540 (accessed 21 August 2021).
Wold, H. O. A. 1960 A generalization of causal chain models (Part III of a triptych on causal chain systems).
Econometrica 28(2), 443-463, https://www.jstor.org/stable/1907733 (accessed 21 August 2021).
Young, P.C. 2011. Recursive Estimation and Time Series Analysis, Berlin, Heidelberg: Springer-Verlag.
Young, P.C., 2015. Refined instrumental variable estimation: Maximum likelihood optimization of a unified
Box-Jenkins model. Automatica, 52, 3546.
Presentation
Full-text available
Current-day scholars have rediscovered change and given particular emphasis on climate change. However change has been well known and well studied on philosophical and scientific grounds since the era of Heraclitus and Aristotle. The omnipresence of change is confirmed by modernday geological and paleoclimatic studies. These have provided concrete evidence that climate has been perpetually changing. The scientific background to study perpetual change has been developed by the Moscow School of Mathematics and most prominently Kolmogorov, who, among other achievements, laid the axiomatic foundation of probability theory and introduced the concept of stochastic processes. On the other hand, observations on long time series, most prominently by Hurst in Egypt, provided the empirical basis to understand change and its consequences in typical engineering tasks. Based on these lines, a stochastic framework is discussed that can deal with natural extremes under perpetual change, avoiding naïve methodologies which currently prevail.
Article
Full-text available
In a companion paper, we develop the theoretical background of a stochastic approach to causality with the objective of formulating necessary conditions that are operationally useful in identifying or falsifying causality claims. Starting from the idea of stochastic causal systems, the approach extends it to the more general concept of hen-or-egg causality, which includes as special cases the classic causal, and the potentially causal and anti-causal systems. The framework developed is applicable to large-scale open systems, which are neither controllable nor repeatable. In this paper, we illustrate and showcase the proposed framework in a number of case studies. Some of them are controlled synthetic examples and are conducted as a proof of applicability of the theoretical concept, to test the methodology with a priori known system properties. Others are real-world studies on interesting scientific problems in geophysics, and in particular hydrology and climatology.
Article
Full-text available
We outline and test a new methodology for genuine simulation of stochastic processes with any dependence structure and any marginal distribution. We reproduce time dependence with a generalized, time symmetric or asymmetric, moving-average scheme. This implements linear filtering of non-Gaussian white noise, with the weights of the filter determined by analytical equations, in terms of the autocovariance of the process. We approximate the marginal distribution of the process, irrespective of its type, using a number of its cumulants, which in turn determine the cumulants of white noise, in a manner that can readily support the generation of random numbers from that approximation, so that it be applicable for stochastic simulation. The simulation method is genuine as it uses the process of interest directly, without any transformation (e.g., normalization). We illustrate the method in a number of synthetic and real-world applications, with either persistence or antipersistence, and with non-Gaussian marginal distributions that are bounded, thus making the problem more demanding. These include distributions bounded from both sides, such as uniform, and bounded from below, such as exponential and Pareto, possibly having a discontinuity at the origin (intermittence). All examples studied show the satisfactory performance of the method.
Book
Full-text available
Much is said and written about hydroclimatic hazards: storms, floods, droughts. Such hazards have existed and will always exist, while the usual scaremongering on them is of little help to avoid them. Instead, what is needed is a cool look at risk, based on measurement data, using scientific methodology, and ultimately employing technology in the service of reducing hazards and their consequences. This is attempted in the book. Much of it is devoted to the theory of stochastics —the mathematical language for analysing extremes. Stochastics is a scientific area broader than statistics —according to the definition adopted in the book, statistics is part of stochastics. Another part is the theory of stochastic processes, in which time has a hypostasis that is typically absent in statistics. Thus, statistics is in relation to stochastic what statics are in relation to dynamics. The commonly used classical statistics (based on the assumption of independence) is a special case of stochastics and, as the book proves, is inappropriate for the subject. This does not mean that statistics are abandoned or underrepresented in the book. On the contrary, several new developments are presented —most notably the new tool of knowable moments, which have two relevant characteristics: they are closely connected to extremes and their estimation is unbiased in the framework of classical statistics or involves small (and determinable) bias in stochastic processes with dependence in time, whilst the bias in the estimation of classical statistical moments can be huge. The new theoretical analyses are supported by mathematical proofs, which, to improve readability, are contained in a number of appendices in each of the 10 main chapters of the book. Along with the development of the theory, the book is oriented to the application, which is supported by a variety of examples, usually standing out as parenthetical sections or Digressions, as well as by tabulations of mathematical formulae that are used for each task.
Article
Full-text available
It is common knowledge that increasing CO 2 concentration plays a major role in enhancement of the greenhouse effect and contributes to global warming. The purpose of this study is to complement the conventional and established theory, that increased CO 2 concentration due to human emissions causes an increase in temperature, by considering the reverse causality. Since increased temperature causes an increase in CO 2 concentration, the relationship of atmospheric CO 2 and temperature may qualify as belonging to the category of "hen-or-egg" problems, where it is not always clear which of two interrelated events is the cause and which the effect. We examine the relationship of global temperature and atmospheric carbon dioxide concentration in monthly time steps, covering the time interval 1980-2019 during which reliable instrumental measurements are available. While both causality directions exist, the results of our study support the hypothesis that the dominant direction is T → CO 2. Changes in CO 2 follow changes in T by about six months on a monthly scale, or about one year on an annual scale. We attempt to interpret this mechanism by involving biochemical reactions as at higher temperatures, soil respiration and, hence, CO 2 emissions, are increasing.
Article
Full-text available
As time irreversibility of streamflow is marked for time scales up to several days, while common stochastic generation methods are good only for time symmetric processes, the need for new methods to handle irreversibility, particularly in flood simulations, has been recently highlighted. From an investigation of the historical evolution of existing stochastic generation methods, which is a useful step before proposing new methods, the strengths and weaknesses of current approaches are located. Following this investigation, a generic solution to the stochastic generation problem is proposed. This is an analytical exact method based on an asymmetric moving average scheme, capable of handling time irreversibility in addition to preserving the second-order stochastic structure, as well as higher-order marginal statistics, of a process. The method is studied theoretically in its general setting, as well as in its most interesting special cases, and is successfully applied to streamflow generation at hourly scale.
Article
Full-text available
Detecting the direction and strength of the causality signal in observed time series is becoming a popular tool for exploration of distributed systems such as Earth's climate system. Here, we suggest that in addition to reproducing observed time series of climate variables within required accuracy a model should also exhibit the causality relationship between variables found in nature. Specifically, we propose a novel framework for a comprehensive analysis of climate model responses to external natural and anthropogenic forcing based on the method of conditional dispersion. As an illustration, we assess the causal relationship between anthropogenic forcing (i.e., atmospheric carbon dioxide concentration) and surface temperature anomalies. We demonstrate a strong directional causality between global temperatures and carbon dioxide concentrations (meaning that carbon dioxide affects temperature more than temperature affects carbon dioxide) in both the observations and in (Coupled Model Intercomparison Project phase 5; CMIP5) climate model simulated temperatures.
Article
Full-text available
Time’s arrow has important philosophical, scientific and technical connotations and is closely related to randomness as well as to causality. Stochastics offers a frame to explore, characterize and simulate irreversibility in natural processes. Indicators of irreversibility are different if we study a single process alone, or more processes simultaneously. In the former case, description of irreversibility requires at least third-order properties, while in the latter lagged second-order properties may suffice to reveal causal relations. Several examined data sets indicate that in atmospheric processes irreversibility is negligible at hydrologically relevant time scales, but may exist at the finest scales. However, the irreversibility of streamflow is marked for scales of several days and this highlights the need to reproduce it in flood simulations. For this reason, two methods of generating time series with irreversibility are developed, from which one, based on an asymmetric moving average scheme, proves to be satisfactory.
Article
Recently, a rigorous formalism has been established for information flow and causality within dynamical systems with respect to Shannon entropy. In this study, we re-establish the formalism with respect to relative entropy, or Kullback-Leiber divergence, a well-accepted measure of predictability because of its appealing properties such as invariance upon nonlinear transformation and consistency with the second law of thermodynamics. Different from previous studies (which yield consistent results only for 2D systems), the resulting information flow, say T, is precisely the same as that with respect to Shannon entropy for systems of arbitrary dimensionality, except for a minus sign (reflecting the opposite notion of predictability vs. uncertainty). As before, T possesses a property called principle of nil causality, a fact that classical formalisms fail to verify in many situation. Besides, it proves to be invariant upon nonlinear transformation, indicating that the so-obtained information flow should be an intrinsic physical property. This formalism has been validated with the stochastic gradient system, a nonlinear system that admits an analytical equilibrium solution of the Boltzmann type.