ArticlePDF Available

Compression complexity with ordinal patterns for robust causal inference in irregularly sampled time series

Authors:

Abstract and Figures

Distinguishing cause from effect is a scientific challenge resisting solutions from mathematics, statistics, information theory and computer science. Compression-Complexity Causality (CCC) is a recently proposed interventional measure of causality, inspired by Wiener–Granger’s idea. It estimates causality based on change in dynamical compression-complexity (or compressibility) of the effect variable, given the cause variable. CCC works with minimal assumptions on given data and is robust to irregular-sampling, missing-data and finite-length effects. However, it only works for one-dimensional time series. We propose an ordinal pattern symbolization scheme to encode multidimensional patterns into one-dimensional symbolic sequences, and thus introduce the Permutation CCC (PCCC). We demonstrate that PCCC retains all advantages of the original CCC and can be applied to data from multidimensional systems with potentially unobserved variables which can be reconstructed using the embedding theorem. PCCC is tested on numerical simulations and applied to paleoclimate data characterized by irregular and uncertain sampling and limited numbers of samples.
PCCC surrogate analysis results. PCCC surrogate analysis results for: (a) Kilo-year scale CO2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{2}$$\end{document}→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document} T, (b) Kilo-year scale T →\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document} CO2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{2}$$\end{document}, (c) Yearly ENSO →\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document} SASM, (d) SASM →\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow$$\end{document} ENSO. Dashed line indicates PCCC value obtained for original series. Its position is indicated with respect to Gaussian curve fitted normalized histogram of surrogate PCCC values. PCCC for cases (b)–(d) is found to be significant.
… 
This content is subject to copyright. Terms and conditions apply.
1
Vol.:(0123456789)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports
Compression complexity
with ordinal patterns for robust
causal inference in irregularly
sampled time series
Aditi Kathpalia, Pouya Manshour & Milan Paluš*
Distinguishing cause from eect is a scientic challenge resisting solutions from mathematics,
statistics, information theory and computer science. Compression-Complexity Causality (CCC) is a
recently proposed interventional measure of causality, inspired by Wiener–Granger’s idea. It estimates
causality based on change in dynamical compression-complexity (or compressibility) of the eect
variable, given the cause variable. CCC works with minimal assumptions on given data and is robust to
irregular-sampling, missing-data and nite-length eects. However, it only works for one-dimensional
time series. We propose an ordinal pattern symbolization scheme to encode multidimensional
patterns into one-dimensional symbolic sequences, and thus introduce the Permutation CCC (PCCC).
We demonstrate that PCCC retains all advantages of the original CCC and can be applied to data from
multidimensional systems with potentially unobserved variables which can be reconstructed using
the embedding theorem. PCCC is tested on numerical simulations and applied to paleoclimate data
characterized by irregular and uncertain sampling and limited numbers of samples.
Unraveling systems’ dynamics from the analysis of observed data is one of the fundamental goals of many areas of
natural and social sciences. In this respect, detecting the direction of interactions or inferring causal relationships
among observables is of particular importance that can improve our ability to better understand the underlying
dynamics and to predict or even control such complex systems1,2.
Around sixty years aer the pioneering work of Wiener and Granger3,4 on quantifying linear ‘causality’ from
observations, it has been widely applied not only in economics57, for which it was rst introduced, but also in
various elds of natural sciences, from neurosciences8 to Earth sciences911. A number of attempts have been
made to generalize Granger Causality (GC) to nonlinear cases, using, e.g., an estimator based on correlation
integral6, a non-parametric regression approach12, local linear predictors13, mutual nearest neighbors14,15, kernel
estimators16, to state a few. Several other causality methods based on the GC principle such as Partial Directed
Coherence17, Direct Transfer Function18 and Modied Direct Transfer Function19 have also been proposed.
Information theory has proved itself as a powerful approach into causal inference. In this respect, Schreiber
proposed a method for measuring information transfer among observables20, known as Transfer Entropy (TE),
which is based on Kullback-Leibler distance between transition probabilities. Paluš etal.21 introduced a causality
measure based on mutual information, called Conditional Mutual Information (CMI). CMI has been shown to
be equivalent to TE22. ese tools have been applied in various research studies and have shown their power in
extracting causal relationships between dierent systems2327.
We usually work with time series x(t) and y(t) as realizations of m and n dimensional dynamical systems,
X(t) and Y(t) respectively, evolving in measurable spaces. It means that x(t) and y(t) can be considered as the
components of these m and n dimensional vectors. In many cases, only one possible dimension of the phase
space is observable, recordings or knowledge of variables which may have indirect eects or play as mediators
in the causal interactions between observables may not be available. In this respect, phase-space reconstruction
is a common useful approach introduced by Takens28, which reconstructs the dynamics of the entire system
(including other unknown/unmeasurable variables) using time-delay embedding vectors, as follows: the manifold
of an m dimensional state vector X can be reconstructed as
X(t)={x(t),x(tη), ..., x(t(m1)η)}
. Here,
η
is the embedding delay, and can be obtained using the embedding construction procedure based on the rst
minimum of the mutual information29. Some causality estimators have applied this phase-space reconstruction
OPEN
Department of Complex Systems, Institute of Computer Science of the Czech Academy of Sciences, 182 07 Prague,
Czech Republic. *email: mp@cs.cas.cz
Content courtesy of Springer Nature, terms of use apply. Rights reserved
2
Vol:.(1234567890)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
procedure to improve their causal inference power, such as high dimension CMI26 and TE30. Other causality
measures, such as, Convergent Cross Mapping (CCM)31, Topological Causality32, Predictability Improvement33,
are based directly on the reconstruction of dynamical systems.
Vast amounts of data available in the recent years have pushed some of the above discussed GC extensions,
information and phase-space reconstruction based approaches forward as they rely on joint probability density
estimations, stationarity, markovianity, topological or linear modeling. However, still, many temporal observa-
tions made in various domains such as climatology34,35, nance36,37 and sociology38 are oen short in length, have
missing samples or are irregularly sampled. A signicant challenge arises when we attempt to apply causality
measures in such situations11. For instance, CMI or TE fail when applied to time series which are undersampled
or have missing samples3941 and also in case of time series with short lengths41. CCM and kernel based non-
linear GC also show poor performance even in the case of few missing samples in bivariate simulated data42.
Kathpalia and Nagaraj recently introduced a causality measure, called Compression-Complexity Causality
(CCC), which employs ‘complexity’ estimated using lossless data-compression algorithms for the purpose of
causality estimation. It has been shown to have the strength to work well in case of missing samples in data for
bivariate systems of coupled autoregressive and tent map processes. is has been shown to be the case for sam-
ples which are missing in the two coupled time series either in a synchronous or asynchronous manner41. Also, it
gives good performance for time series with short lengths41,42. ese strengths of CCC arise from its formulation
as an interventional causality measure based on the evolution of dynamical patterns in time series, independence
from joint probability density functions, making minimal assumptions on the data and use of lossless compres-
sion based complexity approaches which in turn show robust performance on short and noisy time series41,43.
However, as discussed inRef.42, a direct multidimensional extension of CCC is not as straightforward and so a
measure of eective CCC has been formulated and used on multidimensional systems of coupled autoregressive
processes with limited number of variables.
On the other hand, a method for symbolization of phase-space reconstructed (embedded) processes has
been used to improve the ability of info-theoretic causality measures for noisy data, such as symbolic trans
fer entropy44,45, partial symbolic transfer entropy46,47, permutation conditional mutual information (PCMI)48 and
multidimensional PCMI49. e symbolization technique used in these works is based on the Bandt and Pompe
scheme for estimation of Permutation Entropy50, and oen referred to as permutation or ordinal patterns coding.
e scheme labels the embedded values of time-series at each time point in ascending order of their magnitude.
Symbols are then assigned at each time point depending on the ordering of values (or the labelling sequence)
at that point. Ordinal patterns have been used extensively in the analysis and prediction of chaotic dynamical
systems and also shown to be robust in applications to real world time series. By construction, this technique
ignores the amplitude information and thus decreases the eect of high uctuations in data on the obtained causal
inference51. Other benets of permutation patterns are: they naturally emerge from the time series and so the
method is almost parameter-free; are invariant to monotonic transformations of the values; keep account of the
causal order of temporal values and the procedure is computationally inexpensive5255. Ordinal partition has been
shown to have the generating property under specic conditions, implying topological conjugacy between phase
space of dynamical systems and their ordinal symbolic dynamics56. Further, permutation entropy for certain sets
of systems has been shown to have a theoretical relationship to the systems Lyapunov exponents and Kolmogorov
Sinai Entropy57,58. Because of all these benecial properties of permutation patterns, it is no wonder that the
development of symbolic TE or PCMI helped to make them more robust, giving better performance in the case
of noisy measurements, simplifying the process of parameter selection and making less demands on the data.
In this work, we propose the use of CCC approach with reconstructed dynamical systems which are symbol-
ized using ordinal patterns. e combination of strengths of CCC and ordinal patterns, not only makes CCC
applicable to dynamical systems with multidimensional variables, but we also observe that the proposed Per
mutation CCC (PCCC) measure gives great performance on datasets with very short lengths and high levels of
missing samples. e performance of PCCC is compared with that of PCMI (which is identical to symbolic TE),
bivariate CCC and CMI on simulated dynamical systems data. PCCC outperforms the existing approaches and
its estimates are found to be robust for short length time series, and high levels of missing data points.
is development for the rst time opens up avenues for the use of causality estimation tool on real world
datasets from climate and paleoclimate science, nance and other elds where there is prevalence of data with
irregular and/or uncertain sampling times. To determine the major drivers of climate is the need of the hour as
climate change poses a big challenge to humankind and our planet Earth59. Dierent studies have employed either
correlation/coherence, causality methods or modelling approaches to study the interaction between climatic
processes. e results produced by dierent studies are dierent and sometimes contradictory, presenting an
ambiguous situation. We apply PCCC to analyse the causal relationship between the following sets of climatic
processes: greenhouse gas concentrations—atmospheric temperature, El-Niño Southern Oscillation—South
Asian monsoon and North Atlantic Oscillation—European temperatures at dierent time-scales and compare
its performance with bivariate CCC, bivariate and multidimensional CMI, and PCMI. e time series avail-
able for most of these processes are short in length and sometimes have missing samples and (or) are sampled
in irregular intervals of time. We expect our estimates to be reliable and to be helpful to resolve the ambiguity
presented by existing studies.
Results
Simulation experiments. Time series data from a pair of unidirectionally coupled Rössler systems were
generated as per the following equations:
Content courtesy of Springer Nature, terms of use apply. Rights reserved
3
Vol.:(0123456789)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
for the autonomous or master system, and
for the response or slave system. Parameters were set as:
a1=a2=0.15
,
b1=b2=0.2
,
c1=c2=10.0
, and fre-
quencies set as:
ω1=1.015
and
ω2=0.985
. e coupling parameter,
, was xed to 0.09. e data were generated
by numerical integration based on the adaptive Bulirsch–Stoer method60 using a sampling interval of 0.314 for
both the master and slave systems. is procedure gives 17–21 samples per one period. 100 realizations of these
systems were simulated and initial 5000 transients were removed before using the data for testing experiments.
As can be seen from the equations, there is a coupling between
x1
and
x2
, with
x1
inuencing
x2
. e analysis
of the causal inuence between the two systems was done using the causality estimation measures: bivariate or
scalar CCC, CMI, PCCC and PCMI for the cases outlined in the following paragraphs. e estimation procedure
for each of the methods is described in the “Methods” section. e values of parameters used for each of the
methods are also given in the “Methods” section (Table2).
Finite length data. e length of time series, N, of
x1
and
x2
taken from coupled Rössler systems was varied as
shown in Fig.1. e estimation for CMI and PCMI is done up to a higher value of length as CMI did not give
optimal performance until the length became 32,768 samples. Figure1c shows scalar (simple bivariate) CMI or
one-dimensional CMI (CMI1) between
x1
and
x2
(see Paluš and Vejmelka22). is method has high sensitivity
but suers from low specicity. is problem is solved by using conditional CMI or three-dimensional CMI
(CMI3), where the information from other variables (
y1,z1,y2,z2
) is incorporated in the estimation. Its perfor-
mance is depicted in Fig.1e. However, it requires larger length of time series for optimal performance. Figure1a
shows the performance of scalar (or simple bivariate) CCC, which is equivalent to the CMI1 case, considering
dimensionality. Figure1b, d show the performance of PCCC and PCMI respectively. For each length level, all
100 realizations of coupled systems were considered and 100 surrogates generated for each realization in order
to perform signicance analysis of causality estimated (in both directions) from each realization of coupled
processes. ese surrogates were generated for both the processes using the Amplitude Adjusted Fourier Trans-
form method61 and signicance testing done using a standard one-sided z-test with p-value set to 0.05 (this was
justied as the distributions of surrogates for CCC and CMI methods implemented were found to be Gaussian).
Based on this signicance analysis, true positive rate (TPR) and false positive rate (FPR) were computed at each
length level. A true positive is counted for a particular realization of coupled systems when causality estimated
from
x1
to
x2
is found to be signicant and a false positive is counted when causality estimated from
x2
to
x1
is
found to be signicant.
As it can be seen from the plots, direct application of scalar CCC completely fails on multidimensional
dynamical systems data, yielding low true positives and high false positives. Hence the method displays poor
sensitivity as well as specicity. CMI1 also shows poor performance, yielding high false positives. CMI3, which
is appropriate to be applied for multi-dimensional data, only begins to give good performance when the length
of time series is taken to be greater than 32,768 samples. On the other hand, PCCC begins to give high true
positives and low false positives, as the length of time series is increased to 1024 time points, with TPR and FPR
reaching almost 1 and 0 respectively as length is increased to 2048 time points. e use of permutation patterns
also improves the performance of CMI3 for short length data as it can be seen that PCMI begins to show a TPR
of 1 and FPR of 0 for length of time series equal to 2048 time points.
We did further experiments with simulated Rössler data by varying the amount of noise and missing samples
in the data. For these cases, performance of PCCC and PCMI alone were evaluated because it can be seen from
the ‘varying length’ experiments that scalar CCC and CMI1 do not work for multidimensional dynamical systems
data and CMI3 does not perform well for short length data.
Noisy data. White Gaussian noise was added to the simulated Rössler data. e amount of noise added to
the data was relative to the standard deviation of the data. e noise standard deviation (
σn
), is expressed as a
percentage of the standard deviation of the original data (
σs
). For example,
20%
of noise means
σn
=
0.2
σ
s
, and
100% of noise means
σn=σs
. e length of time series taken for this experiment was xed to 2048. For each
realization of noisy data as well, 100 surrogate time series were generated and signicance testing performed as
before using the Amplitude Adjusted Fourier Transform method and z-test respectively. Figure2a,b show the
results for varying noise in the data for the measures PCCC and PCMI respectively.
It can be seen that PCCC performs well for low levels of noise, up to
10%
, but at higher levels of noise, its
performance begins to deteriorate. PCMI, on the other hand, shows high TPR and low FPR even as the noise
level is increased to 50%.
Sparse data. We refer to time-series with missing samples as sparse data. Sparsity or non-uniformly missing
samples were introduced in the data in two ways: (1) Synchronous sparsity and (2) Asynchronous sparsity. In
case of (1), samples were missing from both
x1
and
x2
at randomly chosen time indices and this set of time indi-
ces was the same for both
x1
and
x2
. In case of (2), samples were missing from both
x1
and
x2
based on two dif-
(1)
˙x
1
=−ω
1
y
1
z
1
,
˙
y
1=ω1x1+a1y1,
˙
z1
=
b1
+
z1(x1
c1),
(2)
˙x
2
=−ω
2
y
2
z
2
+ǫ(x
1
x
2
),
˙y2=ω2x2+a2y2,
˙
z2
=
b2
+
z2(x2
c2)
,
Content courtesy of Springer Nature, terms of use apply. Rights reserved
4
Vol:.(1234567890)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
ferent sets of randomly chosen time indices, that is, the time indices of missing samples were dierent for
x1
and
x2
. e amount of synchronous/ asynchronous sparsity is expressed in terms of percentage of missing samples
relative to the original length of time series taken.
αsync
and
αasync
refer to the level of missing samples for the
cases of synchronous and asynchronous sparsity respectively, and are given by m/N, where m is the number of
missing samples and N is the original length of time series. N was xed to 2048. e length of time series became
shorter as the percentage of missing samples were increased. Causality estimation measures were applied to the
data without any knowledge of whether any samples were missing or the time stamps at which the samples were
missing. Surrogate data generation for each realization in this case was not done post the introduction of miss-
ing samples but prior to that, using the original length time series. Sparsity was then introduced in the surrogate
time series in a manner similar to that for original time series.
Figure2c,d show the results obtained using PCCC and PCMI respectively for synchronous sparsity. Figure2e,f
show the same for asynchronous sparsity. It can be seen that PCCC is robust to the introduction of missing
samples, showing high TPR and low FPR. FPR begins to be greater than 0.2 only when the level of synchronous
sparsity is increased to
25%
and asynchronous sparsity is increased to 20%. PCMI is robust to low levels of
synchronous sparsity but deteriorates beyond 5% of missing samples, giving low true positives. It performs very
poorly even with low levels of asynchronous sparsity.
Real data analysis. As discussed in the Introduction, a number of climate datasets are either sampled
at irregular intervals, have missing samples, are sampled aer long intervals of time or have a combination of
two or more of these issues. In addition, their temporal recordings available are short in length. We apply the
Figure1. Specicity and sensitivity of methods with varying length. True positive rate (or rate of signicant
causality estimated from
x1x2
) and false positive rate (or rate of signicant causality estimated from
x2x1
), using measures (a) scalar CCC (CCC), (b) permutation CCC (PCCC), (c) scalar CMI (CMI1), (d)
permutation CMI (PCMI) and (e) three-dimensional CMI (CMI3), as the length of time series, N, is varied.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
5
Vol.:(0123456789)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
proposed method, PCCC, to some such datasets described below. We also compare the results obtained with
existing measures: scalar CCC, scalar CMI and PCMI.
Millenial scale CO
2
‑temperature recordings. Mills etal. have compiled independent estimates of global average
surface temperature and atmospheric CO
2
concentration for the Phanerozoic eon. ese paleoclimate proxy
records span the last 424 million years62 and have been used and made available in the study by Wong etal.63.
One data point for both CO
2
and temperature recordings were available for each million year period and was
used in our analysis to check for causal interaction between between the two.
CO
2
, CH
4
and temperature recordings over the last 800,000years. Past Interglacials Working Group of PAGES64
has made available proxy records of atmospheric CO
2
, CH4 and deepwater temperatures over the last 800 ka (1
ka= 1000 years). Each of these time series were reconstructed by separate studies and so the recordings available
are non-synchronous and also irregularly sampled for each variable. Further, some data points are missing in the
Figure2. Specicity and sensitivity of methods with varying noise and sparsity. True positive rate (or rate of
signicant causality estimated from
x1x2
) and false positive rate (or rate of signicant causality estimated
from
x2x1
), using measures permutation CCC (PCCC) (le column) and permutation CMI (PCMI) (right
column) as the level of noise: (a, b); level of synchronous sparsity: (c, d); and asynchronous sparsity: (e, f), are
varied.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
6
Vol:.(1234567890)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
temperature time-series. Roughly, single data point is available for each ka for each of the three variables. CO
2
proxy data are based on antarctic ice core composites. is was rst reported by Lüthi etal.65 and the revised val-
ues made available in a study by Bereiter etal.66. Reconstructed atmospheric CH4 concentrations, also based on
ice cores, were as reported by Loulergue etal.67 (on the AICC2012 age scale68). Deepwater temperature record-
ings obtained using shallow-infaunal benthic foraminifera (Mg/Ca ratios) that became available from Ocean
Drilling Program (ODP) site 1123 on the Chatham Rise, east of New Zealand were reported by Eldereld etal.69.
Causal inuence was checked between CO
2
-temperature and separately between CH4-temperature. CO
2
and CH4 data are taken beginning from the 6.5th ka on the AICC2012 scale and temperature data are taken
beginning from the 7th ka. Since the number of data points available for temperature are 792, CO
2
-temperature
analysis was done based on these 792 samples and as the number of samples of CH4 is limited to 756 beginning
from the 6.5th ka, CH4-temperature analysis was done using these 756 data points.
Monthly CO
2
‑temperature dataset. Monthly mean CO
2
data constructed from mean daily CO
2
values as well as
Northern Hemisphere’s combined land and ocean temperature anomalies for the monthly timescale are available
open source on the National Oceanic and Atmospheric Administration (NOAA) website. e CO
2
measure-
ments were made at the Mauna Loa Observatory, Hawaii. A part of the CO
2
dataset (March 1958–April 1974)
were originally obtained by C. David Keeling of the Scripps Institution of Oceanography and are available on the
Scripps website. NOAA started its own CO
2
measurements starting May 1974. e temperature anomaly dataset
is constructed from the Global Historical Climatology Network-Monthly data set70 and International Compre-
hensive Ocean-Atmosphere Data Set, also available on the NOAA website. ese data from March, 1958 to June
2021 (with 760 data points) were used to check for the causal inuence between CO
2
and temperature on the
recent timescale. Both time series were dierenced using consecutive values as they were highly non-stationary.
Yearly ENSO‑SASM dataset. 1100 Year El Niño/Southern Oscillation (ENSO) Index Reconstruction dataset,
made available open source on NOAA website and originally published in Ref.71 was used in this study. South
Asian Summer Monsoon (SASM) Index 1100 Year Reconstruction dataset, also available open source on the
NOAA website and originally published inRef.72, was the second variable used here. e aim of our study was to
check the causal dependence between these two sets of recordings taken from the year 900 AD to 2000 AD (with
one data point being available for each year).
Monthly NINO‑Indian monsoon dataset. Monthly NINO 3.4 SST Index recordings from the year 1870 to 2021
are available open source on the NOAA website. Its details are published in Ref.73. All India monthly rainfall
dataset from 1871 to 2016, available on the ocial website of World Meteorological Organization and originally
acquired from ‘Indian Institute of Tropical Meteorology’, was used for analysis. ese recordings are in the units
of mm/month. Causal inuence was checked between these two recordings using 1752 data points, ranging from
the month January, 1871 to December, 2016.
Monthly NAO‑temperature recordings. Reconstructed monthly North Atlantic Oscillation (NAO) index
recordings from December 1658 to July 2001 are available open source on the NOAA website. e reconstruc-
tions from December 1658 to November 1900 are taken from Refs.74,75 and from December 1900 to July 2001
are derived from Ref.76. Central European 500 year temperature reconstruction dataset, beginning from 1500
AD, is made available open source by NOAA National Centers for Environmental Information, under the World
Data Service for Paleoclimatology. ese were derived in the study77. We took winter only data points (months
December, January and February) starting from the December of 1658 to the February of 2001 as it is known that
the NAO inuence is strongest in winter. is yielded a total of 1029 data points. However, reconstruction based
on embedding was done for each year’s winter separately (with a time delay of 1) and not in a continuous manner
as for the other datasets, reducing the length of ordinal patterns encoded sequence to 343. Causal inuence was
checked between NAO and temperature for the encoded sequences using PCMI and PCCC and directly using
one-dimensional CMI and CCC for the 1029 length sequences.
Daily NAO‑temperature recordings. Daily NAO records are available on the NOAA website and have been pub-
lished inRefs.7880. Daily mean surface air temperature data from the Frankfurt station in Germany were taken
from the records made available online by the ECA &D project81. is data was taken from 1st January 1950
to 31st April 2021. Once again, daily values from the winter months alone (December, January and February),
comprising of 6390 data points, were extracted for the analysis. While embedding the two time series, care was
taken not to embed the recordings of winter from one year along with that of winter from the next year. Causal
inuence was checked between daily winter NAO and temperature time-series.
For the analysis of causal interaction in each of these datasets, scalar CCC and CMI as well as PCCC and
PCMI were computed as discussed in the “Methods” section. Parameters used for each of the methods are also
given in the “Methods” section (Table2). In order to assess the signicance of causality value estimated using
each measure, 100 surrogate realizations were generated using the stationary bootstrap method82 for both the time
series under consideration. Resampling of blocks of observations of random length from the original time series
is done for obtaining surrogate time series using this method. e length of each block has a geometric distribu-
tion. e probability parameter that determines the geometric probability distribution for length of each block
was set to 0.1 (as suggested inRef.82). Signicance testing of the causal interaction between original time-series
was then done using a standard one-sided z-test, with p-value set to 0.05. Table1 shows whether causal inu-
ence between the considered variables was found to be signicant using each of the causality measures. Figure3
depicts the value of the PCCC between original pair of time series with respect to the distribution of PCCC
Content courtesy of Springer Nature, terms of use apply. Rights reserved
7
Vol.:(0123456789)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
obtained using surrogate time series for two datasets: kilo-year scale CO
2
-temperature (Fig.3a,b) and yearly scale
ENSO-SASM (Fig.3c,d) recordings. In the tables, Fig.3 and in the following text, we use the notation ‘T’ to refer
to temperature generically. Which of the temperature recordings is being referred to, will be clear from context.
Discussion and conclusions
CCC has been proposed as an ‘interventional’ causality measure for time series. It does not require cause-eect
separability in time series samples and is based on dynamical evolution of processes, making it suitable for sub-
sampled time series, time series in which cause and eect are acquired at slightly dierent spatio-temporal scales
than the scales at which they naturally occur and even when there are slight discrepancies in spatio-temporal
scales of the cause and eect time series. is results in its robust performance in the case of missing samples,
non-uniformly sampled, decimated and short length data41. In this work, we have proposed the use of CCC in
combination with ordinal pattern encoding. e latter preserves the dynamics of the systems of observed vari-
ables, allowing for CCC to decipher causal relationships between variables of multi-dimensional systems while
conditioning for the presence ofother variables in these systems which might be unknown or unobserved.
Simulations of coupled Rössler systems illustrate how scalar CCC is a complete failure for observables of
coupled multi-dimensional dynamical systems, while PCCC performs well to determine the correct direction
of coupling. Comparison of PCCC with PCMI for these simulations shows that the former beats the latter by
showing better performance on shorter lengths of time series. Further, while PCMI consistently gave superior
performance for increasing noise in coupled Rössler systems, experiments with sparse data showed that PCCC
outperforms PCMI. is was the case when samples were missing from the driver and response time series either
in a synchronous or asynchronous manner.
As PCCC showed promising results for simulations with high levels of missing samples and short length,
we have applied it to make causal inferences in datasets from climatology and paleoclimatology which suer
from the issues of irregular sampling, missing samples and (or) have limited number of data points available.
Many of these datasets have been analyzed in previous studies. However, dierent studies report dierent results
probably due to the challenging nature of their recordings available or the limitation of the inference methods
applied to work on the data.
For example, the relationship between CO
2
concentrations and temperature of the atmosphere has been
studied from the mid 1800s83,84, beginning when a strong link between the two was recognized. Relatively
recently, with causal inference tools available, a number of studies have begun to look at the directionality of
relationship between the two on dierent temporal scales. To mention a few ndings, Kodra etal.85 found that
CO
2
Granger causes temperature. eir analysis was based on data taken from 1860 to 2008. Atanassio86 found
a clear evidence of GC from CO
2
to temperature using lag-augmented Wald test, for a similar time range. On
the other hand, Stern and Kaufmann87 found bidirectional GC between the two, again for a similar time range.
Kang and Larsson88 also nd bidirectional causation between the two using GC, however, by using data from
ice cores for the last 800,000 years. Many of these latter studies criticize the former. Also, the drawbacks of one
or more of these studies are explicitly mentioned inRefs.87,89,90 and highlight the issues with the data and/ or the
methodology employed. Other than GC and its extensions, a couple of other measures have also been used to
study CO
2
-T relationship. Stips etal.91 have applied a measure called Liang’s Information ow on CO
2
-T record-
ings, both on recent (1850–2005) and paleoclimate (800ka ice core reconstructions) time-scales. e study nds
unidirectional causation from CO
2
T on the recent time-scale and from T
CO
2
on the paleoclimatic scale.
Table 1. Causal inference obtained for real datasets using dierent causality measures.
indicates signicant
causality and
×
indicates non-signicant causality.
System
Measure
Direction CCC PCCC CMI PCMI
Millenial scale CO
2
-T CO
2
T
×
×
×
T
CO
2
×
×
Kilo-year scale CO
2
-T CO
2
T
×
×
×
×
T
CO
2
×
×
×
Kilo-year scale CH4-T CH4
T
×
×
×
T
CH4
×
×
×
×
Monthly scale CO
2
-T CO
2
T
×
×
×
T
CO
2
×
×
×
×
Yearly ENSO-SASM ENSO
SASM
×
×
×
SASM
ENSO
×
×
Monthly NINO-Indian monsoon NINO
Monsoon
×
Monsoon
NINO
×
Monthly NAO-European T NAO
T
×
×
T
NAO
×
×
×
×
Daily NAO-Frankfurt T NAO
T
×
×
T
NAO
×
×
×
×
Content courtesy of Springer Nature, terms of use apply. Rights reserved
8
Vol:.(1234567890)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
ey have also analysed the CH4-T relationship and found T to drive CH4 on the paleoclimate scale. is study
has been criticized by Goulet etal.92. ey show that an assumption of ‘linearity’ made by Liangs information
ow is nearly always rejected by the data. Convergent cross mapping, which is applied to the 800 ka recordings
in another study, nds a bidirectional causal inuence between both CO
2
- T and CH4-T93. Another recent
study, that infers causation using lagged cross-correlations between monthly CO
2
and temperature, taken from
the period 1980–2019, has found a bidirectional relationship on the recent monthly scale, with the dominant
inuence being from T
CO
2
94. In the light of the limitations of CCM95,96, especially for irregularly sampled or
missing data42, and of the widely known pitfalls of correlation coecient97, it is dicult to rely on the inferences
of the latter two studies.
PCCC indicates unidirectional causality from T
CO
2
on the paleoclimatic scale, using both millenial
and kilo-year scale recordings. On the recent monthly scale, the situation is reversed with CO
2
driving T. ese
results are in line with some of the existing CO
2
-T causal analysis studies and clearly PCCC does not suer the
limitations of existing approaches. On the kilo-year scale, PCCC suggests that CH4 drives T. While none of the
above discussed causality studies have found this result, other works have suggested that methane concentra-
tions modulate millenial-scale climate variability because of the sensitivity of methane to insolation98,99. Other
Figure3. PCCC surrogate analysis results. PCCC surrogate analysis results for: (a) Kilo-year scale CO
2
T,
(b) Kilo-year scale T
CO
2
, (c) Yearly ENSO
SASM, (d) SASM
ENSO. Dashed line indicates PCCC value
obtained for original series. Its position is indicated with respect to Gaussian curve tted normalized histogram
of surrogate PCCC values. PCCC for cases (b)–(d) is found to be signicant.
Table 2. Parameters corresponding to each method, used for dierent datasets.
Dataset Embedding CCC PCCC CMI/ PCMI
Rössler
η
x
1
=
5
,
η
x
2
=
5
,
m=3
L=300
,
w=30
,
δ=30
,
B=8
L=25
,
w=15
,
δ=20
τ=20
Millenial CO
2
-T
ηCO2
=
11
,
ηT
=
16
,
m
=
3
L
=
60
,
w
=
15
,
δ
=
20
,
B=4
L
=
60
,
w
=
30
,
δ
=
20
τ
=
1
30
Kilo-year CO
2
-T
η
CO
2
=
24
,
ηT
=
8
,
m=3
L=60
,
w=15
,
δ=20
,
B=4
L=30
,
w=15
,
δ=20
τ=130
Kilo-year CH4-T
η
CH
4
=
10
,
ηT
=
8
,
m=3
L=60
,
w=15
,
δ=20
,
B=4
L=30
,
w=15
,
δ=20
τ=130
Monthly CO
2
-T
ηCO2
=
3
,
ηT
=
2
,
m=3
L=60
,
w=15
,
δ=20
,
B=4
L=30
,
w=15
,
δ=20
τ=130
Yearly ENSO-SASM
η
ENSO =
1
,
η
SASM =
4
,
m=3
L=60
,
w=15
,
δ=20
,
B=4
L=60
,
w=30
,
δ=30
τ=130
Monthly NINO-India Monsoon
ηNINO
=
10
,
ηmon
=
3
,
m=3
L=60
,
w=15
,
δ=20
,
B=4
L=30
,
w=15
,
δ=20
τ=130
Monthly NAO-T
ηNAO
=
1
,
ηT
=
1
,
m=3
L=60
,
w=15
,
δ=20
,
B=4
L=30
,
w=15
,
δ=10
τ=130
Daily NAO-T
η
NAO =
15
,
η
T=
15
,
m=3
L=40
,
w=15
,
δ=20
,
B=4
L=30
,
w=15
,
δ=20
τ=130
Content courtesy of Springer Nature, terms of use apply. Rights reserved
9
Vol.:(0123456789)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
approaches implemented in this study – CCC, CMI, PCMI also do not duplicate the results obtained by PCCC
because of their specic limitations such as the inability to work on multi-dimensional, short length or irregu-
larly sampled data.
ENSO events and the Indian monsoon are other major climatic processes of global importance59. e relation-
ship between the two has been studied extensively, especially using correlation and coherence approaches100105.
While ENSO is normally expected to play a driving role, there is no clear consensus on the directionality of the
relationship between the two processes. More recently, causal inference approaches have been used to study the
nature of their coupling. InRefs.106,107, both linear and non-linear GC versions were implemented on monthly
mean ENSO-Indian monsoon time series, ranging from the period 1871–2006 and bidirectional coupling was
inferred between the two processes. Other studies have studied the causal relationship indirectly by analyzing the
ENSO-Indian Ocean Dipole link. For example, inRef.108, this connection was studied by applying GC on yearly
reanalysis as well as model data ranging from 1950–2014. e study found robust causal inuence of Indian
Ocean Dipole on ENSO while the inuence in opposite direction had lower condence. Using PCCC, we nd
a bidirectional causal inuence between yearly recordings of ENSO-SASM. However, on the shorter monthly
scales, NINO is found to drive Indian Monsoon and there is insignicant eect in the opposite direction.
Although the NAO is known to be a leading mode of winter climate variability over Europe109111, the direc-
tionality or feedback in NAO related climate eects has been studied by a few causality analysis studies9,112,113.
We investigate the NAO-European temperatures relationship on both monthly and daily time scales using winter
only data. While PCCC indicates that NAO drives central European temperatures with no signicant feedback
on the longer monthly scale, on the daily scale it shows no signicant causation in either direction. On the other
hand, CCC and CMI, based on one dimensional time series, indicate a strong inuence from NAO to Frankfurt
daily mean temperatures. is result indicates that the NAO inuence on European winter temperature on the
daily scale can be explained as a simple time-delayed transfer of information between scalar time series in which
no role is played by higher-dimensional patterns, potentially reected in ordinal coding. Such an information
transfer in the atmosphere is tied to the transfer of mass and energy as indicated in the study of climate networks
by Hlinka etal.114. CMI and PCMI estimates can be considered to be reliable for this analysis as the time-series
analyzed are long, close to 6000 time points.
CCC is free of the assumptions of linearity, requirement of long-term stationarity, extremely robust to miss-
ing samples, irregular sampling and short length data; and its combination with permutation patterns allows it
to make reliable inferences for coupled systems with multiple variables. us, we can expect our analysis and
inferences presented here on some highly-researched and long-debated climatic interactions to be highly robust
and reliable. We also expect that the use of PCCC on other challenging datasets from climatology and other elds
will be helpful to shed light on the causal linkages in considered systems.
Methods
Compression‑complexity causality (CCC) is dened as the change in the dynamical compression-complexity of
time series y when
y
is seen to be generated jointly by the dynamical evolution of both
ypast
and
xpast
as opposed
to by the reality of the dynamical evolution of
ypast
alone.
ypast ,xpast
are windows of a particular length L taken
from contemporary time points of time series y and x respectively and
y
is a window of length w following
ypast
41. Dynamical compression-complexity (CC) is estimated using the measure eort-to-compress (ETC)115
and given by:
Equation(3) computes the dynamical compression-complexity of
y
as a dynamical evolution of
ypast
alone.
Equation(4) computes the dynamical compression-complexity of
y
as a dynamical evolution of both
ypast
and
xpast
.
CCCxpast
y
is then estimated as:
Averaged CCC from x to y over the entire length of time series with the window
y
being slided by a step-size
of
δ
is estimated as:
If
CC
(�y|y
past
)CC(�y|x
past
,y
past )
, there is no causality from x to y. Surrogate time series are generated for
both x and y and the
CCCxy
values of the original and surrogate time series compared. If the CCC computed
for original time series is statistically dierent from that of surrogate time series, we can infer the presence of
causal relation from
xy
42.
CCCxy
can be both < or
>0
depending upon the nature or quality of the causal
relationship41. e magnitude indicates the strength of causation.
Selection of parameters:
L,w,δ
and the number of bins, B, for symbolizing the time series using equidistant
binning (ETC is applied to symbolic sequences) is done using parameter selection criteria given in the supple-
mentary text ofRef.41.
Permutation compression‑complexity causality is the causal inference technique proposed and imple-
mented in this work. Given a pair of time series
x1
and
x2
from dynamical systems in which causation is to be
checked from
x1
to
x2
, we rst embed the time series of the potential driver (
x1
here) in the following manner:
(3)
CC(�y|ypast )=ETC(ypast +y)ETC(ypast ),
(4)
CC(�y|ypast ,xpast )=ETC(ypast +y,xpast +y)ETC(xpast ,ypast ),
(5)
CCCxpast
y=CC(�y|ypast )CC(�y|ypast ,xpast ).
(6)
CCC
xy=CCCxpast y
=CC(�y|y
past
)CC(�y|x
past
,y
past
)
,
Content courtesy of Springer Nature, terms of use apply. Rights reserved
10
Vol:.(1234567890)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
x1(t),x1(t+η),x1(t+2η),...x1(t+(m1)η)
, where
η
is the time delay and m is the embedding dimension
of
x1
.
η
is computed as the rst minimum of auto mutual information function. e embedded time-series at
each time-point is then symbolized using permutation or ordinal patterns binning. For example, if
m=3
, the
embedding at time point t is given as
ˆx1(t)=(x1(t),x1(t+η),x1(t+2η))
. Symbols 0,1,2 are then used for
labelling the pattern for
ˆx1(t)
at each time point by sorting the embedded values in ascending order, with 2 being
used for the highest value and 0 for the lowest. If two or more values are exactly same in
ˆx1(t)
, they are labelled
dierently depending on the order of their occurrence, where the same value takes a smaller symbol at its rst (or
earlier) occurrence. However, this may lead to two or more dierent embedded vectors having the same ordinal
representation. For example, the embeddings, (3,5,5), (3,3,5) and (3,3,3), all have an ordinal representation
of (0,1,2). is limits the total number of possible patterns at time t to
m!=3!
. us,
ˆx1(t)
is symbolized to a
one dimensional sequence consisting of m! possible symbols or bins. CCC is then estimated from
ˆx1(t)
to
x2(t)
,
using Eq.(6) aer symbolizing
x2(t)
using standard equidistant binning with m! bins. us,
Permutation binning is not done for the potential driver series as it was found from simulation experiments
(Rössler data) that embedding the ‘cause’ alone works better for the CCC measure. Full dimensionality of the
cause is necessary to predict the eect. Hence, embedding only the cause helps to recover the causal relation-
ship. PCCC helps to take into account the multidimensional nature of the coupled systems. Parameter selection
for PCCC is done in the same manner as for the case of CCC, using the symbolic sequences,
ˆx1(t)
and
x2(t)
, for
selection of the parameters. When PCCC is to be estimated from
x2x1
,
x2
is embedded and
x1
remains as it
is. Just like CCC, the PCCC measure can also take negative values.
Conditional mutual information (CMI) of the variables X and Y given the variable Z is a common information-
theoretic functional used for the causality detection, and can be obtained as
where
H(X1,X2, ...|Z)=H(X1,X2, ...)H(Z)
is the conditional entropy, and the joint Shannon entropy
H(X1,X2, ...)
is dened as:
where
p(x1,x2, ...)=Pr[X1=x1,X2=x2, ...]
is the joint probability distribution function of the amplitude of var-
iables
{X1,X2, ...}
. In order to detect the coupling direction among two dynamical variables of X and Y, Paluš etal.21
used the conditional mutual information
I(X(t);Y(t+τ)|Y(t))
, that captures the net information about the
τ
-future of the process Y contained in the process X. As mentioned in the Introduction, to estimate other unknown
variables, an m-dimensional state vector X can be reconstructed as
X(t)={x(t),x(tη), ..., x(t(m1)η)}
.
Accordingly, CMI dened above can be represented by its reconstructed version for all variables of X(t),
Y(t+τ)
and Y(t). However, extensive numerical studies22 demonstrated that CMI in the form
is sucient to infer direction of coupling among dynamical variables of X(t) and Y(t). In this respect, we use this
measure to detect causality relationships in this article.
Permutation conditional mutual information (PCMI) can be obtained based on the permutation analysis
described earlier in the PCCC denition. In this approach, all marginal, joint or conditional probability distri-
bution functions of the amplitude of the variables are replaced by their symbolized versions, thus Eq.(9) should
be replaced by
where
p
(ˆ
x1,
ˆ
x2, ...
)=
Pr
[
ˆ
X1
x1,ˆ
X2
x2, ...]
is the joint probability distribution function of the symbolized
variables
ˆ
Xi(t)={Xi(t),Xi(t+η), ..., Xi(t+(m1)η)}
. By using Eqs.(8) and (11), permutation CMI can be
obtained as
I(ˆ
X(t);ˆ
Y(t+τ)|ˆ
Y(t))
. Finally, one should replace
τ
with
τ+(m1
in order to avoid any over-
lapping between the past and future of the symbolized variable
ˆ
Y
.
Parameters of the methods used were set as shown in Table2 for dierent datasets.
Data availability
e millenial scale CO
2
and temperature datasets are freely available at h t t ps:// zenodo. org/ record/ 45629 96#. YiD-
bTN_ ML3A. Kilo-year scale CO
2
, CH
4
and temperature datasets are available as supplementary les forRef.64
at https:// agupu bs. onlin elibr ary. wiley. com/ doi/ full/ 10. 1002/ 2015R G0004 82. Monthly CO
2
recordings are taken
from the NOAA repository and are available at h t t ps:// gml. noaa. gov/ ccgg/ trends/. Monthly Northern hemisphere
temperature anomaly recordings are taken from the NOAA repository and are available at https:// www. ncdc.
noaa. gov/ cag/ global/ time- series. e yearly El Niño/Southern Oscillation Index Reconstruction dataset is taken
from the NOAA repository, ht t ps:// www. ncei. noaa. gov/ acces s/ paleo- searc h/ study/ 11194. e yearly South Asian
Summer Monsoon Index Reconstruction dataset is taken from the NOAA repository, https:// www. ncei. noaa.
gov/ access/ paleo- search/ study/ 17369. Monthly Niño 3.4 SST Index dataset is taken from the NOAA reposi-
tory, available at https:// psl. noaa. g ov/ gcos_ wgsp/ Times eries/ Nino34/. Monthly all India rainfall dataset is made
available by the World Metereological Organization at http:// c lime xp. knmi. nl/ data/ pALLIN. dat. Reconstructed
(7)
PCCCx1x2=CCCˆx1x2.
(8)
I(X;Y|Z)=H(X|Z)+H(Y|Z)H(X,Y|Z)
(9)
H
(X1,X2, ...)=−
x1,x2,...
p(x1,x2, ...)log p(x1,x2, ...
)
(10)
I(X(t);Y(t+τ)|Y(t),Y(tη), ..., Y(t(m1)η))
(11)
H
(ˆ
X1,ˆ
X2, ...)=−
ˆx1,ˆx2,...
p(ˆx1,ˆx2, ...)log p(ˆx1,ˆx2, ...
)
Content courtesy of Springer Nature, terms of use apply. Rights reserved
11
Vol.:(0123456789)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
monthly North Atlantic Oscillation Index is available at the NOAA repository, https:// psl. noaa. gov/ gcos_ wgsp/
Times eries/ RNAO/. Monthly Central European 500 Year Temperature Reconstructions are available at the NOAA
repository, https:// www. ncei. noaa. gov/ access/ metad ata/ landi ng- pag e/ bin/ iso? id= noaa- recon- 9970. Daily North
Atlantic Oscillation Index is available at the NOAA repository, https:// www . cpc. ncep. noaa. gov/ produ cts/ preci p/
CWlink/ pna/ nao. shtml. Daily Frankfurt air temperatures are made available by the ECA &D project at https://
www. ecad. eu/ daily data/ prede ned series. php.
Code availability
e computer codes used in this study are freely available at ht t ps:// github. com/ Adi ti K athp alia/ Permu tatio nCCC
under the Apache 2.0 Open-source license.
Received: 14 May 2022; Accepted: 9 August 2022
References
1. Pearl, J. & Mackenzie, D. e Book of Why: e New Science of Cause and Eect (Basic Books, 2018).
2. Kathpalia, A. & Nagaraj, N. Measuring causality. Resonance 26, 191 (2021).
3. Wiener, N. e theory of prediction. Mod. Math. Eng. 1, 125–139 (1956).
4. Granger, C. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37, 424–438 (1969).
5. Geweke, J. Inference and causality in economic time series models. Handb. Econom. 2, 1101–1144 (1984).
6. Hiemstra, C. & Jones, J. D. Testing for linear and nonlinear granger causality in the stock price-volume relation. J. Financ. 49,
1639–1664 (1994).
7. Chiou-Wei, S. Z., Chen, C.-F. & Zhu, Z. Economic growth and energy consumption revisited: Evidence from linear and nonlinear
granger causality. Energy Econ. 30, 3063–3076 (2008).
8. Seth, A. K., Barrett, A. B. & Barnett, L. Granger causality analysis in neuroscience and neuroimaging. J. Neurosci. 35, 3293–3297
(2015).
9. Mosedale, T. J., Stephenson, D. B., Collins, M. & Mills, T. C. Granger causality of coupled climate processes: Ocean feedback on
the north Atlantic oscillation. J. Clim. 19, 1182–1194 (2006).
10. Tirabassi, G., Masoller, C. & Barreiro, M. A study of the air–sea interaction in the south Atlantic convergence zone through
granger causality. Int. J. Climatol. 35, 3440–3453 (2015).
11. Runge, J. et al. Inferring causation from time series in earth system sciences. Nat. Commun. 10, 1–13 (2019).
12. Bell, D., Kay, J. & Malley, J. A non-parametric approach to non-linear causality testing. Econ. Lett. 51, 7–18 (1996).
13. Chen, Y., Rangarajan, G., Feng, J. & Ding, M. Analyzing multiple nonlinear time series with extended granger causality. Phys.
Lett. A 324, 26–35 (2004).
14. Schi, S. J., So, P., Chang, T., Burke, R. E. & Sauer, T. Detecting dynamical interdependence and generalized synchrony through
mutual prediction in a neural ensemble. Phys. Rev. E 54, 6708 (1996).
15. LeVanQuyen, M., Martinerie, J., Adam, C. & Varela, F.J. Nonlinear analyses of interictal EEG map the brain interdependences
in human focal epilepsy. Physica D Nonlinear Phenomena127, 250–266 (1999).
16. Marinazzo, D., Pellicoro, M. & Stramaglia, S. Kernel method for nonlinear granger causality. Phys. Rev. Lett. 100, 144103 (2008).
17. Baccalá, L. A. & Sameshima, K. Partial directed coherence: A new concept in neural structure determination. Biol. Cybern. 84,
463–474 (2001).
18. Kamiński, M., Ding, M., Tr uccolo, W. A. & Bressler, S. L. Evaluating causal relations in neural systems: Granger causality, directed
transfer function and statistical assessment of signicance. Biol. Cybern. 85, 145–157 (2001).
19. Korzeniewska, A., Mańczak, M., Kamiński, M., Blinowska, K. J. & Kasicki, S. Determination of information ow direction among
brain structures by a modied directed transfer function (dDTF) method. J. Neurosci. Methods 125, 195–207 (2003).
20. Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 85, 461–464 (2000).
21. Paluš, M., Komárek, V., Hrnčíř, Z. & Štěrbová, K. Synchronization as adjustment of information rates: Detection from bivariate
time series. Phys. Rev. E 63, 046211 (2001).
22. Paluš, M. & Vejmelka, M. Directionality of coupling from bivariate time series: How to avoid false causalities and missed con-
nections. Phys. Rev. E 75, 056211 (2007).
23. Vicente, R., Wibral, M., Lindner, M. & Pipa, G. Transfer entropy: A model-free measure of eective connectivity for the neuro-
sciences. J. Comput. Neurosci. 30, 45–67 (2011).
24. Bauer, M., Cox, J. W., Caveness, M. H., Downs, J. J. & ornhill, N. F. Finding the direction of disturbance propagation in a
chemical process using transfer entropy. IEEE Trans. Control Syst. Technol. 15, 12–21 (2007).
25. Dimp, T. & Peter, F. J. Using transfer entropy to measure information ows between nancial markets. Stud. Nonlinear Dyn.
Econom. 17, 85–102 (2013).
26. Paluš, M. Multiscale atmospheric dynamics: Cross-frequency phase-amplitude coupling in the air temperature. Phys. Rev. Lett.
112, 078702 (2014).
27. Jajcay, N., Kravtsov, S., Sugihara, G., Tsonis, A. A. & Paluš, M. Synchronization and causality across time scales in el niño southern
oscillation. NPJ Climate Atmos. Sci. 1, 1–8 (2018).
28. Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980, 366–381 (Springer,
1981).
29. Fraser, A. M. & Swinney, H. L. Independent coordinates for strange attractors from mutual information. Phys. Rev. A 33, 1134
(1986).
30. Wibral, M. et al. Measuring information-transfer delays. PloS one 8, e55809 (2013).
31. Sugihara, G., May, R., Ye, H., Hsieh, C. & Deyle, E. Detecting causality in complex ecosystems. Science 338, 496–500 (2012).
32. Harnack, D., Laminski, E., Schünemann, M. & Pawelzik, K. R. Topological causality in dynamical systems. Phys. Rev. Lett. 119,
098301 (2017).
33. Krakovská, A. & Hanzely, F. Testing for causality in reconstructed state spaces by an optimized mixed prediction method. Phys.
Re v. E 94, 052203 (2016).
34. Barrios, A., Trincado, G. & Garreaud, R. Alternative approaches for estimating missing climate data: Application to monthly
precipitation records in south-central Chile. For. Ecosyst. 5, 1–10 (2018).
35. Anderson, C. I. & Gough, W. A. Accounting for missing data in monthly temperature series: Testing rule-of-thumb omission
of months with missing values. Int. J. Climatol. 38, 4990–5002 (2018).
36. DiCesare, G. Imputation, estimation and missing data in nance. Ph.D. esis, University of Waterloo (2006).
37. John, C., Ekpenyong, E. J. & Nworu, C. C. Imputation of missing values in economic and nancial time series data using ve
principal component analysis approaches. CBN J. Appl. Stat. (JAS) 10, 3 (2019).
38. Gyimah, S. Missing data in quantitative social research. PSC Discuss. Papers Ser. 15, 1 (2001).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
12
Vol:.(1234567890)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
39. Kulp, C. & Tracy, E. e application of the transfer entropy to Gappy time series. Phys. Lett. A 373, 1261–1267 (2009).
40. Smirnov, D. & Bezruchko, B. Spurious causalities due to low temporal resolution: Towards detection of bidirectional coupling
from time series. Europhys. Lett. 100, 10005 (2012).
41. Kathpalia, A. & Nagaraj, N. Data based intervention approach for complexity-causality measure. PeerJ Comput. Sci. e196, 5
(2019).
42. Kathpalia, A. eoretical and Experimental Investigations into Causality, its Measures and Applications. Ph.D. esis, NIAS (2021).
43. Nagaraj, N. & Balasubramanian, K. Dynamical complexity of short and noisy time series. Eur. Phys. J. Special Top. 226, 1–14
(2017).
44. Staniek, M. & Lehnertz, K. Symbolic transfer entropy. Phys. Rev. Lett. 100, 158101 (2008).
45. Staniek, M. & Lehnertz, K. Symbolic transfer entropy: Inferring directionality in biosignals. Biomed. Tech. 54, 323–328 (2009).
46. Kugiumtzis, D. Partial transfer entropy on rank vectors. Eur. Phys. J. Special Top. 222, 401–420 (2013).
47. Papana, A., Kyrtsou, C., Kugiumtzis, D. & Diks, C. Simulation study of direct causality measures in multivariate time series.
Entropy 15, 2635–2661 (2013).
48. Li, X. & Ouyang, G. Estimating coupling direction between neuronal populations with permutation conditional mutual infor-
mation. Neuroimage 52, 497–507 (2010).
49. Wen, D. et al. Estimating coupling strength between multivariate neural series with multivariate permutation conditional mutual
information. Neural Netw. 110, 159–169 (2019).
50. Bandt, C. & Pompe, B. Permutation entropy: A natural complexity measure for time series. Phys. Rev. Lett. 88, 174102 (2002).
51. Fadlallah, B., Chen, B., Keil, A. & Principe, J. Weighted-permutation entropy: A complexity measure for time series incorporat-
ing amplitude information. Phys. Rev. E 87, 022911 (2013).
52. Amigó, J. Permutation Complexity in Dynamical Systems: Ordinal Patterns, Permutation Entropy and All at (Springer Science
& Business Media, 2010).
53. Z anin, M., Zunino, L., Rosso, O. A. & Papo, D. Permutation entropy and its main biomedical and econophysics applications: A
review. Entropy 14, 1553–1577 (2012).
54. Keller, K., Unakafov, A. M. & Unakafova, V. A. Ordinal patterns, entropy, and EEG. Entropy 16, 6212–6239 (2014).
55. Zanin, M. & Olivares, F. Ordinal patterns-based methodologies for distinguishing chaos from noise in discrete time series.
Commun. Phys. 4, 1–14 (2021).
56. McCullough, M., Small, M., Stemler, T. & Iu, H.H.-C. Time lagged ordinal partition networks for capturing dynamics of con-
tinuous dynamical systems. Chaos Interdiscip. J. Nonlinear Sci. 25, 053101 (2015).
57. Bandt, C., Keller, G. & Pompe, B. Entropy of interval maps via permutations. Nonlinearity 15, 1595 (2002).
58. Amigó, J. M., Kennel, M. B. & Kocarev, L. e permutation entropy rate equals the metric entropy rate for ergodic information
sources and ergodic dynamical systems. Physica D 210, 77–95 (2005).
59. Solomon, S. et al. Climate Change 2007‑e Physical Science Basis: Working Group I Contribution to the Fourth Assessment Report
of the IPCC Vol. 4 (Cambridge University Press, 2007).
60. Press, W. H., Flannery, B. P., Teukolsky, S. A., Vetterling, W. T. & Kramer, P. B. Numerical recipes: e art of scientic computing.
Phys. Today 40, 120 (1987).
61. eiler, J., Eubank, S., Longtin, A., Galdrikian, B. & Farmer, J. D. Testing for nonlinearity in time series: e method of surrogate
data. Physica D 58, 77–94 (1992).
62. Mills, B. J. et al. Modelling the long-term carbon cycle, atmospheric co2, and earth surface temperature from late neoproterozoic
to present day. Gondwana Res. 67, 172–186 (2019).
63. Wong, T. E., Cui, Y., Royer, D. L. & Keller, K. A tighter constraint on earth-system sensitivity from long-term temperature and
carbon-cycle observations. Nat. Commun. 12, 1–8 (2021).
64. Past Interglacials Working Group of PAGES. Interglacials of the last 800,000 years. Rev. Geophys. 54, 162–219 (2016).
65. Lüthi, D. et al. High-resolution carbon dioxide concentration record 650,000–800,000 years before present. Nature 453, 379–382
(2008).
66. Bereiter, B. et al. Revision of the Epica dome c co2 record from 800 to 600 kyr before present. Geophys. Res. Lett. 42, 542–549
(2015).
67. Loulergue, L. et al. Orbital and millennial-scale features of atmospheric ch 4 over the past 800,000 years. Nature 453, 383–386
(2008).
68. Bazin, L. et al. An optimized multi-proxy, multi-site Antarctic ice and gas orbital chronology (aicc2012): 120–800 ka. Climate
Past 9, 1715–1731 (2013).
69. Eldereld, H. et al. Evolution of ocean temperature and ice volume through the mid-pleistocene climate transition. Science 337,
704–709 (2012).
70. Lawrimore, J. H. et al. An overview of the global historical climatology network monthly mean temperature data set, version 3.
J. Geophys. Res. Atmos.https:// doi. org/ 10. 1029/ 2011J D0161 87 (2011).
71. Li, J. et al. Interdecadal modulation of el niño amplitude during the past millennium. Nat. Clim. Chang. 1, 114–118 (2011).
72. Shi, F., Li, J. & Wilson, R. J. A tree-ring reconstruction of the south Asian summer monsoon index over the past millennium.
Sci. Rep. 4, 1–8 (2014).
73. Rayner, N. et al. Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth
century. J. Geophys. Res. Atmos.https:// doi. org/ 10. 1029/ 2002J D0026 70 (2003).
74. Luterbacher, J., Schmutz, C., Gyalistras, D., Xoplaki, E. & Wanner, H. Reconstruction of monthly nao and eu indices back to ad
1675. Geophys. Res. Lett. 26, 2745–2748 (1999).
75. Luterbacher, J. et al. Extending north Atlantic oscillation reconstructions back to 1500. Atmos. Sci. Lett. 2, 114–124 (2001).
76. Trenberth, K. E. & Paolino, D. A. Jr. e northern hemisphere sea-level pressure data set: Trends, errors and discontinuities.
Mon. Weather Rev. 108, 855–872 (1980).
77. Dobrovolnỳ, P. et al. Monthly, seasonal and annual temperature reconstructions for central Europe derived from documentary
evidence and instrumental records since ad 1500. Clim. Change 101, 69–107 (2010).
78. Barnston, A. G. & Livezey, R. E. Classication, seasonality and persistence of low-frequency atmospheric circulation patterns.
Mon. Weather Rev. 115, 1083–1126 (1987).
79. Chen, W. Y. & Van den Dool, H. Sensitivity of teleconnection patterns to the sign of their primary action center. Mon. Weather
Rev. 131, 2885–2899 (2003).
80. Van den Dool, H., Saha, S. & Johansson, A. Empirical orthogonal teleconnections. J. Clim. 13, 1421–1435 (2000).
81. Klein Tank, A. et al. Daily dataset of 20th-century surface air temperature and precipitation series for the European climate
assessment. Int. J. Climatol. J. R. Meteorol. Soc. 22, 1441–1453 (2002).
82. Politis, D. N. & Romano, J. P. e stationary bootstrap. J. Am. Stat. Assoc. 89, 1303–1313 (1994).
83. Foote, E. Art. xxxi.–circumstances aecting the heat of the suns rays. American Journal of Science and Arts (1820‑1879)22, 382
(1856).
84. Arrhenius, S. Xxxi. on the inuence of carbonic acid in the air upon the temperature of the ground. e London, Edinburgh,
and Dublin Philosophical Magazine and Journal of Science41, 237–276 (1896).
85. Kodra, E., Chatterjee, S. & Ganguly, A. R. Exploring granger causality between global average observed time series of carbon
dioxide and temperature. eoret. Appl. Climatol. 104, 325–335 (2011).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
13
Vol.:(0123456789)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
86. Attanasio, A. Testing for linear granger causality from natural/anthropogenic forcings to global temperature anomalies. eoret.
Appl. Climatol. 110, 281–289 (2012).
87. Stern, D. I. & Kaufmann, R. K. Anthropogenic and natural causes of climate change. Clim. Change 122, 257–269 (2014).
88. Kang, J. & Larsson, R. What is the link between temperature and carbon dioxide levels? A granger causality analysis based on
ice core data. eoret. Appl. Climatol. 116, 537–548 (2014).
89. Triacca, U. On the use of granger causality to investigate the human inuence on climate. eoret. Appl. Climatol. 69, 137–138
(2001).
90. Triacca, U. Is granger causality analysis appropriate to investigate the relationship between atmospheric concentration of carbon
dioxide and global surface air temperature?. eoret. Appl. Climatol. 81, 133–135 (2005).
91. Stips, A., Macias, D., Coughlan, C., Garcia-Gorriz, E. & San Liang, X. On the causal structure between co2 and global tempera-
ture. Sci. Rep. 6, 1–9 (2016).
92. Goulet Coulombe, P. & Göbel, M. On spurious causality, co2, and global temperature. Econometrics 9, 33 (2021).
93. Van Nes, E. H. et al. Causal feedbacks in climate change. Nat. Clim. Chang. 5, 445–448 (2015).
94. Koutsoyiannis, D. & Kundzewicz, Z. W. Atmospheric temperature and co2: Hen-or-egg causality?. Sci 2, 83 (2020).
95. Mønster, D., Fusaroli, R., Tylén, K., Roepstor, A. & Sherson, J. F. Causal inference from noisy time-series data—Testing the
convergent cross-mapping algorithm in the presence of noise and external inuence. Futur. Gener. Comput. Syst. 73, 52–62
(2017).
96. Schiecke, K., Pester, B., Feucht, M., Leistritz, L. & Witte, H. Convergent cross mapping: Basic concept, inuence of estimation
parameters and practical application. In 2015 37th Annual International Conference of the IEEE Engineering in Medicine and
Biology Society (EMBC), 7418–7421 (IEEE, 2015).
97. Janse, R. J. et al. Conducting correlation analysis: Important limitations and pitfalls. Clin. Kidney J. 14, 2337 (2021).
98. Brook, E. J., Sowers, T. & Orchardo, J. Rapid variations in atmospheric methane concentration during the past 110,000 years.
Science 273, 1087–1091 (1996).
99. irumalai, K., Clemens, S. C. & Partin, J. W. Methane, monsoons, and modulation of millennial-scale climate. Geophys. Res.
Lett. 47, e2020GL087613 (2020).
100. Kripalani, R. H. & Kulkarni, A. Rainfall variability over south-east Asia-connections with Indian monsoon and enso extremes:
New perspectives. Int. J. Climatol. J. R. Meteorol. Soc. 17, 1155–1168 (1997).
101. Kumar, K. K., Rajagopalan, B. & Cane, M. A. On the weakening relationship between the Indian monsoon and enso. Science
284, 2156–2159 (1999).
102. Krishnamurthy, V. & Goswami, B. N. Indian monsoon-enso relationship on interdecadal timescale. J. Clim. 13, 579–595 (2000).
103. Sarkar, S., Singh, R. P. & Kafatos, M. Further evidences for the weakening relationship of Indian rainfall and enso over India.
Geophys. Res. Lett.https:// doi. org/ 10. 1029/ 2004G L0202 59 (2004).
104. Maraun, D. & Kurths, J. Epochs of phase coherence between el nino/southern oscillation and Indian monsoon. Geophys. Res.
Lett.https:// doi. org/ 10. 1029/ 2005G L0232 25 (2005).
105. Zubair, L. & Ropelewski, C. F. e strengthening relationship between ENSO and northeast monsoon rainfall over Sri Lanka
and southern India. J. Clim. 19, 1567–1575 (2006).
106. Mokhov, I. I. et al. Alternating mutual inuence of el-niño/southern oscillation and Indian monsoon. Geophys. Res. Lett.https://
doi. org/ 10. 1029/ 2010G L0459 32 (2011).
107. Mokhov, I., Smirnov, D., Nakonechny, P., Kozlenko, S. & Kurths, J. Relationship between el-nino/southern oscillation and the
Indian monsoon. Izv. Atmos. Ocean. Phys. 48, 47–56 (2012).
108. Le, T., Ha, K.-J., Bae, D.-H. & Kim, S.-H. Causal eects of Indian ocean dipole on el niño-southern oscillation during 1950–2014
based on high-resolution models and reanalysis data. Environ. Res. Lett. 15, 1040b6 (2020).
109. Wanner, H. et al. North Atlantic oscillation-concepts and studies. Surv. Geophys. 22, 321–381 (2001).
110. Hurrell, J. W. & Deser, C. North Atlantic climate variability: e role of the north Atlantic oscillation. J. Mar. Syst. 79, 231–244
(2010).
111. Deser, C., Hurrell, J. W. & Phillips, A. S. e role of the north Atlantic oscillation in European climate projections. Clim. Dyn.
49, 3141–3157 (2017).
112. Wang, W., Anderson, B. T., Kaufmann, R. K. & Myneni, R. B. e relation between the north Atlantic oscillation and SSTS in
the north Atlantic basin. J. Clim. 17, 4752–4759 (2004).
113. Wang, G., Zhang, N., Fan, K. & Palus, M. Central European air temperature: Driving force analysis and causal inuence of NAO.
eoret. Appl. Climatol. 137, 1421–1427 (2019).
114. Hlinka, J., Jajcay, N., Hartman, D. & Paluš, M. Smooth information ow in temperature climate network reects mass transport.
Chaos Interdiscip. J. Nonlinear Sci. 27, 035811 (2017).
115. Nagaraj, N., Balasubramanian, K. & Dey, S. A new complexity measure for time series analysis and classication. Eur. Phys. J.
Special Top. 222, 847–860 (2013).
Acknowledgements
is study is supported by the Czech Science Foundation, Project No.GA19-16066S and by the Czech Academy
of Sciences, Praemium Academiae awarded to M. Paluš.
Author contributions
A.K. performed the research, implementation and computations of CCC and PCCC, wrote the manuscript dra;
P.M. implemented and computed CMI and PCMI; M.P. proposed and led the project. All authors contributed
to the nal version of the manuscript.
Competing interests
e authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to M.P.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
14
Vol:.(1234567890)
Scientic Reports | (2022) 12:14170 | https://doi.org/10.1038/s41598-022-18288-4
www.nature.com/scientificreports/
Open Access is article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the articles Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.
© e Author(s) 2022
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Stips et al. (2016) use information flows (Liang (2008, 2014)) to establish causality from various forcings to global temperature. We show that the formulas being used hinge on a simplifying assumption that is nearly always rejected by the data. We propose the well-known forecast error variance decomposition based on a Vector Autoregression as an adequate measure of information flow, and find that most results in Stips et al. (2016) cannot be corroborated. Then, we discuss which modeling choices (e.g., the choice of CO2 series and assumptions about simultaneous relationships) may help in extracting credible estimates of causal flows and the transient climate response simply by looking at the joint dynamics of two climatic time series.
Article
Full-text available
One of the most important aspects of time series is their degree of stochasticity vs. chaoticity. Since the discovery of chaotic maps, many algorithms have been proposed to discriminate between these two alternatives and assess their prevalence in real-world time series. Approaches based on the combination of “permutation patterns” with different metrics provide a more complete picture of a time series’ nature, and are especially useful to tackle pathological chaotic maps. Here, we provide a review of such approaches, their theoretical foundations, and their application to discrete time series and real-world problems. We compare their performance using a set of representative noisy chaotic maps, evaluate their applicability through their respective computational cost, and discuss their limitations. Here, Zanin and Olivares review the permutation patterns-based metrics used to distinguish chaos from stochasticity in discrete time series. They analyse their performance and computational cost, and compare their applicability to real-world time series.
Article
Full-text available
We present a dataset of daily resolution climatic time series that has been compiled for the European Climate Assessment (ECA). As of December 2001, this ECA dataset comprises 199 series of minimum, maximum and/or daily mean temperature and 195 series of daily precipitation amount observed at meteorological stations in Europe and the Middle East. Almost all series cover the standard normal period 1961-90, and about 50% extends back to at least 1925. Part of the dataset (90%) is made available for climate research on CDROM and through the Internet (at http://www.knmi.nl/samenw/eca). A comparison of the ECA dataset with existing gridded datasets, having monthly resolution, shows that correlation coefficients between ECA stations and nearest land grid boxes between 1946 and 1999 are higher than 0.8 for 93% of the temperature series and for 51% of the precipitation series. The overall trends in the ECA dataset are of comparable magnitude to those in the gridded datasets. The potential of the ECA dataset for climate studies is demonstrated in two examples. In the first example, it is shown that the winter (October-March) warming in Europe in the 1976-99 period is accompanied by a positive trend in the number of warm-spell days at most stations, but not by a negative trend in the number of cold-spell days. Instead, the number of cold-spell days increases over Europe. In the second example, it is shown for winter precipitation between 1946 and 1999 that positive trends in the mean amount per wet day prevail in areas that are getting drier and wetter. Because of its daily resolution, the ECA dataset enables a variety of empirical climate studies, including detailed analyses of changes in the occurrence of extremes in relation to changes in mean temperature and total precipitation.
Article
Full-text available
The correlation coefficient is a statistical measure often used in studies to show an association between variables or to look at the agreement between two methods. In this paper, we will discuss not only the basics of the correlation coefficient, such as its assumptions and how it is interpreted, but also important limitations when using the correlation coefficient, such as its assumption of a linear association and its sensitivity to the range of observations. We will also discuss why the coefficient is invalid when used to assess agreement of two methods aiming to measure a certain value, and discuss better alternatives, such as the intraclass coefficient and Bland–Altman’s limits of agreement. The concepts discussed in this paper are supported with examples from literature in the field of nephrology.
Article
Full-text available
The long-term temperature response to a given change in CO2 forcing, or Earth-system sensitivity (ESS), is a key parameter quantifying our understanding about the relationship between changes in Earth’s radiative forcing and the resulting long-term Earth-system response. Current ESS estimates are subject to sizable uncertainties. Long-term carbon cycle models can provide a useful avenue to constrain ESS, but previous efforts either use rather informal statistical approaches or focus on discrete paleoevents. Here, we improve on previous ESS estimates by using a Bayesian approach to fuse deep-time CO2 and temperature data over the last 420 Myrs with a long-term carbon cycle model. Our median ESS estimate of 3.4 °C (2.6-4.7 °C; 5-95% range) shows a narrower range than previous assessments. We show that weaker chemical weathering relative to the a priori model configuration via reduced weatherable land area yields better agreement with temperature records during the Cretaceous. Research into improving the understanding about these weathering mechanisms hence provides potentially powerful avenues to further constrain this fundamental Earth-system property. Earth-system sensitivity (ESS) describes the long-term temperature response for a given change in atmospheric CO2 and, as such, is a crucial parameter to assess future climate change. Here, the authors use a Bayesian model with data from the last 420 Myrs to reduce uncertainties and estimate ESS to be around 3.4 °C.
Article
Full-text available
Causality testing methods are being widely used in various disciplines of science. Model-free methods for causality estimation are very useful, as the underlying model generating the data is often unknown. However, existing model-free/data-driven measures assume separability of cause and effect at the level of individual samples of measurements and unlike model-based methods do not perform any intervention to learn causal relationships. These measures can thus only capture causality which is by the associational occurrence of 'cause' and 'effect' between well separated samples. In real-world processes, often 'cause' and 'effect' are inherently inseparable or become inseparable in the acquired measurements. We propose a novel measure that uses an adaptive interventional scheme to capture causality which is not merely associational. The scheme is based on characterizing complexities associated with the dynamical evolution of processes on short windows of measurements. The formulated measure, Compression-Complexity Causality is rigorously tested on simulated and real datasets and its performance is compared with that of existing measures such as Granger Causality and Transfer Entropy. The proposed measure is robust to the presence of noise, long-term memory, filtering and decimation, low temporal resolution (including aliasing), non-uniform sampling, finite length signals and presence of common driving variables. Our measure outperforms existing state-of-the-art measures, establishing itself as an effective tool for causality testing in real world applications.
Article
Full-text available
Determining and measuring cause-effect relationships is fundamental to most scientific studies of natural phenomena. The notion of causation is distinctly different from correlation which only looks at the association of trends or patterns in measurements. In this article, we review different notions of causality and focus especially on measuring causality from time-series data. Causality testing finds numerous applications in diverse disciplines such as neuroscience, econometrics, climatology, physics, and artificial intelligence.
Article
Full-text available
It is common knowledge that increasing CO 2 concentration plays a major role in enhancement of the greenhouse effect and contributes to global warming. The purpose of this study is to complement the conventional and established theory, that increased CO 2 concentration due to human emissions causes an increase in temperature, by considering the reverse causality. Since increased temperature causes an increase in CO 2 concentration, the relationship of atmospheric CO 2 and temperature may qualify as belonging to the category of "hen-or-egg" problems, where it is not always clear which of two interrelated events is the cause and which the effect. We examine the relationship of global temperature and atmospheric carbon dioxide concentration in monthly time steps, covering the time interval 1980-2019 during which reliable instrumental measurements are available. While both causality directions exist, the results of our study support the hypothesis that the dominant direction is T → CO 2. Changes in CO 2 follow changes in T by about six months on a monthly scale, or about one year on an annual scale. We attempt to interpret this mechanism by involving biochemical reactions as at higher temperatures, soil respiration and, hence, CO 2 emissions, are increasing.
Article
Full-text available
Uncertainty exists regarding the interaction between the El Niño–Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) where ENSO is normally expected to be the leading mode. Moreover, the effect of global warming on the relationship between these two modes remains unexplored. Therefore, we investigated the ENSO–IOD linkage for the years 1950–2014 using reanalysis data and high-resolution climate model simulations. The 1950-2014 period is of particularly interest as rapid Indian Ocean warming since the 1950s has had huge impacts worldwide. Our results showed that the IOD had robust causal effects on ENSO, whereas the impacts of ENSO on IOD exhibited lower confidence. All models demonstrated that the IOD was unlikely to have no causal effects on ENSO, whereas eight out of fifteen studied models and the reanalysis data showed significant causal effects at the 10% significance level. The analyses provide new evidence that ENSO interannual variability might be forced by changes in Indo-Pacific Walker circulation induced by the IOD. Weak control of ENSO on the IOD is likely due to nonsignificant effects of ENSO on western tropical Indian Ocean, implying that the rapid warming environment in the Indian Ocean may fundamentally modulate the relationship between the IOD and ENSO. We find high agreement between models and reanalysis data in simulating the ENSO-IOD connection. These results indicate that the effects of the IOD on ENSO might be more significant than previously thought.
Article
Full-text available
Earth's orbital geometry exerts a profound influence on climate by regulating changes in incoming solar radiation. Superimposed on orbitally paced climate change, Pleistocene records reveal substantial millennial‐scale variability characterized by abrupt changes and rapid swings. However, the extent to which orbital forcing modulates the amplitude and timing of these millennial variations is unclear. Here we isolate the magnitude of millennial‐scale variability (MMV) in two well‐dated records, both linked to precession cycles (19,000‐ and 23,000‐year periodicity): composite Chinese speleothem δ¹⁸O, commonly interpreted as a proxy for Asian monsoon intensity, and atmospheric methane. At the millennial timescale (1,000–10,000 years), we find a fundamental decoupling wherein precession directly modulates the MMV of methane but not that of speleothem δ¹⁸O, which is shown to be strikingly similar to the MMV of Antarctic ice core δ²H. One explanation is that the MMV of methane responds to changes in midlatitude to high‐latitude insolation, whereas speleothem δ¹⁸O is modulated by internal climate feedbacks.