Content uploaded by Demetris Koutsoyiannis

Author content

All content in this area was uploaded by Demetris Koutsoyiannis on Jun 04, 2019

Content may be subject to copyright.

A preview of the PDF is not available

Time’s arrow has important philosophical, scientific and technical connotations and is closely related to randomness as well as to causality. Stochastics offers a frame to explore, characterize and simulate irreversibility in natural processes. Indicators of irreversibility are different if we study a single process alone, or more processes simultaneously. In the former case, description of irreversibility requires at least third-order properties, while in the latter lagged second-order properties may suffice to reveal causal relations. Several examined data sets indicate that in atmospheric processes irreversibility is negligible at hydrologically relevant time scales, but may exist at the finest scales. However, the irreversibility of streamflow is marked for scales of several days and this highlights the need to reproduce it in flood simulations. For this reason, two methods of generating time series with irreversibility are developed, from which one, based on an asymmetric moving average scheme, proves to be satisfactory.

Figures - uploaded by Demetris Koutsoyiannis

Author content

All figure content in this area was uploaded by Demetris Koutsoyiannis

Content may be subject to copyright.

Content uploaded by Demetris Koutsoyiannis

Author content

All content in this area was uploaded by Demetris Koutsoyiannis on Jun 04, 2019

Content may be subject to copyright.

A preview of the PDF is not available

... For example, the study by Koutsoyiannis and Kundzewicz (2020) concluded, making use of the hen-or-egg causality concept and based on the analysis of modern measurements of T and CO₂, that the principal causality direction is → [CO₂], despite the common conviction that the opposite is true. In addition, using palaeoclimatic proxy data from Vostok ice cores, Koutsoyiannis (2019) and Koutsoyiannis and Kundzewicz (2020) found a time lag of [CO₂] from T of a thousand years. Here we re-examine both modern and paleo data sets with our proposed causality detection methodology. ...

... To this aim we use the datasets from the Vostok ice cores (Jouzel et al., 1987;Petit et al., 1999;Caillon et al., 2003) which were originally given for an irregular time step and were regularized in the study of Koutsoyiannis (2019) We study again the processes Δ and Δln [CO₂], where the differences are taken for 1 time step (1000 years). Figure 16 gives the obtained IRFs in the directions Δ → As the proxy data sets are free of monotonic trends and produce reasonable empirical autocorrelation functions (see Koutsoyiannis, 2019), here we could also apply our framework for the non-differenced processes. We initially note that, if is a differenced process (where the differences are taken for 1 time step) and the non- ...

In a companion paper, we develop the theoretical background of a stochastic approach to causality with the objective of formulating necessary conditions that are operationally useful in identifying or falsifying causality claims. Starting from the idea of stochastic causal systems, the approach extends it to the more general concept of hen-or-egg causality, which includes as special cases the classic causal, and the potentially causal and anti-causal systems. The framework developed is applicable to large-scale open systems, which are neither controllable nor repeatable. In this paper, we illustrate and showcase the proposed framework in a number of case studies. Some of them are controlled synthetic examples and are conducted as a proof of applicability of the theoretical concept, to test the methodology with a priori known system properties. Others are real-world studies on interesting scientific problems in geophysics, and in particular hydrology and climatology.

... This criterion is also known as the post hoc ergo propter hoc adage meaning "after this, thus because of this". The principle of priority establishes the importance of the temporal dimension and time asymmetry regarding causality (e.g., Koutsoyiannis, 2019). Secondly, the cause and the effect coexist on a space-time continuum offering the opportunity for interactions (d23). ...

... However, Dooge was rather a Newtonian mechanicist and determinicist hydrologist. The added values of the information theory, its analogy with thermodynamics, and pertinence to deal with thermodynamically irreversible processes in hydrology is now considered in the emerging debates about causality in hydrology (Goodwell et al., 2020;Koutsoyiannis, 2019). ...

Hydrological systems seem simple, "everything flows", but prove to be even more complex when one tries to differentiate and characterize flows in detail. Hydrology has thus developed a plurality of models that testify to the complexity of hydrological systems and the variety of their causal representations, from the simplest to the most sophisticated. Beyond a subjective complexity linked to our difficulty in understanding or our attention to detail, hydrological systems also present an intrinsic complexity. These include the nonlinearity of processes and interactions between variables, the number of variables in the system, or dimension, and how they are organized to further simplify or complicate the system's dynamics. The thesis addresses these aspects of hydrological complexity. An epistemological and historical analysis of the concept of causality explores the human understanding of hydrological systems. Based on empirical approaches applied to the limestone karstic system of the Lhomme at Rochefort in Belgium, the thesis then studies methods to analyze the nonlinearity of the Lhomme river recession and associate it with the geomorphological complexity of the watershed. The thesis also handles the discrimination of dominant dynamic behaviors in the hydrological continuum of the Rochefort caves subsurface based on an electrical resistivity model of the subsurface and clustering methods grouping time-series according to their similarity.
Ref: Delforge, Damien. Causal analysis of hydrological systems : case study of the Lhomme karst system, Belgium . Prom. : Vanclooster, Marnik ; Van Camp, Michel
Permalink: http://hdl.handle.net/2078.1/240635

... For example, it is well known that a Gaussian process is necessarily symmetric in time and, thus, cannot capture time directionality, otherwise known as irreversibility or time's arrow [25]. On the other hand, it is known that, in several natural processes, time's arrow is present [26,27], and to reproduce it, we need processes with asymmetric distributions, which can also exhibit asymmetry in time. ...

... 1. From the continuous-time stochastic model, expressed through its climacogram ( ), we calculate its autocovariance function in discrete time (assuming time step ) by Equation (26). (This step is obviously omitted if the model is already expressed in discrete time through its autocovariance function). ...

We outline and test a new methodology for genuine simulation of stochastic processes with any dependence structure and any marginal distribution. We reproduce time dependence with a generalized, time symmetric or asymmetric, moving-average scheme. This implements linear filtering of non-Gaussian white noise, with the weights of the filter determined by analytical equations, in terms of the autocovariance of the process. We approximate the marginal distribution of the process, irrespective of its type, using a number of its cumulants, which in turn determine the cumulants of white noise, in a manner that can readily support the generation of random numbers from that approximation, so that it be applicable for stochastic simulation. The simulation method is genuine as it uses the process of interest directly, without any transformation (e.g., normalization). We illustrate the method in a number of synthetic and real-world applications, with either persistence or antipersistence, and with non-Gaussian marginal distributions that are bounded, thus making the problem more demanding. These include distributions bounded from both sides, such as uniform, and bounded from below, such as exponential and Pareto, possibly having a discontinuity at the origin (intermittence). All examples studied show the satisfactory performance of the method.

... The self-organization and emergence in streamflow are accomplished by causal influence and coordination among the constituent elements of catchment and climate. The detection of causal directions among the constituent elements is essential to understand the behaviour and response of a catchment system in an integrated manner (Koutsoyiannis 2019). Conventionally, there are many time-series-based causal detection methods used in hydrological applications for inferring causality between two variables. ...

In this study, catchments are considered as complex systems, and information-theoretic measures are used to capture temporal streamflow characteristics. Emergence and self-organization are used to quantify information production and order in streamflow time series, respectively. The measure complexity is used to quantify the balance between emergence and self-organization in streamflow variability. The complexity measure is found to be effective in distinguishing streamflow variability for high and low snow-dominated catchments. The state of persistence-reflecting the memory of streamflow time series, is shown to be related to the complexity of streamflow. Moreover, it is observed that conventional causal detection methods are constrained by the state of persistence, and more robust methods are needed in hydrological applications considering persistence.

... Since the seminal work of Mandelbrot and Van Ness [27], the characterization of data in terms of fractal properties has found near ubiquitous and enduring use in diverse research areas, including research within the fields of engineering [48], hydrology [21,50], geology [4,34], physics [40], space science [7,41], medicine [17,28], economics [13], financial markets [44] and many more. Fractal properties in nature and human dynamics arguably have served to yield increased understanding and improvement on human society. ...

Higuchi’s method of determining fractal dimension is an important, well-used, research tool that, compared to many other methods, gives rapid, efficient, and robust estimations for the range of possible fractal dimensions. One major shortcoming in applying the method is the correct choice of tuning parameter (kmax); a poor choice can generate spurious results, and there is no agreed upon methodology to solve this issue. We analyze multiple instances of synthetic fractal signals to minimize an error metric. This allows us to offer a new and general method that allows determination, a priori, of the best value for the tuning parameter, for a particular length data set. We demonstrate its use on physical data, by calculating fractal dimensions for a shell model of the nonlinear dynamics of MHD turbulence, and severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1 from the family Coronaviridae.

... Conventional stochastic models generate time symmetric processes. The problem of simulating a scalar process with time directionality has been tackled recently (Koutsoyiannis, 2019(Koutsoyiannis, , 2020. The present framework provides direct methods to simulate time-directional vector processes with two variates, as well as hints for multivariate processes-a problem to be studied in future research. ...

Causality is a central concept in science, in philosophy and in life. However, reviewing various approaches to it over the entire knowledge tree, from philosophy to science and to scientific and technological applications, we locate several problems, which prevent these approaches from defining sufficient conditions for the existence of causal links. We thus choose to determine necessary conditions that are operationally useful in identifying or falsifying causality claims. Our proposed approach is based on stochastics, in which events are replaced by processes. Starting from the idea of stochastic causal systems, we extend it to the more general concept of hen-or-egg causality, which includes as special cases the classic causal, and the potentially causal and anti-causal systems. Theoretical considerations allow the development of an effective algorithm, applicable to large-scale open systems, which are neither controllable nor repeatable. The derivation and details of the algorithm are described in this paper, while in a companion paper we illustrate and showcase the proposed framework with a number of case studies, some of which are controlled synthetic examples and others real-world ones arising from interesting scientific problems.

... This is a continuous-time metric. If we wish to involve also the time scale k of the averaged process, we can define the cross-climacogram (Koutsoyiannis, 2019b): ...

This is a working draft of a book in preparation.
Current version 0.4 – uploaded on ResearchGate on 25 January 2022.
(Earlier versions:
0.3 – uploaded on ResearchGate on 17 January 2022.
0.2 – uploaded on ResearchGate on 3 January 2022.
0.1 (initial) – uploaded on ResearchGate on 1 January 2022.)
Some stuff is copied from Koutsoyiannis (2021, https://www.researchgate.net/ publication/351081149).
Comments and suggestions will be greatly appreciated and acknowledged.

... For a review of such studies, see [7,13] and references therein, where also a massive globalscale analysis in the scale domain is included for several hydrological-cycle processes (i.e., near-surface air temperature, dew point, humidity, atmospheric pressure, near-surface wind speed, streamflow and precipitation) and microscale turbulent processes (such as grid turbulence and turbulent jets). Alternative scientific fields, where the analysis is performed in the scale domain and by using the climacogram, include studies of rock formations [32], landscapes [37,38], water-energy nexus [60,61], time-irreversible processes [62,63], multilayer perceptron [64] and many others [65] as shown in the applications of the entry. It is again emphasized that this entry focuses on the multi-dimensional spatio-temporal stochastic metrics in the scale domain, as presented in the next sections. ...

The stochastic analysis in the scale domain (instead of the traditional lag or frequency domains) is introduced as a robust means to identify, model and simulate the Hurst–Kolmogorov (HK) dynamics, ranging from small (fractal) to large scales exhibiting the clustering behavior (else known as the Hurst phenomenon or long-range dependence). The HK clustering is an attribute of a multidimensional (1D, 2D, etc.) spatio-temporal stationary stochastic process with an arbitrary marginal distribution function, and a fractal behavior on small spatio-temporal scales of the dependence structure and a power-type on large scales, yielding a high probability of low- or high-magnitude events to group together in space and time. This behavior is preferably analyzed through the second-order statistics, and in the scale domain, by the stochastic metric of the climacogram, i.e., the variance of the averaged spatio-temporal process vs. spatio-temporal scale.

... It should be noted that although non-linear mapping has several advantages, it preserves some of the limitations of the parent independent input variables. For example, it can preserve the expected value of the autocorrelation function but not the higher-order joint moments and time-asymmetry [23]. The feedforward neural network trained with the backpropagation algorithm is a well-known machine learning method. ...

Shortwave Radiation density Flux (SRDF)modeling can be key in estimating actual evapotran-spiration in plants. SRDF is the result of the specific and scattered reflection of shortwave radiation by the underlying surface. SRDF can have profound effects on some plant biophysical processes such as photosynthesis and land surface energy budgets. Since it is the main energy source for most atmospheric phenomena, SRDF is also widely used in numerical weather forecasting. In the current study, an improved version of the extreme learning machine was developed for SRDF forecasting using the historical value of this variable. To do that, the SRDF through 1981-2019 was extracted by developing JavaScript-based coding in the Google Earth Engine. The most important lags were found using the auto-correlation function and defined fifteen input combinations to model SRDF using the improved extreme learning machine (IELM). The performance of the developed model is evaluated based on correlation coefficient (R), root mean square error (RMSE), mean absolute percentage error (MAPE), and Nash–Sutcliffe efficiency (NSE). The shortwave radiation was de-veloped for two time ahead forecasting (R = 0.986, RMSE = 21.11, MAPE = 8.68%, NSE = 0.97). Besides, the estimation uncertainty of the developed improved extreme learning machine is quantified and compared with classical ELM and found to be the least with the value of ±3.64 compared to ±6.9 for classical extreme learning machine. IELM not only overcomes the limitation of the classical ex-treme learning machine in random adjusting of bias of hidden neurons and input weights, but also it provides a simple matrix-based method for practical tasks so that there is no need to have any knowledge of improved extreme learning machine to use it.

Rainfall–runoff modelling is crucial for enhancing the effectiveness and sustainability of water resources. Conceptual models can have difficulties, such as coping with nonlinearity and needing more data, whereas data-driven models can be deprived of reflecting the physical process of the basin. In this regard, two hybrid model approaches, namely Génie Rural à 4 paramètres Journalier (GR4 J)–wavelet-based data-driven models (i.e., wavelet-based genetic algorithm–artificial neural network (WGANN); GR4 J–WGANN1 and GR4 J–WGANN2), were implemented to improve daily rainfall–runoff modelling. The novel GR4 J–WGANN1 hybrid model includes the outflow (QR) and direct flow (QD) obtained from the GR4 J model, and the GR4 J–WGANN2 hybrid model includes the soil moisture index (SMI) obtained from the GR4 J model as input data. In hybrid models, wavelet analysis and the Boruta algorithm were implemented to decompose input data and select wavelet components. Four gauging stations in the Eastern Black Sea and Kızılırmak basins in Turkey were used to observe modelling performance. The GR4 J model exhibited poor performance for extreme flow forecasting. The novel GR4 J–WGANN1 approach performed better than the GR4 J–WGANN2 model, and the hybrid models improved modelling performance up to 40% compared to the GR4 J model. In this regard, integrated conceptual–wavelet-based data-driven models can be useful for improving the conceptual model performance, especially regarding extreme flow forecasting.

Classical moments, raw or central, express important theoretical properties of probability distributions but can hardly be estimated from typical hydrological samples for orders beyond two. L-moments are better estimated, but they all are of first order in terms of the process of interest; while they are effective in inferring the marginal distribution of stochastic processes, they cannot characterize even second-order dependence of processes (autocovariance, climacogram, power spectrum) and thus they cannot help in stochastic modelling. Picking from both categories, we introduce knowable (K-) moments, which combine advantages of both classical and L-moments, and enable reliable estimation from samples and effective description of high-order statistics, useful for marginal and joint distributions of stochastic processes. Further, we extend recent stochastic tools by introducing the K-climacogram and the K-climacospectrum, which enable characterization, in terms of univariate functions, of high-order properties of stochastic processes, as well as preservation thereof in simulations.

Hydrometeorological processes are typically characterized by temporal dependence, short- or long-range (e.g., Hurst behavior), as well as by non-Gaussian distributions (especially at fine time scales). The generation of long synthetic time series that resembles the marginal and joint properties of the observed ones is a prerequisite in many uncertainty-related hydrological studies, since they can be used as inputs and hence allow the propagation of natural variability and uncertainty to the typically deterministic water-system models. For this reason, it has been for years one of the main research topics in the field of stochastic hydrology. This work presents a novel model for synthetic time series generation, termed Symmetric Moving Average (neaRly) To Anything, that holds out the promise of simulating stationary univariate and multivariate processes with any-range dependence and arbitrary marginal distributions, provided that the former is feasible and the latter have finite variance. This is accomplished by utilizing a mapping procedure in combination with the relationship that exists between the correlation coefficients of an auxiliary Gaussian process and a non-Gaussian one, formalized through the Nataf's joint distribution model. The generality of Symmetric Moving Average (neaRly) To Anything is stressed through two hypothetical simulation studies (univariate and multivariate), characterized by different dependencies and distributions. Furthermore, we demonstrate the practical aspects of the proposed model through two real-world cases, one that concerns the generation of annual non-Gaussian streamflow time series at four stations and another that involves the synthesis of intermittent, non-Gaussian, daily rainfall series at a single location.

An extension of the symmetric-moving-average (SMA) scheme is presented for stochastic synthesis of a stationary process for approximating any dependence structure and marginal distribution. The extended SMA model can exactly preserve an arbitrary second-order structure as well as the high order moments of a process, thus enabling a better approximation of any type of dependence (through the second-order statistics) and marginal distribution function (through statistical moments), respectively. Interestingly, by explicitly preserving the coefficient of kurtosis, it can also simulate certain aspects of intermittency, often characterizing the geophysical processes. Several applications with alternative hypothetical marginal distributions, as well as with real world processes, such as precipitation, wind speed and grid-turbulence, highlight the scheme’s wide range of applicability in stochastic generation and Monte-Carlo analysis. Particular emphasis is given on turbulence, in an attempt to simulate in a simple way several of its characteristics regarded as puzzles.

While the modern definition of entropy is genuinely probabilistic, in entropy production the classical thermodynamic definition, as in heat transfer, is typically used. Here we explore the concept of entropy production within stochastics and, particularly, two forms of entropy production in logarithmic time, unconditionally (EPLT) or conditionally on the past and present having been observed (CEPLT). We study the theoretical properties of both forms, in general and in application to a broad set of stochastic processes. A main question investigated, related to model identification and fitting from data, is how to estimate the entropy production from a time series. It turns out that there is a link of the EPLT with the climacogram, and of the CEPLT with two additional tools introduced here, namely the differenced climacogram and the climacospectrum. In particular, EPLT and CEPLT are related to slopes of log-log plots of these tools, with the asymptotic slopes at the tails being most important as they justify the emergence of scaling laws of second-order characteristics of stochastic processes. As a real-world application, we use an extraordinary long time series of turbulent velocity and show how a parsimonious stochastic model can be identified and fitted using the tools developed.

Fractal-based techniques have opened new avenues in the analysis of geophysical data. On the other hand, there is often a lack of appreciation of both the statistical uncertainty in the results, and the theoretical properties of the stochastic concepts associated with these techniques. Several examples are presented which illustrate suspect results of fractal techniques. It is proposed that concepts used in fractal analyses are stochastic concepts and the fractal techniques can readily be incorporated into the theory of stochastic processes. This would be beneficial in studying biases and uncertainties of results in a theoretically consistent framework, and in avoiding unfounded conclusions. In this respect, a general methodology for theoretically justified stochastic processes, which evolve in continuous time and stem from maximum entropy production considerations, is proposed. Some important modelling issues are discussed with focus on model identification and fitting, often made using inappropriate methods. The theoretical framework is applied to several processes, including turbulent velocities measured every several microseconds, and wind and temperature measurements. The applications shows that several peculiar behaviours observed in these processes are easily explained and reproduced by stochastic techniques.

L‐moments are expectations of certain linear combinations of order statistics. They can be defined for any random variable whose mean exists and form the basis of a general theory which covers the summarization and description of theoretical probability distributions, the summarization and description of observed data samples, estimation of parameters and quantiles of probability distributions, and hypothesis tests for probability distributions. The theory involves such established procedures as the use of order statistics and Gini's mean difference statistic, and gives rise to some promising innovations such as the measures of skewness and kurtosis described in Section 2, and new methods of parameter estimation for several distributions. The theory of L‐moments parallels the theory of (conventional) moments, as this list of applications might suggest. The main advantage of L‐moments over conventional moments is that L‐moments, being linear functions of the data, suffer less from the effects of sampling variability: L‐moments are more robust than conventional moments to outliers in the data and enable more secure inferences to be made from small samples about an underlying probability distribution. L‐moments sometimes yield more efficient parameter estimates than the maximum likelihood estimates.

A property of natural processes is temporal irreversibility. However, this property cannot be reflected by most statistics used to describe precipitation time series and, consequently, is not considered in most precipitation models. In this paper, a new statistic, the asymmetry measure, is introduced and applied to precipitation enabling to detect and quantify irreversibility. It is used to analyze two different data sets of Singapore and Germany. The data of both locations show a significant asymmetry for high temporal resolutions. The asymmetry is more pronounced for Singapore where the climate is dominated by convective precipitation events. The impact of irreversibility on applications is analyzed on two different hydrological sewer system models. The results show that the effect of the irreversibility can lead to biases in combined sewer overflow statistics. This bias is in the same order as the effect that can be achieved by real time control of sewer systems. Consequently, wrong conclusion can be drawn if synthetic time series are used for sewer systems if asymmetry is present, but not considered in precipitation modeling.

Interpretation of the past CO 2 variations recorded in polar ice during the large climatic transitions requires an accurate determination of the air-ice age difference. For the Vostok core, the age differences resulting from different assumptions on the firn densification process are compared and a new procedure is proposed to date the air trapped in this core. The penultimate deglaciation is studied on the basis of this new air dating and new CO 2 measurements. These measurements and results obtained on other ice cores indicate that at the beginning of the deglaciations, the CO 2 increase is either in phase or lags by less than about 1000 years with respect to the Antarctic temperature, while it clearly lags the temperature at the onset of the last glaciation. DOI: 10.1034/j.1600-0889.1991.t01-1-00002.x

Time-reversibility is defined for a process X(t) as the property that {X(t1), …, X(tn
)} and {X(– t1), …, X(– tn
)} have the same joint probability distribution. It is shown that, for discrete mixed autoregressive moving-average processes, this is a unique property of Gaussian processes.