Figure 2 - uploaded by Korbinian Breinl

Content may be subject to copyright.

Source publication

Stochastic weather generators simulate synthetic weather data while maintaining statistical properties of the observations. A new semi-parametric algorithm for multi-site precipitation has been published recently by Breinl et al. (), who used a univariate Markov process to simulate precipitation occurrence at multiple sites for two small rain gauge...

## Contexts in source publication

**Context 1**

... was decided that the simulation must not generate more than 1% duplicated days. Figure 2 shows the results for January. The figure shows the mean of 50 runs and a fitted two-degree polynomial curve to derive a reasonable value of k. ...

**Context 2**

... mean numbers of dry and wet days are well simulated ( Figure 12). In the FSS version, the mean length of dry spells is under- estimated slightly (2.9%), and the mean length of wet spells is overestimated (1.8%) ( Figure 13). ...

**Context 3**

... demonstrate the seasonality better, the validation was done at monthly time scales. The mean monthly temperature of all temperature stations is plotted in Figure 20 and is well reproduced. ...

**Context 4**

... the standard deviation ( Figure 21) all observations are within the grey areas. The average deviation at all stations is below 1%. ...

**Context 5**

... model also performs well with respect to extremes. For minimum temperatures (Figure 22), almost all observations are in the simulated grey areas, with some exceptions (especially in January). The dotted lines in Figure 22 denote the 5 th and 95 th percentiles of simulations without the suggested modulus trans- formations. ...

**Context 6**

... minimum temperatures (Figure 22), almost all observations are in the simulated grey areas, with some exceptions (especially in January). The dotted lines in Figure 22 denote the 5 th and 95 th percentiles of simulations without the suggested modulus trans- formations. Minor improvements can be detected for autumn and winter at station 15 ( Figure 22(a)), and spring and summer at sta- tion 19 (Figure 22(c)). ...

**Context 7**

... dotted lines in Figure 22 denote the 5 th and 95 th percentiles of simulations without the suggested modulus trans- formations. Minor improvements can be detected for autumn and winter at station 15 ( Figure 22(a)), and spring and summer at sta- tion 19 (Figure 22(c)). ...

**Context 8**

... dotted lines in Figure 22 denote the 5 th and 95 th percentiles of simulations without the suggested modulus trans- formations. Minor improvements can be detected for autumn and winter at station 15 ( Figure 22(a)), and spring and summer at sta- tion 19 (Figure 22(c)). ...

**Context 9**

... situation looks different for the simulated maximum temperatures ( Figure 23). Without the modulus transformation, Figure 20. ...

**Context 10**

... situation looks different for the simulated maximum temperatures ( Figure 23). Without the modulus transformation, Figure 20. Monthly characteristics of mean temperature for all temperature stations. ...

**Context 11**

... results can be achieved only when transforming the data. The ARMA models are able to reproduce the autocorrelation (Figure 24) and the inter-site correlations very well (Figure 25). The correlated random numbers are able to reproduce the spa- tial temperature fields. ...

**Context 12**

... results can be achieved only when transforming the data. The ARMA models are able to reproduce the autocorrelation (Figure 24) and the inter-site correlations very well (Figure 25). The correlated random numbers are able to reproduce the spa- tial temperature fields. ...

**Context 13**

... be conducted with cor- related uniform random numbers, for example through a Cholesky decomposition (see Watkins (2010) for details), to make sure that there are similar data pairs that can be reshuffled. The correlated random numbers can match the inter-site correlations of the observations and should be computed separately for each season. Fig. 2 gives an overview of the proposed precipitation generator and all components described in this ...

## Similar publications

Stochastic weather generators are statistical models widely used to produce climate time series with similar statistical properties to observed data. They are also used as downscaling tools to generate climate change scenarios for impact studies. Precipitation is one of the main variables simulated by weather generators and is also a key variable f...

Comprehensive analysis and modeling of rainfall distribution is essential in capturing the characteristics of high intense rainfall. The western region of Peninsular Malaysia which is more urbanized and densely populated is prone to flash flood occurrences due to the high intense rainfall brought by a convective rainfall during the inter-monsoon se...

Unlike single-site precipitation generators, multi-site precipitation generators make it possible to reproduce the space-time variation of precipitation at several sites. The extension of single-site approaches to multiple sites is a challenging task, and has led to a large variety of different model philosophies for multi-site models. This paper p...

Modeling of rainfall data, particularly rainfall amount, are used for providing input for
models of crop growth, design of urban drainage systems, land management systems and other environmental projects. More importantly, in hydrological studies, mathematically simulated data helps in flood prevention and mitigation efforts. In the eastern coast o...

Weather generator is a numerical tool that uses existing meteorological records to generate series of synthetic weather data. The AWE-GEN (Advanced Weather Generator) model has been successful in producing a broad range of temporal scale weather variables, ranging from the high-frequency hourly values to the low-frequency inter-annual variability....

## Citations

... Other methods alter the temporal dependence structure of hydroclimatic timeseries, for instance by modifying the seasonality or the persistence of wet and dry conditions. Various techniques are used in this case, including Markov chain models (Breinl et al., 2015;Ullrich et al., 2021), spectral analysis and wavelet transforms (Steinschneider and Brown, 2013;Quinn et al., 2018;Fletcher et al., 2023), and copula methods (Borgomeo et al., 2015b;Nazemi et al., 2020). Lastly, Borgomeo et al. (2015a) proposes a versatile tool that lets the user choose the objective function of the streamflow generator to optimize the streamflow properties of interest. ...

Water systems worldwide are experiencing climate change-induced shifts in drought properties like frequency, intensity, and duration, affecting water security and reliability. To develop and test effective drought preparedness plans, researchers often use synthetic weather generators to create hydrological scenarios that explore drought variability beyond historical records.Existing weather generators typically allow to adjust streamflow statistics like percentiles or temporal correlation but do not directly control drought properties of frequency, intensity, and duration. To fill this gap, we propose FIND (Frequency, INtensity, and Duration) synthetic weather generator. FIND incorporates a standardized drought index to directly and independently control drought frequency, intensity, and duration in generated streamflow time series while preserving observed hydrological variability.FIND ideal use cases include i) water systems analysis applications that seek to train and test drought strategies under historical and plausible future drought conditions, and ii) bottom-up vulnerability studies relating system vulnerability outcomes to specific changes in drought properties of frequency, intensity, and duration. We demonstrate FIND's versatility through three experiments: replicating historically observed drought properties, generating streamflow scenarios for multiple sites preserving correlation between their drought conditions, and generating a set of scenarios with direct and independent changes in drought properties. FIND source code is openly available for applications beyond the scope of this paper.

... As a result, several deep learning-based techniques have been developed for photovoltaic power generation prediction. Artificial Neural Networks (ANN) are one commonly used approach [5][6][7][8][9][10]; they often employ statistical analysis or sensitivity analysis to select input variables or are used to combine deep learning with other methods [11,12] such as pattern extraction, bootstrapping, etc. These studies have significantly improved the accuracy of photovoltaic power generation prediction by targeting input variables, model optimization, and model combination. ...

Photovoltaic (PV) power generation is the most widely adopted renewable energy source. However, its inherent unpredictability poses considerable challenges to the management of power grids. To address the arduous and time-consuming training process of PV prediction models, which has been a major focus of prior research, an improved approach for PV prediction based on neighboring days is proposed in this study. This approach is specifically designed to handle the preprocessing of training datasets by leveraging the results of a similarity analysis of PV power generation. Experimental results demonstrate that this method can significantly reduce the training time of models without sacrificing prediction accuracy, and can be effectively applied in both ensemble and deep learning approaches.

... In this respect, parametric approaches falling into the class of the time series models have been proven a viable option to address the problem of large-domain precipitation simulation. However, also in this case, only few examples showed capabilities to simulate precipitation at a large number of locations Serinaldi & Kilsby, 2014b;Ullrich et al., 2021), while more often they are applied to a few tens of sites (Benoit et al., 2022;Breinl et al., 2013Breinl et al., , 2015Verdin et al., 2019) or even less. In fact, large-domain modeling faces limitations such as the dramatic increase of CPU time with increasing number of locations, or the feasibility of Cholesky factorization of large covariance matrices (Benoit et al., 2018). ...

Stochastic simulations of spatiotemporal patterns of hydroclimatic processes, such as precipitation, are needed to build alternative but equally plausible inputs for water‐related design and management, and to estimate uncertainty and assess risks. However, while existing stochastic simulation methods are mature enough to deal with relatively small domains and coarse spatiotemporal scales, additional work is required to develop simulation tools for large‐domain analyses, which are more and more common in an increasingly interconnected world. This study proposes a methodological advancement in the CoSMoS framework, which is a flexible simulation framework preserving arbitrary marginal distributions and correlations, to dramatically decrease the computational burden and make the algorithm fast enough to perform large‐domain simulations in short time. The proposed approach focuses on correlated processes with mixed (zero‐inflated) Uniform marginal distributions. These correlated processes act as intermediates between the target process to simulate (precipitation) and parent Gaussian processes that are the core of the simulation algorithm. Working in the mixed‐Uniform space enables a substantial simplification of the so‐called correlation transformation functions, which represent a computational bottle neck in the original CoSMoS formulation. As a proof of concept, we simulate 40 years of daily precipitation records from 1,000 gauging stations in the Mississippi River basin. Moreover, we extend CoSMoS incorporating parent non‐Gaussian processes with different degrees of tail dependence and suggest potential improvements including the separate simulation of occurrence and intensity processes, and the use of advection, anisotropy, and nonstationary spatiotemporal correlation functions.

... As an alternative, a Fourier transform-based model is proposed [30]. More advanced (multivariate) approaches have one thing in common [31][32][33]: They contain a term that explicitly models the seasonal component. Most are autoregressive approaches, which is reasonable as the temperature will not change too much from one hour to the next. ...

Photovoltaic power is playing an ever-increasing role in the energy mix of countries worldwide. It is a stochastic energy source, and simulation models are needed to establish reliable risk management. This paper presents a novel approach for simulating hourly solar irradiation and—as a consequence—photovoltaic power based on easily accessible data such as wind, temperature, and cloudiness. Solar simulations are generated via a multiplication factor that scales the maximum possible solar irradiation. Photovoltaic simulations are then derived using formulas that approximate the physical interdependencies. The resulting simulations are unbiased on an annual level and reasonably reflect historic irradiation movements. Interpreting our approach as a descriptive model, we find that error values vary over the year and with granularity. Errors are highest when considering hourly values in wintertime, especially in the morning or late afternoon.

... Finally, it 3) recalculates and replaces the previous centroids. Steps 2 and 3 are repeated until the objects no longer move (Breinl et al., 2015). In this method, the number of cluster n must be previously determined. ...

Climate change can significantly affect water systems with negative impacts on many facets of society and ecosystems. Therefore, significant attention must be devoted to the development of efficient adaptation strategies. More specifically, the reoperation of water resources systems to keep the overall performance within acceptable limits should be prioritized to avoid, or at least delay as much as possible, costly infrastructural investments. This manuscript presents a hydrologically-driven approach to support the reoperation of multipurpose multireservoir systems. The approach is organized around 1) the use of a large ensemble of GCM hydro-climate projections to drive a climate stress test; 2) the bottom-up clustering of those hydrologic projections based on hydrologic attributes that are both relevant to the region of interest and interpretable by the operators; and finally, 3) the identification of adaptation measures for each cluster after developing a one-way coupling of an optimization model with a simulation model. The climate impact assessment is illustrated with the multipurpose multireservoir system of the Lievre River basin in Quebec (Canada). Results show that cluster-specific, adapted, operating rules can improve the performance of the system and reveal its operational flexibility with respect to the different operating objectives.

... The SWR uses the time (t) dependent temperature model published by Breinl et al. [23]. Let T t be the temperature, N t be the normal temperature (the average temperature for that day in the year), and σ t be the standard deviation for the normal temperature. ...

... This mean temperature is smoothed using a fifth-order Fourier series as a low pass filter to produce N t . Such Fourier series smoothing of mean temperature to yield normal temperature is a common technique [23][24][25]. ...

... The SWR uses the time (t) dependent temperature model published by Breinl et al [23]. Let Tt be the temperature, Nt be the normal temperature (the average temperature fo that day in the year), and σt be the standard deviation for the normal temperature. ...

Natural gas customers rely upon utilities to provide gas for heating in the coldest parts of winter. Heating capacity is expensive, so utilities and end users (represented by commissions) must agree on the coldest day on which a utility is expected to meet demand. The return period of such a day is long relative to the amount of weather data that are typically available. This paper develops a weather resampling method called the Surrogate Weather Resampler, which creates a large dataset to support analysis of extremely infrequent events. While most current methods for generating weather data are based on simulation, this method resamples the deviations from typical weather. The paper also shows how extreme temperatures are strongly correlated to the demand for natural gas. The Surrogate Weather Resampler was compared in-sample and out-of-sample to the WeaGETS weather generator using both the Kolmogorov–Smirnov test and an exceedance-based test for cold weather generation. A naïve benchmark was also examined. These methods studied weather data from the National Oceanic and Atmospheric Administration and AccuWeather. Weather data were collected for 33 weather stations across North America, with 69 years of data from each weather station. We show that the Surrogate Weather Resampler can reproduce the cold tail of distribution better than the naïve benchmark and WeaGETS.

... Nevertheless, the limitation of the univariate Markov process should also be noted. The univariate Markov process can only reproduce the observed occurrence vectors, and the possible states grow enormously with the number of simulated stations that may lead to a reduction of sample sizes and make the parameterization less reliable (Breinl et al., 2013(Breinl et al., , 2015. For example, 16, 32 and 64 states are possible for four, five and six stations, respectively. ...

Multi-site rainfall models are useful tools to provide synthetic realizations of spatially-correlated rainfall at multiple stations, which are of great importance for flood and drought risk assessment and climate change impact analysis. Therefore, a good preservation of various observed rainfall characteristics including rainfall time-series statistics and rainfall event characteristics at individual stations and the inter-site correlations of these rainfall characteristics is very crucial. To achieve this purpose, this study aims to develop a multi-site stochastic daily rainfall model by coupling a univariate Markov chain with a multi-site rainfall event model (MSDRM-MCREM), based on our previously-developed single-site SDRM-MCREM. The univariate Markov chain model in MSDRM-MCREM is used to generate spatially-correlated multi-site rainfall occurrence time series and extract simulated rainfall events for individual stations based on continuous wet days. The multi-site rainfall event model is then constructed using Vine copulas to simulate spatially-correlated rainfall event characteristics of those simulated rainfall events that occur simultaneously at multiple stations, including rainfall durations, rainfall depths and temporal patterns. Subsequently, this model was applied to the Changshangang River basin in Zhejiang Province, East China and its performance in reproducing rainfall characteristics and spatial correlations was evaluated for three cases, i.e. simulations for two, three and four stations. Results show that except for overestimation of light rainfall, MSDRM-MCREM can simultaneously well preserve rainfall time-series statistics (i.e. different rainfall percentiles, mean monthly rainfall, standard deviations and probabilities and mean values of wet days), extreme rainfall (i.e. exceedance probabilities of annual maximum 1-day, 3-day and 5-day rainfall) and rainfall event characteristics (i.e. cumulative probabilities of wet spell, dry spell and rainfall depth, temporal patterns and occurrence probabilities of rainfall types for different depth-based event classes) at individual stations. In addition, the spatial correlations of rainfall characteristics have also been well maintained, including rainfall occurrence time series and rainfall event characteristics in different groups, with the inter-site correlations of rainfall time series being slightly underestimated.

... This was done by following the same structure of the CMIP6 dataset just presented, by simply extending the initial time series length (i.e., 1460) to multiple years. In particular, the synthetic data were generated using a first order autoregressive model (AR(1)) with a configurable time step which allows for preserving the performance behaviour of a real dataset (see the strong scalability on single node test in Subsection V-E1); moreover the AR(1) model enables the production of sufficiently consistent temperature values [103]. Table 1 summarizes the data used in the various tests. ...

Over the last two decades, scientific discovery has increasingly been driven by the large availability of data from a multitude of sources, including high-resolution simulations, observations and instruments, as well as an enormous network of sensors and edge components. In such a dynamic and growing landscape where data continue to expand, advances in Science have become intertwined with the capacity of analysis tools to effectively handle and extract valuable information from this ocean of data. In view of the exascale era of supercomputers that is rapidly approaching, it is of the utmost importance to design novel solutions that can take full advantage of the upcoming computing infrastructures. The convergence of High Performance Computing (HPC) and data-intensive analytics is key to delivering scalable High Performance Data Analytics (HPDA) solutions for scientific and engineering applications. The aim of this paper is threefold: reviewing some of the most relevant challenges towards HPDA at scale, presenting a HPDA-enabled version of the Ophidia framework and validating the scalability of the proposed framework through an experimental performance evaluation carried out in the context of the Centre of Excellence in Simulation of Weather and Climate in Europe (ESiWACE). The experimental results show that the proposed solution is capable of scaling over several thousand cores and hundreds of cluster nodes. The proposed work is a contribution in support of scientific large-scale applications along the wider convergence path of HPC and Big Data followed by the scientific research community.

... Other studies have also looked into using WGs as constituting a downscaling tool in climate change impact studies because of the relative ease with which their parameters can be modified to represent climate variability (Semenov and Barrow 1997, Kilsby et al. 2007, Zhuang et al. 2016, Keller et al. 2017. Despite this advantage, only a few studies have looked at the potential of using stochastic WGs for long-term streamflow forecasting (Li et al. 2013, Breinl et al. 2015, Shield and Dai 2015, Breinl 2016, and an even smaller number implemented a WG into streamflow forecasts. Of note, Šípek and Daňhelka (2015) used a WG for ESP based on a limited number of observed years selected on the basis of large-scale climate indices, and Hwang et al. (2011) generated several precipitation scenarios to study the impact of uncertainty on streamflow simulations. ...

Resampling historical time series remains one of the main approaches used to generate long-term probabilistic streamflow forecasts, while there is a need to develop more flexible approaches taking into account non-stationarities. One possible approach is to use a modelling chain consisting of a stochastic weather generator and a hydrological model. However, the ability of this modelling chain to generate adequate probabilistic streamflows must first be evaluated. The aim of this paper is to compare the performance of a stochastic weather generator against resampling historical meteorological time series in order to produce ensemble streamflow forecasts. The comparison framework is based on 30 years of forecasts for a single Canadian watershed. Forecasts resulting from the two methods are evaluated using the continuous ranked probability score (CRPS) and rank histograms. Results indicate that while there are differences between the methods, they nevertheless perform similarly, thus showing that weather generators can be used as substitutes for resampling the historical past.

... Risk assessment studies often apply stochastic generators in order to generate long series of different meteorological variables, such as precipitation, temperature, solar radiation or wind (e.g., Richardson, 1981;Wilks and Wilby, 1999;Apipattanavis et al., 2007;Leander and Buishand, 2009;Steinschneider and Brown, 2013;Chen et al., 2014;Li, 2014;Breinl et al., 2015). These long scenarios are then used as inputs of environmental models. ...