Conference PaperPDF Available

Nonparametric Temporal Downscaling of GHI Clear- sky Indices using Gaussian Copula


Abstract and Figures

Small-scale variabilities of solar irradiance are important for many applications. Downscaling approaches need to be developed where only the averaged state of solar irradiance is known. In this study, we investigate the use of copula for temporally downscaling GHI clear-sky indices. With the correlation structure and distribution information derived from measurements at 10 stations across the United States, the copula approach is capable of downscaling clear-sky indices from hourly averages to any arbitrary fine scale whilst preserving its original power spectra.
Content may be subject to copyright.
Nonparametric Temporal Downscaling of GHI Clear-
sky Indices using Gaussian Copula
Jing Huang1, Marc Perez1, Richard Perez2, Dazhi Yang3, Patrick Keelin1 and Tom Hoff1
1Clean Power Research, Napa, CA, USA
2Atmospheric Sciences Research Center, SUNY, Albany, NY, USA
3School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin, Heilongjiang, China
AbstractSmall-scale variabilities of solar irradiance are
important for many applications. Downscaling approaches need
to be developed where only the averaged state of solar irradiance
is known. In this study, we investigate the use of copula for
temporally downscaling GHI clear-sky indices. With the
correlation structure and distribution information derived from
measurements at 10 stations across the United States, the copula
approach is capable of downscaling clear-sky indices from hourly
averages to any arbitrary fine scale whilst preserving its original
power spectra.
Keywordssolar irradiance, clear-sky index, variability,
photovoltaics, downscaling, copula, power spectra
It generally holds that modeling accuracy increases with the
granularity of available data feed. For solar applications, the
requirement for granular solar datasets is often not met by either
sparse ground measurement stations or geostationary satellites.
For example, the solar industry discovers that minute-level solar
irradiance data are preferred to inform critical investment
decision-making of utility-scale solar photovoltaics (PV) farms
particularly when the DC capacity of solar power inverters is
much greater than the AC capacity of PV modules [1]. The
distributed nature of solar power generators and the increasing
need for customized and accurate control of hybrid systems as
envisioned by [2] also call for high-resolution solar data. As
such, downscaling methods are potentially useful to produce
realistic small-scale variabilities.
There are many methods which have been proposed for
synthetic solar data generation ranging from Fourier time series
and Markov chain models to computation-intensive machine
learning (see [3] for a review). In this study, we adopt a
statistical tool called copula, which has been recently applied to
model the clear-sky index (CSI) of solar irradiance recently.
Widén and Munkhammar [4] downscaled hourly CSI by
modeling its distribution as a two-state Gaussian mixture model
and assuming an exponential decay of correlation. In contrast,
we employ a nonparametric downscaling approach with the key
information of distribution and correlation being derived from
measurements at 10 reference ground stations.
A. Ground stations for reference and validation
Figure 1 provides information on the reference ground
stations used in this study. The measurement global horizontal
irradiance (GHI) data span the entire year of 2020 with a
temporal resolution of 1 minute. In addition, we obtain the
corresponding clear-sky GHI at all the reference stations from
SolarAnywhere® V3.5 [5]. The CSI, or kt, is then calculated as
the ratio of GHI to the clear-sky GHI. We cap kt values at 1.3.
Figure 1. 10 reference ground stations in the eastern United States are
used in this study.
B. Gaussian Copula
The concept of copula was originally proposed by Sklar [6].
A copula is a joint multivariate cumulative distribution function
(CDF) where all marginal probability distributions are uniform
within [0, 1] and Gaussian copula simply implies that the joint
CDF is assumed to be Gaussian.
Consider M related random variables of CSI (kt), KTt, KTt+Δt
, …, KTt+(M-1) Δt where each random variable represents a relative
temporal point in a time period of MΔt for an arbitrary site. For
example, if we split 1-min kt values at Boulder into hourly
blocks, KTt represents a random variable denoting the kt values
at the first minute of an hourly block and KTt+(M-1) Δt represents
that at the last minute, and M=60. Denoting the marginal CDF
of KT as
󰇛󰇜 ()
the M-dimensional joint CDF F can be linked to their marginal
distribution via a Gaussian copula,
 󰇛󰇜 󰇛  󰇛󰇜󰇜
Since copula requires all marginal distributions to be uniform
within [0,1], it can be further expressed as
  󰇛󰇜
󰇛󰇛󰇜 󰇛󰇜 󰇛󰇛󰇜 󰇜󰇜 ()
where are uniformly distributed random variables
transformed from and  is the inverse CDF of a standard
univariate normal distribution and is the joint CDF of
multivariate normal distribution constrained by the M×M
correlation matrix . Since random and correlated samples can
be drawn from a multivariate gaussian generator (e.g. using
random.multivariate_normal in Python numpy
package), they can then be converted to scenarios of kt via
inverse marginal CDF
Figure 2. Boulder site: (left) Color plot of correlation matrix of kt;
(right) The decay of correlation coefficient with Δt on the left y-axis
and the CDF of kt on the right y-axis.
As shown in Figure 2, the correlation matrix and the
CDF of kt can be obtained from the measured GHI time series
and the SolarAnywhere clear-sky GHI. The functional decay
of correlation 󰇛󰇜 is further determined empirically by
aggregating and taking medians of the correlation coefficients
with the same Δt from .
C. Nonparametric Downscaling
Figure 3. Boulder site: (left) 󰇛󰇜 and (right) CDF plots for
 .
As mentioned earlier, the main difference of this study from
[3] in terms of downscaling is that we take a nonparametric
approach. Specifically for one station: (1) we calculate the
hourly-averaged CSI values, denoted as 
; (2) we divide 
into 10 quantile groups of equal size (i.e. with quantile ranges
of [0, 0.1], …, [0.9, 1]); (3) for each group, we calculate the
corresponding 󰇛󰇜 and CDF individually; (4) we linearly
interpolate 󰇛󰇜 and CDF for 26 
values, i.e. 0.05, 0.1, …,
1.3. Note that we set the boundary values  
 
 1, and 
 and 
 .
We repeat the above procedures for all 10 reference stations.
Figure 4 shows the convergence of 10 sites for 󰇛󰇜 and CDF
at 
= 0.5. The convergence is generally good except when 
is marginal and thus data density is low. It is possible to further
model 󰇛󰇜 and CDF based on location (e.g., coordinates) but
we are not pursuing that direction for model simplicity. Instead,
we obtain the location-agnostic autocorrelation and CDF
information by taking the median of all sites as shown in Figure
Figure 4. Autocorrelation functions 󰇛󰇜 and CDF at 
= 0.5 for 10
individual sites.
Figure 5. Median of all sites: (top) 󰇛󰇜 and (bottom) CDF color plots
for 
 .
Then, downscaling the hourly average 
simply involves
interpolating 
from Figure 5 to get the corresponding 󰇛󰇜
and CDF and then generating the downscaled time series using
Equation (1) and (3). In practice, we generate 100 samples for
each 
and then select one sample based on two criteria: (1)
the transition from the previous hourly segment to the generated
Figure 6. One example day at Boulder: (top) 1-hour GHI measurement (middle top) 1-min GHI measurement; (middle bottom) 1-min downscaled
GHI from hourly averages; and (bottom) 10-s downscaled GHI from hourly averages.
hourly segment should be smooth; (2) the average of the
generated hourly segment should be as close to 
as possible.
A downscaled GHI time series is shown in Figure 6. By
interpolating and CDF to the given 
, this nonparametric
procedure is capable of downscaling 
to an arbitrarily fine
time scale. The simulated time series in 1-min and 10-s
resolution are similar to the 1-min measurement in the sense
that (1) they share the same large-scale trends (i.e. hourly
means) and (2) the extent of variability changes with 
with a
peak when 
is intermediate. In addition, we quantify the
cross-scale variabilities by calculating the power spectra of kt
and show that the power spectra of the 1-min measurement and
1-min simulation match each other closely across the frequency
range down to the Nyquist frequency (i.e. 0.5 min-1).
In addition, we also model the power generation resulted
from observed and copula-downscaled GHI to validate its
performance. We use PVLIB to model a hypothetical
horizontal single-axis tracking PV system, and then calculate
the power loss due to inverter clipping under various DC:AC
ratios. We follow the same technical procedures as described in
our companion paper [7]. Figure 8 demonstrates the superiority
of the downscaled 1-min time series over the hourly averaged
observation data for those scenarios with high DC:AC ratios.
When the clipping loss error of hourly observation data
Figure 7. Power spectra of kt for 1-min and 1-hour measurement and 1-
min simulation.
monotonically increases with DC:AC ratios, it plateaus around
1% of underestimation for DC:AC > 1.6 for copula-downscaled
data, which verifies the capability of our copula approach to
faithfully reproduce intra-hour variabilities.
Figure 8. Box plots of relative error of curtailed AC power estimation
using interpolated hourly-averaged observation and 1-min copula-
downscaled time series for DC:AC ranging from 1.1 to 2.0.
In this study, we have applied Gaussian copula techniques
for temporal downscaling of solar irradiance, which have
already been proven very useful for wind energy applications.
Built on previous studies, we have proposed a pure
nonparametric approach with the autocorrelation and
distribution functions heuristically derived from 10
measurement stations across the continental United States. We
have demonstrated that the simulated 1-min time series is able
to accurately reproduce the power spectra of the original 1-min
observation data. In addition, the copula-downscaled data
generally perform well in estimating AC power when the
DC:AC ratio is high. This is in clear contrast to using hourly-
averaged observations, which leads to a monotonic increase in
error with increasing DC:AC ratio.
It is possible to further improve this method. For example,
the copula parameters as shown in Figure 5 can be further
regressed from location information or be tuned for individual
applications such as power estimation. In general, these results
demonstrate the effectiveness of the proposed methodology to
meet customer needs for high-frequency and high-quality
irradiance data.
[1] K. Bradford, R. Walker, D. Moon and M. Ibanez, “A regression model
to correct for intra-hourly irradiance variability bias in solar energy
models”, 2020 47th IEEE Photovoltaic Specialists Conference (PVSC),
2020, pp. 2679-2682.
[2] M. Ahlstrom, J. Mays, E. Gimon, A. Gelston, C. Murphy, P. Denholm,
and G. Nemet, "Hybrid Resources: Challenges, Implications,
Opportunities, and Innovation," in IEEE Power and Energy Magazine,
vol. 19, no. 6, pp. 37-44, Nov.-Dec. 2021
[3] Munkhammar, J. and Widén, J. (2021), “Established mathematical
approaches for synthetic solar irradiance data generation,” in Bright, J.
M. (ed.), Synthetic Solar Irradiance: Modeling Solar Data, Melville,
New York: AIP Publishing, pp. 3-13-36
[4] Widén, J. and Munkhammar, J. (2019), “Spatio-temporal downscaling of
hourly solar irradiance data using Gaussian copulas,” Proceedings of the
46th IEEE Photovoltaic Specialists Conference (PVSC), Chicago, USA,
1621 June 2019.
[5] P. Keelin, A. Kubiniec, A. Bhat, M. Perez, J. Dise, R. Perez and J.
Schlemmer, (2021) "Quantifying the solar impacts of wildfire smoke in
western North America," 2021 IEEE 48th Photovoltaic Specialists
Conference (PVSC), 2021, pp. 1401-1404, doi:
[6] Sklar, A. (1959) ‘Fonctions de répartition à n dimensions et leurs
marges’, Publ. Inst. Statist. Univ. Paris, 8, pp. 229231.
[7] J. Huang, R. Perez, J. Schlemmer, A. Kubiniec, M. Perez, A. Bhat and
P. Keelin, (2022) "Enhancing temporal variability of 5-minute satellite-
derived solar irradiance data", 2022 IEEE 49th Photovoltaic Specialists
Conference (PVSC), Philadelphia, PA, USA.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Satellite-derived solar irradiance data are known to underestimate temporal variability compared to point measurements because of their pixel-averaging nature. In this study, we apply an algorithm imposing random noise to enhance the temporal variability of 5-minute satellite-derived solar irradiance data. We show that the resulting product, termed as True Dynamics, has clear-sky exceedance events and the frequency of large ramp events closer to observation. In addition, the increase of temporal resolution of irradiance data significantly reduces the underestimation error of power inverter clipping under high DC:AC capacity ratios conditions.
The electric power system has historically been designed to provide reliable energy to loads by using a relatively small number of well-understood generators. The distinction between load, generation, and transmission resources has been quite clear. Most of the responsibility for planning and operating a system—building a highly reliable network from less reliable parts—has been with the system manager, whether that be a utility, a regional market operator, or some similar entity. Given this historical context, many experts were initially perplexed by the rapidly growing popularity of hybrid resources, which combine multiple technologies into a single entity. Rather than depending on a system operator to provide instructions to individual technologies, hybrid resources intentionally take on more operational responsibility by optimizing and scheduling their combined functions. Interconnection queues in many regions reveal a large and growing interest in hybrids, suggesting that project developers and investors see them as providing advantages.
DESCRIPTION This chapter gives an overview of established state-of-the-art mathematical approaches for generating synthetic solar irradiance data. The most important scientific studies from the last half-century are identified and discussed, and the general development of the field is characterized. The mathematical methods used for modeling both deterministic and stochastic components of solar irradiance are categorized and explained, along with examples of their application to synthetic irradiance modeling. The mathematical approaches described include methods to achieve stationarity, probability distribution models, autoregressive processes, Markov chain models, multivariate distribution modeling, and copulas. Synthesis of time series data with resolutions ranging from days to minutes is covered, including both purely temporal as well as more recent spatiotemporal approaches.
Conference Paper
This paper presents a novel method for downscaling hourly solar irradiance data to higher resolution in both space and time. The method is based on transforming any point in two-dimensional space and time to a position in a propagating cloud field, the internal spatial variability of which is modelled with a Gaussian copula. By relating the mean hourly clear-sky index to probability distributions for 15-s irradiance and exponential decorrelation rate, the required inputs to the copula model are reduced to cloud field velocity and hourly average clear-sky index. The model is applied to irradiance data from a sensor network and is shown to accurately downscale hourly data from one sensor to 15-s resolution and 17 dispersed locations on individual days, reproducing key statistical features of the empirical network data.