Content uploaded by Jing Huang
All content in this area was uploaded by Jing Huang on Jun 14, 2022
Content may be subject to copyright.
Nonparametric Temporal Downscaling of GHI Clear-
sky Indices using Gaussian Copula
Jing Huang1, Marc Perez1, Richard Perez2, Dazhi Yang3, Patrick Keelin1 and Tom Hoff1
1Clean Power Research, Napa, CA, USA
2Atmospheric Sciences Research Center, SUNY, Albany, NY, USA
3School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin, Heilongjiang, China
Abstract—Small-scale variabilities of solar irradiance are
important for many applications. Downscaling approaches need
to be developed where only the averaged state of solar irradiance
is known. In this study, we investigate the use of copula for
temporally downscaling GHI clear-sky indices. With the
correlation structure and distribution information derived from
measurements at 10 stations across the United States, the copula
approach is capable of downscaling clear-sky indices from hourly
averages to any arbitrary fine scale whilst preserving its original
Keywords—solar irradiance, clear-sky index, variability,
photovoltaics, downscaling, copula, power spectra
It generally holds that modeling accuracy increases with the
granularity of available data feed. For solar applications, the
requirement for granular solar datasets is often not met by either
sparse ground measurement stations or geostationary satellites.
For example, the solar industry discovers that minute-level solar
irradiance data are preferred to inform critical investment
decision-making of utility-scale solar photovoltaics (PV) farms
particularly when the DC capacity of solar power inverters is
much greater than the AC capacity of PV modules . The
distributed nature of solar power generators and the increasing
need for customized and accurate control of hybrid systems as
envisioned by  also call for high-resolution solar data. As
such, downscaling methods are potentially useful to produce
realistic small-scale variabilities.
There are many methods which have been proposed for
synthetic solar data generation ranging from Fourier time series
and Markov chain models to computation-intensive machine
learning (see  for a review). In this study, we adopt a
statistical tool called copula, which has been recently applied to
model the clear-sky index (CSI) of solar irradiance recently.
Widén and Munkhammar  downscaled hourly CSI by
modeling its distribution as a two-state Gaussian mixture model
and assuming an exponential decay of correlation. In contrast,
we employ a nonparametric downscaling approach with the key
information of distribution and correlation being derived from
measurements at 10 reference ground stations.
A. Ground stations for reference and validation
Figure 1 provides information on the reference ground
stations used in this study. The measurement global horizontal
irradiance (GHI) data span the entire year of 2020 with a
temporal resolution of 1 minute. In addition, we obtain the
corresponding clear-sky GHI at all the reference stations from
SolarAnywhere® V3.5 . The CSI, or kt, is then calculated as
the ratio of GHI to the clear-sky GHI. We cap kt values at 1.3.
Figure 1. 10 reference ground stations in the eastern United States are
used in this study.
B. Gaussian Copula
The concept of copula was originally proposed by Sklar .
A copula is a joint multivariate cumulative distribution function
(CDF) where all marginal probability distributions are uniform
within [0, 1] and Gaussian copula simply implies that the joint
CDF is assumed to be Gaussian.
Consider M related random variables of CSI (kt), KTt, KTt+Δt
, …, KTt+(M-1) Δt where each random variable represents a relative
temporal point in a time period of MΔt for an arbitrary site. For
example, if we split 1-min kt values at Boulder into hourly
blocks, KTt represents a random variable denoting the kt values
at the first minute of an hourly block and KTt+(M-1) Δt represents
that at the last minute, and M=60. Denoting the marginal CDF
of KT as
the M-dimensional joint CDF F can be linked to their marginal
distribution via a Gaussian copula,
Since copula requires all marginal distributions to be uniform
within [0,1], it can be further expressed as
where are uniformly distributed random variables
transformed from and is the inverse CDF of a standard
univariate normal distribution and is the joint CDF of
multivariate normal distribution constrained by the M×M
correlation matrix . Since random and correlated samples can
be drawn from a multivariate gaussian generator (e.g. using
random.multivariate_normal in Python numpy
package), they can then be converted to scenarios of kt via
inverse marginal CDF
Figure 2. Boulder site: (left) Color plot of correlation matrix of kt;
(right) The decay of correlation coefficient with Δt on the left y-axis
and the CDF of kt on the right y-axis.
As shown in Figure 2, the correlation matrix and the
CDF of kt can be obtained from the measured GHI time series
and the SolarAnywhere clear-sky GHI. The functional decay
of correlation is further determined empirically by
aggregating and taking medians of the correlation coefficients
with the same Δt from .
C. Nonparametric Downscaling
Figure 3. Boulder site: (left) and (right) CDF plots for
As mentioned earlier, the main difference of this study from
 in terms of downscaling is that we take a nonparametric
approach. Specifically for one station: (1) we calculate the
hourly-averaged CSI values, denoted as
; (2) we divide
into 10 quantile groups of equal size (i.e. with quantile ranges
of [0, 0.1], …, [0.9, 1]); (3) for each group, we calculate the
corresponding and CDF individually; (4) we linearly
interpolate and CDF for 26
values, i.e. 0.05, 0.1, …,
1.3. Note that we set the boundary values
We repeat the above procedures for all 10 reference stations.
Figure 4 shows the convergence of 10 sites for and CDF
= 0.5. The convergence is generally good except when
is marginal and thus data density is low. It is possible to further
model and CDF based on location (e.g., coordinates) but
we are not pursuing that direction for model simplicity. Instead,
we obtain the location-agnostic autocorrelation and CDF
information by taking the median of all sites as shown in Figure
Figure 4. Autocorrelation functions and CDF at
= 0.5 for 10
Figure 5. Median of all sites: (top) and (bottom) CDF color plots
Then, downscaling the hourly average
from Figure 5 to get the corresponding
and CDF and then generating the downscaled time series using
Equation (1) and (3). In practice, we generate 100 samples for
and then select one sample based on two criteria: (1)
the transition from the previous hourly segment to the generated
Figure 6. One example day at Boulder: (top) 1-hour GHI measurement (middle top) 1-min GHI measurement; (middle bottom) 1-min downscaled
GHI from hourly averages; and (bottom) 10-s downscaled GHI from hourly averages.
hourly segment should be smooth; (2) the average of the
generated hourly segment should be as close to
A downscaled GHI time series is shown in Figure 6. By
interpolating and CDF to the given
, this nonparametric
procedure is capable of downscaling
to an arbitrarily fine
time scale. The simulated time series in 1-min and 10-s
resolution are similar to the 1-min measurement in the sense
that (1) they share the same large-scale trends (i.e. hourly
means) and (2) the extent of variability changes with
is intermediate. In addition, we quantify the
cross-scale variabilities by calculating the power spectra of kt
and show that the power spectra of the 1-min measurement and
1-min simulation match each other closely across the frequency
range down to the Nyquist frequency (i.e. 0.5 min-1).
In addition, we also model the power generation resulted
from observed and copula-downscaled GHI to validate its
performance. We use PVLIB to model a hypothetical
horizontal single-axis tracking PV system, and then calculate
the power loss due to inverter clipping under various DC:AC
ratios. We follow the same technical procedures as described in
our companion paper . Figure 8 demonstrates the superiority
of the downscaled 1-min time series over the hourly averaged
observation data for those scenarios with high DC:AC ratios.
When the clipping loss error of hourly observation data
Figure 7. Power spectra of kt for 1-min and 1-hour measurement and 1-
monotonically increases with DC:AC ratios, it plateaus around
1% of underestimation for DC:AC > 1.6 for copula-downscaled
data, which verifies the capability of our copula approach to
faithfully reproduce intra-hour variabilities.
Figure 8. Box plots of relative error of curtailed AC power estimation
using interpolated hourly-averaged observation and 1-min copula-
downscaled time series for DC:AC ranging from 1.1 to 2.0.
In this study, we have applied Gaussian copula techniques
for temporal downscaling of solar irradiance, which have
already been proven very useful for wind energy applications.
Built on previous studies, we have proposed a pure
nonparametric approach with the autocorrelation and
distribution functions heuristically derived from 10
measurement stations across the continental United States. We
have demonstrated that the simulated 1-min time series is able
to accurately reproduce the power spectra of the original 1-min
observation data. In addition, the copula-downscaled data
generally perform well in estimating AC power when the
DC:AC ratio is high. This is in clear contrast to using hourly-
averaged observations, which leads to a monotonic increase in
error with increasing DC:AC ratio.
It is possible to further improve this method. For example,
the copula parameters as shown in Figure 5 can be further
regressed from location information or be tuned for individual
applications such as power estimation. In general, these results
demonstrate the effectiveness of the proposed methodology to
meet customer needs for high-frequency and high-quality
 K. Bradford, R. Walker, D. Moon and M. Ibanez, “A regression model
to correct for intra-hourly irradiance variability bias in solar energy
models”, 2020 47th IEEE Photovoltaic Specialists Conference (PVSC),
2020, pp. 2679-2682.
 M. Ahlstrom, J. Mays, E. Gimon, A. Gelston, C. Murphy, P. Denholm,
and G. Nemet, "Hybrid Resources: Challenges, Implications,
Opportunities, and Innovation," in IEEE Power and Energy Magazine,
vol. 19, no. 6, pp. 37-44, Nov.-Dec. 2021
 Munkhammar, J. and Widén, J. (2021), “Established mathematical
approaches for synthetic solar irradiance data generation,” in Bright, J.
M. (ed.), Synthetic Solar Irradiance: Modeling Solar Data, Melville,
New York: AIP Publishing, pp. 3-1–3-36
 Widén, J. and Munkhammar, J. (2019), “Spatio-temporal downscaling of
hourly solar irradiance data using Gaussian copulas,” Proceedings of the
46th IEEE Photovoltaic Specialists Conference (PVSC), Chicago, USA,
16–21 June 2019.
 P. Keelin, A. Kubiniec, A. Bhat, M. Perez, J. Dise, R. Perez and J.
Schlemmer, (2021) "Quantifying the solar impacts of wildfire smoke in
western North America," 2021 IEEE 48th Photovoltaic Specialists
Conference (PVSC), 2021, pp. 1401-1404, doi:
 Sklar, A. (1959) ‘Fonctions de répartition à n dimensions et leurs
marges’, Publ. Inst. Statist. Univ. Paris, 8, pp. 229–231.
 J. Huang, R. Perez, J. Schlemmer, A. Kubiniec, M. Perez, A. Bhat and
P. Keelin, (2022) "Enhancing temporal variability of 5-minute satellite-
derived solar irradiance data", 2022 IEEE 49th Photovoltaic Specialists
Conference (PVSC), Philadelphia, PA, USA.