Content uploaded by Jing Huang

Author content

All content in this area was uploaded by Jing Huang on Jun 14, 2022

Content may be subject to copyright.

Nonparametric Temporal Downscaling of GHI Clear-

sky Indices using Gaussian Copula

Jing Huang1, Marc Perez1, Richard Perez2, Dazhi Yang3, Patrick Keelin1 and Tom Hoff1

1Clean Power Research, Napa, CA, USA

2Atmospheric Sciences Research Center, SUNY, Albany, NY, USA

3School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin, Heilongjiang, China

Abstract—Small-scale variabilities of solar irradiance are

important for many applications. Downscaling approaches need

to be developed where only the averaged state of solar irradiance

is known. In this study, we investigate the use of copula for

temporally downscaling GHI clear-sky indices. With the

correlation structure and distribution information derived from

measurements at 10 stations across the United States, the copula

approach is capable of downscaling clear-sky indices from hourly

averages to any arbitrary fine scale whilst preserving its original

power spectra.

Keywords—solar irradiance, clear-sky index, variability,

photovoltaics, downscaling, copula, power spectra

I. METHODOLOGY

It generally holds that modeling accuracy increases with the

granularity of available data feed. For solar applications, the

requirement for granular solar datasets is often not met by either

sparse ground measurement stations or geostationary satellites.

For example, the solar industry discovers that minute-level solar

irradiance data are preferred to inform critical investment

decision-making of utility-scale solar photovoltaics (PV) farms

particularly when the DC capacity of solar power inverters is

much greater than the AC capacity of PV modules [1]. The

distributed nature of solar power generators and the increasing

need for customized and accurate control of hybrid systems as

envisioned by [2] also call for high-resolution solar data. As

such, downscaling methods are potentially useful to produce

realistic small-scale variabilities.

There are many methods which have been proposed for

synthetic solar data generation ranging from Fourier time series

and Markov chain models to computation-intensive machine

learning (see [3] for a review). In this study, we adopt a

statistical tool called copula, which has been recently applied to

model the clear-sky index (CSI) of solar irradiance recently.

Widén and Munkhammar [4] downscaled hourly CSI by

modeling its distribution as a two-state Gaussian mixture model

and assuming an exponential decay of correlation. In contrast,

we employ a nonparametric downscaling approach with the key

information of distribution and correlation being derived from

measurements at 10 reference ground stations.

A. Ground stations for reference and validation

Figure 1 provides information on the reference ground

stations used in this study. The measurement global horizontal

irradiance (GHI) data span the entire year of 2020 with a

temporal resolution of 1 minute. In addition, we obtain the

corresponding clear-sky GHI at all the reference stations from

SolarAnywhere® V3.5 [5]. The CSI, or kt, is then calculated as

the ratio of GHI to the clear-sky GHI. We cap kt values at 1.3.

Figure 1. 10 reference ground stations in the eastern United States are

used in this study.

B. Gaussian Copula

The concept of copula was originally proposed by Sklar [6].

A copula is a joint multivariate cumulative distribution function

(CDF) where all marginal probability distributions are uniform

within [0, 1] and Gaussian copula simply implies that the joint

CDF is assumed to be Gaussian.

Consider M related random variables of CSI (kt), KTt, KTt+Δt

, …, KTt+(M-1) Δt where each random variable represents a relative

temporal point in a time period of MΔt for an arbitrary site. For

example, if we split 1-min kt values at Boulder into hourly

blocks, KTt represents a random variable denoting the kt values

at the first minute of an hourly block and KTt+(M-1) Δt represents

that at the last minute, and M=60. Denoting the marginal CDF

of KT as

()

the M-dimensional joint CDF F can be linked to their marginal

distribution via a Gaussian copula,

()

Since copula requires all marginal distributions to be uniform

within [0,1], it can be further expressed as

()

where are uniformly distributed random variables

transformed from and is the inverse CDF of a standard

univariate normal distribution and is the joint CDF of

multivariate normal distribution constrained by the M×M

correlation matrix . Since random and correlated samples can

be drawn from a multivariate gaussian generator (e.g. using

random.multivariate_normal in Python numpy

package), they can then be converted to scenarios of kt via

inverse marginal CDF

.

Figure 2. Boulder site: (left) Color plot of correlation matrix of kt;

(right) The decay of correlation coefficient with Δt on the left y-axis

and the CDF of kt on the right y-axis.

As shown in Figure 2, the correlation matrix and the

CDF of kt can be obtained from the measured GHI time series

and the SolarAnywhere clear-sky GHI. The functional decay

of correlation is further determined empirically by

aggregating and taking medians of the correlation coefficients

with the same Δt from .

C. Nonparametric Downscaling

Figure 3. Boulder site: (left) and (right) CDF plots for

.

As mentioned earlier, the main difference of this study from

[3] in terms of downscaling is that we take a nonparametric

approach. Specifically for one station: (1) we calculate the

hourly-averaged CSI values, denoted as

; (2) we divide

into 10 quantile groups of equal size (i.e. with quantile ranges

of [0, 0.1], …, [0.9, 1]); (3) for each group, we calculate the

corresponding and CDF individually; (4) we linearly

interpolate and CDF for 26

values, i.e. 0.05, 0.1, …,

1.3. Note that we set the boundary values

1, and

and

.

We repeat the above procedures for all 10 reference stations.

Figure 4 shows the convergence of 10 sites for and CDF

at

= 0.5. The convergence is generally good except when

is marginal and thus data density is low. It is possible to further

model and CDF based on location (e.g., coordinates) but

we are not pursuing that direction for model simplicity. Instead,

we obtain the location-agnostic autocorrelation and CDF

information by taking the median of all sites as shown in Figure

5.

Figure 4. Autocorrelation functions and CDF at

= 0.5 for 10

individual sites.

Figure 5. Median of all sites: (top) and (bottom) CDF color plots

for

.

Then, downscaling the hourly average

simply involves

interpolating

from Figure 5 to get the corresponding

and CDF and then generating the downscaled time series using

Equation (1) and (3). In practice, we generate 100 samples for

each

and then select one sample based on two criteria: (1)

the transition from the previous hourly segment to the generated

Figure 6. One example day at Boulder: (top) 1-hour GHI measurement (middle top) 1-min GHI measurement; (middle bottom) 1-min downscaled

GHI from hourly averages; and (bottom) 10-s downscaled GHI from hourly averages.

hourly segment should be smooth; (2) the average of the

generated hourly segment should be as close to

as possible.

II. VALIDATION

A downscaled GHI time series is shown in Figure 6. By

interpolating and CDF to the given

, this nonparametric

procedure is capable of downscaling

to an arbitrarily fine

time scale. The simulated time series in 1-min and 10-s

resolution are similar to the 1-min measurement in the sense

that (1) they share the same large-scale trends (i.e. hourly

means) and (2) the extent of variability changes with

with a

peak when

is intermediate. In addition, we quantify the

cross-scale variabilities by calculating the power spectra of kt

and show that the power spectra of the 1-min measurement and

1-min simulation match each other closely across the frequency

range down to the Nyquist frequency (i.e. 0.5 min-1).

In addition, we also model the power generation resulted

from observed and copula-downscaled GHI to validate its

performance. We use PVLIB to model a hypothetical

horizontal single-axis tracking PV system, and then calculate

the power loss due to inverter clipping under various DC:AC

ratios. We follow the same technical procedures as described in

our companion paper [7]. Figure 8 demonstrates the superiority

of the downscaled 1-min time series over the hourly averaged

observation data for those scenarios with high DC:AC ratios.

When the clipping loss error of hourly observation data

Figure 7. Power spectra of kt for 1-min and 1-hour measurement and 1-

min simulation.

monotonically increases with DC:AC ratios, it plateaus around

1% of underestimation for DC:AC > 1.6 for copula-downscaled

data, which verifies the capability of our copula approach to

faithfully reproduce intra-hour variabilities.

Figure 8. Box plots of relative error of curtailed AC power estimation

using interpolated hourly-averaged observation and 1-min copula-

downscaled time series for DC:AC ranging from 1.1 to 2.0.

III. CONCLUSION

In this study, we have applied Gaussian copula techniques

for temporal downscaling of solar irradiance, which have

already been proven very useful for wind energy applications.

Built on previous studies, we have proposed a pure

nonparametric approach with the autocorrelation and

distribution functions heuristically derived from 10

measurement stations across the continental United States. We

have demonstrated that the simulated 1-min time series is able

to accurately reproduce the power spectra of the original 1-min

observation data. In addition, the copula-downscaled data

generally perform well in estimating AC power when the

DC:AC ratio is high. This is in clear contrast to using hourly-

averaged observations, which leads to a monotonic increase in

error with increasing DC:AC ratio.

It is possible to further improve this method. For example,

the copula parameters as shown in Figure 5 can be further

regressed from location information or be tuned for individual

applications such as power estimation. In general, these results

demonstrate the effectiveness of the proposed methodology to

meet customer needs for high-frequency and high-quality

irradiance data.

REFERENCES

[1] K. Bradford, R. Walker, D. Moon and M. Ibanez, “A regression model

to correct for intra-hourly irradiance variability bias in solar energy

models”, 2020 47th IEEE Photovoltaic Specialists Conference (PVSC),

2020, pp. 2679-2682.

[2] M. Ahlstrom, J. Mays, E. Gimon, A. Gelston, C. Murphy, P. Denholm,

and G. Nemet, "Hybrid Resources: Challenges, Implications,

Opportunities, and Innovation," in IEEE Power and Energy Magazine,

vol. 19, no. 6, pp. 37-44, Nov.-Dec. 2021

[3] Munkhammar, J. and Widén, J. (2021), “Established mathematical

approaches for synthetic solar irradiance data generation,” in Bright, J.

M. (ed.), Synthetic Solar Irradiance: Modeling Solar Data, Melville,

New York: AIP Publishing, pp. 3-1–3-36

[4] Widén, J. and Munkhammar, J. (2019), “Spatio-temporal downscaling of

hourly solar irradiance data using Gaussian copulas,” Proceedings of the

46th IEEE Photovoltaic Specialists Conference (PVSC), Chicago, USA,

16–21 June 2019.

[5] P. Keelin, A. Kubiniec, A. Bhat, M. Perez, J. Dise, R. Perez and J.

Schlemmer, (2021) "Quantifying the solar impacts of wildfire smoke in

western North America," 2021 IEEE 48th Photovoltaic Specialists

Conference (PVSC), 2021, pp. 1401-1404, doi:

10.1109/PVSC43889.2021.9518440.

[6] Sklar, A. (1959) ‘Fonctions de répartition à n dimensions et leurs

marges’, Publ. Inst. Statist. Univ. Paris, 8, pp. 229–231.

[7] J. Huang, R. Perez, J. Schlemmer, A. Kubiniec, M. Perez, A. Bhat and

P. Keelin, (2022) "Enhancing temporal variability of 5-minute satellite-

derived solar irradiance data", 2022 IEEE 49th Photovoltaic Specialists

Conference (PVSC), Philadelphia, PA, USA.