Content uploaded by Navid Haghdadi
Author content
All content in this area was uploaded by Navid Haghdadi on Nov 20, 2015
Content may be subject to copyright.
Assessing the Representativeness of ‘Live’
Distributed PV Data for Upscaled PV Generation
Estimates
Navid Haghdadi
School of Photovoltaic and
Renewable Energy Engineering and
Centre for Energy and
Environmental Markets
UNSW Australia
Anna Bruce
School of Photovoltaic and
Renewable Energy Engineering and
Centre for Energy and
Environmental Markets
UNSW Australia
Iain MacGill
School of Electrical Engineering
and Telecommunications and
Centre for Energy and
Environmental Markets
UNSW Australia
Abstract—The incorporation of distributed PV generation data
into power system planning and operation is becoming
increasingly important as penetrations of PV systems on
Australian distribution networks continue to grow. However,
the availability of such data is currently very limited. The APVI
Live PV Map (Live Map) provides near real-time distributed PV
generation estimates in 57 different regions across Australia
based on some 6000 PV systems reporting their generation on-
line. This data has a wide range of potential applications
including, for example, network planning or PV performance
assessment. In this paper we investigate the characteristics of the
PV systems contributing to the Live Map database, in order to
assess its accuracy and suitability for providing total distributed
PV generation estimates for power system planning and
operational purposes. The study compares the sample of PV
systems contributing data to the Live Map database with the
total set of PV systems in Australia, according to the Clean
Energy Regulator’s (CER’s) database. Representativeness is
assessed in terms of PV system location, size, age, and inverter
manufacturer. The accuracy of the APVI Live Map PV
generation estimates for individual regions is assessed by
comparison with a separate database of historical interval
metered household PV generation from the Ausgrid network.
Finally, an example of the application of distributed PV data to
electricity network planning is provided to highlight the
potential value of these PV generation estimates.
Index Terms—Distributed PV data, upscaling, network planning
and operation, peak reduction
I. INTRODUCTION
There is a significant and growing penetration of PV
systems in Australian electricity distribution networks. The
total installed PV system capacity in early 2015 is some 100
times what it was in early 2009 [1], with many postcodes
already seeing PV system penetrations of greater than 40% of
households, and some postcodes greater than 50% [2] .As the
majority of these PV systems are not separately utility
metered, network businesses and other stakeholders have
limited visibility of these PV systems and their impact on the
power system.
The Australian Photovoltaic Institute (APVI) Live PV
Map [2] provides near real-time estimates of the average
performance (generation as a percentage of rated capacity) of
PV systems providing ‘live’ performance data across 57
different regions of Australia. The total generation from
distributed PV systems in each region can be estimated by
upscaling these generation measurements using total installed
PV capacity figures from the Clean Energy Regulator’s (CER)
database of installed PV systems.
To investigate the accuracy of the Live Map PV generation
estimates, we first assess how well the sample PV systems
represent the much larger set of PV systems in Australia by
comparing the characteristics of the sample with those of the
PV systems in the CER database, which comprises the great
majority of PV systems installed in Australia (since the CER
provides Renewable Energy Certificates to suitably registered
systems). We then compare PV generation estimates from the
Live Map against a separate historical database of PV
generation provided by Sydney’s main Distribution Network
Service Provider Ausgrid for several hundred household PV
systems across several regions of Sydney. Having established
the suitability of our upscaling approach, a case study
highlighting the potential use of such PV generation estimates
for network planning purposes is presented.
The rest of this paper is structured as follows: The APVI
Live Solar Map is introduced in Section II. The
representativeness of the sample of PV system contributing to
the Live Map is assessed in Section III by comparison to the
CER database. In Section IV, the accuracy of the APVI PV
performance estimates is examined and in Section V an
example of the application of APVI PV performance traces is
provided, followed by conclusions in Section VI.
II. APVI LIVE SOLAR MAP
The APVI live Solar Map offers near real-time state level
and 2-digit post code level PV performance estimates derived
from a sample of more than 6000 PV system inverters
installed in different locations across Australia that provide
on-line generation data. PV performance is defined here as the
total reported PV generation expressed as a percentage of the
rated capacity of all contributing PV systems in the region.
Where the total installed capacity of PV systems is known for
a region, as made available in the CER’s database, the real
time total PV generation by region can be estimated by
upscaling these average PV performance estimates. A key
question of course is how accurate, and hence suitable, this
upscaling approach is.
A. Statistics:
Table I shows the number, capacity and distribution by
Australian State of PV systems contributing to the Live Map
database. Almost 2% of all installed PV capacity in Australia
is currently contributing to this database.
Table I Contribution of PV systems in each Australian State
APVI Map CER Database %
Region Installations kW Installations kW Installations kW
NSW
+ACT 2610 30926 318159 915084 0.82% 3.4%
NT 43 2769 4304 19698 1.00% 14.1%
QLD 1651 15654 437118 1339109 0.38% 1.2%
SA 421 5263 183917 596651 0.23% 0.9%
TAS 130 1925 24747 84039 0.53% 2.3%
VIC 913 11653 255637 761615 0.36% 1.5%
WA 661 9723 181045 470602 0.37% 2.1%
Total 6429 77912 1404935 4186811 0.46% 1.9%
B. Data collection and preparation:
More than 6400 PV system inverters are contributing data
to the Live Map - around 2900 systems from PVOutput.org
[3], and another 3500 from the SMA Sunny Portal [4]
databases. Individual PV system output data has been
collected from PVOutput.org (the ‘sample’ data) since Sep
2009, and incorporated into the Live Map since May 2013.
The basic specifications of the installations are reported by the
owners, and can therefore be assessed for representativeness,
as discussed in the following section of this paper. The SMA
data was added to the Live Map in April of this year, and is
currently aggregated at the ‘2-digit postcode’ level (all
postcodes with the same first two digits are grouped e.g.
21XX), before being incorporated to the performance
estimates. Given this aggregation, these 3500 systems cannot
be assessed for representativeness at this time. From this point
on, therefore, sample will be used to denote PVOutput.org
systems.
Invalid data is of course inevitable due to monitoring and
communication errors. We control the quality of the data
through a range of techniques that seek to remove invalid data
such as physical outliers and constant values, and correct data
suffering identifiable problems such as time-shifted data
caused by incorrect daylight saving changes or local
monitoring time.
III. REPRESENTATIVENESS CHECKING
In this section we compare the characteristics of the
sample Live Map PV systems with the set of all Australian PV
systems in terms of installation parameters, size, age,
geographical location, and climate classification.
A. Tilt and Orientation
The optimum tilt and orientation angle for maximising
annual energy generation are easily calculated for a given
latitude. However, many small PV systems are installed at
non-optimal angles due to factors including house orientation
and roof slope. More than 58% of sample PV systems have
been reported to face toward north with another 27% facing
northeast or northwest. Most of the sample systems have a tilt
angle between 20 to 30 degrees, while about 40% of the
systems do not report tilt angle or report a likely wrong value.
Fig.1 and Fig. 2 show the distribution of PV system
orientation and tilt angles in the sample dataset.
Figure 1. Distribution of orientation angle of PV systems – EW denotes
systems with two arrays, which partly face east and partly face west
Figure 2. Distribution of tilt angle of PV systems – NaN values are those that
have reported a likely wrong value, or have not reported
Unfortunately there is no data on the tilt and orientation of
PV systems in Australia to check the level of
representativeness of the sample with regard to these factors.
However, the distributions seem reasonable with regard to
Australian housing stock.
B. System age
PV system performance gradually declines over time,
mainly due to physical degradation of the PV modules. The
degradation rate is typically calculated as the percentage
decline in output per year. The median degradation rate for
standard PV modules is reported to be about 0.5% per year
[5]. The average age of systems that report the installation date
in our sample dataset is 2.3 years, while the average age of
Australian PV systems is about 3.3 years according the CER
database. This one year age difference would imply the
sample systems would have on average about 0.5% better
performance compared to the average Australian PV system,
which can be considered a negligible difference. The
distribution of system age is shown in Fig. 3.
Figure 3. Age of systems in sample and all installed PV systems in Australia
C. System size
The performance of small scale and large scale PV
systems might differ due to a variety of factors including the
likelihood of better design, installation, monitoring, diagnosis,
and maintenance of large systems. Fig. 4 shows the
distribution of PV system size in the sample compared to the
CER database. Despite the generally larger size of the sample
PV systems, almost all of installed PV systems in Australia
are small-scale household rooftop PV systems and it might be
expected that there will be minimal differences between the
performance of sample and CER PV systems based on system
size.
Figure 4. Distribution of PV system size in CER and PVoutput database
D. Geographical location
Fig. 5 shows the distribution of PV capacity being
monitored, compared to the total capacity of PV systems
installed in each 2-digit postcode region. Partitioning of the
Australian PV map has been done on the basis of 2-digit
postcode regions due to the availability of postcode-level PV
data from CER. This method of partitioning tends to result in
smaller 2-digit postcode regions with relatively dense
population in temperate or subtropical climate zones, and
large, sparsely populated regions in desert and tropical
climates. Small regions with a good PV system sample allow
for more accurate upscaling, as all of the PV systems in the
area tend to experience similar weather at the same time, even
in relatively changeable temperate and subtropical climates.
While some 2-digit postcode regions are very large, in many
cases, these regions have relatively homogeneous climates and
stable weather across a large area, facilitating accurate
upscaling from dispersed PV systems. In addition, the small
number of PV systems installed and monitored in these
sparsely populated regions will make further subdivision of
the regions particularly challenging. Information about the
distribution of PV systems in different Australian weather
classes can also be helpful for further analysis of different
technologies’ performance in different climates.
E. Distribution in different climate zones
PV system performance is mainly dependent on solar
irradiation, with temperature being the second most important
variable. Wind speed has a more minor impact on
performance that is largely linked to module temperature,
while humidity and other climate factors may also contribute
to the degradation rate [5]. While monitoring of PV systems
within a region is required to estimate historical performance
in that region; for PV performance analysis and improved
modelling of climate-effects, it is important to consider the
distribution of PV systems across each climate class.
Figure 5. Total installed capacity of PV systems in each 2-digit postcode
region and percentage of capacity being monitored (circles)
Figure 6. Climate classification of Australia [6]
Figure 7. Proportion of PV systems in the CER and PVoutput sample by
climate class.
Fig. 6 shows the climate classification map of Australia
based on Koeppen climate classification [6]. Fig. 7 shows the
distribution of PV systems in each major climate class. As
expected, most of the CER and sample PV systems are located
in temperate and subtropical regions. Small number of
systems are located in tropical and grassland and very small
number in equatorial and desert area.
F. PV Inverter brand distribution
Analysis has been conducted on the distribution of PV
inverter and module brands in the sample dataset compared
with all of the PV systems installed in Australia, using data
provided by the CER. The APVI sample has quite a
representative distribution of inverter brands except for the
prevalence of a particular inverter brand, which we will call
Brand X (apparently provided by a particular installer in
Queensland who also facilitated system reporting on
PVoutput.org.) The distribution of inverter brand for the
sample and CER database (top 10 inverter manufacturers
excluding Brand X) is shown in Fig. 8. In order to ensure that
the prevalence of the Brand X systems in the sample was not
introducing any bias, we compared their average performance
for March 2014 – March 2015 with the performance of all the
other PV systems in the sample.
Figure 8. Distribution of different inverters in CER and sample database
Figure 9. Average performance of Brand X inverters compared to other
sample inverters in (a) postcode group 40XX, and (b) postcode group
41XX
Figure 10. Mean absolute error (%) of PV system output compared against
all PV systems and total installed capacity of Brand X inverters in each
region
Fig. 9 shows the correlation for two 2-digit postcode
regions. High correlation with negligible bias are observed in
all regions, which highlights the minimal impact of this Brand
X discrepancy in the sample database. Fig. 10 shows the Mean
Absolute Error (MAE) of the Brand X set performance
compared to the rest of the sample for each region, sorted by
the overall installed capacity of Brand X systems. In
calculating the MAE, only time intervals in which at least half
of the Brand X capacity is contributing to database are
considered. It highlights that regions with a considerable
number of Brand X inverters see fairly modest differences in
the performance of these compared with other systems.
IV. ESTIMATION ACCURACY
In this section performance calculated on the basis of data
from utility metered PV systems is compared to the PV
performance estimated using the sample data. Utility interval-
metered data is available for 300 PV systems in the Ausgrid
network area over the period of July 2010 until June 2013 [7].
These PV systems are located in 100 different postcodes in
New South Wales belonging to four 2-digit postcode groups
(20XX, 21XX, 22XX, 23XX). The average performance of
the Ausgrid PV systems in each 2-digit postcode region is
calculated by averaging the output of all of the systems and
dividing by total capacity of all systems reporting in each time
period. Fig. 11Figure 11 shows the correlation of estimated
versus measured PV performance for postcode group of 23XX
for three years. Good fits are obtained for the second and third
years of the analysis but not the first year.
Figure 11. APVI performance (vertical axis) vs. average Ausgrid PV system
performance (horizontal axis) for three years for 2-digit postcode 21XX
Figure 12. Trend of normalized error and number of systems contributing to
the database in 3 years for 4 locations in NSW
Fig. 12 shows the Normalized Root Mean Square Error
(NRMSE) of the estimation for each year in each postcode
group. NRMSE is calculated by dividing the root mean square
error of performance prediction by the range of real
performance. The average number of PV systems contributing
in each year is also shown. Since the Ausgrid data is not
available for 2013-15, a dashed line shows the predicted error
up to the present, based on the same rate of improvement in
NRMSE. As the number of systems contributing to the sample
increases, the accuracy of the estimate is improved, since
missing data, invalid data, and outliers caused by monitoring
have a smaller effect on the estimates. This seems likely to
explain the relatively poor fit seen for 2010-11 in Fig. 11, and
its improvement in later years. Further analysis of this type
could help to develop a method for determining the required
sample size to achieve a certain level of accuracy in PV
performance nowcasting over a given area.
V. SAMPLE CASE STUDY
With penetrations of PV systems on Australian electricity
networks continuing to climb, distributed PV generation traces
will be increasingly important for effective network planning
and operation. In this section we provide an example
application of distributed PV performance data to study PV
impacts at specific locations in distribution networks for
planning purposes.
The focus of this study is on the impact of PV on overall
peak load within the distribution network, Ausgrid’s zone
substation load data is studied for the year 2013-2014. Fig. 13
shows the estimated change in peak load due to existing PV
for the 20 highest peaks of the year at 148 zone substations in
Ausgrid’s NSW distribution network. The load data is sourced
from [8]. PV generation is estimated using PV performance
estimates based on the sample data for the relevant 2-digit
postcode region, and installed PV capacity for each zone
substation from Ausgrid’s Distribution Annual Planning
Report [9]. Of more than 190 zone substations operated by
Ausgrid, 148 with non-zero PV capacity installed and
available load data were selected. The percentage change in
peak load due to PV for the top 20 annual peak half hours for
each zone substation is shown. The data is sorted by PV
penetration level, calculated as the ratio of installed PV
capacity to average load over 2013-14 at the substation. The
figure shows that at almost all zone substations, peak load is
reduced during some of the top 20 peaks. For those zone
substations with a high PV penetration (probably mainly
serving residential customers), while peak load is significantly
reduced for some of the top 20 half hourly peaks, in most
cases, there is at least one of the top 20 with no change.
Around half of the zone substations with PV penetration less
than 10% (likely dominated by commercial and industrial
customers) show a reduction in peak load for all of the top 20
peaks. Separate assessment of the impact of PV in network
areas with different load characteristics will enable better
incorporation of distributed PV into network planning.
Figure 13. Change in peak value for 20 highest peaks of the year 2013-2014 in 148 NSW distribution substations. Boxplots show the range of change in peaks
while the solid line represents the estimated penetration level of PV for each substation.
VI. CONCLUSION
In this paper, the APVI Live Map PV performance data is
investigated to firstly assess the extent to which it is
representative of, and hence potentially suitable, for predicting
the output of PV systems in Australia. Secondly, a set of
historical utility interval metered PV data is compared with
performance predictions based on the APVI Live Map data, to
investigate the accuracy of the estimates in different regions.
Both techniques suggest that the APVI Live Map is providing
reasonably accurate estimates of PV generation. Finally, an
example of how upscaled distributed PV data might be applied
for network planning is provided. This study highlights that
distributed PV might already be contributing useful peak
demand reductions in some particular substations, while
providing little if no peak reduction in others.
ACKNOWLEDGMENT
The APVI Map and this research has been supported by
the Australian Renewable Energy Agency. Data has been
kindly provided for this study by PVoutput.org, SMA
Australia, via the SMA Sunny Portal, and the Australian Clean
Energy Regulator. We also thank Ausgrid for making NSW
zone substation load and 300 solar home data available for
public download on their website.
REFERENCES
[1] Postcode data for small-scale installations. [Online] Available:
http://www.cleanenergyregulator.gov.au/RET/Forms-and-
resources/Postcode-data-for-small-scale-installations
[2] Australian PV Institute (APVI) Solar Map, [Online] http://pv-
map.apvi.org.au/
[3] Live Photovoltaic data, [Online] www.pvoutput.org
[4] SMA Photovoltaic live data, [Online] https://www.sunnyportal.com
[5] D. C. Jordan and S. R. Kurtz, "Photovoltaic degradation rates—an
analytical review," Progress in photovoltaics: Research and
Applications, vol. 21, pp. 12-29, 2013.
[6] M. Kottek, J. Grieser, C. Beck, B. Rudolf, and F. Rubel, "World map
of the Köppen-Geiger climate classification updated,"
Meteorologische Zeitschrift, vol. 15, pp. 259-263, 2006.
[7] Solar Home Electricity Data, ed, 2014. Ausgrid [Online] Available:
http://www.ausgrid.com.au/Common/About-us/Corporate-
information/Data-to-share/Data-to-share/Solar-household-
data.aspx#.VacSUvmqpOB
[8] Ausgrid Distribution Zone Substation Information. [Online] Available:
http://www.ausgrid.com.au/Common/About-us/Corporate-
information/Data-to-share/Data-to-share/DistZone-
subs.aspx#.VYDqT8-eBzV
[9] Ausgrid Distribution and Transmission Annual Planning Report 2014
[Online] Available:
http://www.ausgrid.com.au/regulatory_investment_test