Content uploaded by Julian Ascencio-Vasquez
Author content
All content in this area was uploaded by Julian Ascencio-Vasquez on Dec 06, 2017
Content may be subject to copyright.
An Overall Data Analysis Methodology for PV Energy
Systems
Julián Ascencio Vásquez and Marko Topič
Faculty of Electrical Engineering, University of Ljubljana
Tržaška cesta 25, Si-1000 Ljubljana,
julian.ascencio@fe.uni-lj.si
Abstract In this paper, an overall data analysis methodology for PV systems is proposed, including the detection
of the main data error issues (e.g. gaps, timeshifts, etc.), followed by their correction and setting of the databases
(e.g. meteorological and electrical data corrections) and ending with key indicator calculations (e.g. performance
ratio, availability, etc.) from operational data, in order to underst and and to optimize PV power generation. This
methodology is demonstrated through the analysis of the operation of a 17 -kW PV power plant installed on the
rooftop of the Faculty of Electrical Engineering at the University of Ljubljana (Slovenia), w hich has been
providing data since 2010, after its grid connection.
Index Terms Photovoltaics, Data Analysis, PV Performance models
1 INTRODUCTION
Over the past few years, the use of photovoltaic
energy (PV) has begun to expand internationally. It is
possible to quantify this expansion by comparing the
installed capacity, which is increasing exponentially
year by year, favoured by government support (e.g.
Fit-in-tariff and Fit-in-Premium support schemes),
and the competitiveness of the technology, reaching
the “grid parity” in several countries [1]. Therefore,
having more and more PV systems connected to the
grid means that we are producing an increasingly
huge amount of data. The data should be analysed to
provide an understanding of the historical operation
of PV plants, and to plan the strategy for the future
maintenance and asset management of the whole
system or portfolio during its lifetime. Handling these
new "Big Data" sets presents a challenge since
conventional data analysis tools are not suitable to
manage such amount of data efficiently [2].
In this paper, an overall data analysis methodology
for PV systems is proposed, in order to get useful
information, as much as possible, from the available
raw data. The steps include the detection of the main
data error issues (e.g. gaps, timeshifts and sensor
drifting), followed by their correction and setting of
the databases (e.g. meteorological and electrical data
corrections) and ending with key indicator
calculations (e.g. performance ratio, availability, etc.)
from operational raw data, in order to understand and
to optimize PV power generation. This methodology
is demonstrated through the analysis of the operation
of a 17-kW PV power plant installed on the rooftop
of the Faculty of Electrical Engineering at the
University of Ljubljana (Slovenia), which has been
providing data since 2010, after its grid connection.
2 DATA ANALYSIS FOR PV SYSTEMS
As shown in Figure 1, all the data from operation and
production of PV systems can be analysed to get a
full performance evaluation and indicators that will
help to optimize the energy yield and the decision-
making process. This methodology is divided into 4
main steps, providing an exhaustive understanding of
the available data, possible data issues, the correction
to be applied and finally the key indicators in order to
infer the historical, current and future operation of
the PV plant.
The minimal requirements of any data analysis for
PV systems are the nominal values from datasheets
and design of the plant, the final energy yield (YF)
and main weather variables, Global Horizontal
Irradiance (GH) and Ambient Temperature (TAMB).
Several modelling alternatives are shown in this
paper, where some of them will require more
measured variables, thus providing a more accurate
and robust analysis.
Generally, the weather and electrical variables are
obtained as time series, containing data columns in
order with a periodic timeline (minutely, hourly, etc.)
2.1 Data Quality Check
As described in [3], full monitoring data is required
for the evaluation of energy yield and financial
performance during a given time period, but
monitoring systems occasionally may lose data due to
communication issues or system disruptions.
Therefore, a data quality check is necessary to know
the accuracy and real availability of the data.
59
Figure 1. Data Analysis Methodology for PV Energy Systems.
The following issues are usually conferred into the
available data and they should be at least identified
before any data processing to know about the
feasibility of calculations.
Missing data
Commonly, gaps are produced by communication
issues and disconnection of the equipment for
maintenance or similar. If the missing data rate is
high, the whole dataset, and hence, the data analysis
could be unreliable.
Timeshifts
Electronic devices work mostly with predetermined
time settings (for instance Central European Time or
Greenwich Mean Time) but when they are
synchronized with a local communication network,
the time is setup in the Local Time, including for
instance the Daylight Saving Time.
The difference in time settings is an inconvenient
when it comes to compare or to analyse the data
between equipment.
Calibration issues
Measurement equipment requires to be constantly
calibrated and updated. Through data analysis, these
operations have evidenced common human errors in
the calibration of devices, setting different
multiplicative conversion factors and delivering
wrong data.
Sensor drifting
The sensor drifting is the systematic degradation in
measurement devices presented during its running
time for example, the drifting of irradiance sensors)
[4]. Usually, and even if this issue is known, it isn’t
taken into account in PV module degradation rate
calculations (explained in 2.4.2).
2.2 System Modelling
The accurate estimation of electrical and weather
variables is fundamental because it will provide the
reference and expected values to reach the optimal
performance of the PV system. Several modelling
methodologies for PV Module Temperature and PV
Performance have been studied in literature.
2.2.1 PV Module Temperature models
The operating temperature of PV modules affects the
performance of the PV systems, typically decreasing
efficiency as temperature rises. If this variable isn’t
measured, it could be estimated from the ambient
temperature using a simple linear thermal model such
as the Ross Model [5] (see Equation 1).
POAAMBMOD GkTT *
(1)
Where TMOD is the module temperature [°C], TAMB
is the ambient temperature [ °C], GPOA is the plane-of-
array irradiance [W/m2] and k is an empirical fitting
coefficient.
Once having TMOD, a Power-Temperature correction
is recommended to remove the negative high
temperature effect on the AC output power ( PAC), as
shown in Equation 2. A constant temperature
coefficient of the electric power at maximum power
point (
) is assumed.
)]°C 25(*1[* MODACTTPP STC
(2)
60
2.2.2 Transposition of Irradiances
As explained in [6], when the irradiation is not
measured in the plane-off-array (POA) of the PV
modules, the GH has to be converted into the POA by
using transposition models. The conversion of the GH
into the GPOA encompasses two major steps. The GH
is first split into horizontal diffuse irradiance and
horizontal direct irradiance by the use of a
decomposition model. Then, the diffuse, direct, and
ground reflected irradiance components are
transformed to the POA and recombined again in
order to obtain the GPOA.
Tian model and Perez model are presented in [7]
and [8], respectively.
2.3 PV Performance Modelling
Nowadays it is possible to have a full PV system
modelling, nevertheless, the uncertainty of selected
methodologies is the relevant parameter to evaluate
the reliability of the estimated values. This section is
separated in IV Curve models, PV Energy yield
models and PV grid-connected inverter models.
2.3.1 IV Curve Models
Any PV device under illumination can be
characterized by the correlation between the
generated current and the voltage applied to the
circuit. This curve illustrates the behaviour of the PV
system, showing how the irradiance, temperature and
loads affect the production for a given set of
operation conditions.
In [9], De Soto model, also known as the Five-
Parameter Model is defined. This method uses only
data provided by manufacturers to predict the
behavior of a PV module.
2.3.2 PV Energy Yield Models
Mixing different inputs and empirical coefficients,
PV energy yield models are built -in depending on the
available on-site measurements and project
characteristics. 4 performance models have been used
in this article.
The simple linear model needs only the GPOA as
input and two empirical fitting coefficients calculated
from some training dataset of operational data.
POAPOAel GkGP LINEAR *)( 1mod
(3)
Where k1 is an empirical fitting coefficient.
Other sophisticated models include TMOD and GPOA,
which give better results since they include PV
module temperature correction and other external
factors into the calculations, such as the Simplified
King Model [10] and SRCL2014 [11].
Direct-Diffuse Performance Rating Model
The DDPR model proposed in [12], which separates
diffuse and direct irradiances (GP OAdir and GPOAdif f,
respectively), has shown better results for low
irradiance values or climates with high diffuse
irradiance component. This model requires as input
GPOAdir, GPOAdiff and also the angle of incidence, to
calculate the angular losses due to reflection.
2.3.3 PV grid-connected Inverter Models
Even if the current efficiency of inverters is high
(usually over 98%), the modelling can be relevant to
understand and identify losses through the electrical
circuit to the grid. The inverter performance models
aim to represent a complex behaviour
mathematically, and these models are based on
measurements made by testing labs that measure
efficiency at specific DC and voltage levels.
The Sandia Inverter Model and Driesse Inverter
Model provide a means to predict AC output power
(PAC) from DC input power (PDC) and its
methodologies are published in [13] and [14].
2.4 PV Performance Indicators
To have a shared understanding of the PV
performance between all the stakeholders, some
indicators are well-known in the industry to evaluate
the past and current operation, and to plan future
interventions and strategies for increasing the profit
and energy yield. Performance Ratio, Degradation
Rate and System Availability are explained below.
2.4.1 Performance Ratio
As defined in [15] and [16], the Performance Ratio
(PR) is the ratio of the final yield (YF) to the
reference yield (YR). It provides important
information about the overall effect of losses, it is
commonly used to evaluate the long term changes in
performance and shows how closely the PV system
operation approaches the “ideal” rated PV generator
operation.
REALSTCnom
STCREAL
R
F
HP
GE
Y
Y
PR *
*
(4)
Where EREAL is the energy yield [kWh], PSTCnom is
the rated power under STC, GSTC is irradiance
considered at STC (equal to 1000 [W/m2]) and HREAL
is the insolation captured on the plane-of-array
[Wh/m2].
2.4.2 Degradation Rate
This indicator is defined in [17] as the rate of
maximum performance reduction over time, denoted
as a positive quantity and commonly expressed in %
per year representing the reduction of maximum
61
output power expected from a PV cell, module, array
or system in the field.
Typically, a linear regression of long term energy
yield is calculated giving a simple approach about the
degradation rate, but robust and complex
methodologies are been studied to increase the
accuracy of this indicator [17] [18].
2.4.3 System Availability
The term “Availability” in PV systems is used to
describe its operational status and it is defined as “the
fraction of time in which a unit is capable of
providing service and accounts for outage frequency
and duration” [19].
Calculations are mainly time-based, dividing the
number of hours producing energy over the number
of hours when the plant is exposed to irradiation
levels above a Minimum Irradiance Threshold
(GMINavaila bility ) which value depends on the location
and system design. The advantage of the time-based
availability is the easy and simple calculation
method, widespread within the industry, which is able
to double check the guarantee availabilities with
contractors and manufacturers. However, this method
is unable to detect poor planning at preventive /
scheduled maintenance, it is unfit to take into account
the impact of having irradiance (or not having it)
during corrective maintenance, it cannot detect
potential performance issues during running periods,
and it is not possible to evaluate the impact of partial
curtailments from electrical grid operators.
TOH
ROH
tyAvailabili basedtime
(5)
To improve this indicator, Availability is generally
weighted on the energy that the plant is capable to
produce at different hours of the day.
Where, ROH is the real operating hours calculated
by the hours in operation over the irradiance
threshold; and TOH is the theoretical operating hours
calculated by the hours with GPOA > GMINavaila bility .
2.4.4 Calculation Uncertainties
Statistical indicators are commonly used to compare
different models and methodologies of analysis. In
this article, the comparison between PV Performance
Models is carried out taking in account the Root
Mean Square Error (RMSE, see Equation 12) and the
Percent Error (see Equation 13).
N
i
realisedforecast XX
N
RMSE
1
2
)(
1
(6)
Expected
Expectedal
Percent X
XX
Error
Re
(7)
3 DATA ANALYSIS AT SE LPVO
Operating since 2010, on the rooftop of the Faculty
of Electrical Engineering in Ljubljana, a 17 kW-PV
power plant is composed by one central grid-
connected inverter and 75 PV modules (poly-Si)
oriented 25° from south to the east, and a slope of
30°.
The following data analysis considers the data
recording during June 2017. During this period, time
data was completely clean, so the Data Quality Check
is not noticed in this paper. IV curve and Inverter
models are neglected for this study and Degradation
Rate is not calculated neither due to the reduced
quantity of data.
Main performance indicators were calculated and
shown in the Table 1.
Parameter
Value
Energy Yield
2.766 MWh
Performance Ratio
85.374 %
System Availability
100.0 %
Real Operating Hours
400.0 hours
Table 1: Main Performance Indicators of SE LPVO
during June 2017.
In Figure 2, the carpet plot shows the 5 minutes
time-basis PAC for the selected period of time. This
plot is visually easy to understand and can provide a
clear overview of the studied variable during the
timeline.
Figure 3 shows the correlation between GPOA and
PAC, evidencing the decrease of efficiency for high
module temperatures.
Figure 2. Carpet plot showing the PAC in 5 min time
basis for SE LPVO during June 2017.
62
Figure 3. Correlation between GPOA and PAC for SE
LPVO during June 2017.
Model
k1
k2
k3
k4
k5
k6
Linear Model
14.29
-
-
-
-
-
SRCL2014
0.31
2.701
0.999
3.602
-
-
Huld Model
0.035
-0.007
-0.008
-0.005
-0.002
5.4e-5
DDPR
17.87
-0.002
16.27
-
-
-
Table 2. Empirical Fitting coefficients of different
models extracted by Least Square Method from June
2017 data.
Table 2 shows the empirical fitting coefficients of
the PV Performance Models described above. The
fitting has been carried out using the Least Square
Method on all the available dataset (one month, June
2017). These coefficients are determined only for the
PV power plant studied and a new fitting process
must be realised for other locations, technologies and
circuit configurations.
After getting the empirical fitting coefficients, it is
possible to evaluate the performance models
comparing the correlations of GPOA with real PAC and
estimated PAC (see Figure 4), the Monthly Energy
Yield Error (using Percent Error formula) and the
RMSE. Results are shown in Table 3. DDPR model
performs the best match between real and expected
data in small time periods (for example, comparing 5
min time basis data), but this methodology is having
the highest Monthly energy yield error, due to a
lower performance for really high irradiance values.
A comparison of PV Performance Model is
illustrated in Figure 5. The histogram of energy yield
shows the accumulative sum of energy during the
time period observed. While the Percent Error of
each model is calculated in each histogram bin,
allowing a visual identification of better performance
by including more variables into the modelling. Also
the DDPR model shows its good performance for low
irradiance values.
Figure 4. Comparison between real outdoor data
and predicted data using DDPR model.
Model
RMSE
[%]
Monthly Energy
Yield Error [%]
Linear Model
3.591
- 1.555
SRCL2014
2.082
- 3.217
Huld Model
2.080
- 3.057
DDPR
1.779
- 3.709
Table 3: Performance indicators for PV
Performance Models.
Figure 5. Histograms of Energy Yield and Percent
Errors of PV Performance Models by GPOA (x-axis).
4 CONCLUSIONS
Data analysis for PV energy systems are primordial
to have a full understanding of the behavior,
operation and maintenance during the project’s
lifetime cycle, but it is also relevant to have clear
knowledge about the numerical data to be processed.
Because of that, a pre-processing data, called Data
Quality Check is recommended.
Also the modelling of different parts of the
electrical system (PV modules, inverters, etc.) can be
realised from operational data to get an accurate
estimation of the energy yield. These estimations
could support the production and alarm management
to identify possible mismatches and malfunctions of
the PV power plant.
63
The PV Performance Model to be used in a given
PV power plant depends on the available
measurement data but also in the location and climate
conditions that would affect the efficiency of
modelling for really low and/or high irradiance
values.
Finally, having good monitoring systems and
getting clean data are the key to have good and
accurate models of the PV systems, in order to give
us a full understanding of the PV power plant,
helping to take better decisions to optimize the
energy yield during the project’s lifetime cycle.
Acknowledgments
This project has received funding from the
European Union's H2020 programme SOLAR-
TRAIN under grant agreement No 721452.
References
[1] Photovoltaic European Technology &
Innovation Platform “Assessing the need for better
forecasting and observability of PV”, 2017.
[2] S. Vergura, “Big data and efficiency of PV
plants”, 20th IMEKO TC4 International Symposium,
Benevento, Italy, September 15-17, 2014.
[3] E.Koubli, D.Palmer, T.T. Betts R. Gottschalg.
“Assessment of PV system performance with
incomplete monitoring data”, 31st EU-PVSEC,
Hamburg, pp.1594-1597, 2015.
[4] X Y Li, “Degradation analysis of photovoltaic
modules based o operational data: effect of seasonal
pattern and sensor drifting”, International Conference
on New Energy and Future Energy system NEFES,
2016.
[5] P.M.Segado, J.Carretero, M.Sidrach-de-
Cardona, “Models to predict the operating
temperature of different photovoltaic modules in
outdoor conditions”, Prog. Photovolt: Res. Appl.
2015; 23:1267-1282, 2014.
[6] IEA PVPS, “Technical Assumptions Used in
PV Financial Models - Review of Current Practices
and Recommendations”, IEA PVPS Task 13, Subtask
1, Report IEA-PVPS T13-08:2017, 2017.
[7] A.Luque, S.Hegedus, “Handbook of
Photovoltaic Science and Engineering”, Second ed.
John Wiley & Sons, Ltd, 2011.
[8] R.Perez, P.Ineichen, R.Seals, “Modeling
daylight availability and irradiance components from
direct and global irradiance”, Solar energy 44 (5),
271-289, 1990.
[9] W.De Soto, S.A. Klein, W.A. Beckman,
“Improvement and validation of a model for
photovoltaic array performance”, Solar Energy, vol.
80, no. 1, pp. 78–88, 2006.
[10] T. Huld, R. Gottschalg, H. G. Beyer, M.
Topic, “Mapping the performance of PV modules,
effects of module type and data averaging”, Solar
Energy 84 (2), 324-338, 2009.
[11] S. Ransome, “How Simulation Program
kWh/kWp Predictions Depend on PV Model
Discrepancies”, 29th European Photovoltaic Solar
Energy Conference and Exhibition; 2980-2984, 2014.
[12] B. Kirn, K. Brecl, M. Topic, “A new PV module
performance model based on separation of diffuse
and direct light”, Sol. Energy 113, 212–220, 2015.
[13] D. L. King, G. Gonzalez, G. M. Galbraith, W.
E. Boyson, “Performance Model for Grid -Connected
Photovoltaic Inverters”, SAND2007-5036, Sandia
National Laboratories, 2007.
[14] A. Driesse, P. Jain, et al, “Beyond the Curves:
Modeling the Electrical Efficiency of Photovoltaic
Inverters”, Photovoltaic Specialists Conference,
PVSC'08. 33rd IEEE, 1-6, 2008.
[15] V. Sharma, S.S. Chandel, “Performance and
degradation analysis for long term reliability of solar
photovoltaic systems: a review”, Renewable and
Sustainable Energy Reviews 27, 753–767, 2013.
[16] K. Brecl, M. Topic, “Apparent performance
ratio of photovoltaic systems – A methodology for
evaluation of photovoltaic systems across a region”,
Journal of Renewable and Sustainable Energy 8,
2016.
[17] A. Phinikarides, N. Kindyni, G. Makrides, G.
E. Georghiou, “Review of photovoltaic degradation
rate methodologies”, Renewable and Sustainable
Energy Reviews 40, 143-152, 2014.
[18] D. C. Jordan, S. R. Kurtz, “Photovoltaic
Degradation Rates — An Analytical Review”,
Progress in photovoltaics: Research and Applications
21 (1), 12-29, 2013.
[19] G. T. Klise, R. Hill, A. Walker, A. Dobos, J.
Freeman, “PV System Availability as a Reliability
Metric – Improving Standards, Contract Language
and Performance Models”, IEEE 43rd Photovoltaic
Specialists Conference (PVSC), 2016.
64