Available via license: CC BY 4.0

Content may be subject to copyright.

Article

Comparison of Training Approaches for Photovoltaic

Forecasts by Means of Machine Learning

Alberto Dolara †, Francesco Grimaccia †ID , Sonia Leva †, Marco Mussetta †ID

and Emanuele Ogliari †,*

Dipartimento di Energia, Politecnico di Milano, via La Masa 34, 20156 Milano, Italy;

alberto.dolara@polimi.it (A.D.); francesco.grimaccia@polimi.it (F.G.); sonia.leva@polimi.it (S.L.);

marco.mussetta@polimi.it (M.M.)

*Correspondence: emanuelegiovanni.ogliari@polimi.it; Tel.: +39-2399-8524

† These authors contributed equally to this work.

Received: 31 December 2017; Accepted: 28 January 2018; Published: 2 February 2018

Abstract:

The relevance of forecasting in renewable energy sources (RES) applications is increasing,

due to their intrinsic variability. In recent years, several machine learning and hybrid techniques

have been employed to perform day-ahead photovoltaic (PV) output power forecasts. In this paper,

the authors present a comparison of the artiﬁcial neural network’s main characteristics used in

a hybrid method, focusing in particular on the training approach. In particular, the inﬂuence of

different data-set composition affecting the forecast outcome have been inspected by increasing

the training dataset size and by varying the training and validation shares, in order to assess the

most effective training method of this machine learning approach, based on commonly used and

a newly-deﬁned performance indexes for the prediction error. The results will be validated over

a one-year time range of experimentally measured data. Novel error metrics are proposed and

compared with traditional ones, showing the best approach for the different cases of either a newly

deployed PV plant or an already-existing PV facility.

Keywords: photovoltaics; power forecasting; artiﬁcial neural networks

1. Introduction

In recent years, several forecasting methods have been developed for the output power of

renewable energy sources (RES) [

1

], addressing in particular the intrinsic variability of parameters

related to changing weather conditions, which directly affect the photovoltaic (PV) systems’ power

output [

2

]. This increasing attention is mainly due to the increasing shares of RES quota in power

systems, which involve novel technical challenges for the efﬁciency of the electrical grid [

3

].

In particular, predictive tools based on historical data can generally provide advantages in PV plant

operation [4,5], reduce excess production, and take advantage of incentives for RES production [6].

Among the commonly-used forecasting models, most aim to predict the expected power

production based on numerical weather prediction (NWP) systems forecasts [

7

]. This is a complex

problem with high degrees of non-linearity; for this reason, it is commonly approached by means

of advanced models and techniques—i.e., evolutionary computation [

8

], machine learning (ML) [

9

],

and artiﬁcial neural networks (ANNs) [

10

]. These are pseudo-stochastic iterative approaches deﬁned

in the class of computational intelligence techniques, and are usually employed to address pattern

recognition, function approximation, control, and forecasting problems [

11

]. Moreover, they are

generally able to handle incomplete or missing data and solve problems with a high degree

of complexity.

Recently, several ANN layouts have been developed to solve different tasks [

12

], such as: times

series prediction, complex dynamical system emulation [

13

], speech generation, handwritten digit

Appl. Sci. 2018,8, 228; doi:10.3390/app8020228 www.mdpi.com/journal/applsci

Appl. Sci. 2018,8, 228 2 of 16

recognition, and image compression, due to their ability to learn from extended time series of historical

measurements with acceptable error levels compared to other statistical and physical forecasting

models [

14

]. Currently, ANN employment in forecasting is quite straightforward due to the widespread

development of speciﬁc software applications [15–17].

In particular, the ﬁrst attempts at solar power forecasting by means of ANN started more than

a decade ago [

18

]. Generally, in the case of PV power output, common training data are the historical

measurements of power production from a PV facility and meteorological parameters unique to the

facility location, including temperature, global horizontal irradiance (i.e., the intensity of all the solar

radiation components on a horizontal surface) [

19

], and cloud cover above the facility. Additional

forecasted variables from the numerical weather predictions can also be considered, such as wind

speed, humidity, pressure, etc. [20].

Novel forecasting models were recently implemented by adding an estimate of the clear sky

radiation to the series of historical local weather data, as reported in [21].

Additionally, the effectiveness of ensemble methods was demonstrated in [

22

], thus giving additional

advantages in terms of results reliability and the implementation of efficient parallel computing techniques.

In their previous work [

23

], the authors conducted a detailed analysis to ﬁnd a procedure for

the best ANN layout and settings in terms of the number of layers, neurons, and trials for the PV

day-ahead forecast. Furthermore, evidence showed that the forecasting performance of ML techniques

is affected by the composition of the training data-set, as well as by input selection [24,25].

In this paper, a speciﬁc study is conducted on training data-sets in order to provide a more

detailed analysis of the effect of different approaches in the training data-set composition on the

day-ahead forecast of the PV power production. In particular, the authors present some procedures to

set-up the training and validation data-sets for the ANN used in physical hybrid method to perform

the day-ahead PV power forecast in view of the electricity market. Moreover, a novel error metric

is proposed and compared with traditional ones, in order to validate the best training approach in

different cases: indeed, the procedures outlined herein can be adopted to set-up data-sets based on

either historical data retrieved from an existing PV plant or on incremental data measurements in

a newly deployed PV plant. The test data set will be made up of the 24-hourly PV power values

forecasted one day-ahead.

The paper is structured as follows: Section 2provides an overview of the considered approaches

for the composition of the training database, considering both cases of historical data retrieved from an

existing PV plant and incremental data measurements in a newly deployed PV plant; Section 3presents

the methodology implemented to compare the different training approaches presented here, proposing

some new metrics aimed at evaluating the suitability of the proposed conﬁgurations in terms of error

performance and statistical behavior; Section 4presents the considered case study, which is used to test

the proposed training approaches: speciﬁc simulations and numerical results are provided in Section 5,

and ﬁnal remarks are reported in Section 6.

2. Training Database Composition Approaches

In order to perform the day-ahead forecast, the ANN needs to be trained. Hence, the amount

of historical data employed in the supervised learning determines the ANN forecast capability.

This amount of data is formed of samples exploited in the process of identifying the links among

neurons in the network which minimize the error in the forecast. In order to do this task, the whole

amount of available samples is divided in two groups:

•

the “training set” (or equally “training database”), which is used to adjust the weights among

neurons by performing the forecast on the same samples,

•

the “validation set”, which is used as a stopping criteria to avoid over-ﬁtting and under-ﬁtting.

It proves the goodness of the trained network on additional samples which have not been

previously included in the training set. The purpose of this step is to test the generalization

capability of the neural network on a new data-set .

Appl. Sci. 2018,8, 228 3 of 16

Learning occurs by updating elements within the network; thus, its response iteratively improves

to match the desired output. An ANN is trained when it has learned its task and converges to a solution.

To achieve this, some learning algorithms are commonly used:

•error back-propagation (EBP)

•gradient descent

•conjugate gradient

•evolutionary algorithms (genetic algorithms, particle swarm optimization, etc.)

Sometimes, according to the problem, the fastest algorithm gives solutions rapidly converging

on local minima; however, this does not guarantee the maximum accuracy. In addition, it should be

considered that a large training set size provides a better sample of the trends improving generalization,

but it generally slows down the learning process. If an ANN is not properly trained or sized, there are

usually undesired results, such as “overﬁtting” and “underﬁtting” [

26

]. Using ANN ensembles by

averaging their outputs has been demonstrated to be beneﬁcial, as it helps to avoid chance correlations

and the overtraining problem [27,28].

However, to choose both the most suitable learning algorithm and the proper size of the training

set which minimizes the error is a challenge which should be faced in each case study [29–31].

In this paper, we inspect how the behavior on the day-ahead forecast is inﬂuenced by the possible

characteristics summarized in Figure 1. The ﬁrst characteristic of the data-set is either “incremental”

when the elements belonging to the training data-set are progressively available over time and the

training set size gradually increases or “complete” if an already existing database of samples is

available. The second characteristic refers to the way the data-set is used for training the ANN.

As the forecast-making is mainly a stochastic process, the choice could be to use entirely the same

training data set for each forecast of the ensemble (we refer to the single forecast with the term “trial”,

and in this case, all the trials will be the same in the ensemble) or to shufﬂe its elements, grouping them

in smaller subsets adopted each time to separately train a different ANN (in this other case, each trial

is independent, as all the training data-sets are different). Finally, the mean of the resulting output is

usually calculated in the so-called “ensemble” forecast. The third characteristic is related to the order of

the hourly samples that constitute the training data-set. They can appear either consecutively displaced

as the chronological time series they belong to or they can be randomly grouped and mixed up.

Figure 1. Main features of the ANN training data-sets.

The combination of these characteristics results in different ANN training methods, which could

affect the forecast.

All of the assumptions exposed here are valid, in general terms, for all ANN-based methods.

In this speciﬁc paper, authors employ the Physical Hybrid Artiﬁcial Neural Network (PHANN) method

for the day-ahead forecast, as described in detail in [

14

,

21

]. This procedure mixes the physical Clear

Sky Radiation Model (CSRM) and the stochastic ANN method as reported in Figure 2.

Appl. Sci. 2018,8, 228 4 of 16

Figure 2. Physical Hybrid Artiﬁcial Neural Network (PHANN) method schematic diagram.

2.1. Incremental Training Data-Set

An incremental data-set occurs when the available samples are limited. Usually this is the case

of real-time or time-dependant processes, and data can be acquired only progressively. Consider

for example our case study when the monitoring system starts recording data from the ﬁrst day of

operation of the PV plant: initially a small amount of data is recorded, and if we acquire hourly

samples, 24 samples are added to the historical data-set every day.

In this database composition (e.g., see Figure 3), the days which can be employed for ANN

training are those available starting from the PV plant commissioning (day 1) until the

kd

day before

the forecast (day

Xd

). As a consequence, the size of the training database will increase over time.

In order to supply the data-set to the network for the training step, samples can be arranged in different

methods. Those adopted in this paper are listed in Figure 4, and determine different results in the

forecast. A short description is given in the following:

•

Method A employs the same chronologically consecutive samples by grouping the 90% of the

samples which are closest to the forecast day for the training set and the remaining 10% of the

samples for the validation set.

•

Method A* employs the same chronologically consecutive samples by grouping the 90% of the

samples for the training set and the 10% of the samples which are closest to the forecast day for

the validation set.

•

Method B employs the samples by randomly grouping them separately, 90% for the training set

and 10% for the validation set.

Figure 3.

Hourly samples are progressively available in an incremental training database. PV: photovoltaic.

Appl. Sci. 2018,8, 228 5 of 16

Figure 4. Training database composition for methods A, A*, and B.

In the ﬁrst two methods (A and A*), the effect of the proximity of the training set to the forecast

day is examined (implying seasonal variations on the parameter), inspecting how the forecast is

affected by the proximity of the samples employed in the training rather than in the validation step.

For example, it is clear that forecasting spring days cannot be accurate if the training samples belong to

the past autumn or winter, and the same consideration applies for the validation. Reasonably, we are

expecting that the further the samples of the validation are, the less accurate the forecast. Obviously,

this problem is not addressed in Method B, as samples are randomly chosen.

2.2. Complete Training Data-Set

In the complete data-set, an extended amount of samples is available, but it might belong to

a period of time which is time-wise distant from the days of the forecast, as it is shown in Figure 5.

In this case, samples which have to be employed for the ANN training can either be mixed (as shown

in Figure 6) each time that a trial is performed (this happens when trials are independent with Method

C1), or each trial depends on the same training data-set with Method C2.

Figure 5.

Hourly samples belonging to an extended period of time are available in a complete

training database.

Appl. Sci. 2018,8, 228 6 of 16

The complete list of the training methods which have been adopted in this paper is in Table 1.

The different shares of the training and the validation set, 90% and 10%, respectively, have been set up

in previous works.

Figure 6.

Hourly samples belonging to an extended period of time in a complete training database are

randomly mixed.

Table 1.

Different methods for the composition of the ANN training data-sets which have been

analysed. †(90%ts 10%vs)ts = training set; vs = validation set.

Method Data-Set Trials Samples

A Incremental Dependent Consecutive (10%vs 90%ts)

A* Incremental Dependent Consecutive (90%ts 10%vs)

B1 Incremental Independent Random †

B2 Incremental Dependent Random †

C1 Complete Independent Random †

C2 Complete Dependent Random †

3. Evaluation Indexes

The effect of the different methods of training is investigated by means of some evaluation indexes.

These indexes aim at assessing the accuracy of the forecasts and the related error, and it is therefore

necessary to deﬁne the indexes. There is a wide variety of existing deﬁnitions of the forecasting

performance, and technical papers present many of these indexes; hence, we will report some of the

most commonly used deﬁnitions in the literature ([32–34]).

The hourly error

eh

is the starting deﬁnition given as the difference between the hourly mean

values of the power measured in the

h

-th hour

Pm,h

and the forecast

Pp,h

provided by the adopted

model [32,35]:

eh=Pm,h−Pp,h(W). (1)

From the hourly error expression and its absolute value

|eh|

, other deﬁnitions can be inferred;

i.e., the well-known mean absolute percentage error (MAPE):

MAPE =1

N

N

∑

h=1

eh

Pm,h

·100 , (2)

where

N

represents the number of samples (hours) considered: usually it is calculated for a single day,

month, or year.

Appl. Sci. 2018,8, 228 7 of 16

Since the hourly measured power

Pm,h

signiﬁcantly changes during the same day (i.e., sunrise,

noon, and sunset), for the sake of a fair comparison, in this paper the authors preferred to consider the

normalized mean absolute error N M AE%:

NMAE%=1

N

N

∑

h=1

eh

C

·100 , (3)

where the percentage of the absolute error is referred to the rated power

C

of the plant, in place of the

hourly measured power Pm,h.

In this paper we also adopted the mean value of all the

NMAE%,d

, which refers to the

d

-th day,

calculated over the whole period. Therefore, we introduce

NMAE%

, which is the mean of all the daily

NMAE%,dobtained with a given data-set:

NMAE%=1

D

D

∑

d=1

NMAE%,d. (4)

The weighted mean absolute error WMAE%is based on total energy production:

WMAE%=∑N

h=1|eh|

∑N

h=1Pm,h

·100 . (5)

The normalized root mean square error

nRMSE

is based on the maximum hourly power output

Pm,h

:

nRMSE%=q∑N

h=1|eh|2

N

max(Pm,h)·100 . (6)

This error deﬁnition is the well-known root mean square error (

RMSE

) which has been normalized

over the maximum hourly power output

Pm,h

measured in the considered time range, for the sake of

a fair comparison.

NMAE%

is largely used to evaluate the accuracy of predictions and trend estimations. In fact,

often relative errors are large because they are divided by small power values (for instance the low

values associated to sunset and sunrise): in such cases,

WMAE%

could result very large and biased,

while NMAE%, by weighting these values with the capacity of the plant C, is more useful.

The

nRMSE%

measures the mean magnitude of the absolute hourly errors

eh,abs

. In fact, it gives

a relatively higher weight to larger errors, thus allowing particularly undesirable results to be

emphasized. In fact, if we consider the daily trends of the aforementioned indexes (which are shown

in Figure 7), it can be seen how they are correlated, while in the same Figure 8, the scatterplot of their

normalized values with the relative maxima clearly shows these correlations between the three error

indexes. Furthermore, the Pearson–Bravais correlation index

ρxy

[

36

] has been calculated to underline

the direct relationship among the error indexes:

ρxy =∑N

h=1(xi−µx)(yi−µy)

q∑N

h=1(xi−µx)2q∑N

h=1(yi−µy)2. (7)

However, as it is shown in Figure 7, the daily evaluation indexes expressed in Equations

(3)

,

(5)

, and

(6)

could vary a great deal, being unable to give complete information “at a glimpse” of

the accuracy of the prediction. For example, consider Figures 9and 10, where the forecasts and the

relevant evaluation indexes for 1 April and 4 November 2014, respectively, are depicted. In both cases,

daily

NMAE%

values are quite low (around 2–3%) and a forecast assessment solely based on this basis

could be misleading.

Appl. Sci. 2018,8, 228 8 of 16

Actually, the 1 April was quite a sunny day and the bell-shaped hourly power curve which has

been forecast—the red starred line—was accurately following the measured one—the blue circled line.

The cloudy winter day 4 November 2014 was a different story; in fact, the forecast red curve is biased

on the noon hours, while the actual blue curve in the morning. However, in the second day, the daily

NMAE%

value is lower. This is owing to the normalisation of the mean absolute error with the net

capacity of the plant. Regarding the other evaluation indexes, even if they are correlated, they can

exceed the 100% cap, as happens for example to WMAE%in Figure 7on day 72.

Figure 7.

Example of the daily errors trend.

NMAE

: normalized mean absolute error;

nRMSE

:

normalized root mean square error; WM AE: weighted mean absolute error.

Figure 8. Normalized daily errors correlated in a scatterplot.

Appl. Sci. 2018,8, 228 9 of 16

Figure 9.

Example of a sunny day forecast—1 April 2014—with the relevant evaluation indexes.

EM AE%: envelope-weighted mean absolute error.

Figure 10.

Example of a cloudy day forecast—4 November 2014—with the relevant evaluation indexes.

Starting from these assumptions, and in view of a more useful summary evaluation, an additional

performance index is proposed, aiming to provide a value between 0% and 100% of the forecast

accuracy. Therefore the envelope-weighted mean absolute error,EM AE%is deﬁned as:

EM AE%=∑N

h=1|eh|

∑N

h=1max(Pm,h,Pp,h)·100 , (8)

where the numerator is the sum of the absolute hourly errors, as in WM AE%, while the denominator

is the sum of the maximum between the forecast and the measured hourly power. In particular,

this deﬁnition is consistent with a graphical representation of the error, where the numerator

corresponds to the yellow area shown in Figures 9and 10 and the denominator is the sum of the

gray and yellow areas highlighted in the same ﬁgures. With reference to the above-mentioned days,

while the two

NMAE%

values are nearly the same, the

EM AE%

is 11% in the ﬁrst case and 40% in the

second case, and it never exceeds 100%.

Appl. Sci. 2018,8, 228 10 of 16

As with the daily

NMAE%,d

, in this study we also introduced the mean value of all the

EM AE%,d

,

which are referred to the

d

-th day, calculated over the whole period. Therefore,

EM AE%

is the mean of

all the daily EMAE%,dfor a given data-set:

EM AE%=1

D

D

∑

d=1

EM AE%,d. (9)

4. Case Study

Experimental data for this study were taken from the laboratory SolarTechLab [

37

] located in

Milano, Italy (coordinates: 45

◦

30

0

10.588

00

N; 9

◦

9

0

23.677

00

E). In 2014, the DC output power of a single

PV module with the following characteristics was recorded:

•PV technology: Silicon mono crystalline,

•Rated power (Net capacity of the PV module): 245 Wp ,

•Azimuth: −6◦300(assuming 0◦as south direction and counting clockwise),

•Solar panel tilt angle (β): 30◦,

The monitoring activity of the PV system parameters lasted from 8 February to 14 December

2014, but the employable data, without interruptions and discontinuities, amount to 216 days.

These 24-hourly samples were used as the database for the forecasting methods comparison.

The PV module was linked to the electric grid by a micro-inverter ABB MICRO-0.25-I- OUTD [

38

],

guaranteeing the optimization of the production. Its operating parameters—DC power included—were

transmitted to a workstation for storage using a ZigBee protocol wireless connection, in real-time.

An important issue that arises is how to avoid missing values and outliers. A suitable pre-processing

procedure, which has already been developed and described in detail in [39], is applied here.

The weather forecasts employed were delivered by a weather service each day at 11 a.m. of the

day before the forecasted one, for the exact location of the PV plant. The historical hourly database of

these parameters was used to train the network and includes the following parameters:

•Tamb ambient temperature (◦C),

•GH I global horizontal irradiance (W/m2),

•GPO A global irradiance on the plane of the array (W/m2),

•Wswind speed (m/s),

•Wdwind direction (◦),

•Ppressure (hPa),

•Rprecipitation (mm),

•Cccloud cover (%),

•Ctcloud type (Low/Medium/High).

In addition to these parameters, in order to train the PHANN method, the local time

LT

(hh:mm)

of the day and the Clear Sky Radiation model

CSRM

(W/m

2

) were also provided. These are the

eleven inputs of the ANN. Regarding the speciﬁc settings of the ANN, exception made for the training

database composition (as presented in Section 2), they were selected on the basis of a sensitivity

analysis, as outlined in a previous study [23]. The ANN settings adopted in this study were:

•neurons in the input layer: 11,

•neurons in the ﬁrst hidden layer: 11,

•neurons in the second hidden layer: 5,

•neurons in the output layer: 1,

•training algorithm: Levenberg–Marquardt,

•activation function: sigmoid,

•number of trials in the ensemble forecast: 40.

Appl. Sci. 2018,8, 228 11 of 16

The share of the data included in the training and in the validation steps have been adjusted by

means of another sensitivity analysis. Independently of how many days were employed in the training,

the database was divided into two groups containing different amounts of data. Thereafter, they were

provided ﬁrst to train the network and the remaining data for the validation. Finally, the ensemble

forecast was performed. This procedure was followed several times, progressively increasing the

number of days employed in the training-process. The above-mentioned performance indexes over

the whole year were calculated, and according to the different shares adopted between training

and validation, the results are plotted in Figures 11 and 12. The results depicted here refer to the

training method C1, and the reason for this choice will be explained later in Section 5. As can be seen,

the best results are always guaranteed by adopting 90% of data for the training and the remaining

10% for the validation (the blue rhomboidal curve). However, the zoom in the top-right corner of

Figure 11 shows that, for the largest amount of data (210 days), also 80% of data for the training and

20% for the validation (the purple dotted curve) provided similar results to the previously described

curve. The same

NMAE%

trends were obtained in Figure 12, where the trend of

EM AE%

is shown as

a function of the data-set size and the shares of training and validation set.

Figure 11. N MAE%as a function of the dataset size.

Figure 12. EMAE%as a function of the dataset size.

Appl. Sci. 2018,8, 228 12 of 16

The same analysis is performed for the training Method A* by comparing the results of

NMAE%

in Figure 13 and the new error deﬁnition Equation (9) shown in Figure 14.

Figure 13. N MAE%as a function of the dataset size.

Figure 14. EMAE%as a function of the dataset size.

5. Results

The study carried on so far aimed to compare different methods in the data-set composition

employed for the training of the ANN, highlighting the most effective ones. The obtained results of

the day-ahead forecasts were analysed by the indexes shown in Section 3and led to the following

results. The graph in Figure 15 shows the trend of the

NMAE%

calculated for the methods in

the training-set composition, according to increasing data-set sizes. The best training method,

which globally performed better with all the data-sets considered, was undoubtedly C1. Instead,

in the short-range training, with only 10 days available in the data-set, method C2 scored the worst

result with

NMAE%

equal to 6.079. In accordance with the increasing data-sets method, C2 aligned

with C1 above 90–130 days. The same trends of the other evaluation indexes are equally shown in

Figures 16–18 and conﬁrm the same results. From this perspective, method C2 scored the worst result,

with

EM AE%

equal to 36.51. According to the

NMAE%

shown in Figure 15, methods B1 and B2

generally performed pretty much the same.

Appl. Sci. 2018,8, 228 13 of 16

Figure 15. N MAE%as a function of the dataset size.

Figure 16. nRMSE%as a function of the dataset size.

Figure 17. W M AE%as a function of the dataset size.

Appl. Sci. 2018,8, 228 14 of 16

Figure 18. EMAE%as a function of the dataset size.

As a general comment on the reported results, it can be stated that method A is best suited

when the availability of historical data is limited (e.g., newly deployed PV plant), while method

C1 appears to be most effective in the case of a greater availability of data (e.g., at least one year of

power measurements from the considered PV facility). Generally speaking, ensembles composed of

independent trials are most effective. The performance of methods B1 and B2 was halfway between

A and C, and their effectiveness in the case of newly deployed PV plants became signiﬁcant after

a minimum period of measurement data accumulation (above 60 days).

6. Conclusions

This paper has presented a speciﬁc study aimed to analyze the effect of different approaches

in the composition of a training data-set for the day-ahead forecasting of PV power production.

In particular, the authors proposed different procedures to set-up the training and validation data-sets

for the ANN used in physical hybrid method to perform the power forecast in view of the electricity

market. The here-outlined approaches can be adopted to set-up data-sets based on either historical

data retrieved from an existing PV plant or on incremental data measurements in a newly deployed PV

facility. In particular, the inﬂuence of different data-set compositions on the forecast outcome has been

inspected by increasing the training dataset size and by varying the training and validation shares,

in order to assess the most effective training method of this machine learning approach, based on

commonly used and newly-deﬁned performance indexes for the prediction error. The reported results

have been validated over a 1-year time range of experimentally measured data from a real PV power

plant, considering a comparison of various error measures and showing the best approach for the

different cases of either newly deployed or already existing PV facilities.

Author Contributions:

In this research activity, all of the authors were involved in the data analysis and

preprocessing phase, the simulation, the results analysis and discussion, and the manuscript’s preparation.

All of the authors have approved the submitted manuscript. All the authors equally contributed to the writing of

the paper.

Conﬂicts of Interest: The authors declare no conﬂict of interest.

References

1.

Pelland, S.; Remund, J.; Kleissl, J.; Oozeki, T.; De Brabandere, K. Photovoltaic and solar forecasting: State of

the art. IEA PVPS Task 2013,14, 1–36.

2.

Paulescu, M.; Paulescu, E.; Gravila, P.; Badescu, V. Weather Modeling and Forecasting of PV Systems Operation;

Springer Science & Business Media: Berlin, Germany, 2012.

Appl. Sci. 2018,8, 228 15 of 16

3.

Raza, M.Q.; Nadarajah, M.; Ekanayake, C. On recent advances in PV output power forecast. Sol. Energy

2016,136, 125–144.

4.

Faranda, R.S.; Hafezi, H.; Leva, S.; Mussetta, M.; Ogliari, E. The Optimum PV Plant for a Given Solar

DC/AC Converter. Energies 2015,8, 4853–4870.

5.

Dolara, A.; Lazaroiu, G.C.; Leva, S.; Manzolini, G.; Votta, L. Snail Trails and Cell Microcrack Impact on PV

Module Maximum Power and Energy Production. IEEE J. Photovolt. 2016,6, 1269–1277.

6.

Omar, M.; Dolara, A.; Magistrati, G.; Mussetta, M.; Ogliari, E.; Viola, F. Day-ahead forecasting for photovoltaic

power using artificial neural networks ensembles. In Proceedings of the 2016 IEEE International Conference

on Renewable Energy Research and Applications (ICRERA), Birmingham, UK, 20–23 November 2016;

pp. 1152–1157.

7.

Cali, Ü. Grid and Market Integration of Large-Scale Wind Farms Using Advanced Wind Power Forecasting: Technical

and Energy Economic Aspects; Erneuerbare Energien und Energieefﬁzienz—Renewable Energies and Energy

Efﬁciency; Kassel University Press: Kassel, Germany, 2011.

8.

Ni, Q.; Zhuang, S.; Sheng, H.; Kang, G.; Xiao, J. An ensemble prediction intervals approach for short-term

PV power forecasting. Sol. Energy 2017,155, 1072–1083.

9.

Simonov, M.; Mussetta, M.; Grimaccia, F.; Leva, S.; Zich, R. Artiﬁcial intelligence forecast of PV plant

production for integration in smart energy systems. Int. Rev. Electr. Eng. 2012,7, 3454–3460.

10.

Duan, Q.; Shi, L.; Hu, B.; Duan, P.; Zhang, B. Power forecasting approach of PV plant based on similar

time periods and Elman neural network. In Proceedings of the 2015 Chinese Automation Congress (CAC),

Wuhan, China, 27–29 November 2015; pp. 1258–1262.

11.

Gardner, M.; Dorling, S. Artiﬁcial neural networks (the multilayer perceptron)—A review of applications in

the atmospheric sciences. Atmos. Environ. 1998,32, 2627–2636.

12.

Nelson, M.; Illingworth, W. A Practical Guide to Neural Nets; Physical Sciences; Addison-Wesley:

Boston, MA, USA, 1991; 316p.

13.

Bose, B.K. Neural Network Applications in Power Electronics and Motor Drives—An Introduction and

Perspective. IEEE Trans. Ind. Electron. 2007,54, 14–33.

14.

Ogliari, E.; Dolara, A.; Manzolini, G.; Leva, S. Physical and hybrid methods comparison for the day ahead

PV output power forecast. Renew. Energy 2017,113, 11–21.

15.

Elder, J.F.; Abbott, D.W. A comparison of leading data mining tools. InProceedings ofthe Fourth International

Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 27–31 August 1998; Volume 28.

16.

Bergstra, J.; Breuleux, O.; Bastien, F.; Lamblin, P.; Pascanu, R.; Desjardins, G.; Turian, J.; Warde-Farley, D.;

Bengio, Y. Theano: A CPU and GPU math compiler in Python. In Proceedings of the 9th Python in Science

Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 1–7.

17.

Collobert, R.; Kavukcuoglu, K.; Farabet, C. Torch7: A matlab-like environment for machine learning. In Proceedings

of the BigLearn, NIPS Workshop, Sierra Nevada, Spain, 16–17 December 2011; Number EPFL-CONF-192376.

18.

Kalogirou, S. Artiﬁcial Intelligence in Energy and Renewable Energy Systems; Nova Publishers: Hauppauge,

NY, USA, 2007.

19.

Duffie, J.A.; Beckman, W.A. Solar Engineering of Thermal Processes; John Wiley & Sons: Hoboken, NJ, USA, 2013.

20.

Gandelli, A.; Grimaccia, F.; Leva, S.; Mussetta, M.; Ogliari, E. Hybrid model analysis and validation for

PV energy production forecasting. In Proceedings of the 2014 International Joint Conference on Neural

Networks (IJCNN), Beijing, China, 6–11 July 2014; pp. 1957–1962.

21.

Dolara, A.; Grimaccia, F.; Leva, S.; Mussetta, M.; Ogliari, E. A Physical Hybrid Artiﬁcial Neural Network for

Short Term Forecasting of PV Plant Power Output. Energies 2015,8, 1138–1153.

22.

Rana, M.; Koprinska, I.; Agelidis, V.G. Forecasting solar power generated by grid connected PV systems

using ensembles of neural networks. In Proceedings of the 2015 International Joint Conference on Neural

Networks (IJCNN), Killarney, Ireland, 12–16 July 2015; pp. 1–8.

23.

Grimaccia, F.; Leva, S.; Mussetta, M.; Ogliari, E. ANN Sizing Procedure for the Day-Ahead Output Power

Forecast of a PV Plant. Appl. Sci. 2017,7, 622.

24.

Netsanet, S.; Zhang, J.; Zheng, D.; Hui, M. Input parameters selection and accuracy enhancement techniques

in PV forecasting using Artiﬁcial Neural Network. In Proceedings of the 2016 IEEE International Conference

on Power and Renewable Energy (ICPRE), Shanghai, China, 21–23 October 2016; pp. 565–569.

Appl. Sci. 2018,8, 228 16 of 16

25.

Panapakidis, I.P.; Christoforidis, G.C. A hybrid ANN/GA/ANFIS model for very short-term PV power

forecasting. In Proceedings of the 2017 11th IEEE International Conference on Compatibility, Power Electronics

and Power Engineering (CPE-POWERENG), Cadiz, Spain, 4–6 April 2017; pp. 412–417.

26.

Tetko, I.V.; Livingstone, D.J.; Luik, A.I. Neural network studies. 1. Comparison of overfitting and overtraining.

J. Chem. Inf. Comput. Sci. 1995,35, 826–833.

27.

Hansen, L.K.; Salamon, P. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell.

1990

,12,

993–1001.

28.

Perrone, M.P. General averaging results for convex optimization. In Proceedings of the 1993 Connectionist

Models Summer School; Psychology Press: London, UK, 1994; pp. 364–371.

29.

Odom, M.D.; Sharda, R. A neural network model for bankruptcy prediction. In Proceedings of the 1990 IJCNN

International Joint Conference on Neural Networks, San Diego, CA, USA, 17–21 June 1990; pp. 163–168.

30.

Hagan, M.T.; Demuth, H.B.; Beale, M.H. Neural Network Design; Campus Publishing Service, University of

Colorado Bookstore: Boulder, CO, USA, 2014; ISBN 9780971732100.

31.

Chen, S.H.; Jakeman, A.J.; Norton, J.P. Artiﬁcial intelligence techniques: an introduction to their use for

modelling environmental systems. Math. Comput. Simul. 2008,78, 379–400.

32.

Monteiro, C.; Fernandez-Jimenez, L.A.; Ramirez-Rosado, I.J.; Munoz-Jimenez, A.; Lara-Santillan, P.M.

Short-Term Forecasting Models for Photovoltaic Plants: Analytical versus Soft-Computing Techniques.

Math. Probl. Eng. 2013,2013, 767284.

33.

Ulbricht, R.; Fischer, U.; Lehner, W.; Donker, H. First Steps Towards a Systematical Optimized Strategy for

Solar Energy Supply Forecasting. In Proceedings of the European Conference on Machine Learning and

Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2013), Riva del Garda, Italy,

23–27 September 2013.

34. Kleissl, J. Solar Energy Forecasting and Resource Assessment; Academic Press: Cambridge, MA, USA, 2013.

35.

Ogliari, E.; Grimaccia, F.; Leva, S.; Mussetta, M. Hybrid Predictive Models for Accurate Forecasting in PV

Systems. Energies 2013,6, 1918–1929.

36.

Wolfram, M.; Bokhari, H.; Westermann, D. Factor inﬂuence and correlation of short term demand for

control reserve. In Proceedings of the 2015 IEEE Eindhoven PowerTech, Eindhoven, The Netherlands,

29 June–2 July 2015; pp. 1–5.

37.

SolarTechLab Department of Energy. Available online: http://www.solartech.polimi.it/ (accessed on

30 September 2017).

38.

ABB MICRO-0.25-I-OUTD. Availableonline: https://library.e.abb.com/public/0ac164c3b03678c085257cbd0061a446/

MICRO-CDD_BCD.00373_EN.pdf (accessed on 21 January 2018).

39.

Leva, S.; Dolara, A.; Grimaccia, F.; Mussetta, M.; Ogliari, E. Analysis and validation of 24 hours ahead neural

network forecasting of photovoltaic output power. Math. Comput. Simul. 2017,131, 88–100.

c

2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access

article distributed under the terms and conditions of the Creative Commons Attribution

(CC BY) license (http://creativecommons.org/licenses/by/4.0/).