PreprintPDF Available

Hybrid ensemble neural network approach for photovoltaic production forecast

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

With the main goal of saving the environment and reducing the amount of burnt fossil fuels, the penetration of renewable energy sources as a share of the electrical energy production is constantly increasing. Nevertheless, this growth significantly jeopardizes electrical grid stability, since renewable sources highly depend on the meteorological conditions, which are stochastic by their nature. Therefore, careful planning of energy use is necessary, which is why a photovoltaic production forecaster model has been presented within this paper. The main focus was presenting a hybrid ensemble neural network approach which combines ensembling method with complex LSTM + CNN networks with the aim of improving forecasting performance. The approach has been tested using real-world year-long data from the town of Adeje in Tenerife and the results show an improvement of 150 W in terms of root mean square forecast error in comparison with the conventional ensemble model and 60 W with the hybrid approach on the test data.
Content may be subject to copyright.
Hybrid ensemble neural network approach for
photovoltaic production forecast
1st Dea Puji´
c
Institute Mihajlo Pupin
University of Belgrade
Belgrade, Serbia
dea.pujic@pupin.rs
2rd Nikola Tomaˇ
sevi´
c
Institute Mihajlo Pupin
University of Belgrade
Belgrade, Serbia
nikola.tomasevic@pupin.rs
Abstract—With the main goal of saving the environment and
reducing the amount of burnt fossil fuels, the penetration of
renewable energy sources as a share of the electrical energy
production is constantly increasing. Nevertheless, this growth
significantly jeopardizes electrical grid stability, since renewable
sources highly depend on the meteorological conditions, which
are stochastic by their nature. Therefore, careful planning of
energy use is necessary, which is why a photovoltaic production
forecaster model has been presented within this paper. The main
focus was presenting a hybrid ensemble neural network approach
which combines ensembling method with complex LSTM + CNN
networks with the aim of improving forecasting performance.
The approach has been tested using real-world year-long data
from the town of Adeje in Tenerife and the results show an
improvement of 150 W in terms of root mean square forecast
error in comparison with the conventional ensemble model and
60 W with the hybrid approach on the test data.
Index Terms—PV production forecaster, hybrid approach,
ensemble model, LSTM
I. INTRODUCTION
In order to reduce environmental pollution caused by burn-
ing fossil fuels, the ratio renewable energy sources (RES) as
a form of energy production has been steadily increasing.
Nevertheless, since RES are highly correlated with the me-
teorological conditions, they are stochastic by nature, which
influence grid stability as any mismatch between the energy
demand and production leads to instability. Therefore, in order
to provide adequate inputs for energy planning and dispatch
optimization, RES production forecasting models are required.
Hence, the focus of this paper is put on photovoltaic (PV)
production forecasting models.
In literature, various approaches are present depending on
the type of utilized data. In the particular case which will be
presented as a part of this paper, PV production modeling
dependant on meteorological data, such as solar radiation,
cloud coverage, etc. will be explored, rather than, for example,
satellite images [1]. Apart from the approach classification
depending on the input data, they also differ depending on the
estimation horizon. Namely, depending on the time interval
The research presented in this paper is partly financed by the European
Union (Horizon 2020 REACT project GA #824395 and SINERGY project GA
#952140) and partly by the Ministry of Education, Science and Technological
Development and the Science Fund of the Republic of Serbia (AI-ARTEMIS
project, #6527051).
for which the forecast is given, influenced by the particular
application of the models, methodologies differ significantly.
When short-term forecasts are considered, covering models
for estimating production of up to a couple of minutes ahead,
most commonly utilized approaches are the probabilistic ones
ARMA, ARIMA, ARMAX etc. [2]. Nevertheless, this pa-
per will be focused on short to mid-term forecast, since it
will present day-ahead estimations. Utilization of the neural
networks in this regard is very common and has shown
excellent performance. A wide range of architectures has been
exploited for this purpose. In [3], the authors have utilized
artificial neural networks (ANN) for PV production estimation
in Morocco. Similarly to the approach that will be presented
in this study, year-long historical meteorological data such as
temperature, solar radiation and humidity have been used for
training purposes. Furthermore, [4] utilizes ANNs through the
ensembling approach. In other words, forecast is given as the
average value of the estimation of a couple of simple ANNs.
Since ensembling approaches are well known in machine
learning (ML) literature for improving estimation precision
and reducing its variance, it was decided to follow this
approach in the work presented in this paper. However, the
main difference between the suggested approach and the one
that will be presented herein is the type and architecture of
the neural network, with more details regarding this topic
given in section III. With time, improvements of available
NN architectures came and more complex structures were
proven to be more performant. In [5], the performances of
the recurrent NN have been presented for production forecast
depending on the meteorological parameters, whilst long short-
term memory (LSTM) were utilized in [6]. Nevertheless, the
most performant were hybrid architectures which utilize bene-
fits of different approaches. One such approach was presented
in [7], where a combination of convolutional (CNN) and
LSTM network was exploited and has served as the motivator
behind the work presented in this paper.
Taking all previous into consideration, it could be concluded
that various approaches for PV production forecast have
already been explored. Since they were proven to be highly
performant, in this paper, NNs were chosen as an appropriate
approach. In order to improve the current state of the art,
hybrid ensemble neural network approach has been chosen.
Namely, the estimation of the proposed solution was given as
the average value of hybrid CNN + LSTM estimations, thus
combining ensembling and hybrid model approaches in order
to achieve better performances.
II. DATA EXPL OR ATIO N
With the focus of developing and testing the proposed PV
production forecaster solution, historical data was required and
the corresponding analysis will be given in this section. Taking
into consideration high correlation between the PV production
and meteorological conditions, this data has been obtained.
A year-long hourly production data has been gathered from
the test site located in Adeje town on Tenerife island for the
PV plant with the capacity of 40 kWp. It should be pointed
out that historical production data was not highly precise,
since it had only a 1 kW resolution. Since this plant did not
have accompanying meteorological station from which data
could be obtained, historical meteorological data has been
gathered from the Solcast tool1for the same period of time
and with the same temporal resolution as the production. This
dataset contained the following meteorological parameters: air
temperature, azimuth angle, cloud opacity, dew point, diffuse
horizontal irradiation (DHI), direct normal irradiation (DNI),
global horizontal irradiation (GHI), perceptible water, relative
humidity, snow depth, surface pressure, wind direction and
speed and zenith angle. As an example of the utilized data,
Figure 1 shows the dependency between production and GHI.
As it can be observed, the correlation between these two
variables is high, however the dependency is not linear and
is influenced by other factors such as seasonality, as well.
All the examples from the presented data sets in which
solar radiation was none were removed since production
corresponding to them was zero, and no model is needed for
that estimation. Furthermore, data preprocessing and cleaning
has been carried out. Data cleaning covered the exclusion of
invalid data such as examples with the negative production,
irradiation, etc. and ones in which some of either production
or meteorological parameters were missing. Outliers were
detected based on the joint data distribution. Namely, since the
resolution of the production was 1 kW, for each production
integer, distribution of the corresponding global horizontal
irradiation was analyzed and the outliers were removed. This
procedure has been carried out separately for each month,
since the distribution differs depending on the period of the
year.
After data has been cleaned, it was separated into three sets
the training, validation and testing one. Taking into con-
sideration how seasonality could influence model predictions,
it was of the utmost importance to have data representation
for all seasons in the training set, so that trained model is
adjusted to be precise during the whole year. Similarly, in
order to be able to estimate the performance of the presented
model, it was inevitable to have season representation in the
testing set. Therefore, one month of each season was separated
1https://solcast.com/
Fig. 1. Correlation between production and GHI
for the testing and validation purposes, precisely February,
May, August and November, where odd days were selected
for testing and even ones for the validation set. Finally, data
has been normalized, so that training could be carried out.
III. METHODOLOGY
The approach that is presented in this paper was envi-
sioned to improve currently present forecasting methodologies.
Therefore, it was inspired by two already existing methodolo-
gies. The first one was ensembling approach which exploits
estimation of various neural networks. Namely, ensembling
approaches are commonly used in machine learning and
their main idea is forming a group of weak learners whose
combination improves the final estimation performance. Each
weak learner is supposed to be trained using a part of the
training data, so that the ensemble results in having different
models. In a particular case presented in [4], a weak learner
was a simple ANN with eight dense layers with 128 neurons
each, a relu activation function and the output as a dense layer
with one neuron and linear activation. Therefore, the proposed
approach left space for improvement with a more complex
and powerful neural network architecture. The other approach
which inspired the work in this paper was a hybrid neural
network proposed in [7] whose slightly improved architecture
is presented in Table I. As it can be observed, the architecture
is much more complex since it contains LSTM, convolutional,
TABLE I
HYBRID NEURAL NET WOR K ARCHITECTURE
layer num. of
neurons / filters
filter
size
activation
function factor
LSTM 64 - tansig -
LSTM 128 - tansig -
CNN 64 3 linear -
Max Pooling - 2 -
CNN 128 3 relu -
Max Pooling - 2 - -
Dropout - - - 0.1
Dense - - 2048 relu
Dense - - 1024 relu
Dense - - 1 linear
max pooling, dropout and dense layers. In this way, it is
excepted that more complex relations could be learnt.
The approach that was utilized within this paper was a
combination of ensembling and a hybrid approach. In other
word, ensembling model has been designed to use a hybrid
NN from Table I as a weak learner.
IV. RESULTS AND DISCUSSION
A. Reference approach definition
In order to be able to adequately present and analyze the
quality of the proposed methods, three approaches have been
defined. The first two ones were the motivations for the
methodology presented in this paper and will be used for
benchmarking performance between the existing and newly
proposed methodology. Hence, the approaches are as follows:
1) Ensembling method with the simple ANN;
2) Hybrid NN with the architecture from Table I;
3) Ensembling hybrid model (combination of the first two)
with the aim of improving estimation performance.
B. Results
Following the approaches defined previously, the first
trained model was based on the ensemble approach with ANN
as a weak learner. Therefore, 200 weak learners have been
trained using a random (overlapping) portion of 80% of the
training set for each of them. Performance (root mean square
error RMSE) of the ensemble model on the validation data
for different number of weak learners is given in Figure 2.
It could be said that the performance metric did not changed
significantly with the number of weak learners from e.g. 80
of them, so the selected maximum number of weak learners
can be considered as well selected. The optimal number of
weak learners was estimated to be 150, where minimal RMSE
was achieved on the validation data. RMSE of that model
was 3.11 kW,3.82 kW and 3.58 kW for training, validation
and testing sets, respectively. Apart from the numerical perfor-
mance evaluation, the estimation on the testing data is depicted
in Figures 3 and 4. It could be observed that the lowest
performance is achieved in periods when production is low
or or in peak.
The hybrid LSTM + CNN network was trained using the
same training data resulting in the performance of RMSE of
3.58 kW,3.76 kW and 3.49 kW for training, validation and
testing data sets, respectively. Additionally, an example of
this model’s performance is given in Figure 5. From both
RMSE and the figure provided, it could be concluded that
the hybrid network improves the problematic estimations,
especially when production is low.
Finally, the proposed ensemble hybrid model was trained
with hybrid NN as the weak learners. As illustrated in Figure
6, 100 weak learners have been trained, whilst minimal RMSE
on the validation set was obtained for 20 of them. Since
these models are more complex, it is expected that less of
them would be needed for the optimal behavior. RMSE of
such model was 3.4 kW,3.7 kW and 3.43 kW for training,
validation and testing data respectively, and a performance
Fig. 2. RMSE on validation data depending on the number of weak learners
for ensemble model
Fig. 3. Ensemble model performance example on testing data
Fig. 4. Ensemble model performance example on testing data enlarged
example is given in Figure 7 and 8. From them it could be seen
that estimation are mostly highly precise, even though errors in
estimations happen from time to time. Hence, the deficiencies
that could be observed for the previously depicted methods
have been improved, as estimations are solid in all production
value ranges.
Fig. 5. Hybrid model performance example on testing data
TABLE II
SUMMARY OF RMSE PERFORMANCES
use case train validation test num. of nets
1. 3.11 kW 3.82 kW 3.58 kW 150
2. 3.58 kW 3.76 kW 3.49 kW -
3. 3.40 kW 3.70 kW 3.43 kW 20
V. CONCLUSION
Within this paper hybrid ensemble neural network for PV
forecasting approach has been presented. The main goal of
the proposed methodology was to improve the forecasting
precision of the existing methods. Therefore, state-of-the-
art models were benchmarked using real world production
and meteorological data against the proposed approach. As
summed up in Table II, the lowes RMSE on the unseen testing
data could be observed for ensemble hybrid model. Even
though the performance on the training data was maximal
for ensemble ANN model, it generalization was lower, since
difference in performance in training and testing data is sig-
nificant. Finally, the methodology presented within this paper
improved forecasting performance by 150 W and 60 W from
the compared existing once, proving that it is more appropriate
for the considered problem.
REFERENCES
[1] G. Cervone, L. Clemente-Harding, S. Alessandrini, and L. Delle
Monache, “Short-term photovoltaic power forecasting using Artificial
Neural Networks and an Analog Ensemble,” Renewable Energy, vol.
108, pp. 274–286, 2017, doi: 10.1016/j.renene.2017.02.052.
[2] J. Antonanzas, N. Osorio, R. Escobar, R. Urraca, F. J. Martinez-
de-Pison, and F. Antonanzas-Torres, “Review of photovoltaic power
forecasting,” Solar Energy, vol. 136, pp. 78–111, Oct. 2016, doi:
10.1016/j.solener.2016.06.069.
[3] A. Elamim, B. Hartiti, A. Haibaoui, A. Lfakir, and P. Thevenin,
“Generation of Photovoltaic Output Power Forecast Using Artificial
Neural Networks,” in Advanced Intelligent Systems for Sustainable De-
velopment (AI2SD’2019), Cham, 2020, pp. 127–134. doi: 10.1007/978-
3-030-36475-5 12.
[4] S. Theocharides, G. Makrides, V. Venizelou, P. Kaimakis, A. Kypri-
anou, and G. Georghiou, PV PRODUCTION FORECASTING MODEL
BASED ON ARTIFICIAL NEURAL NETWORKS (ANN). 2017.
[5] F. Rodr´
ıguez, A. Fleetwood, A. Galarza, and L. Font´
an, “Predicting
solar energy generation through artificial neural networks
using weather forecasts for microgrid control - ScienceDirect,”
Reneable Energy, vol. 126, pp. 855–864, Oct. 2018, doi:
https://doi.org/10.1016/j.renene.2018.03.070.
Fig. 6. RMSE on validation data depending on the number of weak learners
for ensemble hybrid ensemble model
Fig. 7. Hybrid ensemble model performance example on testing data
Fig. 8. Hybrid ensemble model performance example on validation data
[6] M. Abdel-Nasser and K. Mahmoud, “Accurate photovoltaic power
forecasting models using deep LSTM-RNN,” Neural Comput & Applic,
vol. 31, no. 7, pp. 2727–2740, Jul. 2019, doi: 10.1007/s00521-017-3225-
z.
[7] K. Wang, X. Qi, and H. Liu, “Photovoltaic power forecasting
based LSTM-Convolutional Network, Energy, vol. 189,
no. C, 2019, Accessed: Oct. 06, 2021. [Online]. Available:
https://ideas.repec.org/a/eee/energy/v189y2019ics0360544219319206.html
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Abstract— An artificial neural network (ANN) model is used for forecasting the power provided by photovoltaic solar panels using feed forward neural network (FFNN) of a photovoltaic installation located in the city of Mohammedia (Morocco). One year of hourly data on solar irradiance, ambient temperature and output PV power were available for this study. For this, different combinations of inputs with different numbers of hidden neurons were considered. To evaluate this model several statistic parameters were used such as the coefficient of correlation (R), the Root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE). The results of this model tested on unknown data showed that the model works well, with correlation coefficients lying between 96% and 99% for sunny days and between 90% and 95% for cloudy days. Keywords: photovoltaic installation, feed forward neural network, artificial neural networks
Article
This paper proposes an artificial neural network (ANN) to predict the solar energy generation produced by photovoltaic generators. The intermittent nature of solar power creates two main issues. Firstly, power production and demand have to be balanced to ensure the control of the whole system, and the inherent variability of clean energies makes this difficult. Secondly, energy generation companies need a highly accurate day-ahead or intra-day estimation of the energy to be sold in the electricity pool. For the tool developed in this paper, we address the issue of the complexity of control in systems that are based on solar energies. The tool's ability to predict the parameters that are involved in solar energy production will allow us to estimate the future power production in order to optimise grid control. Our tool uses an ANN which we developed using MATLAB® software. The results were validated by analysing the root mean square error of the prediction for days outside the database used for training the ANN. The difference between the actually produced and predicted energy is about 0.5–9%, meaning that the accuracy of our tool is sufficient enough to be installed in systems which have integrated solar generators.
Article
A methodology based on Artificial Neural Networks (ANN) and an Analog Ensemble (AnEn) is presented to generate 72-hour deterministic and probabilistic forecasts of power generated by photovoltaic (PV) power plants using input from a numerical weather prediction model and computed astronomical variables. ANN and AnEn are used individually and in combination to generate forecasts for three solar power plants located in Italy. The computational scalability of the proposed solution is tested using synthetic data simulating 4450 PV power stations. The NCAR Yellowstone supercomputer is employed to test the parallel implementation of the proposed solution, ranging from 1 node (32 cores) to 4450 nodes (141,140 cores). Results show that a combined AnEn + ANN solution yields best results, and that the proposed solution is well suited for massive scale computation.
Article
Variability of solar resource poses difficulties in grid management as solar penetration rates rise continuously. Thus, the task of solar power forecasting becomes crucial to ensure grid stability and to enable an optimal unit commitment and economical dispatch. Several forecast horizons can be identified, spanning from a few seconds to days or weeks ahead, as well as spatial horizons, from single site to regional forecasts. New techniques and approaches arise worldwide each year to improve accuracy of models with the ultimate goal of reducing uncertainty in the predictions. This paper appears with the aim of compiling a large part of the knowledge about solar power forecasting, focusing on the latest advancements and future trends. Firstly, the motivation to achieve an accurate forecast is presented with the analysis of the economic implications it may have. It is followed by a summary of the main techniques used to issue the predictions. Then, the benefits of point/regional forecasts and deterministic/probabilistic forecasts are discussed. It has been observed that most recent papers highlight the importance of probabilistic predictions and they incorporate an economic assessment of the impact of the accuracy of the forecasts on the grid. Later on, a classification of authors according to forecast horizons and origin of inputs is presented, which represents the most up-to-date compilation of solar power forecasting studies. Finally, all the different metrics used by the researchers have been collected and some remarks for enabling a fair comparison among studies have been stated.