Available via license: CC BY-NC-ND 4.0

Content may be subject to copyright.

Forecasting Solar Power Generation on the basis of

Predictive and Corrective Maintenance Activities

Soham Vyas

Department of Computer Science &

Engineering, PDEU, Gandhinagar,

India, Soham.vmtds21@sot.pdpu.ac.in

Sanskar Bhuwania

Department of Computer Science &

Engineering, PDEU, Gandhinagar,

sanskar.bce18@sot.pdpu.ac.in

Brijesh Tripathi

Department of Solar Energy, PDEU,

Gandhinagar, India,

brijesh.tripathi@sot.pdpu.ac.in

Yuvraj Goyal

Department of Computer Science &

Engineering, PDEU, Gandhinagar,

India, Yuvraj.gmtds21@sot.pdpu.ac.in

Hardik Patel

Department of Information and

Communication Technology, PDEU,

Gandhinagar, India,

Hardik.patel@sot.pdpu.ac.in

Neel Bhatt

Department of Computer Science &

Engineering, PDEU, Gandhinagar,

India, Neel.bmtds20@sot.pdpu.ac.in

Shakti Mishra

Department of Computer Science &

Engineering, PDEU, Gandhinagar,

India, Shakti.mishra@sot.pdpu.ac.in

Abstract—Solar energy forecasting has seen tremendous

growth in the last decade using historical time series collected

from a weather station, such as weather variables wind speed

and direction, solar irradiance, and temperature. It helps in the

overall management of solar power plants. However, the solar

power plant regularly requires preventive and corrective

maintenance activities that further impact energy production.

This paper presents a novel work for forecasting solar power

energy production based on maintenance activities, problems

observed at a power plant, and weather data. The results

accomplished on the dataset obtained from the 1MW solar

power plant of PDEU (our university) that has generated data

set with 13 columns as daily entries from 2012 to 2020. There

are 12 structured columns and one unstructured column with

manual text entries about different maintenance activities,

problems observed, and weather conditions daily. The

unstructured column is used to create a new feature column

vector using Hash Map, flag words, and stop words. The final

dataset comprises five important feature vector columns based

on correlation and causality analysis.

Further, the random forest regression is used to compute the

impact of maintenance activities on the total energy output. The

causality and correlation analysis has shown that the five feature

vectors are interdependent time series variables. Next, Vector

Autoregression (VAR) is chosen for simultaneous forecasting of

total power generation for 3, 5, 7, 10, 12, and 30 days ahead using

the VAR model. The results have shown that the root means

square percentage error (RMSPE) in total power generation

forecasting is less than 10% for different days. This research has

proven that the spikes in total power generation forecasting can

be traced and tracked better using daily maintenance activities,

observed problems, and weather conditions.

Keywords—forecasting, vector autoregression, maintenance

activities, solar power generation, weather conditions

I. INTRODUCTION

Solar power generation has the potential to mitigate climate

change by reducing the carbon footprint. It has had better

market penetration in recent years because of awareness

about clean and green energy and its affordable cost. Solar

power plants require various planned and unplanned

maintenance activities for better energy output. These

maintenance activities include PV module cleaning and

maintenance, PV module positioning in the field, inverter

maintenance, etc. Solar energy forecasting is usually done

using past time series data acquired from weather stations

such as wind pressure, humidity, temperature, satellite

imagery, etc. In this research work, total solar power

generation forecasting is proposed by using different

maintenance activities, problems observed, and weather data.

Next, we have carried out a literature survey to understand

the contemporary work done in this area.

Fuzzy logic, AI models, and genetic algorithms are used to

predict and model solar radiation, seizing, performances, and

controls of the solar photovoltaic (PV) systems in [1].

Ensemble of deep ConvNets is proposed for multistep solar

forecasting without additional time series models like RNN

or LSTM and exogenous variables in [2] with 22.5% RMSE.

Mycielski-Markov is utilized to forecast solar power

generation for a short period in [3] with 32.65% RMSE.

Feedforward neural network-based solar irradiance

prediction is followed by LSTM-based solar power

generation prediction for a short period [4] with 98.70

average RMSE. The ensemble approach is proposed based on

long short-term memory (LSTM), gated recurrent unit

(GRU), Autoencoder LSTM (Auto-LSTM), and Auto-GRU

for solar power generation forecasting in [5] without

considering any maintenance activities. Generic fault/status

prediction and specific fault prediction by unsupervised

clustering and neural network by using data of 10MW solar

power plant and one hundred inverters of three different

technology brands [6]. This model can predict generic faults

up to 7 days in advance with 95% sensitivity and specific

defects before some hours to 7 days [6]. Intra hour, short

term, medium term, long term, ramp forecasting, and load

forecasting are proposed for renewable EnergyEnergy like

wind and solar EnergyEnergy [7]. Solar power generation is

reduced by 17.4% per month because of dust on solar

collectors [8]. Day-ahead forecasting of 1MW solar power

plant output is proposed in the American Southwest with

10.3% to 14% RMSE [9]. Solar power generation is forecast

using different neural network models like LSTM, MLP,

LRNN, feedforward, ARMA, ARIMA, SARIMA, and 3640

hours of data for a 20MW power plant [10]. Six-hour-ahead

solar power forecasting is proposed using an autoregressive

forecasting model at residential and medium voltage

substation levels [11]. The autoregressive model of [11]

claims 8% to 10% improvements in results. Two-stage

probabilistic solar power forecasting is proposed in [12], the

first stage is used to predict solar irradiance, and the second

stage is used to predict solar power. The model of [12] results

in minimum loss and the highest daily profit in the energy

market. A robust auto encoder-gated recurrent unit (AE-

GRU) model is used to forecast solar power generation for 24

h, 48 h, and 15 days [13]. Sparsity promoting LASSO-VAR

structures are proposed and fitted with alternating direction

method of multipliers (ADMM), 1hour and 15-minute

resolution for solar power forecasting in [14]. The LASSO-

VAR model of [14] improves 11% in the forecasting. The

probabilistic solar power forecasting is proposed and

compared with the autoregressive method in [15], which

results in RMSE of 8% to 12%. A nonlinear autoregressive

neural network with an exogenous input model is proposed

with Levenberg-Marquardt, Bayesian regularization, scaled

conjugate gradient, and Broyden-Fletcher-Goldfarb-Shanno

(BFGS) algorithms for solar power forecasting over

NIGERIA [16]. The models of [16] result in RMSE values

ranging from 0.162 to 0.544 W/m2. Five-minute-ahead

forecasts are produced and evaluated using point and

probabilistic forecast skill scores and calibration using sparse

vector autoregression for 22 wind farms in Australia [17].

The LASSO vector autoregression model is proposed for very

short-term wind power forecasting [18]. A vector

autoregression weather model is proposed for electricity

supply and demand modeling with six hours ahead

forecasting with less RMSE [19]. Graph-convolutional long

short-term memory (GCLSTM) and the graph-convolutional

transformer (GCTrafo), named two novel graph neural

network models, are proposed for multi-site photovoltaic

power forecasting with 12.6% and 13.6% NRMSE

respectively [20]. The following sections are data set

preprocessing, methodology, results and analysis,

conclusions, and future work.

II. DATA SET PREPROCESSING

Pandit Deendayal Energy University (PDEU),

Gandhinagar, and Gujarat Energy Research and Management

Institute (GERMI) set up a 1 MW Solar Power plant in 2012.

The dataset obtained from this solar power plant has been used

for this work from 2012-to 2020. This dataset has daily entries

of 13 columns from 2012 to 2020. The solar plant consists of

five sets of PV modules. Three out of these five sets are

"poly-crystalline" based, and each has the capacity of

approximately 250KW. The remaining two PV modules are

"thin-film amorphous silicon” and “Concentrate

Photovoltaic” based with capacities of approximately 250

KW and 15 KW, respectively. There are four sets of PV

modules, and each set has approximately a 250KW capacity.

The fifth set of PV modules has approximately a 15KW

capacity. The dataset has five columns for power generation

from five sets of PV modules and the other columns are

“date”, “Total power generation (KWH)”, “aggregate meter

reading (KWH)”, “difference”, “Seeds data (KWH)”,

“insolation”, “PR (%)” and “any issues/problems observed".

As discussed above, there are 13 columns in this data set, and

it is semi-structured because the last column, "any

issues/problems observed," has text data that includes day-

wise manually entered weather information, maintenance

issues, grid failure, module cleaning information, etc. from

2012 to 2020. The first and most important research challenge

is to create the different features from the last column, "any

issues/problems observed ."This research challenge was

addressed by creating a nested hash map with different rules.

The key contains the possible feature label as a text, and the

value is a 2-dimensional array. One array has words

representing the maintenance issues, a problem observed, or

weather conditions. The second array has to stop words that

prevent overlapping and duplication of the maintenance

issues or problems observed or weather conditions. Each key

is the new feature (maintenance or problem observed or

weather condition) column, and the value is tokenized as one

of the new features is present on a particular day. New feature

vectors are created with labels from the column "any

issues/problems observed ."Now, each new feature column

vector label value one is replaced by its percentage of the

occurrence. New feature vectors created are "Grid Failure”,

“Inverter Failure”, “Module Cleaning”, “Rainy Day”, “No

Module Cleaning”, “Transformer Replacement and

Maintenance”, “Cable and Fuse Maintenance”, “Plant

Shutdown,", "Internet”, “Battery”, “Cloudy day”, “Module

Cleaning by Rain” by using the above approach. There are

only five columns, “Total generation (KWH)”, “Grid

Failure”, “Inverter Failure”, “Module Cleaning", and

"Cloudy," in the final dataset based on the correlation and

causality analysis. Vector autoregression (VAR) model is

selected for simultaneous forecasting of total power

generation and new features because they are inter-dependant

time series data.

III. METHODOLOGY

In this paper, solar power generation is forecasted using

maintenance activities. It is novel work, and there is not much

research done on this topic. The power generation prediction

is formulated as a regression problem to understand the usage

of maintenance issues. The labels of processed datasets have

been used to feed the regression model, and the future

maintenance variable has been considered test data. Random

Forest Regression is applied to this data set, and it has been

observed that the maintenance issues can be used as variables

to forecast the power generation.

Fig. 1 Regression Model using Random Forest Regressor

Vector Autoregression

VAR models are used for multivariate time series. The VAR

models consider each variable as a linear function of past lags

of itself and past lags of the other variables. Five variables

<Total Generation (KWH), Grid failure, Inverter Failure,

Module Cleaning, and Cloudy > have been considered and

modeled as a system of equations with one equation per

variable in time series. Let us consider if we have two

variables (Time series), Y1 and Y2, and we need to forecast

the values of these variables at a time (t). To calculate Y1 (t),

VAR will use the past values of both Y1 and Y2. Likewise, to

compute Y2 (t), the past values of both Y1 and Y2 are used. For

example, the system of equations for a VAR model with two-

time series (variables `Y1` and `Y2`) is as follows:

1, 1 11,1 1, 1 12,1 2, 1 1,

2, 2 21,1 1, 1 22,1 2, 1 2,

t t t t

t t t t

Y Y Y

Y Y Y

(1)

The vector autoregressive model of order one is denoted as

VAR (1). Similarly, in a VAR (2) model, the lag two values

for all variables are added to the right sides of the equations.

In the case of five Y-variables (or time series), there would

be ten predictors on the right side of each equation, five lag

one term and five lag two terms. For a VAR (p) model, the

first p lags of each variable in the system would be used as

regression predictors for each variable. As per equation (1),

the data follows stationarity and the causality test. In a

causality test, the data follows the interconnected time series

dependencies. Akaike information criterion (AIC) is the

model Mk with dimension k is defined as where L (Mk) is the

likelihood corresponding to the model Mk. The first term in

AIC is twice the negative log-likelihood, which turns out to

be the residual sum of squares corresponding to the model Mk

for the linear regression model with a Gaussian likelihood

[21]. AIC has been computed using data before forecasting,

and optimal AIC was derived for the specific lag days to fit

the VAR. After checking the model with acquired lag day,

the coefficient matrix is computed for each equation. Here

five variables are used for the endogenous attribute.

Forecasting results with multiple periods of days will give us

an understanding of how power generation varies and the

probability of the various spikes. Our equation has

coefficients for years and tries to forecast with the help of lag

days. It will become our return value, and the result will be

separated into multiple scenarios.

IV. RESULTS AND ANALYSIS

The VAR model is given to understand the effect of different

days on total power generation forecasting results for

different days.

Fig.2 Three days ahead of total power generation forecasting by using the

VAR model

As shown in fig.2, the RMSPE, RMSE, and MAE are 3.38%,

130.414, and 118.249, respectively, for three days ahead of

total power generation forecasting.

Fig.3 Five days ahead total power generation forecasting by using the VAR

model

As shown in fig.3, the RMSPE, RMSE, and MAE are 5.52%,

206.331, and 183.795, respectively, for five days ahead of

total power generation forecasting.

Fig.4 Seven days ahead total power generation forecasting by using the VAR

model

As shown in fig.4, the RMSPE, RMSE, and MAE are 6.27%,

253.774, and 207.842, respectively, for seven days ahead of

total power generation forecasting.

Fig.5 Ten days ahead total power generation forecasting by using the VAR

model

As shown in fig.5, the RMSPE, RMSE, and MAE are 5.91%,

235.562, and 187.288, respectively, for ten days ahead of

total power generation forecasting.

Fig.6 Twelve days ahead total power generation forecasting by using the

VAR model

As shown in fig.6, the RMSPE, RMSE, and MAE are 5.49%,

218.91, and 169.183, respectively, for 12 days ahead of total

power generation forecasting.

Fig.7 Thirty days ahead total power generation forecasting by using the VAR

model

As shown in fig.7, the RMSPE, RMSE, and MAE are 9.59%,

394.128, and 309.921, respectively, for 30 days ahead of total

power generation forecasting.

TABLE I. RMSPE, RMSE, AND MAE IN TOTAL POWER GENERATION

FORECASTING FOR DIFFERENT DAYS.

Days

RMSPE

RMSE

MAE

3

3.38

130.414

118.249

5

5.52

206.331

183.795

7

6.27

253.744

207.842

10

5.91

235.562

187.288

12

5.49

218.91

169.183

30

9.59

394.128

309.921

As shown by figures 2 to 7, the VAR model can predict

almost all power generation spikes, which is the most crucial

point of this research. The total power generation spikes are

due to different maintenance activities, problems, and

weather conditions. The real power generation forecasting

error is lowest three days ahead of forecasting. Table I shows

that the error in total power generation forecasting is less than

10% for different days. VAR model can forecast all the new

features “Grid Failure”, “Inverter Failure”, “Module

Cleaning”, “Rainy Day”, “No Module Cleaning”,

“Transformer Replacement," and “Maintenance”, “Cable and

Fuse Maintenance”, “Plant Shutdown”, “Internet”, “Battery”,

“Cloudy day”, “Module Cleaning by Rain” and “total power

generation” because all these are interdependent time series.

V. CONCLUSIONS

The research work in the paper presents the forecasting of

total power generation based on various maintenance

activities carried out in solar power plants. Scheduled

maintenance activities in the power plant impact energy

production. This work involves transforming the unstructured

dataset into structured form with twelve new feature vectors

using HashMap, flag words, and stop words. Further,

Random Forest Regressor is used to analyze the impact of

maintenance activities on forecasting total power generation.

The same outcome has shown that the total power generation

prediction is perfect because of the maintenance activities.

The maintenance activities are not available for forecasting,

so maintenance activities should be predicted before the total

power generation forecasting. Vector Auto Regression-based

model is used for forecasting multivariate time-series

considering five variables “Total Power Generation (KWH)”,

“Grid Failure”, “Inverter Failure”, “Module Cleaning”, and

“Cloudy”. VAR can forecast total power generation along

with forecasting four maintenance activities. The three days

ahead total power generation forecasting has the lowest error

compared to other results. Total power generation forecasting

is implemented in two stages in the literature review. The first

stage predicts solar irradiance or maintenance activities,

problems, and weather conditions, and the second stage is

total power generation forecasting. In this research work,

forecasting of solar power generation and maintenance

activities, problems, and weather conditions are all done

simultaneously.

VI. FUTURE WORK

In the future, this work shall be extended by comparing the

total power generation forecasting using different models and

the inclusion of two essential feature vectors, “solar

irradiance” and “insolation," in the current VAR model for

solar power generation forecasting. It is also planned to

forecast the solar power generation for an individual set of

PV modules to determine the impact of different PV modules

on the forecasting. The evaluation shall be based on the

overall effect of varying PV modules, "solar irradiance",

"insolation," and daily maintenance activities, problems

observed, and weather conditions on the total power

generation forecasting.

REFERENCES

[1] Belu, R., 2014. Artificial intelligence techniques for solar

EnergyEnergy and photovoltaic applications. Robotics: Concepts,

methodologies, tools, and applications (pp. 1662-1720). IGI Global J.

Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol.

2. Oxford: Clarendon, 1892, pp.68–73.

[2] Wen, H., Du, Y., Chen, X., Lim, E., Wen, H., Jiang, L. and Xiang, W.,

2020. Deep learning based multistep solar forecasting for PV ramp-rate

control using sky images. IEEE Transactions on Industrial Informatics,

17(2), pp.1397-1406.

[3] Serttas, F., Hocaoglu, F.O. and Akarslan, E., 2018, July. Short term

solar power generation forecasting: A novel approach. In 2018

International Conference on Photovoltaic Science and Technologies

(PVCon) (pp. 1-4). IEEE.

[4] Chung, P.L., Wang, J.C., Chou, C.Y., Lin, M.J., Liang, W.C., Wu, L.C.

and Jiang, J.A., 2020, March. An intelligent control strategy for energy

storage systems in solar power generation based on long-short-term

power prediction. In 2020 8th International Electrical Engineering

Congress (iEECON) (pp. 1-4). IEEE.

[5] AlKandari, M. and Ahmad, I., 2020. Solar power generation

forecasting using ensemble approach based on deep learning and

statistical methods. Applied Computing and Informatics.

[6] Betti, A., Trovato, M.L.L., Leonardi, F.S., Leotta, G., Ruffini, F. and

Lanzetta, C., 2019. Predictive maintenance in photovoltaic plants with

a big data approach. arXiv preprint arXiv:1901.10855.

[7] Chernyakhovskiy I. Forecasting Wind and Solar Generation:

Improving System Operations, Greening the Grid. National Renewable

Energy Lab. (NREL), Golden, CO (United States); 2016 Jan 1.

[8] Elminir, H.K., Ghitas, A.E., Hamid, R.H., El-Hussainy, F., Beheary,

M.M. and Abdel-Moneim, K.M., 2006. Effect of dust on the

transparent cover of solar collectors. Energy conversion and

management, 47(18-19), pp.3192-3203.

[9] Larson, D.P., Nonnenmacher, L. and Coimbra, C.F., 2016. Day-ahead

forecasting of solar power output from photovoltaic plants in the

American Southwest. Renewable EnergyEnergy, 91, pp.11-20.

[10] Sharadga, H., Hajimirza, S. and Balog, R.S., 2020. Time series

forecasting of solar power generation for large-scale photovoltaic

plants. Renewable EnergyEnergy, 150, pp.797-807.

[11] Bessa, R.J., Trindade, A. and Miranda, V., 2014. Spatial-temporal solar

power forecasting for smart grids. IEEE Transactions on Industrial

Informatics, 11(1), pp.232-241.

[12] Kim, H. and Lee, D., 2021. Probabilistic Solar Power Forecasting

Based on Bivariate Conditional Solar Irradiation Distributions. IEEE

Transactions on Sustainable Energy, 12(4), pp.2031-2041.

[13] Rai, A., Shrivastava, A. and Jana, K.C., 2021. A Robust Auto Encoder-

Gated Recurrent Unit (AE-GRU) Based Deep Learning Approach for

Short Term Solar Power Forecasting. Optik, p.168515.

[14] Cavalcante, L. and Bessa, R.J., 2017, June. Solar power forecasting

with sparse vector autoregression structures. In 2017 IEEE Manchester

PowerTech (pp. 1-6). IEEE.

[15] Bessa, R.J., Trindade, A., Silva, C.S. and Miranda, V., 2015.

Probabilistic solar power forecasting in smart grids using distributed

information. International Journal of Electrical Power & Energy

Systems, 72, pp.16-23.

[16] Oluwafemi, O., Olusola, O.S., Israel, E. and Babatunde, A., 2022.

Autoregressive neural network models for solar power forecasting over

nigeria. Journal of Solar Energy Research, 7(1), pp.983-996.

[17] Dowell, J. and Pinson, P., 2015. Very-short-term probabilistic wind

power forecasts by sparse vector autoregression. IEEE Transactions on

Smart Grid, 7(2), pp.763-770.

[18] Cavalcante, L., Bessa, R.J., Reis, M. and Browell, J., 2017. LASSO

vector autoregression structures for very short‐term wind power

forecasting. Wind Energy, 20(4), pp.657-675.

[19] Liu, Y., Roberts, M.C. and Sioshansi, R., 2018. A vector autoregression

weather model for electricity supply and demand modeling. Journal of

Modern Power Systems and Clean Energy, 6(4), pp.763-776.

[20] Simeunovic, J., Schubnel, B., Alet, P.J. and Carrillo, R.E., 2021.

Spatio-temporal graph neural networks for multi-site PV power

forecasting. IEEE Transactions on Sustainable Energy.

[21] Korner-Nievergelt, F. et. al., “Data Analysis in Ecology Using Linear

Models with R,” BUGS, and STAN, Academic Press,2015, Pages 175-

196,