Conference Paper

Photovoltaic Energy Production Forecasting using LightGBM

To read the full-text of this research, you can request a copy directly from the authors.


Precise predictions of solar photovoltaic (PV) energy production have an important role in day-ahead planning of power grid and power plant operators since they help to improve stability and power quality of the electricity distributed. In this work, the machine learning (ML) algorithm, called as the LightGBM, is applied to the challenge of forecasting energy yield of PV power plants. We compare the performance of the LightGBM with different ML and empirical models. The advantages of the LightGBM are highlighted and this model is introduced as a new approach to be used in the forecast of PV energy. We trained and tested the models using 2 years of operational data in 30 min re-sampling resolution. The prediction results on the test set showed nRMSE of 1.56 % for the LightGBM model, and processing time of 0.181 s, presenting much better accuracy than empirical models, comparable accuracy than other ML models, and outperforming in terms of processing time being significantly faster than the rest of models studied.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This algorithm has been tested successfully in the finance industry [18][19][20], the chemistry industry [21,22], and the healthcare sector [23,24]. In the PV industry, the first results were published in [25], highlighting the accuracy and fast speed to estimate the energy output of a PV system. Hereby, we extend and validate the use of several energy yield models for different levels of irradiance data accuracy. ...
Full-text available
In photovoltaic (PV) systems, energy yield is one of the essential pieces of information to the stakeholders (grid operators, maintenance operators, financial units, etc.). The amount of energy produced by a photovoltaic system in a specific time period depends on the weather conditions, including snow and dust, the actual PV modules' and inverters' efficiency and balance-of-system losses. The energy yield can be estimated by using empirical models with accurate input data. However, most of the PV systems do not include on-site high-class measurement devices for irradiance and other weather conditions. For this reason, the use of reanalysis-based or satellite-based data is currently of significant interest in the PV community and combining the data with decomposition and transposition irradiance models, the actual Plane-of-Array operating conditions can be determined. In this paper, we are proposing an efficient and accurate approach for PV output energy modelling by combining a new data filtering procedure and fast machine learning algorithm Light Gradient Boosting Machine (LightGBM). The applicability of the procedure is presented on three levels of irradiance data accuracy (low, medium, and high) depending on the source or modelling used. A new filtering algorithm is proposed to exclude erroneous data due to system failures or unreal weather conditions (i.e., shading, partial snow coverage, reflections, soiling deposition, etc.). The cleaned data is then used to train three empirical models and three machine learning approaches, where we emphasize the advantages of the LightGBM. The experiments are carried out on a 17 kW roof-top PV system installed in Ljubljana, Slovenia, in a temperate climate zone.
Conference Paper
Full-text available
Accurate day-ahead photovoltaic (PV) power output forecasting techniques are important both for grid and plant operators. In this work, a machine learning model was implemented based on gradient boosting machine (GBM), for accurate PV production forecasting. The accuracy of the developed model was experimentally verified on a test system installed in Cyprus. The basic methodology followed was to train and optimize different developed GBM PV production day-ahead forecasting models with acquired data-sets and construct relationships between the input and output features. The final optimal developed GBM model included 7 inputs, 1000 trees with 10 minimum observations on each node and a shrinkage level set to 0.001. The prediction results obtained when the test set was applied to the model, demonstrated that the nRMSE was 0.80 %, while some days were exhibiting accuracies close to 0.50 %. Finally, the forecasting performance assessment results obtained when the test set and numerical weather prediction (NWP) data were applied to the optimal designed model, showed a nRMSE of 7.9 % with 55 % of the test set days exhibiting nRMSE below 5 %. The error relative to the capacity of the system for all points during clear sky conditions was in most cases less than 0.1 W/Wp.
To mitigate the impact of climate change and global warming, the use of renewable energies is increasing day by day significantly. A considerable amount of electricity is generated from renewable energy sources since the last decade. Among the potential renewable energies, photovoltaic (PV) has experienced enormous growth in electricity generation. A large number of PV systems have been installed in on-grid and off-grid systems in the last few years. The number of PV systems will increase rapidly in the future due to the policies of the government and international organizations, and the advantages of PV technology. However, the variability of PV power generation creates different negative impacts on the electric grid system, such as the stability, reliability, and planning of the operation, aside from the economic benefits. Therefore, accurate forecasting of PV power generation is significantly important to stabilize and secure grid operation and promote large-scale PV power integration. A good number of research has been conducted to forecast PV power generation in different perspectives. This paper made a comprehensive and systematic review of the direct forecasting of PV power generation. The importance of the correlation of the input-output data and the preprocessing of model input data are discussed. This review covers the performance analysis of several PV power forecasting models based on different classifications. The critical analysis of recent works, including statistical and machine-learning models based on historical data, is also presented. Moreover, the strengths and weaknesses of the different forecasting models, including hybrid models, and performance matrices in evaluating the forecasting model, are considered in this research. In addition, the potential benefits of model optimization are also discussed.
Variability of solar resource poses difficulties in grid management as solar penetration rates rise continuously. Thus, the task of solar power forecasting becomes crucial to ensure grid stability and to enable an optimal unit commitment and economical dispatch. Several forecast horizons can be identified, spanning from a few seconds to days or weeks ahead, as well as spatial horizons, from single site to regional forecasts. New techniques and approaches arise worldwide each year to improve accuracy of models with the ultimate goal of reducing uncertainty in the predictions. This paper appears with the aim of compiling a large part of the knowledge about solar power forecasting, focusing on the latest advancements and future trends. Firstly, the motivation to achieve an accurate forecast is presented with the analysis of the economic implications it may have. It is followed by a summary of the main techniques used to issue the predictions. Then, the benefits of point/regional forecasts and deterministic/probabilistic forecasts are discussed. It has been observed that most recent papers highlight the importance of probabilistic predictions and they incorporate an economic assessment of the impact of the accuracy of the forecasts on the grid. Later on, a classification of authors according to forecast horizons and origin of inputs is presented, which represents the most up-to-date compilation of solar power forecasting studies. Finally, all the different metrics used by the researchers have been collected and some remarks for enabling a fair comparison among studies have been stated.
We consider the task of forecasting the electricity power generated by a solar PhotoVoltaic (PV) system for forecasting horizons from 5 to 60 min ahead, from previous PV power and meteorological data. We present a new method based on advanced machine learning algorithms for variable selection and prediction. The correlation based variable selection identifies a small set of informative variables that are used as inputs for an ensemble of neural networks and support vector regression algorithms to generate the predictions. We develop two types of models: univariate, that use only previous PV power data, and multivariate, that also use previous weather data, and evaluate their performance on Australian PV data for two years. The results show that the univariate models performed similarly to the multivariate models, achieving mean relative error of 4.15–9.34%. Hence, the PV power output for very short-term forecasting horizons of 5–60 min can be predicted accurately by using only previous PV power data, without weather information. The most accurate model was univariate ensemble of neural networks, predicting the PV power output separately for each step of the forecasting horizon.
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Conference Paper
Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these “Stepped Sigmoid Units ” are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors. 1.
Pv Production Forecasting Model Based on Artificial Neural Networks (Ann)
  • S Theocharides
  • G Makrides
  • V Venizelou
  • P Kaimakis
  • G E Georghiou
S. Theocharides, G. Makrides, V. Venizelou, P. Kaimakis, and G. E. Georghiou, "Pv Production Forecasting Model Based on Artificial Neural Networks (Ann)," 33rd Eur. Photovolt. Sol. Energy Conf., no. October, pp. 1830-1894, 2017.
A Highly Efficient Gradient Boosting Decision Tree
  • G Ke
  • Q Meng
  • T Wang
  • W Chen
  • W Ma
  • T.-Y Liu
G. Ke, Q. Meng, T. Wang, W. Chen, W. Ma, and T.-Y. Liu, "A Highly Efficient Gradient Boosting Decision Tree," Adv. Neural Inf. Process. Syst. 30, no. Nips, pp. 3148-3156, 2017.