ArticlePDF Available

Attention-Based Models for Multivariate Time Series Forecasting: Multi-step Solar Irradiation Prediction

Authors:

Abstract and Figures

Bangladesh's subtropical climate with an abundance of sunlight throughout the greater portion of the year results in increased effectiveness of solar panels. Solar irradiance forecasting is an essential aspect of grid-connected photovoltaic systems to efficiently manage solar power's variation and uncertainty and to assist in balancing power supply and demand. This is why it is essential to forecast solar irradiation accurately. Many meteorological factors influence solar irradiation, which has a high degree of fluctuation and uncertainty. Predicting solar irradiance multiple steps ahead makes it difficult for forecasting models to capture long-term sequential relationships. Attention-based models are widely used in the field of Natural Language Processing for their ability to learn long-term dependencies within sequential data. In this paper, our aim is to present an attention-based model framework for multivariate time series forecasting. Using data from two different locations in Bangladesh with a resolution of 30 min, the Attention-based encoder-decoder, Transformer, and Temporal Fusion Transformer (TFT) models are trained and tested to predict over 24 steps ahead and compared with other forecasting models. According to our findings, adding the attention mechanism significantly increased prediction accuracy and TFT has shown to be more precise than the rest of the algorithms in terms of accuracy and robustness. The obtained mean square error (MSE), the mean absolute error (MAE), and the coefficient of determination (R²) values for TFT are 0.151, 0.212, and 0.815, respectively. In comparison to the benchmark and sequential models (including the Naive, MLP, and Encoder-Decoder models), TFT has a reduction in the MSE and MAE of 8.4–47.9% and 6.1–22.3%, respectively, while R² is raised by 2.13–26.16%. The ability to incorporate long-distance dependency increases the predictive power of attention models.
Content may be subject to copyright.
Attention-Based Models for Multivariate Time Series Forecasting: Multi-
step Solar Irradiation Prediction
Sadman Sakib,1 Mahin K. Mahadi,1 Samiur R. Abir,1 Al-Muzadded Moon,1 Ahmad Shafiullah,1
Sanjida Ali,1 Fahim Faisal,1 and Mirza M. Nishat1
1 Department of Electrical and Electronic Engineering, Islamic University of Technology,
Gazipur 1704, Bangladesh
Correspondence should be addressed to Ahmad Shafiullah; ahmadshafiullah@iut-dhaka.edu
Abstract:
Bangladesh’s subtropical climate with an abundance of sunlight throughout the greater portion of the year results in
increased effectiveness of solar panels. Solar irradiance forecasting is an essential aspect of grid-connected
photovoltaic systems to efficiently manage solar power’s variation and uncertainty and to assist in balancing power
supply and demand. This is why it is essential to forecast solar irradiation accurately. Many meteorological factors
influence solar irradiation, which has a high degree of fluctuation and uncertainty. Predicting solar irradiance multiple
steps ahead makes it difficult for forecasting models to capture long-term sequential relationships. Attention-based
models are widely used in the field of Natural Language Processing for their ability to learn long-term dependencies
within sequential data. In this paper, our aim is to present an attention-based model framework for multivariate time
series forecasting. Using data from two different locations in Bangladesh with a resolution of 30 minutes, the
Attention-based encoder-decoder, Transformer, and Temporal Fusion Transformer (TFT) models are trained
and tested to predict over 24 steps ahead and compared with other forecasting models. According to our findings,
adding the attention mechanism significantly increased prediction accuracy and TFT has shown to be more precise
than the rest of the algorithms in terms of accuracy and robustness. The obtained mean square error (MSE), the mean
absolute error (MAE), and the coefficient of determination (R2) values for TFT are 0.151, 0.212, and 0.815,
respectively. In comparison to the benchmark and sequential models (including the Naive, MLP, and Encoder-
Decoder models), TFT has a reduction in the MSE and MAE of 8.4-47.9% and 6.1-22.3%, respectively, while R2 is
raised by 2.13-26.16%. The ability to incorporate long-distance dependency increases the predictive power of attention
models.
Keywords: Solar Irradiance, Multivariate time series forecasting, Sequence Models, Attention-Based Models,
Transformer, Temporal Fusion Transformer
1. Introduction:
The combustion of fossil fuels for conventional electrical systems releases greenhouse gases that significantly
contribute to global warming. Extensive efforts have been made to understand and promote renewable energy to
reduce reliance on nonrenewable sources [1], [2]. The photovoltaic system has emerged as a viable alternative to
conventional electricity, offering green energy and a reduced carbon footprint [3]. As awareness grows regarding the
financial and ecological benefits of transitioning to renewable energy sources, there has been a notable increase in the
adoption of photovoltaic systems in households and small businesses [4]. Integrated photovoltaic systems mainly
consist of distributed systems, such as small domestic setups, and their primary function is to convert solar energy
into electrical power. Renewable sources, including solar radiation, are less harmful to the environment and are
recognized as one of the most promising future energy sources [5], [6]. However, the intermittent power supply of
solar systems can pose challenges to their integration. Various factors, particularly solar radiation, contribute to the
variability in energy output [7]. Environmental conditions, such as cloudiness, visibility, etc. directly impact solar
Journal Pre-proof
irradiance. For example, in regions prone to frequent sandstorms and high particle levels, developing an irradiation
prediction model that incorporates dust phenomena is essential, as dust accumulation on PV panels affects the
efficiency of solar modules [8], [9]. Accurate estimation of these climatic characteristics is essential for developing
precise models of solar irradiation. Additionally, connecting large-scale renewable power to the grid presents
challenges [10]. The imbalance between supply and demand can cause instability and blackouts. Load balancing,
which involves controlling the proportion of energy generated and consumed, is a complex task typically achieved by
adjusting output energy and increasing energy production [11], [12]. That’s why we must ensure the maximum
possible production from solar to mitigate the challenge. The variability of solar photovoltaic output power across
geographic regions and climatic variables introduces volatility and unpredictability, underscoring the need for accurate
solar PV prediction to ensure the reliability of the entire power grid [13]. Precise predictions can assist utility
administrations and corporate workers in promptly adjusting and optimizing power generation plans, thereby
enhancing the use and economic productivity of new energy sources [14], [15]. PV forecast algorithms primarily focus
on predicting photovoltaic generation or solar irradiation [16]. Solar forecasting involves creating prediction models
that utilize historical data and adhere to data science methodologies [17]. Accurate forecasting of solar resources and
photovoltaic power production is of interest to electricity network operators and energy generators due to its impact
on power grid maintenance, market structure, and cost reduction. As the popularity of photovoltaics continues to grow,
companies are investing heavily in power management systems to improve data collection and enable autonomous
resource management [18].
Solar irradiance forecasting has progressed with advancements in forecasting theories and machine learning. With an
emphasis primarily on short-term or day-ahead forecasts, several methodologies, including statistical and machine
learning approaches, predict solar irradiance at different time horizons [19]. These models can only capture linear
relationships and need stationary input data. Some of the statistical methods used include persistence forecasting,
Autoregressive (AR), Autoregressive Integrated Moving Average (ARIMA), and Exponential Smoothing Models
[20], [21]; however, these techniques do not make use of multivariate data, such as relevant meteorological variables.
Machine learning-based methods, like Artificial Neural Networks (ANNs) [22], Support Vector Machine (SVM) [23],
and K-Nearest Neighbor (KNN) are widely used and show superior accuracy in short-term predictions. Without the
complexity of mathematical and physical relationships, ANNs can learn any nonlinear information and produce
accurate short-term predictions [24]. In time series forecasting, they do have certain drawbacks. Time series data
contain sequential information and have a time order. When dealing with sequential data, the ANN model does not
preserve sequential information effectively. Deep Learning techniques like Recurrent Neural Network (RNN), Long
Short-Term Memory (LSTM), and Convolutional Neural Network (CNN) [25] are popular for solar forecasting due
to their capacity to characterize high-dimensional nonlinear complex relationships between inputs and outputs [26],
[27]. Sequential models such as RNN, LSTM, and GRU have a recurrent connection that can capture the sequential
relationship of the data during forecasting [28]. RNN-based methods provide better results in comparison to other
machine learning models; however, they struggle with multi-step forward prediction. This issue is better served by
encoder-decoder architecture, which is used in the fields of machine translation and natural language processing [29].
This architecture is also employed in several time series forecasting tasks. In order to accurately forecast the weather
and stock prices, Qin employs a two-stage encoder-decoder method [30]. Using seq2seq models, Bottieau was able to
make probabilistic predictions about the cost of various imbalances in the European power markets [31].
Because of the wide range of meteorological variables included in the input data, solar irradiance provides a unique
forecasting problem. This multivariate time series data encapsulates a spectrum of input attributes, making it difficult
for the existing forecasting models to extract the complex feature correlations and long temporal dependencies of these
input features from nonlinear and non-stationary data. Additionally, for multi-step forecasting, the output sequence's
temporal dependency coupled with external factors like seasonality makes prediction more challenging. The encoder
in the encoder-decoder architecture struggles to capture long temporal relationships for particularly lengthy input
sequences since the encoder converts the input sequence into a fixed-length context vector, which could lead to
information loss. To address this problem, we present a modeling approach for time series data using the attention
mechanism and transformer model in our study. The Attention mechanism was first introduced in the machine
translation problem to solve the long-range dependency problem of the encoder-decoder [32]. The Transformer model
has recently revolutionized the field of natural language processing by pushing the state-of-the-art and being used for
a wide range of tasks, including conversational chatbots, vision-language tasks, and machine translation [33]. It is
possible to model time series data with complex temporal relations using transformer-based models. Temporal Fusion
Transformer (TFT) is an attention-based transformer model for time series forecasting with a high degree of flexibility
and the capacity for multi-step prediction [34]. TFT's attention mechanisms empower it to learn the complex temporal
dynamics of time sequences and its capacity to deal with seasonality makes TFT a strategic choice for our study's
Journal Pre-proof
goals. TFT can take into account a variety of input variables and provide insights on relevant time phases.
In this work, we present the application of attention-based models in multivariate time series forecasting for 24-step
forward prediction with a resolution of 30 minutes with improved accuracy and interpretability. By leveraging
attention mechanisms, our approach aims to address critical problems faced by conventional forecasting methods by
dynamically emphasizing essential spatiotemporal elements in solar irradiance time series data. Furthermore, the
research intends to contribute to the field by offering insights into the interpretability of the attention-based model,
resulting in more reliable predictions and therefore increasing the model's adaptability in real-world applications. The
key contribution of this paper lies in the application of the Temporal Fusion Transformer (TFT) and attention-based
models to the task of solar irradiance forecasting within the particular context of our area, Dhaka and Cox's Bazar,
two places in Bangladesh. Our study includes thorough data preprocessing, model construction, and parameter tuning
to improve the performance of TFT and other models, as well as the practicality of TFT by customizing it to our
region's distinct geographical and climatic characteristics. We demonstrate the efficiency and applicability of
attention-based models in addressing the complex nature of solar forecasting in our region-specific solar data through
comprehensive experimentation and comparisons of prediction accuracy between the proposed model and other
benchmark forecasting models. The following is how the paper is organized. Section 2 discusses relevant work on
deep learning models. Section 3 discusses methodologies, data preparation, and key terminology. Section 4 provides
training setups, detailed experimental findings, and further discussions. Section 5 concludes the paper.
2. Related Work
Recent advances in the fields of artificial intelligence and deep learning have led to the development of a variety of
deep learning models for time series forecasting problems. For such time-series analyses, conventional statistical
analysis approaches were previously employed. Due to the availability of relatively large amounts of energy and
meteorological data, the use of deep learning algorithms in solar irradiance forecasting over different time horizons,
including short, medium, and long-term, is growing increasingly appealing. P. Bendiek et al. [35] introduce DCF, a
solar irradiation forecasting algorithm with improved accuracy in three cities (Seattle, Denver, and Boston). The
algorithm uses two components: precise ML algorithms (SVM and FBP) and contextual information. SVM performs
better for short-term 1-hour projections, while FBP is used for longer-term forecasts beyond 3 hours due to stability.
M. Abdel-Nasser et al. [36] suggested HIFA, a solar irradiation forecasting technique that uses LSTM and GRU
networks. It was tested in three Finnish locales and showed better performance compared to three other ensemble
techniques with low site RMSE values. N. Yogambal et al. [37] introduce a CSO-GWO optimizer algorithm for multi-
timescale solar irradiance predictions using an LSTM-based deep recurrent neural network that outperforms
other models in single and multi-timescale forecasting with low MSE and MAPE values.
M. Abdel-Nasser [38] performed a solar irradiance forecasting approach based on LSTM models aggregated by the
Choquet integral which provides accurate forecasts and eliminates the need for costly meteorological equipment. X.
Huang et al. [39] presented a two-branch input LSTM-MLP structure for solar irradiance forecasting, which includes
main output, main input, auxiliary input, and auxiliary output, as well as LSTM layers that use irradiance history and
meteorological parameters. Model II-BD outperforms other models by using historical irradiance and meteorological
features as main inputs and next-instant meteorological data as auxiliary inputs. G. Guariso et al. [40] validated the
accuracy of FF and LSTM networks for predicting environmental variable time series, emphasizing the effect of null
values and midnight samples on performance metrics. J. Wojtkiewicz et al. [41] employ univariate and multivariate
GRU and LSTM models to predict Phoenix, Arizona's solar irradiance based on historical data, weather variables, and
cloud cover data.
GRU attention, a hybrid deep learning model built on Keras, was introduced by K. Yan et al. [42] for solar irradiance
prediction and has shown good prediction accuracy, quick modeling, and high portability. The authors emphasized
the advantages of utilizing deep learning to estimate power generation stability, dependability, and precision. Y. Yu
et al. [17] developed a short-term LSTM model to forecast solar irradiance and tested it in Atlanta, New York, and
Hawaii one hour and one day ahead. With low MAPE values in all three cities, LSTM outperforms other models,
particularly on cloudy and mixed days. M. Husein et al. [43] proposed a deep LSTM RNN for solar irradiance
forecasting using external features such as dry bulb temperature, dew point temperature, and relative humidity. The
model showed an average root mean square error of 80.07 W/m2 across six datasets, outperforming traditional
feedforward neural networks (FFNN). S. Dev et al. [44] proposed a solar irradiance forecasting approach based on
clearness index data and triple exponential smoothing to accurately reflect seasonality.
Journal Pre-proof
Tong et al. [45] propose an encoder-decoder deep hybrid model combining TCN, LSTM, and MLP, enhanced by
dynamic error compensation, achieving balanced multi-step forecasting through unique loss functions. Li et al. [46]
suggest a two-channel method employing LSTM, WGAN, and CEEMDAN, splitting solar output into frequency-
based subsequences for prediction, and integrating their values for final output. Hou et al. [47] introduce CNN-A-
LSTM, employing comparable day analysis and attention processes, surpassing various models on the NSRDB dataset
for accurate solar irradiance prediction, particularly excelling in unclouded and partly cloudy conditions. Munsif et al.
[48] explore the CT-NET model, a transformer variation combining CNN and multi-head attention for both local and
global information utilization, outperforming CNN-RNN, CNN-GRU, and CNN-LSTM across seasons using the
Alice Springs dataset. Yang et al. [49] developed a model with RACB, DIFM, and TSAM components, demonstrating
improved accuracy and resilience in multi-step forecasting compared to TCN, LSTM, LSTM-Attention, CNN-LSTM,
and Transformer models across various locations. Kong et al. [50] utilize EMD, GRU-A with attention, and Kalman
filtering for accurate solar radiation forecasting, proving its effectiveness against RNN, GRU, EMD-GRU, and GRU-
A models.
Previous research has primarily focused on traditional approaches such as statistical models, Artificial Neural
Networks (ANN), and sequence models such as Long Short-Term Memory (LSTM) networks. While these techniques
provided useful insights and advances, their difficulties in dealing with multivariate time series data and capturing
complex temporal correlations in solar irradiance data still need to be addressed. Moreover, the existing literature
reveals challenges in achieving optimal forecasting accuracy, particularly when dealing with volatility and
unpredictability, as well as the inability to demonstrate good generalization across different geographical locations,
which pose barriers to achieving robust and accurate predictions. Transformer models have recently been integrated
into time series forecasting problems, even though there is a discussion about whether or not transformers are effective
for time series data [51]. There are very limited works utilizing the advantages of attention-based models and
transformers while some prior studies used transformer models to estimate direct PV power using historical power
generation data [52]. Considering these limitations, our study aims to address them by introducing the Temporal
Fusion Transformer (TFT) to the area of solar irradiance forecasting and applying this model directly to a real-world
scenario, especially forecasting solar irradiance at two specific sites in Bangladesh: Dhaka and Cox’s Bazar. These
two locations have different geographical features, such as climate, distance from the sea, and seasonality, that affect
the availability and variability of solar resources. This study focuses on solar irradiance data as the input and output
to our model with other meteorological variables to increase the applicability to different regions and enhance our
understanding of the dynamic patterns and complexity driving energy output. In addition, we examine and compare
the effectiveness of the TFT, transformer, and attention-based models in comparison to other well-established models,
offering enhanced accuracy and adaptability in solar irradiance predictions, particularly in our specific geographical
and climatic setting.
3. Methodology
3.1. Seq2seq Encoder-Decoder
The Sequence-to-Sequence encoder-decoder architecture was developed [29], [53] to encode and produce a sequence
of any length for machine translation tasks with sequential input and output. The architecture has two RNN networks
called encoder and decoder. After recursively processing the input sequence 󰇛󰇜 of length , the encoder
RNN computes a fixed-length representation of the final hidden state vector which recapitulates the entire input
sequence. The decoder is another RNN network that produces a target sequence (󰆒) of length  that
employs the encoder's hidden state as its initial state. The decoder generates the target iteratively, and at each step, it
utilizes the previous step's output as well as the previous hidden state as input. It should be noted here that the lengths
of the input and output sequences may differ. Either a basic RNN, an LSTM [54], or a GRU [55] may be used as the
RNN in the encoder and decoder. Each hidden state of the encoder in a basic RNN is calculated using equation 1.
󰇛   󰇜
(1)
Weight matrices  and  link the input and the encoder's hidden states, respectively, where is the activation
function and stands for the encoder's hidden states.
Given an input sequence () whose fixed length hidden state representation is , the conditional probability
of the output sequence 󰇛󰆒󰇜is formulated in equation 2.
Journal Pre-proof
󰇛󰆒󰇜 󰇛󰇜
󰆒

(2)
The encoder-decoder model’s architecture was designed for language modeling, and the input and output sequences
are both represented as word embeddings, which are learned numerical vector representations for text. The decoder
initializes with a start token or a dummy input to begin the prediction. However, the preceding value to the target
sequence is known to our time series task. Additionally, the input and output sequences don’t share the same size of
feature representation. The dataset we’re using here has multiple features in each sequence hence it is called
multivariate time series forecasting, whereas the output sequence only has one feature. Therefore, we adapt the model
to our problem in that manner. Here, the prior true output value shown in Figure 1 is not known by the decoder;
instead, it only has access to the initial target value 󰇛󰇜 during the prediction phase. So, the decoder updates the
sequence (󰇛󰇜󰇛󰇜󰇛󰆒󰇜) using the probability distribution it obtained from the prior state. There are several
methods for updating decoder predictions during training. Recursive prediction is one way. That is, the previously
predicted decoder outputs feed into the decoder recurrently until we obtain an output of the desired target length. One
disadvantage of this strategy is that if the predictions are too poor in the early stages of training, the errors will accrue
over the sequence length, making it harder for the model to learn and converge rapidly. Another method is using
teacher forcing [56], [57]. In teacher forcing, the model's decoder makes predictions based on the true previous target
value. It forces the sequence model to stay near the true sequence. This approach has one drawback: there is no true
target value during inference. We need to forecast recursively during inference, resulting in a discrepancy between
training and inference. So, we adopted a hybrid of the two approaches. Using a ratio, we combined two approaches
by giving the decoder the projected value in some steps and providing it with the true value at other times. This ratio
is designated as TFR.
3.2. Encoder-Decoder with Attention Mechanism
In encode-decoder attention model, the time series input sequence is read by the encoder, which then transform into
hidden states (hen) to create a fixed-size context vector (ci) representation of the data. The context vector is then utilized
by the decoder to generate an output sequence based on the previously generated output (yt-1) and the previous hidden
state (hde,i-1). The attention mechanism is used at each decoding step to continuously select information from the hidden
states, adjusting the context vector based on the decoder's current state. The attention mechanism starts by generating
an alignment score using the decoder's hidden states and each of the encoder's hidden states, which is then transformed
into attention weights. Then, the context vector is generated by using the attention weights and weighted-
Journal Pre-proof
summarizing of encoder hidden states  which is displayed in equation 3.
 

(3)
Using equation 4 and 5 each annotation’s value  is determined.
 
(4)
 
󰇛󰇜

(5)
The GRU and LSTM layers used in the encoder of the attention-based model are bidirectional. Mixed recursive and
teacher-forcing methods were used for the training phase as mentioned in the preceding section.
3.3. Transformer for Time Series:
In 2017, researchers from Google Brain unveiled the first-ever transformer [33]. To adapt the transformer model for
time series forecasting, Neo et al. [58] created a variant that maintains the original structure of encoder-decoder layers.
In the original transformer model, which was developed to solve the machine translation issue, the embedding size is
utilized as the  dimensional vector value throughout the encoder and the decoder. This ensures that the feature
size of the input and output text data is the same. In this scenario, input and output time series data may have different
characteristics. Figure 2 depicts the input layer of the encoder, which is a fully connected neural network used to map
the input data's attributes onto a  dimensional vector. The decoder also has a layer like an encoder to translate
the output data to the dimensional vector.
In multi-headed attention, the time series data is linearly transformed to obtain query vectors (Q), key vectors (K), and
value vectors (V) and each of these transformed vectors is split into multiple heads. Using the scaled dot-product
attention mechanism, each attention head separately computes attention scores. To generate attention output, the
outputs of all attention heads are concatenated and linearly transformed, as presented in equation 6.
󰇛󰇜  󰇛󰇜

(6)
Positional encoding is used to capture the sequential information of the input data since our model does not include a
sequential unit like an RNN. In addition, masking is used in the decoder's output sequence to ensure that only preceding
data points in the time series are included in the prediction. A normalizing layer is included underneath each sublayer.
Journal Pre-proof
3.4. Temporal Fusion Transformer (TFT)
Temporal Fusion Transformer (TFT) [34] provides a neural network design that combines the features of other
networks, such as LSTM layers and Transformers’ attention heads. TFT is able to accommodate three distinct kinds
of features. They are temporal data with known inputs into the future, temporal data known only up to the present,
and external categorical or static variables, which are also referred to as time-invariant features. The model has a high
degree of adaptability with the capability of multi-step prediction. Certain time sequences might be rather complicated
or noisy, but others can be easily modeled using seasonal naive predictors and require very little effort. In an ideal
world, the model would be able to distinguish between these distinct kinds of situations. There is also the possibility
of success with one-step-ahead prediction models that recursively feed forecasts.
In order to adapt to a broad variety of datasets and use cases, the architecture may be equipped with gating mechanisms
that allow data to bypass unused parts of the network, as shown in equations 7-11.
󰇛󰇜󰇛󰇛󰇜󰇜
(7)
󰇟󰇛󰑅󰇜󰇛󰑅󰇜󰇠
(8)
󰑅 󰇛󰑅󰇜
(9)
󰑅 󰑅
(10)
󰑅 󰇛  󰇜
(11)
In these equation, ELU is represented as the Exponential Linear Unit activation function, 󰑅󰑅󰑅󰑅 are represented
as intermediate layers, LN is represented as standard layer normalization, 󰑅 is the result of concatenating 󰇛󰑅󰇜
and 󰇛󰑅󰇜 and is represented as weight sharing.
At each time step, variable selection networks choose the right set of input variables. In order to include static
characteristics in the network, context vectors are encoded and used to condition the temporal dynamics using static
Journal Pre-proof
covariate encoders. For the purpose of local processing, a sequence-to-sequence layer is used, and for the purpose of
capturing long-term dependencies, an innovative interpretable multi-head attention block is provided. Quantile
forecasting intervals are used to determine the probable range of goal values at each time step in the forecasting
process.
3.5. Data Description:
The historical irradiance data utilized for the system modeling and validation for this study came from the National
Solar Radiation Database (NSRDB) [59] over the period of January to December from two consecutive years 2019
and 2020. To assess the robustness of the models, it is necessary to investigate data from several locations. Dhaka
(23.8° N, 90.41° E) and Cox's Bazar (21.46° N, 92.01° E) are the two locations in Bangladesh that were utilized in
this study. Table 1. below shows the statistical characteristics of the data for these two locations.
Table 1. Statistical features of the solar irradiance data.
Location
GHI(W/m2)
Max
Mean
Std.
All samples
1017
207.23
287.50
Dhaka
994
200.24
278.47
Cox’s Bazar
1017
214.23
296.09
Figure 3. Solar irradiation data in Dhaka during 2019
The dataset contains a total of 70,176 data points from two locations with a temporal resolution of 30 minutes and has
no missing values. Global Horizontal Irradiation (GHI), one of the three solar irradiation components included in this
database, is chosen as the target variable for our experiment. Figure 3 displays the Global Horizontal Irradiation
distribution for Dhaka for different months in 2019. The figure shows that solar irradiance varies between the hours
of each day and that each month has a different peak.
Journal Pre-proof
(a)
(b)
Figure 4. Global Horizontal Irradiation during (a) clear-sky (b) cloudy day
Due to various weather conditions, the distribution of solar irradiance in different locations varies substantially. In
cloudy or rainy conditions, the solar irradiation value is highly uncertain and variable.
Figure 4(a,b) shows the solar irradiance for two different weather scenarios: a clear sky and cloud cover, during the
course of the day. Data exhibits a pattern on days with a clear sky. However, when there is cloud cover, GHI readings
become extremely irregular and exhibit a sharp drop in the curve.
To enhance the forecasting ability of our model, we incorporate meteorological data, which is also provided by the
National Solar Radiation Database, along with the solar irradiance data. The properties of the meteorological data are
shown in Table 2.
Table 2. Meteorological parameters.
Variable Name
Unit
Global Horizontal Irradiance
W/m2
Ozone
Solar Zenith Angle
Degree
Precipitable Water
cm
Temperature
Dew Point
Relative Humidity
%
Pressure
mbar
Wind Direction
Degree
Wind Speed
m/s
3.6. Feature Selection:
Numerous meteorological factors can be thought of as possible factors that can have an impact on the solar radiation
that a surface receives from above. In order to choose an optimum feature subset as the model input, it is necessary to
differentiate the particular features linked to weather conditions into those that are useful to the model and those that
are irrelevant. Pearson’s correlation coefficient is the measure of the statistical relationship between two continuous
variables. To decide which factors should be used as inputs, the correlation between GHI and other meteorological
variables was examined. Table 3 displays the dataset's solar irradiance and weather variables' Pearson correlation
coefficients.
Journal Pre-proof
Table 3. Pearson’s correlation coefficients between meteorological parameters and GHI
Weather Variables
Dhaka
Cox’s Bazar
Ozone
0.064
0.047
Solar Zenith Angle
-0.815
-0.817
Precipitable Water
-0.002
-0.048
Temperature
0.510
0.271
Dew Point
0.018
-0.021
Relative Humidity
-0.547
-0.470
Pressure
-0.007
0.057
Wind Direction
0.054
0.093
Wind Speed
0.227
-0.033
The correlation between GHI and the various weather variables differs by location, indicating that the climate
condition has an impact on these parameters. A minimum value of 0.2 for the absolute value of Pearson’s correlation
coefficients in either location was chosen to determine the inclusion of the features. From the table, it can be seen that
Temperature, Humidity, Solar Zenith Angle, and Wind Speed were deemed to be critical for the model and that the
remaining parameters were excluded since they showed no significant correlation with the GHI.
3.7. Feature Transform and Encoding:
Cloud type is a categorical feature that represents different cloud conditions and weather types. It is an important
feature since cloud condition is responsible for the abrupt change in radiation received at the surface. One-hot encoding
is used since this feature doesn't have any ordinal relationships. DateTime variable is also an important feature as there
is a strong correlation between GHI and time which can be seen in Figure 3. One-hot encoding is not suitable for this
feature as there are too many categories. Moreover, the variables have a cyclical relationship that one-hot encoding
can’t address. For instance, although appearing to be separated by 11 months in categorical value, December and
January are only 1 month apart. To resolve this problem, we encoded the cyclic feature using sine and cosine
transformations, as shown in equations 12 and 13.
 󰇛 
󰇛󰇜󰇜
(12)
 󰇛 
󰇛󰇜󰇜
(13)
3.8. Data Scaling and Splitting:
Different continuous input variables' scales may result in slow learning or cause it to become trapped in local
optimums. If the scale or distribution of the time series data is constant, gradient descent-based algorithms, such as
neural networks, would perform better. This necessitates that we need to normalize the data such that each feature has
the same scale and significance. Standardization (z-score), a technique that rescales the distribution of values with a
zero mean and a standard deviation of 1, is used in this study to rescale the data. The z-score normalization formula
is as follows in equation 14:

(14)
Journal Pre-proof
Where is the input data, denotes the mean of the feature vector, and denotes the feature vector's standard
deviation.
For training purposes, the complete dataset is split into three sets: train, validation, and test sets. 75% of the data,
covering the first year (2019) and the first six months of 2020, are in the training set, which is used to fit the models.
The remaining six months are split between the test (12.5%) and validation (12.5%) sets. The validation set is used to
provide an unbiased assessment of a fitted model while fine-tuning its hyperparameters whereas the test set is used to
evaluate the final model. Since it is necessary to preserve the temporal order of time series data, data points are not
shuffled while splitting.
3.9. Performance Criterion:
Four performance metrics, including the mean square error (MSE), the mean absolute error (MAE), the mean absolute
scaled error (MASE), and the coefficient of determination (R2) are used in the forecasting experiments to assess the
forecasting accuracy of our models.
MSE stands for Mean Squared Error which is shown in equation 15. It measures the average of the squared differences
between the actual and estimated values.

󰇛
󰇜

(15)
MAE stands for Mean Absolute Error which is presented in equation 16. It calculates the sum of the absolute
differences between the actual and predicted values.


(16)
MASE stands for Mean Absolute Scaled Error which is exhibited in equation 17. It evaluates the accuracy of forecasts
by comparing the mean absolute error of the forecast values with the mean absolute error of a naive model. A Naive
model is a simple baseline model that forecasts the future value to be the same as the previous one.
 

(17)
R2 is a coefficient of determination which is shown in equation 18. It indicates how well the model fits the data by
comparing the total variance explained by the model and the total variance in the data.
󰇛
󰇜

󰇛󰇜

(18)
Here, 
represent the actual and predicted values, respectively, while indicates the mean of the actual values.
4. Results and Analysis:
From the datasets of two different locations, multi-step solar irradiance is forecasted using different sequence-to-
sequence attention-based models. As a multi-step ahead time series forecasting, the model predicts 12 hours ahead of
the Global Horizontal Irradiance(GHI) value using the last 24 hours of data as the input sequence. According to the
methods described in the preceding section, Transformer, GRU and LSTM Encoder-Decoder (GRU-ED, LSTM-ED),
GRU and LSTM Encoder-Decoder with attention (GRU-attn, LSTM-attn) models were developed and trained in
Pytorch. The TFT model was trained using the Pytorch implementation in Pytorch Forecasting [60]. As the various
hyperparameters, like learning rate and hidden units, significantly impact the model's performance, we tuned the
hyperparameters of the models using Optuna [61]. The optimization method used in this experiment is the Adam
optimizer. The selected hyperparameters for our forecasting models are presented in Tables 4 and 5.
Journal Pre-proof
Table 4. Selected parameters for Encoder-Decoder & Attention-Based GRU and LSTM model
Parameter
GRU-ED
LSTM-ED
GRU-attn
LSTM-attn
Layers
1
1
1
1
Encoder hidden size
64
48
32
32
Decoder hidden size
64
48
32
32
Learning rate
0.0005
0.0005
0.0005
0.0005
Input sequence length
48
48
48
48
TFR
0.6
0.5
0.6
0.5
Dropout
0
0
0
0
Batch size
256
256
256
256
Table 5. Selected parameters for the Transformer and Temporal Fusion Transformer (TFT) model
Transformer
Temporal Fusion Transformer
Parameter
Value
Parameter
Value
Layers
3
Layers
1
Dmodel
24
Hidden size
32
Dff
16
Hidden continuous size
16
Attention heads
8
Attention heads
4
Learning rate
0.0005
Learning rate
0.0001
Input sequence length
48
Input sequence length
48
Dropout
0.2
Dropout
0.2
Batch size
256
Batch size
256
The performance of the sequence-to-sequence models is also compared with the simple MLP and Naive models. The
Naive model uses the previous value or period to forecast the next value/period. Because we are forecasting sequences,
the naive model will anticipate the following day's irradiance based on the value from the previous day. To compare
our sequence models, we also construct a simple MLP model that predicts sequence recursively. Sometimes MLP
model performs well on several occasions in time series forecasting [62], [63]. The MLP model used in this experiment
has 2 hidden layers, each with 64 hidden units.
The evaluation metrics of these forecasting models for the two different locations are shown in Table 6.
Journal Pre-proof
Table 6. Forecasting metrics for the different models in two locations
Dhaka
Cox's Bazar
Model
MSE
MAE
MASE
R2
MSE
MAE
MASE
R2
Naive
0.302
0.283
0.622
0.277
0.263
0.668
MLP
0.180
0.243
0.858
0.775
0.171
0.241
0.916
0.796
GRU-ED
0.179
0.232
0.819
0.776
0.152
0.219
0.833
0.818
LSTM-ED
0.183
0.236
0.834
0.770
0.156
0.227
0.863
0.814
GRU-attn
0.153
0.231
0.816
0.809
0.160
0.242
0.920
0.809
LSTM-attn
0.160
0.219
0.773
0.799
0.164
0.236
0.897
0.804
Transformer
0.1945
0.271
0.957
0.757
0.1865
0.296
1.125
0.777
TFT
0.154
0.215
0.759
0.806
0.147
0.210
0.798
0.824
As seen in the table, almost all forecasting models can forecast with reasonable accuracy when compared to the naive
model. The table also shows that TFT outperforms the other models for most of the metrics in both locations. After
the Naive model, ANN and Transformer perform worse compared to other models overall.
In time series forecasting, sequential models generally outperform MLP because they contain recurrent structures that
can store sequential data. Here, at Cox's Bazar location, GRU-ED and LSTM-ED outperform MLP across all
parameters, with GRU-ED doing the best. MLP outperforms LSTM-ED in Dhaka in terms of MSE and MASE values,
however, LSTM-ED is more effective in terms of MAE and R2. In this case, GRU-ED also gives superior results than
MLP and LSTM-ED. GRU-ED model has shown better results in Cox’s Bazar location than attention models, with
MSE and MAE values of 0.152 and 0.219, respectively. In Dhaka, GRU and LSTM attention models beat MLP and
encoder-decoder models, while the GRU-attn model performs the best and even outperforms TFT in terms of MSE
and R2 score. The effectiveness of the attention mechanism is evident as it facilitates attention-based models in
retaining all prior information in long sequences. The attention mechanism assesses all hidden states from the encoder
sequence and also assigns relative importance to the time steps and features that affect output when formulating
predictions, thus improving the prediction accuracy.
The Transformer model performs the worst in both locations, slightly outperforming the Naive model. Although the
Transformer model does well throughout the training phase, it does poorly in the testing data. Finally, the TFT model
beats all other models in Cox's Bazar location with the lowest MSE, MAE, and MASE loss and high R2 value. Only
GRU-attn has a better MSE and R2 value than TFT with values of 0.153 and 0.809 in Dhaka. TFT has the best MAE
and MASE scores in this location. The TFT model can handle a variety of input data, including static covariates, future
known inputs, and temporal variables known just up to the present. The model can also be trained on multiple time
series. This algorithm combines a temporal self-attention decoder with a novel Multi-head attention mechanism that,
when evaluated, gives additional insight into feature importance in order to capture long-term dependencies.
Journal Pre-proof
(a)
(b)
Figure 5. Predicted solar irradiance for different models in Dhaka during (a) clear-sky (b) cloudy days
(a)
(b)
Figure 6. Predicted solar irradiance for different models in Cox’s Bazar during (a) clear-sky (b) cloudy days
The actual data and predicted outcomes for the various models in both locations and for the two weather conditions
are shown in Figures 5(a,b) and 6(a,b). Our forecasting algorithms predict 24 steps ahead of the data. On days with
cloud cover, as shown in Figures 5(b) and 6(b), algorithms can capture the uncertainty and volatility in solar data. Due
to the high level of weather unpredictability on cloudy days, models work better when the sky is clear than when it is
cloudy.
Better performance in forecasting is achieved in the location of Cox’s Bazar. Almost every forecasting model performs
better in this location. This might be because the seasonality pattern is more consistent in this location and there is
less residual or randomness owing to the cloudy and variability in weather conditions. Moreover, the same information
can be observed through the Naive model, where the error values are smaller in Cox's Bazar than in Dhaka. We may
infer that Cox's Bazar data follow seasonality with less unpredictability since the Naive model predicts the upcoming
period using the prior period. The TFT model shows more consistency in both locations with MSE values of 0.154
and 0.147 and MAE values of 0.215 and 0.210 respectively. Attention models also perform well in both locations
although they have better values in the Dhaka location. All of the other models projected inconsistently for the two
separate locations. TFT’s ability to maintain consistent performance levels across varying contexts implies that it is a
robust choice for diverse patterns.
Journal Pre-proof
Table 7. Overall Forecasting metrics for the different models in both locations
Model
MSE
MAE
MASE
R2
Naive
0.290
0.273
0.646
MLP
0.176
0.242
0.886
0.785
GRU-ED
0.165
0.226
0.828
0.798
LSTM-ED
0.169
0.231
0.846
0.794
GRU-attn
0.157
0.236
0.864
0.808
LSTM-attn
0.162
0.227
0.831
0.802
Transformer
0.190
0.270
0.989
0.767
TFT
0.151
0.212
0.776
0.815
To provide a thorough assessment of our solar prediction models, the test datasets from two locations are combined
to compute the error metrics of the total test datasets, as shown in Table 7. The combination of results allows for a
comparative analysis, which provides insights into the models' overall performance under two distinct environmental
settings. Table 7 demonstrates TFT's superior performance in comparison to other forecasting models, with TFT
having a better value in all error metrics, with a 0.151 MSE and 0.212 MAE value while the 0.776 MASE and 0.815
R2 scores further corroborate its superior performance. Overall experimental results show that the TFT’s performance
is on par with the attention models and outperforms Encoder-Decoder models and a simple estimator (Naive model).
In contrast to the encoder-decoder architecture, which fails to capture information because of its fixed-length context
vector representation, attention-based models are able to collect information in long input sequences. Particularly, we
illustrate the benefits of the attention mechanisms which provide a clear view into the decision-making process,
allowing models to gain insights into specific meteorological components and temporal patterns influencing
solar irradiance forecasts. We also observed that the GRU and LSTM architecture in the Encoder-Decoder and
Attention models function similarly despite having different architectural designs, with GRU marginally
outperforming LSTM. Our results demonstrate that the TFT consistently surpasses traditional sequential models and
other attention-based architectures in both locations, showcasing its robustness and effectiveness in capturing the
intricate patterns inherent in our region's solar data. However, since TFT is more computationally expensive due to
containing significantly more parameters, a careful trade-off between model complexity and training efficiency is
required.
5. Conclusion:
In this paper, we presented an Attention-based deep learning framework to address the multivariate multistep Time
Series Forecasting problem. Attention-based encoder-decoder, transformer, and Temporal Fusion Transformer (TFT)
models are evaluated to forecast 24 steps forward solar irradiance at two different locations in Bangladesh. The dataset
with an interval of 30 minutes includes information on cloud cover, meteorological variables, and historical solar
irradiance values. The unpredictable nature of the weather makes it challenging to forecast solar irradiance, which
leads to imbalances in the interconnected grid. Our primary motivation was to assess the attention mechanism's
capabilities to address the complicated and dynamic nature of solar irradiance patterns, therefore contributing to the
grid and optimizing renewable energy utilization. According to the results, the TFT model had superior outcomes than
other existing models such as MLP and sequential encoder-decoder models, across all performance measures.
Attention-based GRU Encoder-Decoder, which has the best MSE and R2 score in the Dhaka location, was the second-
best method after TFT. The Transformer model for the Time Series performed the worst out of all the models used. In
comparison to the other models' inconsistent predictions, the empirical results exhibit a significant decrease in
forecasting errors, as well as the consistency and robustness of TFT in two separate locations in our specific region,
Journal Pre-proof
proving its usefulness in real-world applications. As the need for clean and renewable energy sources increases, our
research contributes to assisting energy management in making informed decisions for sustainable energy integration
into the grid and more reliable and efficient utilization of solar energy. It is important to recognize several limitations
of our study. Firstly, our work primarily focuses on a specific time horizon for solar radiation predictions; future
studies could investigate multiple time horizons to further assess the robustness of forecasting methodologies.
Furthermore, the training period for TFT and other attention models is relatively high, which could lead to potential
practical issues in situations when quick model response is necessary. Despite these limitations, our research
demonstrates the importance of the application of the TFT model and incorporating the attention mechanism to
overcome the issues associated with solar irradiation variability.
6. Data Availability
Solar Irradiance Forecasting: Dataset from NSRDB (National Solar Radiation Database) was used in order to support
this study and is available at “https://nsrdb.nrel.gov/”. The dataset is cited at relevant places within the text as Ref
[59].
7. Conflicts of interest
The authors certify that they do not have any competing interests that might influence the results of this research in
any way, and they give their approval for the current version of the work to be published.
8. References
[1] L. G. Thompson, “Climate change: The evidence and our options,” Behav. Anal., vol. 33, no. 2, pp. 153170,
Oct. 2010, doi: 10.1007/BF03392211.
[2] P. Newell and A. Simms, “How Did We Do That? Histories and Political Economies of Rapid and Just
Transitions,New Polit. Econ., vol. 26, no. 6, pp. 907922, Nov. 2021, doi: 10.1080/13563467.2020.1810216.
[3] F. Wang, Z. Zhen, Z. Mi, H. Sun, S. Su, and G. Yang, “Solar irradiance feature extraction and support vector
machines based weather status pattern recognition model for short-term photovoltaic power forecasting,”
Energy Build., vol. 86, pp. 427438, Jan. 2015, doi: 10.1016/j.enbuild.2014.10.002.
[4] and S. A. Reinders, P. Verlinden, A. Freundlich, John Wiley, “Photovoltaic solar energy: from fundamentals
to applications.,” 2017.
[5] S. Jiang, C. Wan, C. Chen, E. Cao, and Y. Song, “Distributed photovoltaic generation in the electricity market:
status, mode and strategy,” CSEE J. Power Energy Syst., vol. 4, no. 3, pp. 263272, Sep. 2018, doi:
10.17775/CSEEJPES.2018.00600.
[6] P. Hanser, R. Lueken, W. Gorman, and J. Mashal, “The practicality of distributed PV-battery systems to
reduce household grid reliance,” Util. Policy, vol. 46, pp. 2232, Jun. 2017, doi: 10.1016/j.jup.2017.03.004.
[7] M. Q. Raza, M. Nadarajah, and C. Ekanayake, “On recent advances in PV output power forecast,” Sol. Energy,
vol. 136, pp. 125144, Oct. 2016, doi: 10.1016/j.solener.2016.06.073.
[8] T. Sarver, A. Al-Qaraghuli, and L. L. Kazmerski, “A comprehensive review of the impact of dust on the use
of solar energy: History, investigations, results, literature, and mitigation approaches,” Renew. Sustain. Energy
Rev., vol. 22, pp. 698733, Jun. 2013, doi: 10.1016/j.rser.2012.12.065.
[9] S. A. Sulaiman, A. K. Singh, M. M. M. Mokhtar, and M. A. Bou-Rabee, “Influence of Dirt Accumulation on
Performance of PV Panels,” Energy Procedia, vol. 50, pp. 5056, 2014, doi: 10.1016/j.egypro.2014.06.006.
[10] Y. JIA, X. LYU, C. S. LAI, Z. XU, and M. CHEN, “A retroactive approach to microgrid real-time scheduling
in quest of perfect dispatch solution,” J. Mod. Power Syst. Clean Energy, vol. 7, no. 6, pp. 16081618, Nov.
Journal Pre-proof
2019, doi: 10.1007/s40565-019-00574-2.
[11] K. S. Perera, Z. Aung, and W. L. Woon, “Machine Learning Techniques for Supporting Renewable Energy
Generation and Integration: A Survey,” 2014, pp. 81–96.
[12] A. Fouilloy et al., “Solar irradiation prediction with machine learning: Forecasting models selection method
depending on weather variability,” Energy, vol. 165, pp. 620629, Dec. 2018, doi:
10.1016/j.energy.2018.09.116.
[13] F. Wang, Y. Yu, Z. Zhang, J. Li, Z. Zhen, and K. Li, “Wavelet Decomposition and Convolutional LSTM
Networks Based Improved Deep Learning Model for Solar Irradiance Forecasting,” Appl. Sci., vol. 8, no. 8,
p. 1286, Aug. 2018, doi: 10.3390/app8081286.
[14] H. Zhou, Y. Zhang, L. Yang, Q. Liu, K. Yan, and Y. Du, “Short-Term Photovoltaic Power Forecasting Based
on Long Short Term Memory Neural Network and Attention Mechanism,” IEEE Access, vol. 7, pp. 78063
78074, 2019, doi: 10.1109/ACCESS.2019.2923006.
[15] J. Antonanzas, N. Osorio, R. Escobar, R. Urraca, F. J. Martinez-de-Pison, and F. Antonanzas-Torres, “Review
of photovoltaic power forecasting,” Sol. Energy, vol. 136, pp. 78111, Oct. 2016, doi:
10.1016/j.solener.2016.06.069.
[16] J. Kleissl, “Solar energy forecasting and resource assessment.,” Acad. Press, 2013.
[17] Y. Yu, J. Cao, and J. Zhu, “An LSTM Short-Term Solar Irradiance Forecasting Under Complicated Weather
Conditions,” IEEE Access, vol. 7, pp. 145651145666, 2019, doi: 10.1109/ACCESS.2019.2946057.
[18] R. B. Melton et al., “Leveraging Standards to Create an Open Platform for the Development of Advanced
Distribution Applications,” IEEE Access, vol. 6, pp. 3736137370, 2018, doi:
10.1109/ACCESS.2018.2851186.
[19] R. Baños, F. Manzano-Agugliaro, F. G. Montoya, C. Gil, A. Alcayde, and J. Gómez, “Optimization methods
applied to renewable and sustainable energy: A review,” Renew. Sustain. Energy Rev., vol. 15, no. 4, pp.
17531766, May 2011, doi: 10.1016/j.rser.2010.12.008.
[20] G. Reikard, “Predicting solar radiation at high resolutions: A comparison of time series forecasts,” Sol.
Energy, vol. 83, no. 3, pp. 342349, Mar. 2009, doi: 10.1016/j.solener.2008.08.007.
[21] Z. Dong, D. Yang, T. Reindl, and W. M. Walsh, “Short-term solar irradiance forecasting using exponential
smoothing state space model,” Energy, vol. 55, pp. 11041113, Jun. 2013, doi: 10.1016/j.energy.2013.04.027.
[22] S. P. DURRANI, S. BALLUFF, L. WURZER, and S. KRAUTER, “Photovoltaic yield prediction using an
irradiance forecast model based on multiple neural networks,” J. Mod. Power Syst. Clean Energy, vol. 6, no.
2, pp. 255267, Mar. 2018, doi: 10.1007/s40565-018-0393-5.
[23] M. Pan et al., “Photovoltaic power forecasting based on a support vector machine with improved ant colony
optimization,” J. Clean. Prod., vol. 277, p. 123948, Dec. 2020, doi: 10.1016/j.jclepro.2020.123948.
[24] M. Marzouq, H. El Fadili, K. Zenkouar, Z. Lakhliai, and M. Amouzg, “Short term solar irradiance forecasting
via a novel evolutionary multi-model framework and performance assessment for sites with no solar irradiance
data,” Renew. Energy, vol. 157, pp. 214231, Sep. 2020, doi: 10.1016/j.renene.2020.04.133.
[25] S. M. J. Jalali, S. Ahmadian, A. Kavousi-Fard, A. Khosravi, and S. Nahavandi, “Automated Deep CNN-LSTM
Architecture Design for Solar Irradiance Forecasting,” IEEE Trans. Syst. Man, Cybern. Syst., vol. 52, no. 1,
pp. 5465, Jan. 2022, doi: 10.1109/TSMC.2021.3093519.
[26] P. Kumari and D. Toshniwal, “Deep learning models for solar irradiance forecasting: A comprehensive
review,” J. Clean. Prod., vol. 318, p. 128566, Oct. 2021, doi: 10.1016/j.jclepro.2021.128566.
[27] Z. Pang, F. Niu, and Z. O’Neill, “Solar radiation prediction using recurrent neural network and artificial neural
network: A case study with comparisons,” Renew. Energy, vol. 156, pp. 279289, Aug. 2020, doi:
10.1016/j.renene.2020.04.042.
Journal Pre-proof
[28] P. Kumari and D. Toshniwal, “Long short term memory–convolutional neural network based deep hybrid
approach for solar irradiance forecasting,” Appl. Energy, vol. 295, p. 117061, Aug. 2021, doi:
10.1016/j.apenergy.2021.117061.
[29] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Adv. Neural
Inf. Process. Syst., vol. 4, no. January, pp. 31043112, 2014.
[30] G. C. Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, “A Dual-Stage Attention-Based
Recurrent Neural Network for Time Series Prediction,” doi: https://doi.org/10.48550/arXiv.1704.02971.
[31] J. Bottieau, L. Hubert, Z. De Greve, F. Vallee, and J.-F. Toubeau, “Very-Short-Term Probabilistic Forecasting
for a Risk-Aware Participation in the Single Price Imbalance Settlement,” IEEE Trans. Power Syst., vol. 35,
no. 2, pp. 12181230, Mar. 2020, doi: 10.1109/TPWRS.2019.2940756.
[32] D. Bahdanau, K. H. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and
translate,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 115, 2015.
[33] A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 2017-Decem, no. Nips, pp.
59996009, 2017.
[34] B. Lim, S. Arık, N. Loeff, and T. Pfister, “Temporal Fusion Transformers for interpretable multi-horizon time
series forecasting,” Int. J. Forecast., vol. 37, no. 4, pp. 17481764, 2021, doi:
10.1016/j.ijforecast.2021.03.012.
[35] P. Bendiek, A. Taha, Q. H. Abbasi, and B. Barakat, “Solar Irradiance Forecasting Using a Data-Driven
Algorithm and Contextual Optimisation,” Appl. Sci., vol. 12, no. 1, p. 134, Dec. 2021, doi:
10.3390/app12010134.
[36] M. Abdel-Nasser, K. Mahmoud, and M. Lehtonen, “HIFA: Promising Heterogeneous Solar Irradiance
Forecasting Approach Based on Kernel Mapping,” IEEE Access, vol. 9, pp. 144906144915, 2021, doi:
10.1109/ACCESS.2021.3122826.
[37] N. Y. Jayalakshmi et al., “Novel Multi-Time Scale Deep Learning Algorithm for Solar Irradiance
Forecasting,” Energies, vol. 14, no. 9, p. 2404, Apr. 2021, doi: 10.3390/en14092404.
[38] M. Abdel-Nasser, K. Mahmoud, and M. Lehtonen, “Reliable Solar Irradiance Forecasting Approach Based
on Choquet Integral and Deep LSTMs,” IEEE Trans. Ind. Informatics, vol. 17, no. 3, pp. 18731881, Mar.
2021, doi: 10.1109/TII.2020.2996235.
[39] X. Huang, C. Zhang, Q. Li, Y. Tai, B. Gao, and J. Shi, “A Comparison of Hour-Ahead Solar Irradiance
Forecasting Models Based on LSTM Network,” Math. Probl. Eng., vol. 2020, pp. 115, Aug. 2020, doi:
10.1155/2020/4251517.
[40] G. Guariso, G. Nunnari, and M. Sangiorgio, “Multi-Step Solar Irradiance Forecasting and Domain Adaptation
of Deep Neural Networks,” Energies, vol. 13, no. 15, p. 3987, Aug. 2020, doi: 10.3390/en13153987.
[41] J. Wojtkiewicz, M. Hosseini, R. Gottumukkala, and T. L. Chambers, “Hour-Ahead Solar Irradiance
Forecasting Using Multivariate Gated Recurrent Units,” Energies, vol. 12, no. 21, p. 4055, Oct. 2019, doi:
10.3390/en12214055.
[42] K. Yan, H. Shen, L. Wang, H. Zhou, M. Xu, and Y. Mo, “Short-Term Solar Irradiance Forecasting Based on
a Hybrid Deep Learning Methodology,” Information, vol. 11, no. 1, p. 32, Jan. 2020, doi:
10.3390/info11010032.
[43] M. Husein and I.-Y. Chung, “Day-Ahead Solar Irradiance Forecasting for Microgrids Using a Long Short-
Term Memory Recurrent Neural Network: A Deep Learning Approach,” Energies, vol. 12, no. 10, p. 1856,
May 2019, doi: 10.3390/en12101856.
[44] S. Dev, T. AlSkaif, M. Hossari, R. Godina, A. Louwen, and W. van Sark, “Solar Irradiance Forecasting Using
Triple Exponential Smoothing,” in 2018 International Conference on Smart Energy Systems and Technologies
(SEST), Sep. 2018, pp. 16, doi: 10.1109/SEST.2018.8495816.
Journal Pre-proof
[45] J. Tong, L. Xie, S. Fang, W. Yang, and K. Zhang, “Hourly solar irradiance forecasting based on encoder–
decoder model using series decomposition and dynamic error compensation,” Energy Convers. Manag., vol.
270, p. 116049, Oct. 2022, doi: 10.1016/j.enconman.2022.116049.
[46] Q. Li, D. Zhang, and K. Yan, “A Solar Irradiance Forecasting Framework Based on the CEE-WGAN-LSTM
Model,” Sensors, vol. 23, no. 5, p. 2799, Mar. 2023, doi: 10.3390/s23052799.
[47] X. Hou, C. Ju, and B. Wang, “Prediction of solar irradiance using convolutional neural network and attention
mechanism-based long short-term memory network based on similar day analysis and an attention
mechanism,” Heliyon, vol. 9, no. 11, p. e21484, Nov. 2023, doi: 10.1016/j.heliyon.2023.e21484.
[48] M. Munsif, F. U Min Ullah, S. Ullah Khan, N. Khan, and S. Wook Baik, “CT-NET: A Novel Convolutional
Transformer-Based Network for Short-Term Solar Energy Forecasting Using Climatic Information,” Comput.
Syst. Sci. Eng., vol. 47, no. 2, pp. 17511773, 2023, doi: 10.32604/csse.2023.038514.
[49] Y. Yang, Z. Tang, Z. Li, J. He, X. Shi, and Y. Zhu, “Dual-Path Information Fusion and Twin Attention-Driven
Global Modeling for Solar Irradiance Prediction,” Sensors, vol. 23, no. 17, p. 7469, Aug. 2023, doi:
10.3390/s23177469.
[50] X. Kong, X. Du, G. Xue, and Z. Xu, “Multi-step short-term solar radiation prediction based on empirical mode
decomposition and gated recurrent unit optimized via an attention mechanism,” Energy, vol. 282, p. 128825,
Nov. 2023, doi: 10.1016/j.energy.2023.128825.
[51] A. Zeng, M. Chen, L. Zhang, and Q. Xu, “Are Transformers Effective for Time Series Forecasting?,” 2022,
[Online]. Available: http://arxiv.org/abs/2205.13504.
[52] M. López Santos, X. García-Santiago, F. Echevarría Camarero, G. Blázquez Gil, and P. Carrasco Ortega,
“Application of Temporal Fusion Transformer for Day-Ahead PV Power Forecasting,” Energies, vol. 15, no.
14, p. 5232, Jul. 2022, doi: 10.3390/en15145232.
[53] N. Kalchbrenner and P. Blunsom, “Recurrent continuous translation models,” EMNLP 2013 - 2013 Conf.
Empir. Methods Nat. Lang. Process. Proc. Conf., no. October, pp. 17001709, 2013.
[54] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,”
IEEE Trans. Neural Networks, vol. 5, no. 2, pp. 157166, Mar. 1994, doi: 10.1109/72.279181.
[55] K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine
translation,” EMNLP 2014 - 2014 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 17241734,
2014, doi: 10.3115/v1/d14-1179.
[56] R. J. Williams and D. Zipser, “A Learning Algorithm for Continually Running Fully Recurrent Neural
Networks,” Neural Comput., vol. 1, no. 2, pp. 270280, Jun. 1989, doi: 10.1162/neco.1989.1.2.270.
[57] A. Goyal, A. Lamb, Y. Zhang, S. Zhang, A. Courville, and Y. Bengio, “Professor forcing: A new algorithm
for training recurrent networks,” in Advances in Neural Information Processing Systems, 2016, no. Nips 2016,
pp. 46084616.
[58] N. Wu, B. Green, X. Ben, and S. O’Banion, “Deep Transformer Models for Time Series Forecasting: The
Influenza Prevalence Case,” 2020, [Online]. Available: http://arxiv.org/abs/2001.08317.
[59] “NSRDB: National Solar Radiation Database.” https://nsrdb.nrel.gov/.
[60] “PyTorch Forecasting Documentation.” https://pytorch-forecasting.readthedocs.io/en/stable/index.html.
[61] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna,” in Proceedings of the 25th ACM SIGKDD
International Conference on Knowledge Discovery & Data Mining, Jul. 2019, pp. 26232631, doi:
10.1145/3292500.3330701.
[62] T. Zhang et al., “Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP
Structures,” Proc. ACM Conf., vol. 1, no. 1, 2022, [Online]. Available: http://arxiv.org/abs/2207.01186.
Journal Pre-proof
[63] P. H. Borghi, O. Zakordonets, and J. P. Teixeira, “A COVID-19 time series forecasting model based on MLP
ANN,” Procedia Comput. Sci., vol. 181, pp. 940947, 2021, doi: 10.1016/j.procs.2021.01.250.
Journal Pre-proof
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
Journal Pre-proof
... DL methods require large, high-quality datasets for training, which can be time-consuming and resource-intensive to prepare [19]. While DL methods demonstrate remarkable proficiency in recognizing patterns within training data. ...
... Gated recurrent unit (GRU) is an improvement over traditional RNNs which was first presented it in 2014 by Kyunghyun Cho et al. [36]. In many ways, GRUs may be compared to LSTMs (Long Short-term Memory) [19]. GRU, like LSTM, makes use of gates to regulate data transfer. ...
... For error estimation, the mean square error (MSE), the mean absolute error (MAE), the mean absolute scaled error (MASE), and the coefficient of determination ( R 2 ) were used [19]. The Mean Squared Error (MSE) represents the average squared deviation between predicted values and actual values in regression analysis. ...
Article
Full-text available
In this research, we propose a deep learning model employing gated recurrent units (GRUs) for the transmission spectrum prediction, and computational inverse designing of all-optical plasmonic devices (AOPDs), which may a crucial role in the precise fabrication process of photonic integrated circuits (PICs). This shift from a conventional simulation-based design to a GRU-based model offers a significant advantage in terms of accuracy and precision. The technique facilitates the intricate design process by simulating the propagation of Surface Plasmon Polaritons (SPPs) within the plasmonic structures of AOPDs. The forward modeling approach presented here demonstrates a substantial improvement in computational efficiency over the finite-difference time-domain method, adeptly forecasting transmission spectra characterized through the analysis of various geometrical parameters. The inverse modeling process infers the necessary design parameters to produce specific transmission spectra, markedly expediting the design process and eclipsing the time-intensive nature of traditional optimization methods. With a prediction loss (MSE) of 0.168 and 0.9217, this research substantiates the efficacy of GRUs in streamlining the forward and inverse design processes of AOPDs for integration into PICs.
... This combination holds immense potential for improving the reliability and accuracy of water quality predictions. The demonstration in other fields such as greenhouse temperature and solar irradiance forecasting Sakib, 2024) further solidifies the potential. The results reveal that our model achieves a mean square error (MSE) of 3987.56 and 4356.39, a root mean square error (RMSE) of 63.14 and 66.00, a mean absolute error (MAE) of 62.49 and 65.43, and a coefficient of determination (R²) 0.91 and 0.88 for training and testing datasets, respectively. ...
Article
Full-text available
Forecasting the surface water quality is vital for environmental monitoring and ecological sustainability. Although existing statistical and machine learning methods have been applied deliberately to forecast water quality, they often do not utterly delineate its complex spatial and temporal dynamics promptly. This in turn limits ensuring the accuracy and reliability of predictions, which are indispensable for effective environmental management. In order to overcome these challenges, we develop a novel approach as a multi-head attention-based long short-term memory model, specifically designed to enhance predictive precision that can capture complex dependencies using water quality datasets more precisely for the very first time. The proposed model shows a significant improvement over existing machine learning and deep learning models, achieving around 5–8% more accuracy in water quality forecasting. These enhanced results suggest that proposed approach is well-suited for large-scale environmental applications, offering a data-driven approach that supports targeted intervention strategies appropriately and reliably. This work contributes finely to the advancement of automated water quality forecasting systems, aiding sustainable environmental management practices.
... Multiple AI-driven applications have been introduced in industrial settings, including predictive maintenance (PdM) [7], quality assurance [8], collaborative robotics, and supply chain optimization [9,10]. By blending AI with routine operational tasks, manufacturers can boost workflow efficiency, refine production processes, and more effectively consider environmental factors [11]. ...
Article
Full-text available
As industrial processes are becoming increasingly complex and data-driven, the need for accurate quality predictions in manufacturing systems is regarded as critical. To address this challenge, AE-BiLA (Autoencoder-Bidirectional Long Short-Term Memory with Attention mechanism) has been proposed as a framework in which a stacked Long Short-Term Memory (LSTM) autoencoder is combined with a bidirectional LSTM enhanced by an Attention mechanism for predicting quality in multi-stage manufacturing processes (MMP). First, high-dimensional, noisy data are reduced by employing the stacked LSTM autoencoder, with essential information being preserved. Next, the compressed features are fed into the bidirectional LSTM, where significant temporal patterns are highlighted by the Attention mechanism. The method was validated on a real-world MMP dataset. An R² (coefficient of determination) of 0.9452 was obtained in Stage 1, demonstrating that upstream process dynamics were effectively captured. In contrast, an R² of 0.7329 was produced in Stage 2, reflecting increased complexity and variability in downstream operations. Moreover, the Symmetric Mean Absolute Percentage Error (SMAPE) in Stage 2 was reduced to 1.9319 from 19.3583 in Stage 1, thereby underscoring that outliers and noise were successfully managed. Overall, the AE-BiLA framework outperforms existing methods by effectively integrating denoising with a bidirectional recurrent structure. Despite the increased computational overhead, it is expected that the framework will yield substantial gains in productivity, lower waste levels, and reduce operational costs.
... The BiLSTM captures information from past and future contexts by processing sequences in both directions. An attention layer is applied to focus on relevant time steps within the sequence, allowing the model to weigh the importance of different time steps when making predictions [6]. We also add dropout layers that randomly set a fraction of input units to zero during training after the LSTM and attention layers to prevent overfitting. ...
Article
Accurate forecasting of precipitation is essential for various sectors, including agriculture, disaster management, and water resource planning. This paper presents a deep learning architecture that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) layers to predict precipitation using a set of weather parameters, including temperature, station level pressure, dew point, and calculated relative humidity. The proposed architecture leverages CNN for feature extraction and BiLSTM for capturing temporal dependencies, offering insights into the model's efficiency and the challenges associated with predicting precipitation using calculated inputs.
... For a final evaluation, a benchmark is performed using an equivalent experiment setup of the one conducted with the proposed method. The compared CatBoost model [108], XGBoost model [109], random forest [110], TFT [111], and TCN prediction models are from the darts.models library [112], additionally, the Prophet is compared [113]. ...
Article
Energy planning in Brazil is based on assessing the availability of hydrological resources in the future, thus guaranteeing the supply of energy based on hydroelectric generation. Currently, the definition of energy dispatch by the Brazilian National System Operator is based on the rainfall forecast calculated by the soil moisture accounting procedure (SMAP). Considering that the dam's level is associated with rainfall, a forecast based on the recorded history regarding its variation trend might be an alternative for better management of the electrical power system. In this context, machine learning-based prediction models are an alternative to enhance time series forecasting. This paper proposes a hypertuned wavelet convolutional neural network (CNN) with a long short-term memory (LSTM) network for precipitation and natural inflow forecasting considering the variation trend over time. The proposed hypertuned wavelet-CNN-LSTM model, considers wavelet for signal denoising, CNN for feature extraction, LSTM for time series forecasting, and hypertuning to achieve an optimized structure. With a root mean squared error of 872.1, mean absolute error of 626.2, mean absolute percentage error (MAPE) of 0.071, and symmetric MAPE of 6.78, the proposed method outperforms the currently used SMAP model and other state-of-the-art deep learning approaches.
... Qu et al. [6] found that the prediction of solar radiation is affected by intricate spatial attributes and temporal fluctuations. Similarly, Sakib et al. [7] identified that numerous meteorological factors have a significant impact on solar radiation. Therefore, accurate and reliable solar radiation prediction is crucial to promote largescale gridconnected solar energy projects and enhance the operational efficiency of multienergy power systems [8]. ...
Article
Full-text available
Solar radiation prediction research is a key area of interest in the realm of solar energy utilization and has garnered significant attention in recent times. In order to realize accurate prediction of solar radiation and make solar radiation prediction better serve photovoltaic (PV) power generation, this study proposes a solar radiation prediction method based on sequence model, which integrates two kinds of neural networks, namely, temporal convolutional network (TCN) and neural basis expansion analysis (N-BEATS). First, the dataset is preprocessed using Pearson’s correlation coefficient, outlier detection, and normalized to obtain valid and relevant data; second, the features of TCN feature extraction and N-BEATS flexible extension are integrated to predict the solar radiation; then, the model’s hyperparameters are fine-tuned using the grid search algorithm to ensure precise predictions; and last, the correctness of the method is verified by comparing the error metrics and the running time. Empirical findings indicate that the TCN-N-BEATS sequence model has high prediction accuracy and short time overhead, and it has certain application value in solar radiation prediction, and the model could offer valuable insights for predicting solar radiation.
... In [11] authors proposed a system using an Improved-CEEMDAN and a Deep Residual Network (DRESNET) with Bidirectional LSTM. An attention-based framework in [12], was developed for multivariate time series forecasting using the Temporal Fusion Transformer (TFT) to capture long-distance dependencies to enhance performance. Furthermore, optimization of hyperparameters in the deep neural network is essential, as the performance of the model is strongly dependent on the architecture of the system [13]. ...
Conference Paper
Accurate solar irradiation forecasting is essential to optimize renewable energy systems, ensure grid reliability, and drive the shift to sustainable energy solutions worldwide. This study conducted a structured analysis on articles, conference papers, and references discussing the application of time series analysis for predicting solar irradiance. Our review concentrates on the forecasting methods, forecast resolution, evaluated meteorological and astronomical parameters, and the results. One topic discussed in this review is the comparative analysis between stand-alone machine learning and deep learning approaches. In addition, we also examine the hybrid methods, which include hybrid convolutional neural network (CNN)-based, long short-term memory (LSTM)-based, gate recurrent unit based, echo state network based, weather clustering, and decomposition techniques. Moreover, Kalman filter-based as well as statistical and probabilistic methods are also considered in several references that we explore. In references that we studied, various forecast resolutions are considered depending on the applications requirements. This starts from intra-hour, hourly, daily and multi-horizon resolutions. Various meteorological and astronomical parameters are also considered to assist accurate forecasting, where the most frequently used parameters are solar irradiance, temperature, humidity and wind speed.
Article
Full-text available
As one of the future's most promising clean energy sources, solar energy is the key to developing renewable energy. The randomness of solar irradiance can affect the efficiency of photovoltaic power generation, which makes photovoltaic power generation planning extremely difficult. The main goal of this study is to accurately predict solar irradiance and establish a prediction model with meteorological characteristics to improve prediction accuracy. This paper proposes a convolutional neural network (CNN) and attention mechanism-based long short-term memory network (A-LSTM) to predict solar irradiance the next day. In addition, the prediction accuracy is further improved by combining similar day analyses. A similar day prediction model is constructed by selecting solar energy data from Andhra Pradesh, India. The experimental results show that the method proposed in this paper can predict solar irradiance more accurately, providing a new idea for photovoltaic power generation planning.
Article
Full-text available
Accurate prediction of solar irradiance holds significant value for renewable energy usage and power grid management. However, traditional forecasting methods often overlook the time dependence of solar irradiance sequences and the varying importance of different influencing factors. To address this issue, this study proposes a dual-path information fusion and twin attention-driven solar irradiance forecasting model. The proposed framework comprises three components: a residual attention temporal convolution block (RACB), a dual-path information fusion module (DIFM), and a twin self-attention module (TSAM). These components collectively enhance the performance of multi-step solar irradiance forecasting. First, the RACB is designed to enable the network to adaptively learn important features while suppressing irrelevant ones. Second, the DIFM is implemented to reinforce the model’s robustness against input data variations and integrate multi-scale features. Lastly, the TSAM is introduced to extract long-term temporal dependencies from the sequence and facilitate multi-step prediction. In the solar irradiance forecasting experiments, the proposed model is compared with six benchmark models across four datasets. In the one-step predictions, the average performance metrics RMSE, MAE, and MAPE of the four datasets decreased within the ranges of 0.463–2.390 W/m2, 0.439–2.005 W/m2, and 1.3–9.2%, respectively. Additionally, the average R2 value across the four datasets increased by 0.008 to 0.059. The experimental results indicate that the model proposed in this study exhibits enhanced accuracy and robustness in predictive performance, making it a reliable alternative for solar irradiance forecasting.
Article
Full-text available
Photovoltaic (PV) systems are environmentally friendly, generate green energy, and receive support from policies and organizations. However, weather fluctuations make large-scale PV power integration and management challenging despite the economic benefits. Existing PV forecasting techniques (sequential and convolutional neural networks (CNN)) are sensitive to environmental conditions, reducing energy distribution system performance. To handle these issues, this article proposes an efficient, weather-resilient convolutional-transformer-based network (CT-NET) for accurate and efficient PV power forecasting. The network consists of three main modules. First, the acquired PV generation data are forwarded to the pre-processing module for data refinement. Next, to carry out data encoding, a CNN-based multi-head attention (MHA) module is developed in which a single MHA is used to decode the encoded data. The encoder module is mainly composed of 1D convolutional and MHA layers, which extract local as well as contextual features, while the decoder part includes MHA and feedforward layers to generate the final prediction. Finally, the performance of the proposed network is evaluated using standard error metrics, including the mean squared error (MSE), root mean squared error (RMSE), and mean absolute percentage error (MAPE). An ablation study and comparative analysis with several competitive state-of-the-art approaches revealed a lower error rate in terms of MSE (0.0471), RMSE (0.2167), and MAPE (0.6135) over publicly available benchmark data. In addition, it is demonstrated that our proposed model is less complex, with the lowest number of parameters (0.0135 M), size (0.106 MB), and inference time (2ms/step), suggesting that it is easy to integrate into the smart grid.
Article
Full-text available
With the rapid development of solar energy plants in recent years, the accurate prediction of solar power generation has become an important and challenging problem in modern intelligent grid systems. To improve the forecasting accuracy of solar energy generation, an effective and robust decomposition-integration method for two-channel solar irradiance forecasting is proposed in this study, which uses complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), a Wasserstein generative adversarial network (WGAN), and a long short-term memory network (LSTM). The proposed method consists of three essential stages. First, the solar output signal is divided into several relatively simple subsequences using the CEEMDAN method, which has noticeable frequency differences. Second, high and low-frequency subsequences are predicted using the WGAN and LSTM models, respectively. Last, the predicted values of each component are integrated to obtain the final prediction results. The developed model uses data decomposition technology, together with advanced machine learning (ML) and deep learning (DL) models to identify the appropriate dependencies and network topology. The experiments show that compared with many traditional prediction methods and decomposition-integration models, the developed model can produce accurate solar output prediction results under different evaluation criteria. Compared to the suboptimal model, the MAEs, MAPEs, and RMSEs of the four seasons decreased by 3.51%, 6.11%, and 2.25%, respectively.
Article
Full-text available
The energy generated by a solar photovoltaic (PV) system depends on uncontrollable factors, including weather conditions and solar irradiation, which leads to uncertainty in the power output. Forecast PV power generation is vital to improve grid stability and balance the energy supply and demand. This study aims to predict hourly day-ahead PV power generation by applying Temporal Fusion Transformer (TFT), a new attention-based architecture that incorporates an interpretable explanation of temporal dynamics and high-performance forecasting over multiple horizons. The proposed forecasting model has been trained and tested using data from six different facilities located in Germany and Australia. The results have been compared with other algorithms like Auto Regressive Integrated Moving Average (ARIMA), Long Short-Term Memory (LSTM), Multi-Layer Perceptron (MLP), and Extreme Gradient Boosting (XGBoost), using statistical error indicators. The use of TFT has been shown to be more accurate than the rest of the algorithms to forecast PV generation in the aforementioned facilities.
Article
Full-text available
Solar forecasting plays a key part in the renewable energy transition. Major challenges, related to load balancing and grid stability, emerge when a high percentage of energy is provided by renewables. These can be tackled by new energy management strategies guided by power forecasts. This paper presents a data-driven and contextual optimisation forecasting (DCF) algorithm for solar irradiance that was comprehensively validated using short- and long-term predictions, in three US cities: Denver, Boston, and Seattle. Moreover, step-by-step implementation guidelines to follow and reproduce the results were proposed. Initially, a comparative study of two machine learning (ML) algorithms, the support vector machine (SVM) and Facebook Prophet (FBP) for solar prediction was conducted. The short-term SVM outperformed the FBP model for the 1- and 2- hour prediction, achieving a coefficient of determination (R²) of 91.2% in Boston. However, FBP displayed sustained performance for increasing the forecast horizon and yielded better results for 3-hour and long-term forecasts. The algorithms were optimised by further contextual model adjustments which resulted in substantially improved performance. Thus, DCF utilised SVM for short-term and FBP for long-term predictions and optimised their performance using contextual information. DCF achieved consistent performance for the three cities and for long- and short-term predictions, with an average R² of 85%.
Article
Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task. Despite the growing performance over the past few years, we question the validity of this line of research in this work. Specifically, Transformers is arguably the most successful solution to extract the semantic correlations among the elements in a long sequence. However, in time series modeling, we are to extract the temporal relations in an ordered set of continuous points. While employing positional encoding and using tokens to embed sub-series in Transformers facilitate preserving some ordering information, the nature of the permutation-invariant self-attention mechanism inevitably results in temporal information loss. To validate our claim, we introduce a set of embarrassingly simple one-layer linear models named LTSF-Linear for comparison. Experimental results on nine real-life datasets show that LTSF-Linear surprisingly outperforms existing sophisticated Transformer-based LTSF models in all cases, and often by a large margin. Moreover, we conduct comprehensive empirical studies to explore the impacts of various design elements of LTSF models on their temporal relation extraction capability. We hope this surprising finding opens up new research directions for the LTSF task. We also advocate revisiting the validity of Transformer-based solutions for other time series analysis tasks (e.g., anomaly detection) in the future.
Article
Accurate solar irradiance prediction is crucial for harnessing solar energy resources. However, the pattern of irradiance sequence is intricate due to its nonlinear and non-stationary characteristics. In this paper, a deep hybrid model based on encoder–decoder is proposed to cope with the complex pattern for hourly irradiance forecasting. The hybrid deep model integrates complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), encoder–decoder module, and dynamic error compensation (DEC) architecture. The CEEMDAN is implemented to reduce the nonlinear and non-stationarity of the irradiance sequence. The encoder–decoder integrates temporal convolutional networks (TCN), long short-term memory networks (LSTM), and multi-layer perceptron (MLP) for temporal features extraction and multi-step prediction. The DEC architecture dynamically updates the model based on adjacent error information to mine the predictable components of error information. Furthermore, a new loss function is further proposed for multi-objective optimization to balance the performance of multi-step forecasting. In the hourly irradiance forecasting experiments on the three public datasets, the root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R) of the proposed model are observed to be in a range of 30.693-34.433 W/m2, 19.398-22.900 W/m2, and 0.9872-0.9902, respectively. Compared to the benchmark models (including MLP, LSTM, and TCN), the RMSE and MAE reduce by 10.76%–22.00% and 5.47%–20.40%, respectively. The experimental results indicate that the proposed model shows accurate and robust forecasting performance and is a reliable alternative to hourly irradiance forecasting.