PreprintPDF Available

Studying Error Propagation for Energy Forecasting Using Univariate and Multivariate Machine Learning Algorithms

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Statistical machine learning models are widely used in time series forecasting. These models often use historical data recursively to make predictions, i.e. future timesteps. This leads to compounding of errors, which may negatively impact the prediction accuracy for long-term prediction tasks. In this paper, we address this problem by using features that can have "anchoring" effect on recurrent forecasts, thus, limiting the impact of compounding errors. We apply our approach on a benchmark energy dataset using four machine learning models, i.e., Linear Regression, Support Vector Regression, Long Short-Term Memory (LSTM) neural networks, and the XGBoost regression. In particular, we compare the prediction accuracy for the models with and without using historical data (i.e. past energy consumption) for different forecasting lengths. We observe that addition of generated features improves performance for both short and long time horizons compared to univariate models, and for long-term forecasts, nonrecursive multivariate models outperform all recursive models.
Content may be subject to copyright.
0 IJCA, Vol. 27, No. 3, Sep. 2020
Studying Error Propagation for Energy Forecasting Using Univariate and
Multivariate Machine Learning Algorithms
Maher Selim, Ryan Zhou, Wenying Feng, and Omar Alam*
Trent University, Peterborough, Ontario, CANADA, K9L 0G2
Statistical machine learning models are widely used in time
series forecasting. These models often use historical data
recursively to make predictions, i.e. future timesteps. This
leads to compounding of errors, which may negatively impact
the prediction accuracy for long-term prediction tasks. In this
paper, we address this problem by using features that can
have “anchoring” effect on recurrent forecasts, thus, limiting
the impact of compounding errors. We apply our approach
on a benchmark energy dataset using four machine learning
models, i.e., Linear Regression, Support Vector Regression,
Long Short-Term Memory (LSTM) neural networks, and the
XGBoost regression. In particular, we compare the prediction
accuracy for the models with and without using historical data
(i.e. past energy consumption) for different forecasting lengths.
We observe that addition of generated features improves
performance for both short and long time horizons compared
to univariate models, and for long-term forecasts, nonrecursive
multivariate models outperform all recursive models.
Key Words: Linear regression; LSTM; energy forecasting;
machine learning; support vector regression; time series
forecasting; XGBoost regression.
1 Introduction
Machine learning models are widely used in the energy
industry for forecasting future energy prices and demands [1,
22]. Advances in sensor and smart meter technologies have
made large quantities of energy data available [12]. This,
combined with increasingly accurate predictions produced by
machine learning models has made it possible for technologies
such as smart grid to flourish.
In the domain of energy forecasting, most machine learning
models, such as Long-Term Short Memory (LSTM) [15], use
historical values of the electricity load as an input feature. This
works well for single timestep predictions, e.g. forecasting
*Email: {maherselim, ryanzhou, wfeng, omaralam}
the power consumption for the next hour. However, when
forecasting multiple timestamps into the future, these models
recursively feed back in past predictions. In addition, if
the model uses external features, such as the hourly weather
reading, forecasts of these features must be generated as well.
All these predictions introduce error, which is compounded
when fed back into the model as inputs. Without external inputs,
models generally become inaccurate or even unstable after
several timesteps. This makes multiple timestep forecasting
challenging even for models with high single timestep accuracy.
As further extension to our previous work [19], in this paper,
we continue the study on reducing error propagation for energy
forecasting using generated features, i.e. input features that
can be calculated from known variables with perfect accuracy
even far into the future. These features limit the impact
of the accumulated error, as the model is trained on these
features along with recursive inputs. We demonstrate the
efficacy of this approach using a benchmark energy dataset.
Four machine learning models are trained to perform single
timestep predictions: Linear Regression (LR) [18], Support
Vector Regression (SVR) [7], LSTM neural network [15], and
a gradient boosted tree model (XGBoost) [6]. Predictions
are then made over a period of one month by recursively
feeding in the model outputs from earlier timesteps as inputs
for later timesteps. We show that without any generated
features, error accumulates rapidly over time while including
generated features leads to smaller accumulated errors. We also
demonstrate the accuracy of predictions made entirely using
generated features, i.e. without recursive inputs. This version
of the model allows forecasting for arbitrary timesteps in the
future, without the need to predict all values in between.
The remainder of this paper is organized as follows. Section 2
introduces the four machine learning algorithms used in our
study. Section 3 describes time series forecasting using
univariate and multivariate approaches. Section 4 describes the
development of computational models, the experimental set-up
and the results. Lastly, Section 5 concludes the paper.
ISCA Copyright© 2020
IJCA, Vol. 27, No. 3, Sep. 2020 1
2 Prediction Using Machine Learning Algorithms
Prediction using machine learning has been shown to be
efficient in many applications. There are numerous learning
algorithms mostly based on statistical and mathematical
approaches. For our study, four popular algorithms representing
different categories are selected including Linear Regression
(LR), Support Vector Regression (SVR), Long Short-Term
Memory (LSTM) neural networks, and XGBoost regression.
As a basic method in statistics, LR predicts a future value
using a linear function that was obtained by minimizing the
discrepancies between predicted and actual output values.
Widely applied in industry, linear regression can be easily
performed in many platforms such as Excel, R, MatLab, Python
and others [21].
SVR is a typical kernel based learning method since it relies
on the kernel functions. Different from the linear regression, it
provides some flexibility to define how much error is acceptable
in the model. The problem is equivalent to finding the equation
of a separating hyperplane in a high dimensional space. For
example, if we have Nobservations with ynis the observed
response for the input data xn, the training data set can be
represented as D={(xi,yi)|i=1,2,3, ...N}.The objective of
a linear SVR is to find the linear function f(x) = x0β+bsuch
MIN J(β) = 1
subject to
nβ+b)ε+ξn,n=1,2,··· ,N,(2)
n,n=1,2,··· ,N,(3)
n0,n=1,2,··· ,N,(4)
where the constant C, slack variables ξnand ξ
nare for the
Lagrangian formulation. ε>0 controls the loss function that
ignores the errors within εdistance. β0βis the l2norm of
the coefficient vector. This is a convex quadratic programming
problem, since the objective function is itself convex, and those
points which satisfy the constraints also form a convex set. For
more details on SVR, we refer to [2] and the references within.
LSTM is a type of recurrent neural network architecture
designed to extract long-term dependencies out of sequential
data and avoid the vanishing gradient problem present in
ordinary recurrent networks [11, 15]. These properties make
it the method of choice for longer time series and sequence
prediction problems [10, 23]. Several variations of the LSTM
unit have been successfully applied to energy forecasting and
other areas [3, 14]. The standard LSTM architecture [11]
described below is applied in our study. Each LSTM cell
contains a cell state (ht1), the long-term memory, and a
recurrent input (yt1) - the short-term memory. It also contains
three “gates”: neurons which output values between 0 and
1 and are multiplied with the information flowing into and
out of the cell. The forget gate σfcontrols the amount of
information discarded from the previous cell state. The input
gate σuoperates on the previous state h[t1], after having been
modified by the forget gate, and decides how much of a new
candidate state ˜
h[t]to add to the cell state h[t]. The output y[t]is
produced by squashing the cell state with a nonlinear function
g2(·), usually tanh. Then, the output gate σoselects the overall
fraction of the state to be returned as output.
Gradient boosting is an ensemble technique which creates
a prediction model by aggregating the predictions of weak
prediction models, typically decision trees. With boosting
methods, weak predictors are added to the collection
sequentially with each one attempting to improve upon the
entire ensemble’s performance.
In the XGBoost implementation [6], given a dataset with n
training examples consisting of an input xiand expected output
yi, a tree ensemble model φ(xi)is defined as the sum of K
regression trees fk(xi):
ˆyi=φ(xi) =
To evaluate the performance of a given model, we choose a
loss function l(ˆyi,yi)to measure the error between the predicted
value and the target value, and optionally add a regularization
term (fk)to penalize overly complex trees:
L(φ) =
l(ˆyi,yi) +
The algorithm minimizes L(φ)by iteratively introducing each
fk. Assume that the ensemble currently contains Ktrees. We
add a new tree fK+1that minimizes
l(ˆyi,yi+fK+1(xi)) + (fk).(7)
In other words, the tree that most improves the current model
as determined by Lare greedily added. We train the new tree
using the objective function (6); this is done in practice by
approximating the objective function using the first and second
order gradients of the loss function l(ˆyi,yi)[9].
3 Univariate and Multivariate Input Features
Time series prediction is a problem which aims to predict
future values using past values. These are generally past values
of the target variable, but this is not necessarily the case.
Forecasting models can be broadly classified into univariate
and multivariate models based on the number of features used.
When forecasting multiple timesteps into the future, models can
also be classified into direct, recursive and MIMO approaches
A recursive approach trains a single model to predict a single
step in the future, known as a one-step ahead forecast:
ˆxt=F(xt1,xt2, . . .)
2 IJCA, Vol. 27, No. 3, Sep. 2020
Figure 1: Forecasts for January 1999 using (a) linear regression (b) support vector regression (c) XGBoost regression and (d) LSTM.
Full model (blue) uses recursively calculated load and all external features. No load model (orange) uses only external
features and no recursion. Only load (green) uses no external features and only recursively calculated load
where x(i)represents the value of the variable at timestamp i.
This forecasted value is then fed back in as an input and the
next timestep is forecasted using the same model:
ˆxt+1=F(xt,xt1, . . .)
This process is repeated until the desired time horizon has been
reached. This approach is sensitive to accumulated errors, as
any error present in the initial prediction will subsequently be
carried forward to later predictions when the predicted value
is used as input. However, as only one model is used for all
predictions, this allows more resources to be invested in the
single model. In addition, this approach is flexible in that it
IJCA, Vol. 27, No. 3, Sep. 2020 3
allows forecasting for any time horizon, whether or not the
model has been trained on that time horizon.
A direct approach aims to avoid error accumulation by
creating a separate model for each potential time horizon. Thus,
a collection of models is trained:
ˆxt=F(xt1,xt2,. . .)
ˆxt+1=G(xt1,xt2, . . .)
ˆxt+2=H(xt1,xt2, . . .)
. . . =. . .
This avoids propagated errors as no predicted values are used
as input. However, as each model is trained independently, the
models may not learn complex dependencies between the values
ˆxt, ˆxt+1, ˆxt+2. . . . This approach is also computationally much
more expensive as multiple models must be trained and stored.
The multi-input multi-output (MIMO) strategy attempts to
combine the advantages of these approaches by training a single
model with multiple outputs to predict all timesteps up to the
time horizon simultaneously:
[ˆxt+H,ˆxt+H1,..., ˆxt] = F(xt1,xt2, . . .)
This avoids accumulated error by performing all predictions in
one step, as well as modeling any interdependencies between
future timesteps. However, this comes at the cost of less
flexibility, as all horizons are forecasted using the same model
and possible time horizons are limited to those built into the
Based on the input features, time series prediction models can
be categorized as univariate or multivariate. Univariate models
use a single feature, generally the target variable, to predict a
future value:
ˆxt=F(xt1,xt2, . . .).
This has the advantage of allowing smaller and computationally
lighter models. Univariate models do not require extra external
data and require no feature engineering. However, as they are
tied to a single variable, they exhibit more sensitivity to noise
and reduced stability for recursive models.
Multivariate time series models use observations of multiple
variables or features, often taken simultaneously, and attempt to
also describe the interrelationships among the features [4]:
t2. . .)
where each a(i)represents the time series of an external feature.
This has the obvious advantage of modeling relationships
between the target and external variables, but at the cost of a
bulkier model and higher computational costs. Building such
a model generally also requires obtaining measurements of
external features; the difficulty of this is highly dependent on
data availability.
It is also possible for a multivariate model to employ no past
information about the target variable:
t3, . . .).
In this case, predictions must be made solely based on the
relationships of external features to the target variable. Such a
model is rarely used in practice as training the model in the first
place requires knowledge of past values of the target variable,
but may see use if obtaining a full time series of the target value
is difficult due to missing or unusable values. In addition, as the
output of the model is never used as an input, error accumulation
is limited. If future values for the external features can be
obtained, this approach allows prediction based on those values
without first predicting earlier time horizons.
4 Empirical Study on Energy Forecasting
We study energy forecasting using the four machine learning
algorithms described in Section 2. Effects of external features
on error propagation are compared for the recursive univariate,
multivariate and the modified multivariate techniques.
The linear regression and support vector regression models
are implemented using scikit-learn [17]. We use the radial basis
function (RBF) kernel for SVR. The gradient boosting model
was built using the XGBoost Python library [6] with a maximum
tree depth of 12. All other parameters are set to scikit-learn
The LSTM model is implemented using PyTorch [16]
running on Python 3.8. The model consists of four layers: the
input layer, two hidden LSTM layers with 16 nodes each, and
a linear fully connected aggregation layer as the output. To
improve stability, we use a residual connection on the LSTM
layers. The model is trained on MAE loss using the Adam
optimizer for 30 epochs.
In order to ensure reproducibility of the experiment, the
2001 EUNITE competition dataset [8] is used in our study.
This benchmark dataset is well-studied in energy forecasting
research [5, 13].
The EUNITE dataset spans over two years from January 1997
until January 1999. It contains the following fields: the half-
hourly electricity load, the daily average temperature, and a
flag signifying whether the day is a holiday. In the statistical
analysis of the dataset [5, 13], it was found that the electricity
load generally decreases during holidays and weekends. This
phenomenon depends on the type of the holiday, e.g., Christmas
or New Year.
In order to ensure no outside forecasts are required,
we disregard all temperature measurements as these require
separate weather forecasts. This ensures model performance
is based only on features which can be calculated with perfect
accuracy. In addition, we generate the following features based
on the prediction timestamp: weekday, ranging from 0 to 6, day
of year, ranging from 1 to 365, and hour of day, ranging from 0
to 23. These two features allow the model to pinpoint the day
and time within the year and capture daily, weekly and yearly
Both datasets were converted into input-output pairs for
supervised learning using a sliding window method, whereby
4 IJCA, Vol. 27, No. 3, Sep. 2020
Figure 2: Absolute error of January 1999 forecast, smoothed with a moving average of 50 timesteps for (a) linear regression (b)
support vector regression (c) XGBoost regression and (d) LSTM. Full model (blue) uses recursively calculated load and all
external features. No load model (orange) uses only external features and no recursion. Only load (green) uses no external
features and only recursively calculated load
timesteps within the window were used as input to predict
the next timestep after the window. A window size of 48
timesteps was chosen, corresponding to the previous 24 hours
of activity. As the generated time features were uniformly rather
than normally distributed, features were normalized to lie in
range [1,1]. The last month of data was used to evaluate the
models. This was done in order to limit potential data leakage by
ensuring all evaluation data was drawn from points temporally
after the training data. Ten percent of the remaining data was
used to validate the models during training, while the remainder
IJCA, Vol. 27, No. 3, Sep. 2020 5
Table 1: Correlation between forecast error and input feature value for linear regression, LSTM, XGBoost and SVR with rbf kernel.
Shown are recursive full models, non recursive (no load) models and recursive univariate (only load) models
Model Name Load Weekday Holiday Hour Day Of Year
Linear (Full) 0.5388 -0.0296 0.1878 0.2124 0.2919
Linear (No Load) * -0.0284 0.1659 0.2107 0.2414
Linear (Only Load) 0.6260 * * * *
LSTM (Full) -0.4827 0.2664 -0.0042 -0.0479 0.2982
LSTM (No Load) * 0.2385 0.5833 0.1944 -0.1310
LSTM (Only Load) -0.9232 * * * *
XGBoost (Full) 0.6223 -0.3040 -0.0926 0.1341 0.6788
XGBoost (No Load) * -0.0505 -0.0083 -0.0560 0.1149
XGBoost (Only Load) 0.8324 * * * *
SVR (Full) 0.1618 -0.1451 -0.0727 -0.0265 0.0239
SVR (No Load) * -0.1662 0.0312 0.2091 -0.1016
SVR (Only Load) -0.7639 * * * *
was used for training itself.
Each model was trained to forecast only one step ahead.
We compared three methods: first, a univariate model using
only past values of the load to forecast future values. Each
prediction was recursively added to the input for the next
timestep. The second was a multivariate model which made
use of generated external features in addition to past values of
load. The load was updated recursively as in the first model,
while external features were calculated directly based on the
timestamp of the prediction. The third model removes all
recurrent dependency by ignoring previous loads altogether and
using only the calculated external features.
For each variant, we compare the performance using four
learning models: linear regression, XGBoost, LSTM, and
support vector regression. For evaluation, we calculate the
absolute error of each model for each timestep, after outputs
are scaled back to the original range.
Figure (1) shows the forecasts obtained from the four models.
From top to bottom, these are: linear regression, support vector
regression, XGBoost, and LSTM.
Green lines represent recursive predictions using the
original univariate models, while blue lines represent recursive
predictions from the same models with generated features
introduced. Orange lines show the non-recursive version which
decouples predictions from past values of the output variable
and forecasts based only on generated features.
We note that all models are capable of learning short term
trends in the data. This is reflected in the high forecast accuracy
for short time horizons. We also observe that daily patterns
are successfully captured using all methods. The full models
generally prove to be the most accurate over short time horizons
(less than 1 day), but recursive error begins appear to by as early
as the second day, in the case of the LSTM model.
Figure (2) shows the magnitude of the forecasting error for
the testing set of January 1999 for all models. To showcase
the trend, these are averaged using a moving window of 50
We note that the univariate recursive models generally
accumulate significant error by 250 timesteps. This is mitigated
in the multivariate recursive models, but due to the recursive
nature of the predictions error still rises over time. Nonrecursive
models exhibit higher initial error for linear regression and
LSTM models while being comparable for SVR and LSTM, but
this error remains relatively constant over time. For the linear
regression model, nonrecursive error is significantly higher. We
believe there are two main reasons for this: first, there is a
nonlinear relationship between the features and the load, making
prediction difficult for a linear model. Second, the winter of
1999 (our testing set) was unusually cold and resulted in a
higher power consumption than previous years. This led to
consistent underestimates which were also observed in the SVR
and LSTM models. However, use of actual load values in the
recursive models anchored these models to higher initial values.
Table (1) shows the Pearson correlation coefficients between
error magnitude and feature values. We see that the error
correlation decreases from the univariate to the full multivariate
5 Conclusion
Forecasting time series with machine learning models has
wide applications to our daily life. Reducing errors in the
predictions is a paramount concern in the design of these
In this paper, we have demonstrated an approach using
generated features to convert a univariate model into a
multivariate model to mitigate long-term error accumulation.
This method can be applied to a variety of machine learning
time series models, a selection of which we have studied in this
paper. Our experiments show that the addition of generated
6 IJCA, Vol. 27, No. 3, Sep. 2020
features improves performance of all univariate models tested
over most time horizons, and that it is possible to rely on these
added features alone to avoid recursive error accumulation by
creating a nonrecursive model. Our results also show that
for the majority of models tested, the nonrecursive model can
achieve comparable performance on short time horizons while
outperforming recursive models over long time horizons.
This principle of using generated features to create a
multivariate model can be used for a wide variety of applications
and algorithms. Our method preserves the flexibility of
recursive forecasting and allows use of the same model for any
forecast length, and can be extended to models which forecast
multiple timesteps at once. For future work, performance will
be evaluated on other applications such as stock market price
forecasting. We will also consider other types of non time-based
or composite features which can be generated.
Support from the Natural Sciences and Engineering Research
Council of Canada (NSERC) is greatly acknowledged.
[1] Kadir Amasyali and Nora M El-Gohary. “ Review of Data-
Driven Building Energy Consumption Prediction Studies”.
volume 81, pages 1192–1205. Elsevier, 2018.
[2] Ingo Steinwart Andreas Christmann. “Support Vector
Machines”. Springer-Verlag New York, 2008.
[3] Filippo Maria Bianchi, Enrico Maiorino, Michael C
Kampffmeyer, Antonello Rizzi, and Robert Jenssen. “ An
Overview and Comparative Analysis of Recurrent Neural
Networks for Short Term Load Forecasting”. 2017.
[4] C. Chatfield. “Time-Series Forecasting”. CRC Press,
[5] En Chen, Ming-Wei Chang, and Chih-Jen lin. “ Load
Forecasting Using Support Vector Machines: A Study on
EUNITE Competition 2001. volume 19, pages 1821–
1830. IEEE, 2004.
[6] Tianqi Chen and Carlos Guestrin. “Xgboost: A Scalable
Tree Boosting System”. pages 785–794. ACM, 2016.
[7] Harris Drucker, Christopher JC Burges, Linda Kaufman,
Alex J Smola, and Vladimir Vapnik. “Support Vector
Regression Machines. pages 155–161, 1997.
[8] EUNITE. “EUNITE Electricity Load Forecast 2001
Competition”. EUNITE, Dec. 2001 2001.
[9] Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al.
Additive Logistic Regression: A Statistical View of
Boosting (With Discussion and a Rejoinder by the
Authors)”. volume 28, pages 337–407. Institute of
Mathematical Statistics, 2000.
[10] John Cristian Borges Gamboa. “Deep Learning for Time-
Series Analysis”. 2017.
[11] Alex Graves and J¨
urgen Schmidhuber. “Framewise
Phoneme Classification With Bidirectional LSTM and
Other Neural Network Architectures”. volume 18, pages
602–610. Elsevier, 2005.
[12] Katarina Grolinger, Alexandra L’Heureux, Miriam AM
Capretz, and Luke Seewald. “Energy Forecasting for
Event Venues: Big Data and Prediction Accuracy”.
volume 112, pages 222–233. Elsevier, 2016.
[13] Jawad Nagi, Keem Siah Yap, Farrukh Nagi, Sieh Kiong
Tiong, and Syed Khaleel Ahmed. A Computational
Intelligence Scheme for the Prediction of the Daily Peak
Load”. volume 11, pages 4773–4788. Elsevier, 2011.
[14] Apurva Narayan and Keith W Hipel. “Long Short
Term Memory Networks for Short-Term Electric Load
Forecasting”. pages 1050–1059, Banff Center, Banff,
Canada, October 5-8 2017.
[15] Christopher Olah. “Understanding LSTM networks”.
volume 27, page 2015, 2015.
[16] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,
James Bradbury, Gregory Chanan, Trevor Killeen, Zeming
Lin, Natalia Gimelshein, Luca Antiga, et al. “Pytorch:
An Imperative Style, High-Performance Deep Learning
Library”. In “Advances in neural information processing
systems, pages 8026–8037, 2019.
[17] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.
“Scikit-learn: Machine Learning in Python”. volume 12,
pages 2825–2830, 2011.
[18] George AF Seber and Alan J Lee. “Linear Regression
Analysis”. volume 329. John Wiley & Sons, 2012.
[19] Maher Selim, Ryan Zhou, Wenying Feng, and Omar Alam.
“Reducing Error Propagation for Long Term Energy
Forecasting Using Multivariate Prediction”. Number 1,
pages 1–9. EPiC Series in Computing, 2020.
[20] Souhaib Ben Taieb, Gianluca Bontempi, Amir Atiya, and
Antti Sorjamaa. “A Review and Comparison of Strategies
for Multi-Step Ahead Time Series Forecasting Based on
the nn5 Forecasting Competition”, 2011.
[21] Sanford Weisberg. Applied Linear Regression”. Wiley
Series in Probability and Statistics, 2013.
[22] Kaile Zhou, Chao Fu, and Shanlin Yang. “Big Data
Driven Smart Energy Management: From Big Data to Big
Insights”. volume 56, pages 215–225. Elsevier, 2016.
IJCA, Vol. 27, No. 3, Sep. 2020 7
[23] Lingxue Zhu and Nikolay Laptev. “Deep and Confident
Prediction for Time Series at Uber”. 2017.
Maher Selim is a postdoctoral fellow
for AI and Machine Learning at Trent
University. He obtained his PhD
in Physics from the University of
Western Ontario, Canada. He also
has a Physics from Helwan
University, Egypt. Maher obtained
his Physics from Ain Shams
University, Egypt. He is interested in Quantum AI and Quantum
Machine learning applications to real world problem.
Ryan Zhou is a Master’s student
at Trent University. He obtained
his B. Eng from Cornell University
in Ithaca, New York. He is
interested in convolutional and graph
neural networks, AI interpretability
and machine learning algorithms for
regression and time series prediction.
Wenying Feng is a Full Professor at
the Department of Computer Science
and the Department of Mathematics,
Trent University, Canada. She is
also an adjunct professor at the
School of Computing, Queen’s
University. Dr. Feng specializes
in nonlinear differential equations,
nonlinear analysis, machine learning
algorithms, mathematical and computational modelling. She
has published more than 100 research papers at refereed
journals and conference proceedings. She has presented as
a keynote speaker, served as program chairs and organized
special sessions for international conferences.
Omar Alam is an Assistant Professor
at the Department of Computer
Science at Trent University. His
broad area of interest is in software
engineering. In particular, he
is interested in Model-Driven
Engineering, Aspect-Oriented
Modelling, Empirical Software
Engineering, and Software Reuse. Dr. Alam has published in
premier venues in software engineering, such as MODELS,
JSS, SPE, SLE, ICSR, SAM, ICSM. He served as a reviewer
and program committee member for various journals and
conferences in the field of Model-Driven Engineering and
Software Engineering.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The key component in forecasting demand and consumption of resources in a supply network is an accurate prediction of real-valued time series. Indeed, both service interruptions and resource waste can be reduced with the implementation of an effective forecasting system. Significant research has thus been devoted to the design and development of methodologies for short term load forecasting over the past decades. A class of mathematical models, called Recurrent Neural Networks, are nowadays gaining renewed interest among researchers and they are replacing many practical implementation of the forecasting systems, previously based on static methods. Despite the undeniable expressive power of these architectures, their recurrent nature complicates their understanding and poses challenges in the training procedures. Recently, new important families of recurrent architectures have emerged and their applicability in the context of load forecasting has not been investigated completely yet. In this paper we perform a comparative study on the problem of Short-Term Load Forecast, by using different classes of state-of-the-art Recurrent Neural Networks. We test the reviewed models first on controlled synthetic tasks and then on different real datasets, covering important practical cases of study. We provide a general overview of the most important architectures and we define guidelines for configuring the recurrent networks to predict real-valued time series.
Full-text available
Advances in sensor technologies and the proliferation of smart meters have resulted in an explosion of energy-related data sets. These Big Data have created opportunities for development of new energy services and a promise of better energy management and conservation. Sensor-based energy forecasting has been researched in the context of office buildings, schools, and residential buildings. This paper investigates sensor-based forecasting in the context of event-organizing venues, which present an especially difficult scenario due to large variations in consumption caused by the hosted events. Moreover, the significance of the data set size, specifically the impact of temporal granularity, on energy prediction accuracy is explored. Two machine-learning approaches, neural networks (NN) and support vector regression (SVR), were considered together with three data granularities: daily, hourly, and 15 minutes. The approach has been applied to a large entertainment venue located in Ontario, Canada. Daily data intervals resulted in higher consumption prediction accuracy than hourly or 15-min readings, which can be explained by the inability of the hourly and 15-min models to capture random variations. With daily data, the NN model achieved better accuracy than the SVR; however, with hourly and 15-min data, there was no definitive dominance of one approach over another. Accuracy of daily peak demand prediction was significantly higher than accuracy of consumption prediction.
Energy is the lifeblood of modern societies. In the past decades, the world's energy consumption and associated CO2 emissions increased rapidly due to the increases in population and comfort demands of people. Building energy consumption prediction is essential for energy planning, management, and conservation. Data-driven models provide a practical approach to energy consumption prediction. This paper offers a review of the studies that developed data-driven building energy consumption prediction models, with a particular focus on reviewing the scopes of prediction, the data properties and the data preprocessing methods used, the machine learning algorithms utilized for prediction, and the performance measures used for evaluation. Based on this review, existing research gaps are identified and future research directions in the area of data-driven building energy consumption prediction are highlighted.
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
see website