Content uploaded by Omar Alam

Author content

All content in this area was uploaded by Omar Alam on Oct 18, 2020

Content may be subject to copyright.

0 IJCA, Vol. 27, No. 3, Sep. 2020

Studying Error Propagation for Energy Forecasting Using Univariate and

Multivariate Machine Learning Algorithms

Maher Selim∗, Ryan Zhou∗, Wenying Feng∗, and Omar Alam*

Trent University, Peterborough, Ontario, CANADA, K9L 0G2

Abstract

Statistical machine learning models are widely used in time

series forecasting. These models often use historical data

recursively to make predictions, i.e. future timesteps. This

leads to compounding of errors, which may negatively impact

the prediction accuracy for long-term prediction tasks. In this

paper, we address this problem by using features that can

have “anchoring” effect on recurrent forecasts, thus, limiting

the impact of compounding errors. We apply our approach

on a benchmark energy dataset using four machine learning

models, i.e., Linear Regression, Support Vector Regression,

Long Short-Term Memory (LSTM) neural networks, and the

XGBoost regression. In particular, we compare the prediction

accuracy for the models with and without using historical data

(i.e. past energy consumption) for different forecasting lengths.

We observe that addition of generated features improves

performance for both short and long time horizons compared

to univariate models, and for long-term forecasts, nonrecursive

multivariate models outperform all recursive models.

Key Words: Linear regression; LSTM; energy forecasting;

machine learning; support vector regression; time series

forecasting; XGBoost regression.

1 Introduction

Machine learning models are widely used in the energy

industry for forecasting future energy prices and demands [1,

22]. Advances in sensor and smart meter technologies have

made large quantities of energy data available [12]. This,

combined with increasingly accurate predictions produced by

machine learning models has made it possible for technologies

such as smart grid to ﬂourish.

In the domain of energy forecasting, most machine learning

models, such as Long-Term Short Memory (LSTM) [15], use

historical values of the electricity load as an input feature. This

works well for single timestep predictions, e.g. forecasting

*Email: {maherselim, ryanzhou, wfeng, omaralam}@trentu.ca

the power consumption for the next hour. However, when

forecasting multiple timestamps into the future, these models

recursively feed back in past predictions. In addition, if

the model uses external features, such as the hourly weather

reading, forecasts of these features must be generated as well.

All these predictions introduce error, which is compounded

when fed back into the model as inputs. Without external inputs,

models generally become inaccurate or even unstable after

several timesteps. This makes multiple timestep forecasting

challenging even for models with high single timestep accuracy.

As further extension to our previous work [19], in this paper,

we continue the study on reducing error propagation for energy

forecasting using generated features, i.e. input features that

can be calculated from known variables with perfect accuracy

even far into the future. These features limit the impact

of the accumulated error, as the model is trained on these

features along with recursive inputs. We demonstrate the

efﬁcacy of this approach using a benchmark energy dataset.

Four machine learning models are trained to perform single

timestep predictions: Linear Regression (LR) [18], Support

Vector Regression (SVR) [7], LSTM neural network [15], and

a gradient boosted tree model (XGBoost) [6]. Predictions

are then made over a period of one month by recursively

feeding in the model outputs from earlier timesteps as inputs

for later timesteps. We show that without any generated

features, error accumulates rapidly over time while including

generated features leads to smaller accumulated errors. We also

demonstrate the accuracy of predictions made entirely using

generated features, i.e. without recursive inputs. This version

of the model allows forecasting for arbitrary timesteps in the

future, without the need to predict all values in between.

The remainder of this paper is organized as follows. Section 2

introduces the four machine learning algorithms used in our

study. Section 3 describes time series forecasting using

univariate and multivariate approaches. Section 4 describes the

development of computational models, the experimental set-up

and the results. Lastly, Section 5 concludes the paper.

ISCA Copyright© 2020

IJCA, Vol. 27, No. 3, Sep. 2020 1

2 Prediction Using Machine Learning Algorithms

Prediction using machine learning has been shown to be

efﬁcient in many applications. There are numerous learning

algorithms mostly based on statistical and mathematical

approaches. For our study, four popular algorithms representing

different categories are selected including Linear Regression

(LR), Support Vector Regression (SVR), Long Short-Term

Memory (LSTM) neural networks, and XGBoost regression.

As a basic method in statistics, LR predicts a future value

using a linear function that was obtained by minimizing the

discrepancies between predicted and actual output values.

Widely applied in industry, linear regression can be easily

performed in many platforms such as Excel, R, MatLab, Python

and others [21].

SVR is a typical kernel based learning method since it relies

on the kernel functions. Different from the linear regression, it

provides some ﬂexibility to deﬁne how much error is acceptable

in the model. The problem is equivalent to ﬁnding the equation

of a separating hyperplane in a high dimensional space. For

example, if we have Nobservations with ynis the observed

response for the input data xn, the training data set can be

represented as D={(xi,yi)|i=1,2,3, ...N}.The objective of

a linear SVR is to ﬁnd the linear function f(x) = x0β+bsuch

that

MIN J(β) = 1

2β0β+C

N

∑

n=1

(ξn+ξ∗

n)(1)

subject to

yn−(x0

nβ+b)≤ε+ξn,n=1,2,··· ,N,(2)

(x0

nβ+b)−yn≤ε+ξ∗

n,n=1,2,··· ,N,(3)

ξn≥0,ξ∗

n≥0,n=1,2,··· ,N,(4)

where the constant C, slack variables ξnand ξ∗

nare for the

Lagrangian formulation. ε>0 controls the loss function that

ignores the errors within εdistance. β0βis the l2−norm of

the coefﬁcient vector. This is a convex quadratic programming

problem, since the objective function is itself convex, and those

points which satisfy the constraints also form a convex set. For

more details on SVR, we refer to [2] and the references within.

LSTM is a type of recurrent neural network architecture

designed to extract long-term dependencies out of sequential

data and avoid the vanishing gradient problem present in

ordinary recurrent networks [11, 15]. These properties make

it the method of choice for longer time series and sequence

prediction problems [10, 23]. Several variations of the LSTM

unit have been successfully applied to energy forecasting and

other areas [3, 14]. The standard LSTM architecture [11]

described below is applied in our study. Each LSTM cell

contains a cell state (ht−1), the long-term memory, and a

recurrent input (yt−1) - the short-term memory. It also contains

three “gates”: neurons which output values between 0 and

1 and are multiplied with the information ﬂowing into and

out of the cell. The forget gate σfcontrols the amount of

information discarded from the previous cell state. The input

gate σuoperates on the previous state h[t−1], after having been

modiﬁed by the forget gate, and decides how much of a new

candidate state ˜

h[t]to add to the cell state h[t]. The output y[t]is

produced by squashing the cell state with a nonlinear function

g2(·), usually tanh. Then, the output gate σoselects the overall

fraction of the state to be returned as output.

Gradient boosting is an ensemble technique which creates

a prediction model by aggregating the predictions of weak

prediction models, typically decision trees. With boosting

methods, weak predictors are added to the collection

sequentially with each one attempting to improve upon the

entire ensemble’s performance.

In the XGBoost implementation [6], given a dataset with n

training examples consisting of an input xiand expected output

yi, a tree ensemble model φ(xi)is deﬁned as the sum of K

regression trees fk(xi):

ˆyi=φ(xi) =

K

∑

k=1

fk(xi).(5)

To evaluate the performance of a given model, we choose a

loss function l(ˆyi,yi)to measure the error between the predicted

value and the target value, and optionally add a regularization

term Ω(fk)to penalize overly complex trees:

L(φ) =

n

∑

i

l(ˆyi,yi) +

K

∑

k

(Ω(fk)).(6)

The algorithm minimizes L(φ)by iteratively introducing each

fk. Assume that the ensemble currently contains Ktrees. We

add a new tree fK+1that minimizes

n

∑

i

l(ˆyi,yi+fK+1(xi)) + Ω(fk).(7)

In other words, the tree that most improves the current model

as determined by Lare greedily added. We train the new tree

using the objective function (6); this is done in practice by

approximating the objective function using the ﬁrst and second

order gradients of the loss function l(ˆyi,yi)[9].

3 Univariate and Multivariate Input Features

Time series prediction is a problem which aims to predict

future values using past values. These are generally past values

of the target variable, but this is not necessarily the case.

Forecasting models can be broadly classiﬁed into univariate

and multivariate models based on the number of features used.

When forecasting multiple timesteps into the future, models can

also be classiﬁed into direct, recursive and MIMO approaches

[20].

A recursive approach trains a single model to predict a single

step in the future, known as a one-step ahead forecast:

ˆxt=F(xt−1,xt−2, . . .)

2 IJCA, Vol. 27, No. 3, Sep. 2020

Figure 1: Forecasts for January 1999 using (a) linear regression (b) support vector regression (c) XGBoost regression and (d) LSTM.

Full model (blue) uses recursively calculated load and all external features. No load model (orange) uses only external

features and no recursion. Only load (green) uses no external features and only recursively calculated load

where x(i)represents the value of the variable at timestamp i.

This forecasted value is then fed back in as an input and the

next timestep is forecasted using the same model:

ˆxt+1=F(xt,xt−1, . . .)

This process is repeated until the desired time horizon has been

reached. This approach is sensitive to accumulated errors, as

any error present in the initial prediction will subsequently be

carried forward to later predictions when the predicted value

is used as input. However, as only one model is used for all

predictions, this allows more resources to be invested in the

single model. In addition, this approach is ﬂexible in that it

IJCA, Vol. 27, No. 3, Sep. 2020 3

allows forecasting for any time horizon, whether or not the

model has been trained on that time horizon.

A direct approach aims to avoid error accumulation by

creating a separate model for each potential time horizon. Thus,

a collection of models is trained:

ˆxt=F(xt−1,xt−2,. . .)

ˆxt+1=G(xt−1,xt−2, . . .)

ˆxt+2=H(xt−1,xt−2, . . .)

. . . =. . .

This avoids propagated errors as no predicted values are used

as input. However, as each model is trained independently, the

models may not learn complex dependencies between the values

ˆxt, ˆxt+1, ˆxt+2. . . . This approach is also computationally much

more expensive as multiple models must be trained and stored.

The multi-input multi-output (MIMO) strategy attempts to

combine the advantages of these approaches by training a single

model with multiple outputs to predict all timesteps up to the

time horizon simultaneously:

[ˆxt+H,ˆxt+H−1,..., ˆxt] = F(xt−1,xt−2, . . .)

This avoids accumulated error by performing all predictions in

one step, as well as modeling any interdependencies between

future timesteps. However, this comes at the cost of less

ﬂexibility, as all horizons are forecasted using the same model

and possible time horizons are limited to those built into the

model.

Based on the input features, time series prediction models can

be categorized as univariate or multivariate. Univariate models

use a single feature, generally the target variable, to predict a

future value:

ˆxt=F(xt−1,xt−2, . . .).

This has the advantage of allowing smaller and computationally

lighter models. Univariate models do not require extra external

data and require no feature engineering. However, as they are

tied to a single variable, they exhibit more sensitivity to noise

and reduced stability for recursive models.

Multivariate time series models use observations of multiple

variables or features, often taken simultaneously, and attempt to

also describe the interrelationships among the features [4]:

ˆxt=F(xt−1,xt−2,...,a(1)

t−1,a(1)

t−2,...,a(2)

t−1,a(2)

t−2. . .)

where each a(i)represents the time series of an external feature.

This has the obvious advantage of modeling relationships

between the target and external variables, but at the cost of a

bulkier model and higher computational costs. Building such

a model generally also requires obtaining measurements of

external features; the difﬁculty of this is highly dependent on

data availability.

It is also possible for a multivariate model to employ no past

information about the target variable:

ˆxt=F(a(1)

t−1,a(1)

t−2,a(1)

t−3,...,a(2)

t−1,a(2)

t−2,a(2)

t−3, . . .).

In this case, predictions must be made solely based on the

relationships of external features to the target variable. Such a

model is rarely used in practice as training the model in the ﬁrst

place requires knowledge of past values of the target variable,

but may see use if obtaining a full time series of the target value

is difﬁcult due to missing or unusable values. In addition, as the

output of the model is never used as an input, error accumulation

is limited. If future values for the external features can be

obtained, this approach allows prediction based on those values

without ﬁrst predicting earlier time horizons.

4 Empirical Study on Energy Forecasting

We study energy forecasting using the four machine learning

algorithms described in Section 2. Effects of external features

on error propagation are compared for the recursive univariate,

multivariate and the modiﬁed multivariate techniques.

The linear regression and support vector regression models

are implemented using scikit-learn [17]. We use the radial basis

function (RBF) kernel for SVR. The gradient boosting model

was built using the XGBoost Python library [6] with a maximum

tree depth of 12. All other parameters are set to scikit-learn

defaults.

The LSTM model is implemented using PyTorch [16]

running on Python 3.8. The model consists of four layers: the

input layer, two hidden LSTM layers with 16 nodes each, and

a linear fully connected aggregation layer as the output. To

improve stability, we use a residual connection on the LSTM

layers. The model is trained on MAE loss using the Adam

optimizer for 30 epochs.

In order to ensure reproducibility of the experiment, the

2001 EUNITE competition dataset [8] is used in our study.

This benchmark dataset is well-studied in energy forecasting

research [5, 13].

The EUNITE dataset spans over two years from January 1997

until January 1999. It contains the following ﬁelds: the half-

hourly electricity load, the daily average temperature, and a

ﬂag signifying whether the day is a holiday. In the statistical

analysis of the dataset [5, 13], it was found that the electricity

load generally decreases during holidays and weekends. This

phenomenon depends on the type of the holiday, e.g., Christmas

or New Year.

In order to ensure no outside forecasts are required,

we disregard all temperature measurements as these require

separate weather forecasts. This ensures model performance

is based only on features which can be calculated with perfect

accuracy. In addition, we generate the following features based

on the prediction timestamp: weekday, ranging from 0 to 6, day

of year, ranging from 1 to 365, and hour of day, ranging from 0

to 23. These two features allow the model to pinpoint the day

and time within the year and capture daily, weekly and yearly

dependencies.

Both datasets were converted into input-output pairs for

supervised learning using a sliding window method, whereby

4 IJCA, Vol. 27, No. 3, Sep. 2020

Figure 2: Absolute error of January 1999 forecast, smoothed with a moving average of 50 timesteps for (a) linear regression (b)

support vector regression (c) XGBoost regression and (d) LSTM. Full model (blue) uses recursively calculated load and all

external features. No load model (orange) uses only external features and no recursion. Only load (green) uses no external

features and only recursively calculated load

timesteps within the window were used as input to predict

the next timestep after the window. A window size of 48

timesteps was chosen, corresponding to the previous 24 hours

of activity. As the generated time features were uniformly rather

than normally distributed, features were normalized to lie in

range [−1,1]. The last month of data was used to evaluate the

models. This was done in order to limit potential data leakage by

ensuring all evaluation data was drawn from points temporally

after the training data. Ten percent of the remaining data was

used to validate the models during training, while the remainder

IJCA, Vol. 27, No. 3, Sep. 2020 5

Table 1: Correlation between forecast error and input feature value for linear regression, LSTM, XGBoost and SVR with rbf kernel.

Shown are recursive full models, non recursive (no load) models and recursive univariate (only load) models

Model Name Load Weekday Holiday Hour Day Of Year

Linear (Full) 0.5388 -0.0296 0.1878 0.2124 0.2919

Linear (No Load) * -0.0284 0.1659 0.2107 0.2414

Linear (Only Load) 0.6260 * * * *

LSTM (Full) -0.4827 0.2664 -0.0042 -0.0479 0.2982

LSTM (No Load) * 0.2385 0.5833 0.1944 -0.1310

LSTM (Only Load) -0.9232 * * * *

XGBoost (Full) 0.6223 -0.3040 -0.0926 0.1341 0.6788

XGBoost (No Load) * -0.0505 -0.0083 -0.0560 0.1149

XGBoost (Only Load) 0.8324 * * * *

SVR (Full) 0.1618 -0.1451 -0.0727 -0.0265 0.0239

SVR (No Load) * -0.1662 0.0312 0.2091 -0.1016

SVR (Only Load) -0.7639 * * * *

was used for training itself.

Each model was trained to forecast only one step ahead.

We compared three methods: ﬁrst, a univariate model using

only past values of the load to forecast future values. Each

prediction was recursively added to the input for the next

timestep. The second was a multivariate model which made

use of generated external features in addition to past values of

load. The load was updated recursively as in the ﬁrst model,

while external features were calculated directly based on the

timestamp of the prediction. The third model removes all

recurrent dependency by ignoring previous loads altogether and

using only the calculated external features.

For each variant, we compare the performance using four

learning models: linear regression, XGBoost, LSTM, and

support vector regression. For evaluation, we calculate the

absolute error of each model for each timestep, after outputs

are scaled back to the original range.

Figure (1) shows the forecasts obtained from the four models.

From top to bottom, these are: linear regression, support vector

regression, XGBoost, and LSTM.

Green lines represent recursive predictions using the

original univariate models, while blue lines represent recursive

predictions from the same models with generated features

introduced. Orange lines show the non-recursive version which

decouples predictions from past values of the output variable

and forecasts based only on generated features.

We note that all models are capable of learning short term

trends in the data. This is reﬂected in the high forecast accuracy

for short time horizons. We also observe that daily patterns

are successfully captured using all methods. The full models

generally prove to be the most accurate over short time horizons

(less than 1 day), but recursive error begins appear to by as early

as the second day, in the case of the LSTM model.

Figure (2) shows the magnitude of the forecasting error for

the testing set of January 1999 for all models. To showcase

the trend, these are averaged using a moving window of 50

timesteps.

We note that the univariate recursive models generally

accumulate signiﬁcant error by 250 timesteps. This is mitigated

in the multivariate recursive models, but due to the recursive

nature of the predictions error still rises over time. Nonrecursive

models exhibit higher initial error for linear regression and

LSTM models while being comparable for SVR and LSTM, but

this error remains relatively constant over time. For the linear

regression model, nonrecursive error is signiﬁcantly higher. We

believe there are two main reasons for this: ﬁrst, there is a

nonlinear relationship between the features and the load, making

prediction difﬁcult for a linear model. Second, the winter of

1999 (our testing set) was unusually cold and resulted in a

higher power consumption than previous years. This led to

consistent underestimates which were also observed in the SVR

and LSTM models. However, use of actual load values in the

recursive models anchored these models to higher initial values.

Table (1) shows the Pearson correlation coefﬁcients between

error magnitude and feature values. We see that the error

correlation decreases from the univariate to the full multivariate

model.

5 Conclusion

Forecasting time series with machine learning models has

wide applications to our daily life. Reducing errors in the

predictions is a paramount concern in the design of these

algorithms.

In this paper, we have demonstrated an approach using

generated features to convert a univariate model into a

multivariate model to mitigate long-term error accumulation.

This method can be applied to a variety of machine learning

time series models, a selection of which we have studied in this

paper. Our experiments show that the addition of generated

6 IJCA, Vol. 27, No. 3, Sep. 2020

features improves performance of all univariate models tested

over most time horizons, and that it is possible to rely on these

added features alone to avoid recursive error accumulation by

creating a nonrecursive model. Our results also show that

for the majority of models tested, the nonrecursive model can

achieve comparable performance on short time horizons while

outperforming recursive models over long time horizons.

This principle of using generated features to create a

multivariate model can be used for a wide variety of applications

and algorithms. Our method preserves the ﬂexibility of

recursive forecasting and allows use of the same model for any

forecast length, and can be extended to models which forecast

multiple timesteps at once. For future work, performance will

be evaluated on other applications such as stock market price

forecasting. We will also consider other types of non time-based

or composite features which can be generated.

Acknowledgement

Support from the Natural Sciences and Engineering Research

Council of Canada (NSERC) is greatly acknowledged.

References

[1] Kadir Amasyali and Nora M El-Gohary. “ Review of Data-

Driven Building Energy Consumption Prediction Studies”.

volume 81, pages 1192–1205. Elsevier, 2018.

[2] Ingo Steinwart Andreas Christmann. “Support Vector

Machines”. Springer-Verlag New York, 2008.

[3] Filippo Maria Bianchi, Enrico Maiorino, Michael C

Kampffmeyer, Antonello Rizzi, and Robert Jenssen. “ An

Overview and Comparative Analysis of Recurrent Neural

Networks for Short Term Load Forecasting”. 2017.

[4] C. Chatﬁeld. “Time-Series Forecasting”. CRC Press,

2000.

[5] En Chen, Ming-Wei Chang, and Chih-Jen lin. “ Load

Forecasting Using Support Vector Machines: A Study on

EUNITE Competition 2001. volume 19, pages 1821–

1830. IEEE, 2004.

[6] Tianqi Chen and Carlos Guestrin. “Xgboost: A Scalable

Tree Boosting System”. pages 785–794. ACM, 2016.

[7] Harris Drucker, Christopher JC Burges, Linda Kaufman,

Alex J Smola, and Vladimir Vapnik. “Support Vector

Regression Machines. pages 155–161, 1997.

[8] EUNITE. “EUNITE Electricity Load Forecast 2001

Competition”. EUNITE, Dec. 2001 2001.

[9] Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al.

“Additive Logistic Regression: A Statistical View of

Boosting (With Discussion and a Rejoinder by the

Authors)”. volume 28, pages 337–407. Institute of

Mathematical Statistics, 2000.

[10] John Cristian Borges Gamboa. “Deep Learning for Time-

Series Analysis”. 2017.

[11] Alex Graves and J¨

urgen Schmidhuber. “Framewise

Phoneme Classiﬁcation With Bidirectional LSTM and

Other Neural Network Architectures”. volume 18, pages

602–610. Elsevier, 2005.

[12] Katarina Grolinger, Alexandra L’Heureux, Miriam AM

Capretz, and Luke Seewald. “Energy Forecasting for

Event Venues: Big Data and Prediction Accuracy”.

volume 112, pages 222–233. Elsevier, 2016.

[13] Jawad Nagi, Keem Siah Yap, Farrukh Nagi, Sieh Kiong

Tiong, and Syed Khaleel Ahmed. “A Computational

Intelligence Scheme for the Prediction of the Daily Peak

Load”. volume 11, pages 4773–4788. Elsevier, 2011.

[14] Apurva Narayan and Keith W Hipel. “Long Short

Term Memory Networks for Short-Term Electric Load

Forecasting”. pages 1050–1059, Banff Center, Banff,

Canada, October 5-8 2017.

[15] Christopher Olah. “Understanding LSTM networks”.

volume 27, page 2015, 2015.

[16] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,

James Bradbury, Gregory Chanan, Trevor Killeen, Zeming

Lin, Natalia Gimelshein, Luca Antiga, et al. “Pytorch:

An Imperative Style, High-Performance Deep Learning

Library”. In “Advances in neural information processing

systems, pages 8026–8037, 2019.

[17] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,

B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,

R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,

D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.

“Scikit-learn: Machine Learning in Python”. volume 12,

pages 2825–2830, 2011.

[18] George AF Seber and Alan J Lee. “Linear Regression

Analysis”. volume 329. John Wiley & Sons, 2012.

[19] Maher Selim, Ryan Zhou, Wenying Feng, and Omar Alam.

“Reducing Error Propagation for Long Term Energy

Forecasting Using Multivariate Prediction”. Number 1,

pages 1–9. EPiC Series in Computing, 2020.

[20] Souhaib Ben Taieb, Gianluca Bontempi, Amir Atiya, and

Antti Sorjamaa. “A Review and Comparison of Strategies

for Multi-Step Ahead Time Series Forecasting Based on

the nn5 Forecasting Competition”, 2011.

[21] Sanford Weisberg. “Applied Linear Regression”. Wiley

Series in Probability and Statistics, 2013.

[22] Kaile Zhou, Chao Fu, and Shanlin Yang. “Big Data

Driven Smart Energy Management: From Big Data to Big

Insights”. volume 56, pages 215–225. Elsevier, 2016.

IJCA, Vol. 27, No. 3, Sep. 2020 7

[23] Lingxue Zhu and Nikolay Laptev. “Deep and Conﬁdent

Prediction for Time Series at Uber”. 2017.

Maher Selim is a postdoctoral fellow

for AI and Machine Learning at Trent

University. He obtained his PhD

in Physics from the University of

Western Ontario, Canada. He also

has a M.Sc.in Physics from Helwan

University, Egypt. Maher obtained

his B.Sc.in Physics from Ain Shams

University, Egypt. He is interested in Quantum AI and Quantum

Machine learning applications to real world problem.

Ryan Zhou is a Master’s student

at Trent University. He obtained

his B. Eng from Cornell University

in Ithaca, New York. He is

interested in convolutional and graph

neural networks, AI interpretability

and machine learning algorithms for

regression and time series prediction.

Wenying Feng is a Full Professor at

the Department of Computer Science

and the Department of Mathematics,

Trent University, Canada. She is

also an adjunct professor at the

School of Computing, Queen’s

University. Dr. Feng specializes

in nonlinear differential equations,

nonlinear analysis, machine learning

algorithms, mathematical and computational modelling. She

has published more than 100 research papers at refereed

journals and conference proceedings. She has presented as

a keynote speaker, served as program chairs and organized

special sessions for international conferences.

Omar Alam is an Assistant Professor

at the Department of Computer

Science at Trent University. His

broad area of interest is in software

engineering. In particular, he

is interested in Model-Driven

Engineering, Aspect-Oriented

Modelling, Empirical Software

Engineering, and Software Reuse. Dr. Alam has published in

premier venues in software engineering, such as MODELS,

JSS, SPE, SLE, ICSR, SAM, ICSM. He served as a reviewer

and program committee member for various journals and

conferences in the ﬁeld of Model-Driven Engineering and

Software Engineering.