ArticlePDF Available

Abstract and Figures

In the past few years, there has been a notable interest in the application of machine learning methods to enhance energy efficiency in the smart building industry. The paper discusses the use of machine learning in smart buildings to improve energy efficiency by analyzing data on energy usage, occupancy patterns, and environmental conditions. The study focuses on implementing and evaluating energy consumption prediction models using algorithms like long short-term memory (LSTM), random forest, and gradient boosting regressor. Real-life case studies on educational buildings are conducted to assess the practical applicability of these models. The data is rigorously analyzed and preprocessed, and performance metrics such as root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are used to compare the effectiveness of the algorithms. The results highlight the importance of tailoring predictive models to the specific characteristics of each building’s energy consumption.
This content is subject to copyright. Terms and conditions apply.
Research Article
Machine Learning Algorithms for Predicting Energy
Consumption in Educational Buildings
Khaoula Elhabyb ,
1
Amine Baina,
1
Mostafa Bellafkih ,
1
and Ahmed Farouk Deifalla
2
1
National Institute of Posts and Telecommunications (INPT), Rabat, Morocco
2
Future University Cairo in Egypt, Cairo, Egypt
Correspondence should be addressed to Ahmed Farouk Deifalla; ahmed.deifalla@fue.edu.eg
Received 14 December 2023; Revised 19 March 2024; Accepted 26 March 2024; Published 13 May 2024
Academic Editor: Saleh N. Al-Saadi
Copyright © 2024 Khaoula Elhabyb et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
In the past few years, there has been a notable interest in the application of machine learning methods to enhance energy eciency
in the smart building industry. The paper discusses the use of machine learning in smart buildings to improve energy eciency by
analyzing data on energy usage, occupancy patterns, and environmental conditions. The study focuses on implementing and
evaluating energy consumption prediction models using algorithms like long short-term memory (LSTM), random forest, and
gradient boosting regressor. Real-life case studies on educational buildings are conducted to assess the practical applicability of
these models. The data is rigorously analyzed and preprocessed, and performance metrics such as root mean square error
(RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are used to compare the eectiveness of
the algorithms. The results highlight the importance of tailoring predictive models to the specic characteristics of each
buildings energy consumption.
1. Introduction
Articial intelligence is rapidly being integrated into various
industries, such as healthcare, nance, and smart grids.
Among these human-centric applications, the use of AI in
smart buildings has attracted signicant attention from a
large community [1]. Smart buildings, which have been a
subject of research since the 1980s, utilize advanced technol-
ogy, data analytics, and automation systems to optimize
operations, enhance occupant comfort and productivity,
and reduce costs and energy consumption [2]. These build-
ings incorporate sensors, devices, and control systems to
monitor lighting, HVAC systems, security, and access con-
trols. Real-time data on occupancy, temperature, air quality,
and energy use can be analyzed to identify optimization
opportunities. The primary aim is to create an ecient, com-
fortable, and sustainable environment for residents while
reducing costs and ecological impact.
The smart building industry is experiencing signicant
growth as society becomes more connected and digital.
According to statistics from MarketsandMarkets [3], the
industry is projected to expand at a compound annual
growth rate (CAGR) of 10.5% between 2020 and 2025,
reaching a value of $108.9 billion. This growth is driven
by factors such as increased energy usage and expenses,
advancements in machine learning and the Internet of
Things (IoT), the push for net zero energy buildings, and
regulatory changes that encourage the adoption of smart
building systems and services. Figure 1 presents the fore-
casted global market side from 2020 to 2030. Expanding
on the ndings of the Zion Marketing research study [4],
it reveals the market value of 40,760 million in 2016, with
projections of a substantial growth trajectory to 61,900
million by 2024, with a CAGR exceeding 34%. This indi-
cates a rapid expansion within the market, indicating
robust trends and signicant economic development dur-
ing the study period.
The AI sector being discussed is experiencing signi-
cant growth due to the integration of the Internet of
Things (IoT) and machine learning (ML). IoT sensors col-
lect data about buildings and occupants, such as tempera-
ture, humidity, occupancy, and electricity consumption.
Hindawi
International Journal of Energy Research
Volume 2024, Article ID 6812425, 19 pages
https://doi.org/10.1155/2024/6812425
This data is centralized for optimizing building operations,
improving resident comfort, and reducing energy usage.
ML, on the other hand, is a powerful tool for processing
large amounts of data from various sources. It analyzes
this data to identify patterns and predict future events,
such as equipment failures, enabling preventative mainte-
nance [5].
The American Council for Energy-Ecient Economy
(ACEEE) [6] suggests that commercial buildings can signif-
icantly reduce their energy bills by up to 30% by imple-
menting energy-ecient technologies such as smart
thermostats and controlled lighting. The US Department of
Energy [7] reports that commercial buildings account for a
signicant portion of total energy consumption and green-
house gas emissions in the US. This highlights the impor-
tance of buildings that can predict energy consumption
and plan eciently to reduce energy usage. Intel research
[8] indicates also that energy consumption prediction has
the potential to achieve operational cost savings, stapro-
ductivity gains, and energy usage reductions. Given these
ndings, the primary emphasis will be on forecasting the
energy usage of smart buildings, with a specic focus on
educational facilities, which will be analyzed for the rst
time. Understanding and predicting energy consumption
in educational environments are paramount for optimizing
resource allocation, implementing eective eciency mea-
sures, and establishing sustainable and cost-eective opera-
tional procedures [9]. By focusing on this sector, valuable
insights can be gained to inform strategies for enhancing
energy eciency and sustainability in educational buildings,
ultimately contributing to improved resource management
and environmental conservation eorts.
The research concentrates on energy management
within smart buildings, aiming to forecast power consump-
tion through three distinct approaches: a traditional statisti-
cal approach employing the random forest algorithm, a deep
learning approach utilizing long short-term memory
(LSTM), and a hybrid approach leveraging the gradient
boosting regressor algorithm. These three techniques were
chosen to investigate a research gap regarding to the major-
ity of data-driven methodologies. While signicant progress
has been made in this area, limited attention has been given
to utilizing streaming and temporal data for forecasting
buildingsenergy demand. This gap will be addressed
through the utilization of real historical electricity data. The
used data is analyzed to evaluate model performance and
accuracy, aiming to identify the most eective approach for
smart building energy management. The research is aimed
at optimizing forecasting techniques through rigorous
comparative analysis, leveraging the strengths of LSTM, RF,
and GBR models. The study highlights the importance of
advanced machine learning in shaping smart building strate-
gies and is aimed at enhancing sustainability and eciency in
energy usage. Insights from this research will inform future
advancements in energy management practices for sustain-
able development. The article is structured into several delin-
eated sections, each serving a specic purpose:
(i) Introduction: This section introduces the applica-
tion of AI within the smart building sector, setting
the context for the study
(ii) Literature analysis: Here, a comparative examina-
tion of various ML algorithms used for energy pre-
diction in smart building systems is provided,
drawing insights from existing research
(iii) Methodology: This section outlines the systematic
approach adopted in the study, encompassing data
analytics, model development, and model evalua-
tion processes
(iv) Results and discussions: Findings obtained from the
methodology are presented, followed by a compara-
tive analysis that juxtaposes these results with prior
research initiatives
(v) Conclusion: This section synthesizes the results and
provides conclusions, oering perspectives on the
implications of the studysndings for the eld of
smart building energy management
2. Literature Review
A recent study conducted by the International Energy
Agency [10] has revealed concerning levels of energy
CAGR 10.6%
Revenue (USD Mn/Bn)
$121.3 billion
$67.4 billion
2019 2020 2021 2022 2023 2024 2025
Ye a r
2026 2027 2028 2029 2030
Figure 1: The global smart building market size [3].
2 International Journal of Energy Research
consumption in buildings. The study found that buildings
are responsible for a signicant portion of electricity con-
sumption and overall energy consumption in urban areas.
Buildings account for 72% of total electricity consumption
and 38% of average energy consumption in urban areas.
Additionally, buildings contribute to almost 40% of total
carbon dioxide pollution in urban areas. A smart building
is a modern infrastructure that incorporates automated
control systems and uses data to improve the buildings
performance and occupantscomfort. Figure 2 presents
the smart building functionalities and its most important
axis of work.
The top technology companies are currently prioritizing
IoT (Internet of Things) and AI (articial intelligence). The
future of building innovation is expected to focus on achiev-
ing maximum energy eciency, and this challenge can be
addressed by integrating AI-powered systems like machine
learning (ML) and deep learning. ML systems continuously
improve themselves, leading to advancements in various AI
research areas [12]. ML involves algorithms that allow them
to respond to inputs from their environment and identify
nonlinear connections in complicated or uncertain systems.
ML is divided into four major categories based on the type of
learning task they manage: supervised learning, unsuper-
vised learning, semisupervised learning, and reinforcement
learning.
(i) Supervised learning is a method of developing a
machine learning model by using a labeled data
set. In this process, each data point in the set is asso-
ciated with a known intended output. The model is
trained to predict the output
(ii) Unsupervised learning: in contrast to traditional
supervised learning, developing a model on an unla-
beled data set involves working with data where the
target outputs are unknown. In this scenario, the
model is not explicitly instructed on what to search
for but instead learns
(iii) Semisupervised learning is a learning approach that
combines supervised and unsupervised learning. In
this approach, the model is trained using a data set
that is partly labeled, meaning that some of the data
points have known labels
(iv) Reinforcement learning where a model is trained to
make a series of decisions in a changing environ-
ment. The model learns through trial and error,
receiving feedback in the form of rewards or costs
Energy consumption prediction is a valuable technique
that involves forecasting the amount of energy a system or
device will use within a specic time frame. This technique
serves various purposes, such as optimizing energy usage,
predicting future energy demands, and identifying potential
ineciencies in energy consumption. To predict energy
consumption, dierent methods can be employed, including
statistical models, machine learning algorithms, and physics-
based models. The choice of technique depends on factors
such as data availability, system complexity, and the desired
level of accuracy. In this particular case, the focus is on
utilizing machine learning algorithms to predict energy
consumption by leveraging historical data and other relevant
factors.
The quality and relevance of the data used in machine
learning algorithms greatly inuence their performance. In
a study conducted by Ahajjam et al. [13] on Moroccan
BuildingsElectricity Consumption Data Set, electricity con-
sumption was categorized into three types: whole premises
(WP), individual loads (IL), and circuit-level (CL) data.
(1) Labeled WP: Labeled whole premises (WP) con-
sumption data refers to electricity usage data col-
lected from 13 households in the MORED data set.
Energy
Smart meters,
demand response Wat er
Smart meters, use
and ow sensing
HVAC
Fans, variable air
volume, air quality
Elevators
Maintenance,
performance
Access and security
Badge in, cameras,
integration perimeter,
doors
Lighting
Occupancy sensing
Fire
Functionality checks,
detector service
24/7 monitoring
Condition monitoring,
parking lot utilization
PEHV charging
Charging of hybrid
and electric vehicles
Figure 2: The global smart building market size [11].
3International Journal of Energy Research
This data is valuable as it includes not only the raw
electricity consumption measurements but also addi-
tional information that can assist in analyzing,
modeling, and comprehending the patterns of elec-
tricity usage in dierent households
(2) Labeled IL: Ground-truth electricity consumption
refers to the electricity consumption data of individ-
ual loads (IL) that have been labeled or annotated
with accurate information. This involves recording
and labeling the operational states of specic loads,
such as refrigerators or air conditioners when they
are turned on or oat specic times. Having this
ground-truth information is valuable for researchers
and analysts as it allows for accurate load disaggrega-
tion, energy management, and appliance recognition
(3) CL: Measurements in the context of energy refer to
the circuit-level energy measurements obtained from
the electrical mains of a premises. These measure-
ments provide information about the overall energy
consumption of a circuit and can be used to under-
stand the energy consumption of a group of loads
The current work focuses on three educational buildings
located at Down Town University. Further information
about these buildings will be provided next. The subsequent
section presents a literature review on energy consumption
forecasting in various buildings using multiple machine
learning algorithms.
2.1. Traditional Machine Learning Approach. ML algorithms
have been utilized to tackle the primary challenges of
physics-driven methods in load prediction. For instance,
Somu et al. [14] developed eDemand, a new building energy
use forecasting model, using long short-term memory net-
works and an improved sine cosine optimization algorithm,
and as a result, the model outperformed previous state-of-
the-art models in real-time energy load prediction. Next,
Suranata et al. [15] focused on predicting energy consump-
tion in kitchens. They used a feature engineering technique
and a short-term memory (LSTM) model. Principal compo-
nent analysis (PCA) was applied to extract important fea-
tures, and the LSTM model was used on two tables. In
addition, Shapi et al. [16] developed a prediction model for
energy demand making use of the Microsoft Azure cloud-
based machine learning framework, The methodology of
the prediction model is provided using three distinct tech-
niques, including support vector machine, articial neural
network, and k-nearest neighbors. The study focuses on
real-world applications in Malaysia, with two tenants from
an industrial structure chosen as case studies. The experi-
mental ndings show that each tenants energy consumption
has a particular distribution pattern, and the suggested
model can accurately estimate energy consumption for each
renter. To forecast daily energy consumption based on
weather data, Faiq et al. [17] developed a new energy usage
prediction technique for institutional buildings using long
short-term memory (LSTM). The model, trained using
Malaysian Meteorological Department weather forecasting
data, outperformed support vector regression (SVR) and
Gaussian process regression (GPR) with the best RMSE
scores. The dropout method reduces overtting, and Shap-
leys additive explanation is used for feature analysis. Accu-
rate energy consumption estimates can help detect and
diagnose system faults in buildings, aiding in energy policy
implementation. Further, Kawahara et al. [18] explore the
application of various machine learning models to predict
voltage in lithium-ion batteries. The study includes algo-
rithms such as support vector regression, Gaussian process
regression, and multilayer perceptron. The hyperparameters
of each model were optimized using 5-fold cross-validation
on training data. The data set used consists of both simula-
tion data, generated by combining driving patterns and
applying an electrochemical model, and experimental data.
The performance of the ML models was evaluated using both
simulation and experimental data, with dierent data sets
created to simulate variations in state of charge distribution.
2.2. Deep Learning and Hybrid Approaches. Additionally,
various networks integrate multiple techniques to devise
data-driven approaches. These integrated mechanisms are
commonly referred to as hybrid networks. For example,
Mohammed et al. [19] focus on the application of an
intelligent control algorithm in HVAC systems to enhance
energy eciency and thermal comfort. The authors pro-
pose integrating SCADA systems with an intelligent build-
ing management system to optimize heat transmission
coecients and air temperature values. Genetic algorithms
are employed to maintain user comfort while minimizing
energy consumption. Similar to [19], Aurna et al. [20]
compare the performance of ARIMA and Holt-Winters
models in predicting energy consumption data in Ohio
and Kentucky. The study nds that the Holt-Winters model
is more accurate and eective for long-term forecasting. The
authors recommend further research to consider other
parameters, and environmental factors, and explore hybrid
models for better short-term load forecasting. Next, Fer-
doush et al. [21] developed a hybrid forecasting model for
time series electrical load data. The model combines random
forest and bidirectional long short-term memory methods
and was tested on a 36-month Bangladeshi electricity con-
sumption data set. The results showed that the hybrid model
outperformed standard models in terms of accuracy. The
study emphasizes the eectiveness of the hybrid machine
learning approach in improving short-term load forecasting
accuracy in the dynamic electric industry. In their study,
He and Tsang [22] developed a hybrid network combining
long short-term memory (LSTM) and improved complete
ensemble empirical mode decomposition with adaptive noise
(iCEEMDAN) to optimize electricity consumption. They
divided the initial power consumption data into patterns
using iCEEMDAN and used Bayesian-optimized LSTM to
forecast each mode independently. In the same direction,
Jin et al. [23] proposed an attention-based encoder-decoder
network with Bayesian optimization for short-term electrical
load forecasting, using a gated recurrent unit recurrent neu-
ral network for time series data modeling and a temporal
attention layer for improved prediction accuracy and
4 International Journal of Energy Research
precision. Further in their study, Olu-Ajayi et al. [24] used
various machine learning techniques to predict yearly build-
ing energy consumption using a large data set of residential
buildings. The model allows designers to enter key building
design features and anticipate energy usage early in the
development process. DNN was found to be the most e-
cient predictive model, motivating building designers to
make informed choices and optimize structures. Jang et al.
[25] created three LSTM models to compare the eects of
incorporating operation pattern data on prediction perfor-
mance. The model using operation pattern data performed
the best, with a CVRMSE of 17.6% and an MBE of 0.6%.
The article by Ndife et al. [26] presents a smart power
consumption forecast model for low-powered devices. The
model utilizes advanced methodologies, such as the
ConvLSTM encoder-decoder algorithm, to accurately pre-
dict power consumption trends. The performance evalua-
tion of the model demonstrates improved accuracy and
computational eciency compared to traditional methods.
Also, Duong and Nam [27] developed a machine learning
system that monitors electrical appliances to improve elec-
tricity usage behavior and reduce environmental impact.
The system utilizes load and activity sensors to track energy
consumption and operating status. After three weeks of test-
ing, the system achieved a state prediction accuracy of
93.60%. In their approach, Vennila et al. [28] propose a
hybrid model that integrates machine learning and statistical
techniques to improve the accuracy of predicting solar
energy production. The model also helps in reducing place-
ment costs by emphasizing the signicance of feature selec-
tion in forecasting. In the sale context, Kapp et al. [29]
developed a supervised machine learning model to address
energy use reduction in the industrial sector. They collected
data from 45 manufacturing sites through energy audits
and used various characteristics and parameters to predict
weather dependency and production reliance. The results
showed that a linear regressor over a transformed feature
space was a better predictor than a support vector machine.
In their research, Bhol et al. [30] propose a new method for
predicting reactive power based on real power demand. They
utilize a ower pollination algorithm to optimize their model
and show that it outperforms other models like GA, PSO, and
FPA. Asiri et al. [31] used an advanced deep learning model
for accurate load forecasting in smart grid systems. They use
hybrid techniques, including LSTM and CNN, feature engi-
neering, and wavelet transforms, to enhance forecasting
accuracy and eciency. The results show signicant
improvements in short-term load prediction, outperforming
traditional forecasting methods.
Table 1 contains detailed information about the algo-
rithms used, performance evaluation measurements, and
the advantages and disadvantages of each approach.
3. Methodology
This research predicts power usage in three buildings of a
private research university using a data set collected from
January 2020 to January 2023. The university is known
for its excellence in education and research across various
disciplines. The buildings under study (referred to as CLAS,
NHAI, and Cronkite) are all part of the same institution
and serve distinct functions. Building CLAS, an abbrevia-
tion of Center of Law and Society, mainly consists of an
amphitheater and oces, and building NHAI, which
means Nursing and Health Innovation, consists of oces
and laboratories. In contrast, Cronkite consists of class-
rooms and seminar halls.
The buildings are equipped with IoT sensors connected
to power intel sockets, and the collected data is sorted on
an open-source website server [32]. The prediction method
will use three machine learning algorithms: long short-
term memory (LSTM), random forest (RF), and gradient
boosting regressor (GBR). The data will be analyzed and
prepared before being used to train and test the models.
The methodology for forecasting energy consumption
will be divided into three sections:
(1) Data analysis involves evaluating raw data to under-
stand patterns and characteristics of electrical power
consumption data.
(2) Model training trains machine learning models,
using past data to identify patterns and correlations
between input characteristics and day power use
(3) Model test models evaluation using validation met-
rics to assess their performance and accuracy.
3.1. Data Analysis
3.1.1. Data Preparation. This study focuses on the process of
data preparation in machine learning, which is time-
consuming and computationally challenging due to the pres-
ence of missing values and uneven value scales between
features. The data was prepared using two techniques: impu-
tation of missing data and standardization. The imputation
procedure was carried out using the probabilistic principal
component analysis (PPCA) approach, a maximum likeli-
hood estimate-based technique that estimates missing values
using the expectation-maximization (EM) algorithm. This
method is developed from the principal component analysis
(PCA) method, which is used for data compression or
dimensionality reduction. The resulting cleaned data was
then subjected to standardization, also known as Z-score
normalization, to ensure an even distribution of the data
above and below the mean value as shown in equation (3):
xstandardized =xμ
σ, 1
where μrepresents the mean, σdenotes the standard devia-
tion, and xis the original data points.
3.1.2. Data Normality Analysis. This research conducted a
normality test on each renters data set to determine its distri-
bution. This test is crucial for model construction and is espe-
cially important for larger sample sizes. Understanding the
data set distribution can provide valuable insights into the
prediction outcome. Kurtosis measures distribution peaks,
5International Journal of Energy Research
Table 1: Previous research in ML-driven building energy use prediction.
Authors Algorithm Data set Performance evaluation Pros Cons
Somu et al.
[14]
(i) ARIMA
(ii) Genetic algorithm-LSTM
(iii) Sine cosine optimization
algorithm-LSTM
(i) The KReSIT power consumption
data set is sourced from the Indian
Institute of Technology (IIT) in
Mumbai, India.
(i) ARIMA: MAE: 0.3479;
MAPE: 21.3333; MSE: 0.1661;
RMSE: 0.4076.
(ii) Genetic algorithm-LSTM:
MAE: 0.1804; MAPE: 5.9745;
MSE: 0.0432; RMSE: 0.2073.
(iii) (ISCOA-LSTM): MAE: 0.0819;
MAPE: 4.9688; MSE: 0.0135;
RMSE: 0.1164.
(i) Improved forecasting accuracy
(ii) Improved forecasting accuracy
(iii) Real-world applicability
(i) Sensitivity to initialization
(ii) Convergence speed
Suranata
et al. [15] (i) Long short-term memory (i) NL (i) RMSE = 62 013;MAE = 26 982;
MAPE = 12 876
(i) The ability to eectively predict
energy consumption patterns in
time series data.
(i) Time-consuming training
Ferdoush
et al. [21]
(i) LSTM
(ii) RF-bi-LSTM hybrid model
(iii) Bidirectional long short-term
memory (LSTM)
(i) Bangladesh Power Development
Board covered 36 months.
(i) LSTM: MSE = 0 4776;
RMSE = 0 691;MAE = 0 5578;
MAPE = 148 7.
(ii) Bi-LSTM: MSE = 0 2943;
RMSE = 0 5425;MAE = 0 4317;
MAPE = 194 80.
(iii) RF-bi-LSTM: MSE = 0 1673;
RMSE = 0 4090;MAE = 0 3070
MAPE = 193 49.
(i) Stable learning characteristics.
(ii) Moderate generalization gap in
learning loss analysis.
(i) The hybrid model may require
specic data to utilize the
strengths of random forest and
bidirectional LSTM eectively.
Yaqing et al.
[22]
(i) EMD-BO-LSTM
(ii) iCEEMDAN
(i) Real power consumption data of a
university campus for 12 months
(i) EMD-BO-LSTM: MAE = 155 77;
RMSE = 203 4;MAPE = 10 41%;
R2 = 0 8478.
(ii) iCEEMDAN-BO-LSTM:
MAE = 40 841;RMSE = 59 68;
MAPE = 2 5563%;R2 = 0 986.
(i) Adaptability and eciency
(ii) Enhanced prediction accuracy (i) NL
Ndife et al.
[26] (i) ConvLSTM encoder-decoder
(i) Two million measurements were
gathered over 47 months from a
residential location in Sceaux,
France.
(i) RMSE on the model: 358 kWh
RMSE on the persistence model:
465 kWh RMSE on model A:
530 kWh RMSE on model B:
450.5 kwh
(i) Improved forecast accuracy
(ii) Suitable for low-powered devices
(iii) Ecient training and prediction
time
(i) Model complexity
Duong et al.
[27] (i) Multiple layer perceptron
(i) 215 data points on the power
consumption and on/ostatus of
electrical devices, in Vietnam.
(i) RMSE: 10.468
(ii) MAPE: 21.563
(i) It handles large amounts of input
data well. Makes quick predictions
after training.
(i) Slow training
Faiq et al.
[17]
(i) LSTM
(ii) LSTM-RNN
(iii) CNN-LSTM
(i) Daily data from 2018 to 2021,
from the Malaysian
Meteorological Department.
(i) LSTM: RMSE = 165 20;
MAE = 572 545.
(ii) LSTM-RNN: RMSE = 263 14;
MAE = 353 38.
(iii) CNN-LSTM: RMSE = 692 14,
MSE = 1134 1.
(i) Accurate prediction of building
energy consumption
(ii) Improved energy eciency
(i) Requires a signicant amount of
historical data to create an
accurate model
Bhol et al.
[29]
(i) ARIMA
(ii) Holt-Winters ower
pollination algorithm
(i) Laboratory-operated critical loads
over three months.
(i) HW-GFPA: MBE = 0 42 for
validation, 0.43 for test
RMSE = 0 80
(ii) ARIMA: MBE = 0 073 for
validation, 0.016 for test
RMSE = 0 183
(i) Scalability
(ii) Optimal hyperparameter
identication
(i) Sensitivity to kernel selection
6 International Journal of Energy Research
while skewness measures irregular probability distribution
around the mean value [33]. Equations (2) and (3) provide
formulas for skewness and kurtosis, which are essential for
understanding the data set distribution and its impact on
the prediction outcome.
Skewness = N
i=1 xix3
N1σ3,2
Kurtosis = N
i=1 xix4
σ4,3
where nis the number of data points in the collection, xi
is the individual data points within the sample, and xis
the sample mean.
3.1.3. Feature Selection. Feature engineering is a crucial
aspect of machine learning, involving the creation of mean-
ingful data representations to enhance model performance.
It involves careful selection, transformation, and creation of
features that capture relevant information from raw data,
enhancing predictive accuracy and interoperability. Tech-
niques like principal component analysis, domain knowledge
extraction, and creative data manipulation help models
extract patterns and make accurate predictions, bridging
the gap between raw data and actionable insights.
As previously stated, our data set comprises 27 features
detailing the characteristics of the selected buildings. To
ensure optimal input for our predictive model, we employed
a feature engineering approach leveraging a tree-based
model, specically the random forest algorithm.
3.2. Model Development. This study uses supervised machine
learning to predict energy usage using data prepared and
trained in two groups. The model employs regressive predic-
tion using random forest, LSTM, and gradient boosting
regressor. The process from data collection to model gener-
ation is depicted in Figure 3.
3.2.1. Random Forest. A random forest regressor is a
machine learning method that combines multiple decision
trees to create a predictive model for regression tasks. Each
tree is constructed using a randomly selected subset of train-
ing data and features with H x ;θk,k=1,,Kwhere xrep-
resents the observed input (covariate) vector of length pwith
associated random vector X. During prediction, the regres-
sor aggregates predictions from all trees to generate the nal
output, typically the average of the individual three predic-
tion h x = 1/kK
k=1h x ;θk[34]. This method is com-
monly used for pattern identication and prediction due to
its ability to learn complicated behavior, Consequently, it is
the best choice for constructing the prediction model in
the present study. In Figure 4, we present a ow chart of
the random forest algorithm.
3.2.2. Long Short-Term Memory. Sepp Hochreiter and Juer-
gen Schmidhuber introduced long short-term memory
(LSTM) in 1997 as an advanced application of recurrent
neural networks. LSTM is eective in processing and pre-
dicting time series data with varying durations. It captures
long-term relationships, handles variable-length sequences,
and recalls previous data, making it useful for energy con-
sumption prediction [35]. The LSTM model structure con-
sists of three layers: input, LSTM unit, and output. The
mathematical equations used in LSTM include the forget
gate, input gate, output gate, and cell state. The following
are the equations utilized in LSTM:
it=σWi·ht1,xt+bift=σWf·ht1,xt+bf,
Ct=ft·Ct1+it·Ct,
ot=σWo·ht1,xt+boht=ot· tan hC
t,
4
where xtis the input at the step t;it,ft, and otare the input,
forgot, and output vectors; gtis the candidate activation vec-
tor, and ctis the cell state at time t.
The LSTM algorithm is a powerful tool for collecting
and transmitting information across long sequences. It is
commonly used in applications such as audio recognition,
natural language processing, and time series analysis. Based
on previous research and the availability of a time series data
set, LSTM is chosen as the algorithm for predicting energy
with high precision. Figure 5 presents a owchart of LSTM.
3.2.3. Gradient Boosting Regressor. The gradient boosting
approach is an iterative method that combines weak learners
to create a strong learner by focusing on errors at every step.
It is aimed at decreasing the loss function by nding an
approximation function of the function F x that translates
xto y. This method improves prediction performance and
lowers prediction error by matching weak learner models
to the loss function [36]. The squared error function is often
used to estimate the approximation function, which is then
used to nd the ideal settings for weak learners. The gradient
boosting regressors mathematical equation is as follows:
yi=Fx
i+
M
m=1
γmhmx
i, 5
where yiis the predicted target, xiis the input features, F xi
is the ensemble model prediction, Mis the weak model, γm
is the learning rate, and hmxiis the prediction by mth
weak model. The current research utilized gradient boosting
due to its robust predictive performance, ability to capture
complex data linkages and nonlinear patterns, and exibility
and customization capabilities. Figure 6 depicts the gradient
boost regressor algorithmsow chart.
3.3. Model Evaluation. The data set was divided into a train-
ing group (25%) and a testing group (75%). The training
group was used to train machine learning algorithms and
create predictive models for maximum consumption data.
The testing group was used to evaluate the performance of
these models. This process is illustrated in Figure 7.
The training and testing process involved a simple parti-
tioning of data to prevent overtting. Machine learning algo-
rithmspredictive models were evaluated for performance
7International Journal of Energy Research
and accuracy using metrics like R2, MSE, MAE, RMSE, and
MAPE. Each measurement denition is mentioned in
Table 2.
The present research used MSE because of its sensitivity
to errors, dierentiability, and simplicity of interpretation.
The use of RMSE is preferable to MSE because it yields a
more easily understandable outcome in the original units
of the dependent variable, facilitating straightforward com-
parison across data sets or models. The mean absolute error
(MAE) is a suitable metric where the quantity of errors is
more signicant than the specic direction of the mistakes,
oering a clear and direct evaluation of the models perfor-
mance, and MAPE is particularly valuable for comparing a
models prediction accuracy to the scale of the actual values.
Data
preparation
Training data
(70%) Random forest
Long short term
memory
Gradient boosting
regressor
Learning
algorithm
Predictive
model
Figure 3: Process of generating predictive model after data preparation.
Start
Input data
Random subset sampling
Feature subsampling
Construct trees
Prediction by trees
Aggregation
Final prediction
End
Figure 4: Random forest algorithm owchart.
Start
Input (xt)
Memory cell
Forget gate (ft) Input gate (it)
Output gate (ot)
Hidden state (ht)
End
Candidate cell (gt)
Figure 5: Long short-term memory algorithm owchart.
No
Yes
Calculate pseudo-residuals
Fit weak learner
Start
Input dataset
Initialize ensemble
For each iteration
Final prediction
End
Aggregated predictions
Update ensemble
Figure 6: Gradient boosting regressor ow chart.
8 International Journal of Energy Research
4. Results and Discussion
The experiment results were reviewed in sections, discuss-
ing the initial processing and imputation of missing data,
energy consumption prediction for each building, and per-
formance comparisons for random forest, long short-term
memory, and gradient boosting regressor models. The pre-
sentation of results follows a hierarchy, starting with the
normality test, then data preprocessing, and nally model
evaluation.
4.1. Normality Testing of Data. The evaluation is aimed at
examining the impact of data shape on predictive model
development performance, using measures of skewness and
kurtosis. Results were compiled in Table 3 to evaluate the
datas shape and potential deviations from normal distribu-
tion. To evaluate the normality of the energy demand data,
the two values were computed using the aggregated data
from each building spanning from January 2020 to January
2023. Figure 8 also depicts the format of the data set for a
graphical examination of normality.
Based on Table 3, the data sets for the CLAS, NHAI, and
Cronkite buildings were approximately symmetrical and
skewed with bidirectional shape distribution. However, there
were some dierences in the skewness values for each build-
ing. The CLAS building showed normal asymmetry due to
power consumption and KWS, with a slightly negative skew-
ness indicating a longer left tail. The CHWTON distribution
was skewed, witha skewnessof 427578, indicating a longer left
tail. The nursing and health innovation building had a pro-
nounced asymmetry, with power consumption having a posi-
tive skewness and KWS and CHWTON having a negative
skewness, indicating balanced tails. The Cronkite building
had positive skewness values, indicating a moderate right-
skewed distribution. Overall, all three data sets were approxi-
mately symmetric, skewed, and bimodal in their form density.
The kurtosis values of all three buildings in Table 3
were less than 0, indicating that their distributions were
Data
preparation
Generate
predictive values Comparison with
the actual
recorded
maximum
consumption
Training data
(30%) Root mean square
error
Mean absolute
error
Mean squared
error
Mean absolute
percentage error
Figure 7: Testing procedure for the trained predictive model.
Table 2: Performance metrics.
Algorithms Description Math form
R-squared [37]
The coecient of determination is used to determine
how much of the variance in the dependent variable can
be explained by the independent variables.
R2=1SSres
SStot
Mean squared error [38] A regression metric used to calculate the average squared
dierence between predicted and actual values. MSE = 1
n
n
i=1
yiyi
2
Root mean squared error [39]
RMSE is a widely used measure for estimating the
average variance between predicted and real values in
regression tasks.
RMSE = 1
n
n
i=1
yiyi
2
Mean absolute error [40]
A regression statistic used to calculate the average
absolute dierence between predicted and actual
values, ignoring the direction of mistakes.
MAE = 1
n
n
i=1
yiyi
Mean absolute percentage error [39]
A commonly used method for determining forecasting
error, as it measures the average absolute percent inaccuracy
for each time period less actual values divided by actual
values, making understanding it simpler due to its scaled units.
MAPE = 1
n
n
i=1
yiyi
yi
× 100
9International Journal of Energy Research
platykurtic. This was also evident in Figure 8, where the
probability distribution plot had a higher tail and a larger
peak center. However, the Cronkite building had a kurto-
sis value greater than 0, indicating a leptokurtic distribu-
tion with higher variance. CLAS and NHAI had roughly
normal distributions, but CLAS had a lower mean than
the median. Department CLAS also had an almost normal
distribution but with higher skewness and kurtosis. The
CHWTON data set had a higher variation compared to
the other data sets.
Table 3: Measurements of skewness and kurtosis for the buildings.
Application area Building name Power consumption KWS CHWTON
Skewness
CLAS -0.10206 -1 0.42329
NHAI 0.017914 -1 -1
Cronkite 0.805914 -1 0.494
Kurtosis
CLAS -0.333548 -2.0 -1.019492
NHAI -0.777519 -2.0 -2
CRONKIT 2.620576 2.056333 -0.687182
0.0000
4000
0.0001
0.0002
0.0003
0.0004
Density
0.0005
0.0006
0.0007
0.0008
5000 6000 7000 8000
0.00000
–1000
0.00005
0.00010
0.00015
Density
0.00020
0.00025
0.00030
0 1000 2000 3000 4000 5000 6000 7000
0.0000
2000
0.0001
0.0002
0.0003
Density
0.0004
1000 3000
Power_consumption (KW) CHWTON
4000 5000 6000 –0.4
Density
30
25
20
15
10
5
0
–0.2 0.0 0.2 0.4
0.0000
0.0004
0.0002
0.0006
0.0008
Density
0.0010
Power_consumption (KW) CHWTON
4000 5000 6000 7000 8000 0
Density
0.00035
0.00030
0.00025
0.00020
0.00015
0.00010
0.00005
0.00000
2000 4000 6000 8000
Figure 8: Probability density for buildings CLAS, NHAI, and Cronkite.
10 International Journal of Energy Research
4.2. Data Preprocessing. Based on Figure 9, the original data
set had various scale ranges for power consumption factors
like KWS, CHWTON, voltage, and building occupants. To
verify the prediction capacity of 29 features, multiple
approaches like correlation analysis, ensemble analysis, and
tree-based models were used. After testing the mentioned
methods, the most ideal qualities for projecting energy
demand and consumption are as follows:
(1) Previous consumption patterns
(2) Calendar: weekday, month, and season
(3) Demography: A buildings population might inu-
ence consumption patterns
(4) Geographical factors such as climate. People will use
more electrical appliances at hot and low tempera-
tures, respectively
The study on missing data utilized the missingness
matrix to quantify the extent of missing data and identify
rows that contained missing values. Upon analyzing
Figure 10, it is noteworthy that none of the three data sets
exhibited any missing data.
<bound method DataFrame.info of campus bldgno
0 Downtown 309 Beus center for law and society 2020 1
2020 1
2020 1
2020 1
2020 1
2022 12
2022 12
2022 12
2022 12
2023 1
. . .. . . . . .
1 Downtown 309 Beus center for law and society
2 Downtown 309 Beus center for law and society
3 Downtown 309 Beus center for law and society
4 Downtown 309 Beus center for law and society
1092 Downtown 309 Beus center for law and society
1093 Downtown 309 Beus center for law and society
1094 Downtown 309 Beus center for law and society
1095 Downtown 309 Beus center for law and society
1096 Downtown 309 Beus center for law and society
. . . . . . . . . . . .
Day Hour KW KWS CHWTONgalsgas HTmmBTU#Houses \
\
HTmmBTUlightbulbs HTmmBTUgalsgas Total#Houses Totallightbulbs
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
8905
9162
9651
10861
10305
2.550
2.702
2.710
2.452
2.467
4
5
6
7
1
2020-01-01T00:00:00.000
2020-01-02T00:00:00.000
2020-01-03T00:00:00.000
2020-01-04T00:00:00.000
2020-01-05T00:00:00.000
2020-12-28T00:00:00.000
2020-12-29T00:00:00.000
2020-12-30T00:00:00.000
2020-12-31T00:00:00.000
2020-12-32T00:00:00.000
1092
1093
1094
1095
1096
1092
1093
1094
1095
1096
11252
10343
12972
13775
10026
2.669
2.567
2.558
2.346
2.508
4
5
6
7
1
112759
124047
118887
126654
131700
43
47
45
48
50
3232
3167
3446
3504
3071
807960
791672
861333
875937
767612
135532
129250
132119
127761
119682
52
49
51
49
46
3129
3283
3360
3380
3282
782052
820649
839804
844761
820307
1
2
3
4
5
5364.07
5902.25
5915.77
5496.93
5512.42
–0.01
–0.01
–0.01
–0.01
–0.01
101
103
109
124
117
542
517
529
511
479
. . .
. . .
. . .
. . .
. . .
. . .
1092 28 4998.70 –0.01 129 451. . .
1093 29 5012.78 –0.01 118 496. . .
1094 30 4900.29 –0.01 150 476. . .
1095 31 4631.51 –0.01 160 507. . .
1096 1 4694.85 –0.01 115 527. . .
. . . . . . . . . . . . . . . . . . . . .. . .
. . .
. . . . . . . . . . . . . . .
. . . . . .. . . . . .
Totalgalsgas GHG DOW tstamp2
Figure 9: Summary of transform data set for CLAS building.
11International Journal of Energy Research
4.3. Feature Selection. Selecting the most crucial features
plays a vital role in enhancing the eectiveness, stability,
and scalability of our prediction model. Through the utiliza-
tion of a feature importance assessment method, as summa-
rized in Table 4, we identied the top ve inuential
features: KW, KWS, CHWTON, total houses, and
CHWTONgaslas. The ranking of these features is illustrated
in Figure 11, which shows the order of their importance.
Although the initial analysis considered all 29 parameters,
the gure only highlights features that signicantly contrib-
ute to precision, ensuring a streamlined and informative
depiction.
The study is aimed at predicting energy consumption in
three educational buildings by identifying key parameters.
Through feature selection, we have identied key parameters
that signicantly impact energy usage. These include
CHwtonor chilled water tons which measures the cooling
capacity of chilled water systems, representing the heat
energy required to melt one ton of ice in 24 hours. Addition-
ally, KWdenotes the power consumption of electrical
equipment and lighting systems within the buildings. Total-
lightbulbdenotes the aggregate number of light bulbs or
lamps within the buildings, crucial for various assessments.
Furthermore, aspects of HVAC systems, like CHWTON-
galsgas,oer insights into chilled water and gas usage.
Moreover, Combined mmBTUmeasures the heat required
to raise the temperature of water by one degree Fahrenheit.
The feature selection process helps identify the most inuen-
tial parameters for the predictive model, enabling more
accurate energy consumption forecasts.
4.4. Performance Evaluation and Comparison. The predic-
tion modelsperformance was evaluated by comparing mul-
tiple methods for each building after training and testing.
Comparative results are shown in Table 5.
Based on the performance evaluation measurements pre-
sented in Table 5, the GBR method exhibited outstanding
performance across all buildings. Notably, the determination
coecients were remarkably high, reaching 0.998 for Cron-
kite, 0.984 for CLAS, and 0.845 for NHAI. Furthermore,
the corresponding mean squared error (MSE) values were
8.148, 5.09, and 9.17, respectively. The root mean squared
error (RMSE) and mean absolute error (MAE) also sup-
ported these results, indicating that GBR outperformed
other methods and yielded the best values. Additionally,
when assessing the mean absolute percentage error (MAPE)
results, GBR surpassed the other methods, demonstrating
the lowest error percentage. The LSTM method exhibited
lower determination coecients compared to the GBR
results, with values of 0.86 for CLAS, 0.7772 for NHAI,
and 0.7609 for Cronkite. However, when comparing LSTM
to the RF method, the performance varied across buildings.
1.0 1097
Time reviewed
0.8
0.6
0.4
0.2
0.0
1097
877
658
438
219
0
1097
Power_consumption (KW)
1097
KWS
1097
CHWTON
1097
Weekday
Figure 10: Missingness graph of CLAS building.
Table 4: Feature importance.
Features Importance
CHWTON 0.303456
CHWTONgalsgas 0.23327
Total houses 0.291235
KW 0.353657
HTmmBTU 0.083478
Combined mmBTU 0.183562
HTmmBTUgalsgas 0.285535
Total light bulbs 0.229835
12 International Journal of Energy Research
Specically, in the Cronkite building, the random forest
method outperformed LSTM with an R2 value of 0.89. Nev-
ertheless, in terms of other metrics such as MSE and RMSE,
LSTM yielded comparably smaller values than the RF
method. Moreover, there was a signicant dierence in the
MAPE results, with LSTM generating fewer errors compared
to random forest. This observation suggests that, in terms of
errors, LSTM performed better and produced a lower num-
ber of errors compared to RF. According to the forecast eval-
uation, the square error method was deemed a more suitable
evaluation metric for assessing the accuracy of the predic-
tions. Following this examination, it became clear that the
gradient boosting regressor (GBR) method performed the
best across all buildings.
Considering the data presented in Table 6, it is evident
that the algorithm closest to the real testing values is the gra-
dient boosting regressor, demonstrating good precision. The
Feature ranking
Feature
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
KW KWS CHWTON HTmmBTU Combined mmBTU Totalgalsgas Totallightbulbs Total#Houses CHWTONgalsgas HTmmBTUgalsgas
Importance
Figure 11: Feature importance.
Table 5: Predictions for performance evaluation using trained models.
Building Method R2 MSE RMS MAE MAPE
CLAS RF 0.8506 27.245 16.530 219.73 76.947
CLAS LSTM 0.8669 11.0921 13.3036 79.0677 56.0298
CLAS GBR 0.984 8.148 9.335 71.722 40.2587
NHAI RF 0.479 56.293 20.439 47.78 8.45123
NHAI LSTM 0.8372 27.10199 19.2844 33.0788 48.11382
NHAI GBR 0.795 15.089 17.4370 32.675 52.57254
Cronkite RF 0.89318 19.821 10.8491 117.704 56.54793
Cronkite LSTM 0.76096 26.12360 7.3153 29.09945 64.8606
Cronkite GBR 0.99817 9.1734 4.04234 16.3405 36.34167
Table 6: Real and predicted average consumption for each method.
Real values GBR test LSTM test RF test
5364.07 5511.33 5646.09 5666.75
5902.25 5881.72 5811.137 5822.69
5915.77 5900.722 5871.49 5870.67
5496.93 5491.65 5499.29 5496.55
5512.42 5523.29 5535.02 5535.18
6173.30 6178.63 6366.57 6354.28
6141.73 6296.113 6345.22 6365.09
6302.20 6364.17 6371.89 6389.07
6182.52 6182.480 6234.04 6240.37
6251.62 6171.9 6166.97 6168.348
6251.62 5606.99 5530.53 5527.52
5602,05 5626.398 5637.729 5527.52
5678,72 6769.29 6437.53 5641.79
6842,53 6893.64 6884.76 6417.95
6980,2 6701.16 6606.624 6876.06
6767,83 6695.89 6520.20 6622.32
6568,83 6394.089 6358.6 6506.90
5789,52 5730.96 5596.32934 6355.81
Table 7: Cross-validation score for models.
Algorithm RF LSTM GBR
Validation score 0.83 0.92 0.95
13International Journal of Energy Research
long short-term memory (LSTM) method follows in second
place, and the random forest algorithm comes last in terms
of accuracy in predicting average consumption. In the con-
text of result validation, K-fold cross-validation is a highly
suitable technique for our case due to its inherent advan-
tages. By partitioning the data set into K subsets, each con-
taining a representative sample of the data, K-fold cross-
validation ensures thorough training and validation of the
model. This approach maximizes data utilization and mini-
mizes bias, as every data point is utilized for both training
and validation across dierent folds. Furthermore, the aver-
aging of performance metrics over multiple splits provides a
robust evaluation, eectively reducing the variance associ-
ated with a single train-test split. Additionally, K-fold
cross-validation facilitates better generalization by assessing
the models performance across diverse subsets of the data,
ensuring that it can eectively handle various scenarios. Its
utility extends to hyperparameter tuning, enabling the com-
parison of dierent parameter congurations across multiple
validation sets.
In our scenario, we choose 5-fold cross-validation for its
moderate data set size, balancing computational eciency
and robust performance estimation. This method ensures
reliable model evaluation without excessive computational
overhead and aligns with common practices in the eld,
allowing easier comparison with existing literature and
benchmarks. Table 7 provides the outcome of the 5-fold
cross-validation.
A line graph comparison was used to better demonstrate
the dierence between the actual and anticipated average
consumption levels, as depicted in Figures 1214. In addi-
tion, Figures 1517 show the graphical presentation of the
regression line for the three buildings. In the CLAS and
Cronkite buildings, the gradient boosting regressor (GBR)
produces a symmetric regression line, indicating that its
predicted values closely align with the actual ones. Con-
versely, for the NHAI building data set, characterized by
nonsymmetrical data, long short-term memory (LSTM)
outperforms other models due to its ability to capture tem-
poral dependencies.
However, in the case of NHAI, the performance dier-
ence between LSTM and GBR is minimal, highlighting the
suitability of both algorithms for dierent data characteris-
tics. GBR excels in all cases, while LSTMs recurrent nature
makes it valuable for handling nonlinear, time-dependent
data.
7000
6500
6000
5500
5000
4500
0 50 100 150 200 250 300
1.0
0.8
0.6
0.4
0.2
0.0
1.0
0.8
0.6
0.4
0.2
0.0
7000
6500
6000
5500
5000
4500
0 100 200 400 600 700 800
7500
4000
7000
6500
6000
5500
5000
4500
7000
6500
6000
5500
5000
4500
7500
4000
300 500
0 50 100 150 200 250 300 0 100 200 400 600 700 800300 500
0 50 100 150 200 250 300 0 100 200 400 600 700 800300 500
Observed
Predicted_GBR
Observed
Predicted_LSTM
Observed
Predicted_RF
Figure 12: Real and predicted average consumption for CLAS building.
14 International Journal of Energy Research
From the analysis of all the tables and gures, we con-
clude that the best performances are consistently achieved
by the gradient boosting regressor (GBR). GBRs sequential
training approach trains weak learners sequentially, correct-
ing errors from previous iterations, and ne-tuning the
models predictive capabilities with each step. Additionally,
gradient descent optimization minimizes prediction errors,
leading to more accurate predictions. Following GBR, long
short-term memory (LSTM) stands out as it is specically
designed for handling sequential data, making it well-
suited for time series forecasting and similar tasks. Its ability
to understand and process temporal patterns contributes to
accurate predictions in time-dependent scenarios. Lastly,
the random forest algorithm also delivers good results, par-
ticularly when it comes to capturing complex nonlinear cor-
relations between features and the subject variable, and its
ability to model complex interactions and patterns makes
it eective.
The CLAS building has a signicantly higher energy con-
sumption rate, exceeding 30 kWh, in contrast to the other
buildings. The main reason for this dierence is the large sur-
face area and the simultaneous use for many educational
objectives. On the other hand, the Cronkite building has an
energy consumption rate of 26 kWh/h, while NHAI has a
consumption rate of 12 kWh per hour. Predictive modeling
approaches are necessary for ecient energy allocation and
management. Within this particular instance, the gradient
boosting regressor model demonstrates its superiority in
eectively predicting outcomes for both the CLAS and Cron-
kite buildings. The choice is backed by the models remark-
able performance metrics, as shown by its coecient of
determination (R-squared) values of 0.99 for Cronkite and
0.98 for CLAS. This model improves the accuracy of forecast-
ing by oering proactive insights into the energy needs of
each building. It also helps in preventing energy loss before
it happens and promotes eorts to reduce energy usage.
5. Comparison with the Previous Study
The study compared three algorithms: random forest,
LSTM, and gradient boosting regressor, revealing their per-
formance in forecasting monthly average consumption.
8000
7500
7000
6500
6000
5500
1.0
0.8
0.6
0.4
0.2
0.0
0.7
0.6
0.5
0.4
0.2
0.0
0.3
0.1
8000
7000
6000
4000
5000
8000
7500
7000
6500
6000
5500
8000
7000
6000
4000
5000
0 50 100 150 200 250 300 0 100 200 400 600 700 800300 500
0 50 100 150 200 250 300 0 100 200 400 600 700 800300 500
0 50 100 150 200 250 300 0 100 200 400 600 700 800300 500
Observed
Predicted_LSTM
Observed
Predicted_RF
Observed
Predicted_GBR
Figure 13: Real and predicted average consumption for Cronkite building.
15International Journal of Energy Research
The development of prediction models demonstrated their
capabilities, urging further optimization. The ndings also
led to a comparative analysis with previous machine learn-
ing studies. In the rst research conducted by Khaoula
et al. in 2022 [40], four machine learning algorithms were
implemented to predict energy demand for a commercial
building over two years. The algorithms used were multiple
linear regression (MLR), long short-term memory (LSTM),
simple linear regression (LR), and random forest (RF). The
results indicated that LSTM performed the best, followed
by RF, MLR, and LR, providing valuable insights into the
regression algorithmscapabilities. In the second research,
Khaoula et al. in 2023 [41] examined energy consumption
prediction in a low-energy house over four months. Unlike
the rst research, this time, the prediction considered not
only the houses energy but also its appliances. Three
machine learning algorithms, namely, articial neural net-
works (ANN), recurrent neural networks (RNN), and ran-
dom forest (RF), were employed for tests. Recurrent neural
networks especially LSTM once again outperformed the
other algorithms, achieving an impressive accuracy of 96%.
RF was followed with 88% accuracy. However, ANN yielded
negative predictions, indicating its unsuitability for time
series data sets. Furthermore, in their research, Khaoula
et al. [42] used three deep learning algorithmsrecurrent
neural networks (RNNs), articial neural networks (ANNs),
and autoregressive neural networks (AR-NNs)to forecast
the total load of HVAC systems. The results showed that
the autoregressive neural network model outperformed the
other two due to its ability to capture temporal dependencies
and patterns in time series data, which is crucial for HVAC
load prediction. AR-NNs use a simpler architecture, focus-
ing on past observations to predict future values, and their
autoregressive nature allows them to eectively model the
self-dependence of time series data, leading to more accurate
predictions.
Drawing insights from these three studies, signicant
ndings emerge regarding the ecacy of regression algo-
rithms for energy consumption prediction. Specically, long
short-term memory (LSTM) and random forest (RF) consis-
tently emerge as top performers, especially in handling time
series data. However, our research introduces a novel aspect
by exploring the eectiveness of gradient boosting regressor
(GBR), which yielded exceptional results. Notably, GBR
5000
4500
4000
3500
3000
2500
2000
5000
4500
4000
3500
3000
2500
2000
5000
4500
4000
3500
3000
2500
2000
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
1.0
0.8
0.6
0.4
0.2
0.0
5000
4500
4000
3500
3000
2500
2000
0 50 100 150 200 250 300 0 100 200 400 600 700 800300 500
0 50 100 150 200 250 300 0 100 200 400 600 700 800300 500
0 50 100 150 200 250 300 0 100 200 400 600 700 800300 500
Observed
Predicted_RF
Observed
Predicted_LSTM
Observed
Predicted_GBR
Figure 14: Real and predicted average consumption for NHAI building.
16 International Journal of Energy Research
achieved remarkable precision, boasting an impressive accu-
racy of 98.9%. Moreover, compared to other algorithms,
GBR demonstrated superior performance with fewer errors,
as evidenced by lower root mean square (RMS), mean abso-
lute error (MAE), and mean absolute percentage error
(MAPE) values. This underscores the potential of GBR as a
formidable contender in energy consumption prediction
tasks, oering a promising alternative to LSTM and RF in
certain contexts.
6. Perspectives and Future Work
For future contributions, we plan to optimize the GBR
model by increasing the data used for training and predic-
tion, which may improve eciency and performance on
larger data sets. We intend also to apply a novel approach
to the gradient boosting optimizer to ne-tune the models
parameters and hyperparameters more eectively. These
eorts are aimed at enhancing the GBR algorithms perfor-
mance for accurate energy consumption forecasting and
other applications.
Another signicant contribution of our future research
lies in the utilization of transformer models for predicting
diurnal energy consumption patterns. Transformers, origi-
nally designed for natural language processing tasks, have
shown remarkable capabilities in capturing long-range
dependencies in sequential data, making them well-suited
for time series forecasting tasks as well. By applying trans-
former architectures to predict diurnal energy consumption,
we aim to leverage their ability to eectively model complex
4500
4500 5000 5500 6000
Observed elec. usage (kWh)
6500 7000
5000
5500
Predicted LSTM usage (kWh):
6000
6500
7000
4500
4000 5000 6000 7000
Observed elec. usage (kWh)
5000
5500
Predicted LSTM usage (kWh):
6000
6500
7000
4500
4500 5000 5500 6000 6500 7000
Observed elec. usage (kWh)
5000
5500
Predicted GBR (kWh):
6000
6500
7000
Figure 15: Regression line between observation and predictions for CLAS building.
5500
6000
6500
Predicted RF usage (kWh):
7000
7500
8000
5500 6000 6500 7000 7500 8000
Observed elec. usage (kWh)
4000
5000
6000
7000
8000
Predicted LSTM usage (kWh):
4000 5000 6000 7000 7000
Observed elec. usage (kWh)
4000
5000
6000
7000
8000
Predicted GBR (kWh):
4000 5000 6000 7000 8000
Observed elec. usage (kWh)
Figure 16: Regression line between observation and predictions for Cronkite building.
2000
2500
3000
3500
4000
4500
5000
Predicted RF usage (kWh):
2000 3000 4000 5000
Observed elec. usage (kWh)
2000
2500
3000
3500
4000
4500
5000
Predicted LSTM usage (kWh):
2000 3000 4000 5000
Observed elec. usage (kWh)
2000
2500
3000
3500
4000
4500
8000
Predicted GBR
2000 3000 4000 5000
Observed elec. usage (kWh)
Figure 17: Regression line between observation and predictions for NHAI building.
17International Journal of Energy Research
temporal patterns and dependencies inherent in energy con-
sumption data. Our case study focuses on commercial and
institutional buildings, where accurate energy consumption
prediction is crucial for optimizing building operations,
reducing costs, and minimizing environmental impact.
7. Conclusion
Our major focus in this research is developing an energy
consumption forecasting model given the environment of
three institutional buildings that have adopted the smart
building ecosystem. From January 2020 to January 2023,
the collected energy consumption data was subjected to sta-
tistical analysis to assess its normality. The skewness and
kurtosis values showed that the data had a variety of distri-
bution characteristics.
The predictive model development process involved data
preprocessing, which included handling missing data and
identifying feature importance. For this researchs objective,
three supervised machine learning methods, namely, gradi-
ent boosting regressor (GBR), long short-term memory
(LSTM), and random forest (RF), were selected as the algo-
rithms for the predictive model. The comparison of these
strategies was based on an assessment of their production
structures and prediction abilities. The results of our model
training and testing indicated that each strategy performed
dierently for each building. Remarkably, the GBR approach
continually produced the most promising outcomes,
cementing its position as the best-performing strategy across
all three buildings: CLAS, NHAI, and Cronkite. GBRs mean
absolute percentage error (MAPE) values were 9.337, 12.338,
and 4.045 for CLAS, NHAI, and Cronkite, respectively.
Additionally, GBR achieved a lower mean absolute error
(MAE) for CLAS and Cronkite (71.04 and 53.77, respec-
tively), while RF and LSTM yielded lower MAE results for
these two buildings. Moreover, while computing average
consumption using demand data, it was shown that the gra-
dient boosting regressor (GBR) displayed greater accuracy in
anticipating demand. This performance outperformed all
other approaches in all buildings.
In terms of future study recommendations, it is sug-
gested to use more powerful computers or platforms to run
the LSTM algorithm, potentially improving its performance.
Additionally, exploring hybrid or ensemble methods may be
benecial, as they have shown higher accuracy than single
regressors. Lastly, a comparison with another smart building
could be included to distinguish and validate the obtained
results. These recommendations can further enhance the
understanding and applicability of the energy consumption
predictive model.
Data Availability
The collected data was saved in an open-source website
server [43] and could be manually downloaded from the
platforms website in the form of a CSV le with any sort
of aggregation [43] (https://portal.emcs.cornell.edu/d/2/
dashboard-list?orgId=2).
Conflicts of Interest
The authors declare that they have no conicts of interest.
Acknowledgments
The authors thank the National Center for Scientic and
Technical Research (CNRST) for supporting and funding
this research.
References
[1] H. Farzaneh, L. Malehmirchegini, A. Bejan, T. Afolabi,
A. Mulumba, and P. P. Daka, Articial intelligence evolution
in smart buildings for energy eciency,Applied Sciences,
vol. 11, no. 2, p. 763, 2021.
[2] E. Khaoula, B. Amine, and B. Mostafa, Machine Learning and
the Internet Of Things for Smart Buildings: A state of the art
survey,in 2022 2nd International Conference on Innovative
Research in Applied Science, Engineering and Technology (IRA-
SET), pp. 110, Meknes, Morocco, 2022.
[3] A. Widya-Hasuti, A. Mardani, D. Streimikiene, A. Sharifara,
and F. Cavallaro, The role of process innovation between
rm-specic capabilities and sustainable innovation in SMEs:
empirical evidence from Indonesia,Sustainability, vol. 10,
no. 7, p. 2244, 2018.
[4] MarketsandMarkets, Smart Building Market by Component
(Solution and Services), Solution (Safety and Security Manage-
ment, Energy Management, Building Infrastructure Manage-
ment, Network Management, and IWMS), Services, Building
Type, Region - Global Forecast to 2025, 2021.
[5] K. B. Anacker, Healthy Buildings: How Indoor Spaces Drive
Performance and Productivity, J. G. Allen and J. D. Macomber,
Eds., Harvard University Press, Cambridge, 2020.
[6] American Council for an Energy-Ecient Economy, Building
technologies,n.d., https://www.aceee.org/topics/building-
technologies.
[7] US Department of Energy, Energy ecient commercial build-
ings,n.d., https://www.energy.gov/eere/buildings/energy-
ecientcommercial-buildings.
[8] Intel, Intelligent buildings: saving energy, making occupants
happy,2017, https://www.intel.com/content/www/us/.
[9] Z. Chen, C. Lin, X. Zhou, L. Huang, M. Sandanayake, and
P.-S. Yap, Recent technological advancements in BIM and
LCA integration for sustainable construction: a review,Sus-
tainability, vol. 16, no. 3, p. 1340, 2024.
[10] P. IEA, World Energy Outlook 2022, International Energy
Agency (IEA), Paris, France, 2022.
[11] Smart building market- by application (commercial, residen-
tial, and industrial), by automation type (energy management,
infrastructure management, network and communication
management), by service (professional service and managed
services) and by region global industry perspective, compre-
hensive analysis, and forecast 2021-2028,https://www
.zionmarketresearch.com/report/smart-buildingmarket.
[12] S. Seyedzadeh, F. P. Rahimian, I. Glesk, and M. Roper,
Machine learning for estimation of building energy consump-
tion and performance: a review,Visualization in Engineering,
vol. 6, pp. 120, 2018.
18 International Journal of Energy Research
[13] M. A. Ahajjam, D. B. Licea, C. Essayeh, M. Ghogho, and
A. Kobbane, MORED: a Moroccan buildingselectricity con-
sumption dataset,Energies, vol. 13, no. 24, p. 6737, 2020.
[14] N. Somu, M. R. Gauthama Raman, and K. Ramamritham, A
hybrid model for building energy consumption forecasting
using long short term memory networks,Applied Energy,
vol. 261, article 114131, 2020.
[15] I. W. A. Suranata, I. N. K. Wardana, N. Jawas, and I. K. A. A.
Aryanto, Feature engineering and long short-term memory
for energy use of appliances prediction,TELKOMNIKA (Tele-
communication Computing Electron- ics and Control), vol. 19,
no. 3, pp. 920930, 2021.
[16] M. K. M. Shapi, N. A. Ramli, and L. J. Awalin, Energy con-
sumption prediction by using machine learning for smart
building: case study in Malaysia,Developments in the Built
Environment, vol. 5, article 100037, 2021.
[17] M. Faiq, K. G. Tan, C. P. Liew et al., Prediction of energy con-
sumption in campus buildings using long short-term mem-
ory,Alexandria Engineering Journal, vol. 67, pp. 6576, 2023.
[18] T. Kawahara, K. Sato, and Y. Sato, Battery voltage prediction
technology using machine learning model with high extrapola-
tion accuracy,International Journal of Energy Research,
vol. 2023, Article ID 5513446, 17 pages, 2023.
[19] S. A. Mohammed, O. A. Awad, and A. M. Radhi, Optimiza-
tion of energy consumption and thermal comfort for intelli-
gent building management system using genetic algorithm,
Indonesian Journal of Electrical Engineering and Computer Sci-
ence, vol. 20, no. 3, pp. 16131625, 2020.
[20] N. F. Aurna, M. T. M. Rubel, T. A. Siddiqui et al., Time series
analysis of electric energy consumption using autoregressive
integrated moving average model and Holt Winters model,
TELKOMNIKA (Telecommunication Computing Electronics
and Control), vol. 19, no. 3, pp. 9911000, 2021.
[21] Z. Ferdoush, B. N. Mahmud, A. Chakrabarty, and J. Uddin,
A short-term hybrid forecasting model for time series
electrical-load data using random forest and bidirectional
long short-term memory,International Journal of Electrical
and Computer Engineering, vol. 11, no. 1, pp. 763771,
2021.
[22] Y. He and K. F. Tsang, Universities power energy manage-
ment: a novel hybrid model based on iCEEMDAN and Bayes-
ian optimized LSTM,Energy Reports, vol. 7, pp. 64736488,
2021.
[23] X.-B. Jin, W.-Z. Zheng, J.-L. Kong et al., Deep-learning fore-
casting method for electric power load via attention-based
encoder decoder with Bayesian optimization,Energies,
vol. 14, no. 6, p. 1596, 2021.
[24] R. Olu-Ajayi, H. Alaka, I. Sulaimon, F. Sunmola, and S. Ajayi,
Building energy consumption prediction for residential
buildings using deep learning and other machine learning
techniques,Journal of Building Engineering, vol. 45, article
103406, 2022.
[25] J. Jang, J. Han, and S.-B. Leigh, Prediction of heating energy
consumption with operation pattern variables for non-
residential buildings using LSTM networks,Energy and
Buildings, vol. 255, article 111647, 2022.
[26] A. N. Ndife, W. Rakwichian, P. Muneesawang, and Y. Mensin,
Smart power consumption forecast model with optimized
weighted average ensemble,IAES International Journal of
Articial Intelligence, vol. 11, no. 3, p. 1004, 2022.
[27] V. H. Duong and N. H. Nguyen, Machine learning algorithms
for electrical appliances monitoring system using open-source
systems,IAES International Journal of Articial Intelligence,
vol. 11, no. 1, p. 300, 2022.
[28] C. Vennila, T. Anita, T. Sri Sudha et al., Forecasting solar
energy production using machine learning,International
Journal of Photoenergy, vol. 2022, Article ID 7797488, 7 pages,
2022.
[29] S. Kapp, J.-K. Choi, and T. Hong, Predicting industrial build-
ing energy consumption with statistical and machine learning
models informed by physical system parameters,Renewable
and Sustainable Energy Reviews, vol. 172, article 113045, 2023.
[30] R. Bhol, S. C. Swain, R. Dash, K. J. Reddy, C. Dhanamjayulu,
and B. Khan, Short-term reactive power forecasting based
on real power demand using Holt-Wintersmodel ensemble
by global ower pollination algorithm for microgrid,Interna-
tional Journal of Energy Research, vol. 2023, Article ID
9733723, 22 pages, 2023.
[31] M. M. Asiri, G. Aldehim, F. A. Alotaibi, M. M. Alnai,
M. Assiri, and A. Mahmud, Short-Term Load Forecasting in
Smart Grids Using Hybrid Deep Learning,IEEE Access,
vol. 12, pp. 2350423513, 2024.
[32] Real-time electricity consumption,https://portal.emcs
.cornell.edu/d/2/dashboard-list?orgId=2.
[33] P. Mishra, C. M. Pandey, U. Singh, A. Gupta, C. Sahu, and
A. Keshri, Descriptive statistics and normality tests for statis-
tical data,Annals of Cardiac Anaesthesia, vol. 22, no. 1,
pp. 6772, 2019.
[34] M. R. Segal, Machine learning benchmarks and random forest
regression, 2004.
[35] Y. Yu, X. Si, C. Hu, and J. Zhang, A review of recurrent neural
networks: LSTM cells and network architectures,Neural
Computation, vol. 31, no. 7, pp. 12351270, 2019.
[36] A. Natekin and A. Knoll, Gradient boosting machines, a tuto-
rial,Frontiers in Neurorobotics, vol. 7, p. 21, 2013.
[37] A. Di Bucchianico, Coecient of determination (R
2
),Ency-
clopedia of Statistics in Quality and Reliability, 2008.
[38] D. Wallach and B. Gonet, Mean squared error of prediction
as a criterion for evaluating and comparing system models,
Ecological Modelling, vol. 44, no. 3-4, pp. 299306, 1989.
[39] P. Goodwin and R. Lawton, On the asymmetry of the sym-
metric MAPE,International Journal of Forecasting, vol. 15,
no. 4, pp. 405408, 1999.
[40] E. Khaoula, B. Amine, and B. Mostafa, Evaluation and com-
parison of energy consumption prediction models,in Inter-
national Conference on Advanced Technologies for Humanity,
Marrakech, Morocco, 2022.
[41] E. Khaoula, B. Amine, and B. Mostafa, Evaluation and com-
parison of energy consumption prediction models case study:
smart home,in The International Conference on Articial
Intelligence and Computer Vision, pp. 179187, Springer
Nature Switzerland, Cham, 2023.
[42] E. Khaoula, B. Amine, and B. Mostafa, Forecasting diurnal
Heating Energy Consumption of HVAC system using ANN,
RNN, ARNN,in 2023 14th International Conference on Intel-
ligent Systems: Theories and Applications (SITA), pp. 16,
Casablanca, Morocco, 2023.
[43] Real time utility use data,https://portal.emcs.cornell.edu/d/
2/dashboard-list?orgId=2.
19International Journal of Energy Research
... Another major area of innovation is energy load forecasting. AI models can predict energy demand more accurately by analyzing consumption patterns, weather conditions, and historical data (Elhabyb et al., 2024). This enables energy suppliers to better match production with demand, reducing waste and minimizing reliance on non-renewable sources. ...
Article
Full-text available
This study explores AI's transformative role in enhancing urban energy efficiency. Focusing on AI-driven innovations in smart grids, renewable energy integration, and predictive energy management, it evaluates AI's potential to optimize energy distribution, reduce waste, and improve sustainability. A comprehensive literature review and case studies are employed to analyze the application. The findings indicate that AI technologies, including machine learning and predictive analytics, are crucial for optimizing energy consumption, managing renewable energy variability, and improving smart grid efficiency. AI enhances sustainability by forecasting energy demand and optimizing storage systems. However, challenges such as data privacy concerns, integration complexities with existing infrastructure, and the need for specialized AI expertise pose barriers to broader adoption of these technologies. The study recommends that future research focus on advancing AI technologies for real-time optimization and explainability, as well as addressing the skills gap in AI development. Policymakers and energy stakeholders should invest in AI-driven solutions and establish supportive regulations to promote AI adoption in urban energy systems. In the long term, AI will be pivotal in creating more sustainable, resilient, and energy-efficient cities.
Article
Full-text available
This research aims to develop predictive models to estimate building energy accurately. Three commonly used artificial intelligence techniques were chosen to develop a new building energy estimation model. The chosen techniques are Genetic Programming (GP), Artificial Neural Network (ANN), and Evolutionary Polynomial Regression (EPR). Sixteen energy efficiency measures were collected and used in designing and evaluating the proposed models, which include building dimensions, orientation, envelope construction materials properties, window-to-wall ratio, heating and cooling set points, and glass properties. The performance of the developed models was evaluated in terms of the RMS, R², and MAPE. The results showed that the EPR model is the most accurate and practical model with an error percent of 2%. Additionally, the energy consumption was found to be mainly governed by three factors which dominate 87% of the impact; which are building size, Solar Heating Glass Coefficient (SHGC), and the target inside temperature in summer.
Article
Full-text available
In the high-energy, high-carbon landscape of the construction industry, a detailed and precise life cycle assessment (LCA) is essential. This review examines the role of building information modeling (BIM) software in streamlining the LCA process to enhance efficiency and accuracy. Despite its potential, challenges such as software interoperability and compatibility persist, with no unified standard for choosing BIM-integrated LCA software. Besides, the review explores the capabilities and limitations of various BIM software, LCA tools, and energy consumption tools, and presents characteristics of BIM-LCA integration cases. It critically discusses BIM-LCA integration methods and data exchange techniques, including bill of quantities import, Industry Foundation Classes (IFC) import, BIM viewer usage, direct LCA calculations with BIM plugins, and LCA plugin calculations. Finally, concluding with future perspectives, the study aims to guide the development of advanced LCA tools for better integration with BIM software, addressing a vital need in sustainable construction practices.
Article
Full-text available
Load forecasting in Smart Grids (SG) is a major module of current energy management systems, that play a vital role in optimizing resource allocation, improving grid stability, and assisting the combination of renewable energy sources (RES). It contains the predictive of electricity consumption forms over certain time intervals. Load Forecasting remains a stimulating task as load data has exhibited changing patterns because of factors such as weather change and shifts in energy usage behaviour. The beginning of advanced data analytics and machine learning (ML) approaches; particularly deep learning (DL) has mostly enhanced load forecasting accuracy. Deep neural networks (DNNs) namely Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) have achieved popularity for their capability to capture difficult temporal dependencies in load data. This study designs a Short-Load Forecasting scheme using a Hybrid Deep Learning and Beluga Whale Optimization (LFS-HDLBWO) approach. The major intention of the LFS-HDLBWO technique is to predict the load in the SG environment. To accomplish this, the LFS-HDLBWO technique initially uses a Z-score normalization approach for scaling the input dataset. Besides, the LFS-HDLBWO technique makes use of convolutional bidirectional long short-term memory with an autoencoder (CBLSTM-AE) model for load prediction purposes. Finally, the BWO algorithm could be used for optimal hyperparameter selection of the CBLSTM-AE algorithm, which helps to enhance the overall prediction results. A wide-ranging experimental analysis was made to illustrate the better predictive results of the LFS-HDLBWO method. The obtained value demonstrates the outstanding performance of the LFS-HDLBWO system over other existing DL algorithms with a minimum average error rate of 3.43 and 2.26 under FE and Dayton grid datasets, respectively.
Conference Paper
Full-text available
mproving the heating, ventilation, and air conditioning (HVAC) system to minimize heating and ventilation consumption in large networks of controlled buildings becomes a necessity nowadays. This attempt requires the development of a sophisticated machine learning (ML) model capable of properly predicting energy usage trends. This research focuses on the problem of predicting energy usage in a group of public buildings. Three unique models are used in the study: artificial neural networks, recurrent neural networks, and autoregressive neural networks. Given that the public sector consumes a significant portion of total consumption, accurately anticipating this usage becomes crucial in reaching energy conservation targets. We used data from an energy usage monitoring system placed in a business building in France for our experiment and we employ the aforementioned methodology, which depends on tree-based methodologies. We use measures such as RMSE, MSE, MAE, and R2 to analyze the performance of each technique. Our findings show promise in terms of accurately estimating energy use. In addition, we discovered
Article
Full-text available
Load forecasting is an integral part of the energy study unit to schedule the generating unit by the load demand. Many studies were conducted on load forecasting based on real power demand; however, very few papers were published on reactive power demand. In this research work, an attempt has been made to predict the requirement of reactive power as a function of demand for real power. Household loads are considered for evaluating the demand for reactive power as a critical load. The attempt has been made based on the data collected from the laboratory experimental setup for one year. The load forecasting requires time series analysis of the data set along with error minimization between predicted values and actual value; therefore, the global flower pollination algorithm along with Holt-Winters’ exponential model has been used to predict the reactive power. Autoregressive integrated moving average (ARIMA) and seasonal autoregressive integrated moving average (SARIMA) models have been used as benchmarking models to evaluate the effectiveness of the model under various conditions. Python-developed model has been used to predict the demand for reactive power, and a MATLAB model has been developed to optimize the cost function. A detailed comparative analysis of the proposed model along with some well-established optimized models such as GA, PSO, and FPA has been presented related to evening peak demand for a microgrid architecture in Conclusion. The analysis includes median values of different quantities such as nMBE, nMAE, nRMSE, and RMSE. Normalized MBE, indicating underestimation and overestimation, is negative for ARIMA but 0.42 for HW-GFPA during validation and 0.43 for the testing data set. Normalized RMSE, measuring the variance between actual and forecasted values, is lowest at 0.803 for proposed HW-GFPA during validation and 0.799 for testing.
Article
Full-text available
Battery performance prediction techniques based on machine learning (ML) models and lithium-ion battery (LIB) data collected in the real world have received much attention recently. However, poor extrapolation accuracy is a major challenge for ML models using real-world data, as the data frequency distribution can be uneven. Here, we have investigated the extrapolation accuracy of the ML models by using artificial data generated with an electrochemical simulation model. Specifically, we set a lower open circuit voltage (OCV) limit for the training data and generated data limited to the higher state of charge (SOC) region to train the voltage prediction model. We have validated the root mean squared error (RMSE) of the voltage for the test data at several lower OCV limit settings and defined the average + 3 standard deviations of them as an evaluation metric. Eight representative ML models were evaluated, and it was found that the multilayer perceptron (MLP) showed an accuracy of 92.7 mV, which was the best extrapolation accuracy. We also evaluated models with published experimental data and found that the MLP had an accuracy of 102.4 mV, reconfirming that it had the best extrapolation accuracy. We also found that MLP was robust to changes in the data of interest since the accuracy degradation when changing from simulation to experimental data was as small as a factor of 1.1. This result shows that MLP can achieve higher voltage prediction accuracy even when collecting data for comprehensive SOC conditions is difficult.
Article
Full-text available
Smart power forecasting enables energy conservation and resource planning. Power estimation through previous utility bills is being replaced with machine intelligence. In this paper, a neural network architecture for demand side power consumption forecasting, called SGtechNet, is proposed. The forecast model applies ConvLSTM-encoder-decoder algorithm designed to enhance the quality of spatial encodings in the input feature to make a 7-day forecast. A weighted average ensemble approach was used, where multiple models were trained but only allow each model’s contribution to the prediction to be weighted proportionally to their level of trust and estimated performance. This model is most suitable for low-powered devices with low processing and storage capabilities like smartphones, tablets and iPads. The power consumption comparison between a manually operated home and a smart home was investigated and the model’s performance was tested on a time-domain household power consumption dataset and further validated using a real time load profile collated from the School of Renewable Energy and Smart Grid Technology, Naresuan University Smart Office. An improved root mean square error (RMSE) of 358 kwh was achieved when validated with holdout validation data from the automated office. Overall performance error, forecast and computational time showed a significant improvement over published research efforts identified in a literature review.
Article
Full-text available
When it comes to large-scale renewable energy plants, the future of solar power forecasting is vital to their success. For reliable predictions of solar electricity generation, one must take into consideration changes in weather patterns over time. In this paper, a hybrid model that integrates machine learning and statistical approaches is suggested for predicting future solar energy generation. In order to improve the accuracy of the suggested model, an ensemble of machine learning models was used in this study. The results of the simulation show that the proposed method has reduced placement cost, when compared with existing methods. When comparing the performance of an ensemble model that integrates all of the combination strategies to standard individual models, the suggested ensemble model outperformed the conventional individual models. According to the findings, a hybrid model that made use of both machine learning and statistics outperformed a model that made sole use of machine learning in its performance.
Chapter
Predicting a structure’s energy usage is an essential part of achieving energy efficiency objectives. Engineering, AI-based, and hybrid approaches can all be used to predict how much energy a building will require; however, we choose the AI-based method because it uses historical data to make predictions about future energy usage rather than thermodynamic equations the other approaches rely on. As a result, the objective of this study is to put several prediction models for energy usage into practice and assess them, the recommended algorithms are linear regression, random forest, and artificial neural network. Our study’s data set was gathered from a house that served as a case study, and we compared each approach’s efficacy using RMSE, R squared, MAE, and MAPE measurements. KeywordsSmart houseMachine learningHome energy use predictionSupervised learningIntelligent buildings
Article
In this paper, Long Short-Term Memory (LSTM) was proposed to predict the energy consumption of an institutional building. A novel energy usage prediction method was demonstrated for daily day-ahead energy consumption by using forecasted weather data. It used weather forecasting data from a local meteorological organization, the Malaysian Meteorological Department (MET). The predictive model was trained by considering the dependencies between energy usage and weather data. The performance of the model was compared with Support Vector Regression (SVR) and Gaussian Process Regression (GPR). The experimental results with a dataset obtained from a building in Multimedia University, Malacca Campus from January 2018 to July 2021 outperformed the SVR and GPR. The proposed model achieved the best RMSE scores (561.692–592.319) when compared to SVR (3135.590–3472.765) and GPR (1243.307–1334.919). Through experimentation and research, the dropout method reduced overfitting significantly. Furthermore, feature analysis was done with SHapley Additive exPlanation to identify the most important weather variables. The results showed that temperature, wind speed, rainfall duration and the amount had a positive effect on the model. Thus, the proposed approach could aid in the implementation of energy policies because accurate predictions of energy consumption could serve as system fault detection and diagnosis for buildings.
Article
The industrial sector consumes about one-third of global energy, making them a frequent target for energy use reduction. Variation in energy usage is observed with weather conditions, as space conditioning needs to change seasonally, and with production, energy-using equipment is directly tied to production rate. Previous models were based on engineering analyses of equipment and relied on site-specific details. Others consisted of single-variable regressors that did not capture all contributions to energy consumption. New modeling techniques could be applied to rectify these weaknesses. Applying data from 45 different manufacturing plants obtained from industrial energy audits, a supervised machine-learning model is developed to create a general predictor for industrial building energy consumption. The model uses features of air enthalpy, solar radiation, and wind speed to predict weather-dependency; motor, steam, and compressed air system parameters to capture support equipment contributions; and operating schedule, production rate, number of employees, and floor area to determine production-dependency. Results showed that a model that used a linear regressor over a transformed feature space could outperform a support vector machine and utilize features more representative of physical systems. Using informed parameters to build a reliable predictor will more accurately characterize a manufacturing facility's energy savings opportunities.