Valerija Mladenovikj1, Tamara Ilieva2, Mentor: Milos Jovanovik3
1 Faculty of Electrical Engineering and Information Technologies,
Ss. Cyril and Methodius University in Skopje, N. Macedonia
2 3 Faculty of Computer Science and Engineering,
Ss. Cyril and Methodius University in Skopje, N. Macedonia
1 firstname.lastname@example.org 2 email@example.com 3 firstname.lastname@example.org
MACHINE LEARNING APPROACHES FOR SMART CITY ENERGY
With the constant increase in population and the growing impact of climate change, energy efficiency on
a household and a city-wide level represents a significant key in the process of transformation of smart cities.
Recently, machine learning approaches have been proven to be beneficial in addressing several global problems,
especially in areas where large amounts of data is available. In this paper, we propose the use of machine
learning methods to analyze the energy consumption behavior of households on a daily and seasonal basis, in
order to detect the parts of the days and seasons in which they have peak energy consumption. Our machine
learning models allow us to segment households according to daily and seasonal behavior into different groups.
Both energy suppliers and individual households may benefit from the segmentation carried out in this paper.
Energy suppliers can be precisely aware about the expected energy consumption by the different groups of
customers, in different parts of the day, by knowing their daily behaviour. They can also precisely target
households with energy efficient programs and provide more reliable estimates of energy savings. Individual
households can reduce costs by increasing energy consumption during off-peak cheaper tariff periods, and
would also potentially be more careful about their behaviour if they knew how efficient their energy
consumption is, compared to other households. Therefore, we believe that the analysis in this paper can provide
a solid foundation for the construction of an energy efficiency system which is necessary for creating a smart
Key words: artificial intelligence, climate change, energy efficiency, machine learning, smart city
A smart city represents a city that ensures sustainability and efficiency by using smart monitoring and
control technology . Over the past century, climate change has caused an annual global temperature rise of
0.83ºC (1.5ºF) and it is estimated it will cause an additional rise of 0.28–4.78ºC (0.5–8.6ºF) in the next century.
Climate change is often presented as a future problem, but the science of climate change has actually been
around for many decades now and the effects are already being felt in many fields . Approximately 40% of
EU energy consumption and 36% of greenhouse gas emissions are caused by buildings. Therefore, buildings
are Europe's single largest energy user . The energy efficiency of a building is determined by calculating the
energy used during the year, in terms of heating, hot water, lighting, etc. The final energy consumption values
are calculated in kilowatt-hours per square meter (kWh / m2 per year) and in kilograms of CO2 per square meter
of housing (kgCO2 / m2 per year). The efficiency of a house is calculated as the annual CO2 emissions and the
annual consumption of non-renewable primary energy of that accommodation . It is becoming more and
more necessary to have an energy-efficient building, as energy emerges as a crucial economic problem due to
high energy demand and unsustainable energy supplies. This means that even households must evaluate how
efficiently their energy is being used. Energy-efficient buildings provide ways to save money and reduce
emissions of greenhouse gases . In terms of emissions and green actions, energy consumption efficiency was
not considered prior to the green-thinking period. Currently, climate change evidence motivates policymakers
to control atmospheric emissions and general pollution, i.e. soil, air, water . As the changing world affects
future cities, their inhabitants continue to contribute to global climate change. Climate scientists acknowledge
that greenhouse gas (GHG) emissions from burning fossil fuels are likely to cause this global temperature rise.
With no effective steps to improve energy efficiency in buildings, energy demand in buildings will increase by
50% globally by 2050, based on projected growth. Reducing GHG emissions through a combination of
technology and policy decisions is one essential aspect of creating smart buildings and cities. Building GHG
emissions must be reduced to a quarter of the current level by 2050 to reach the 2ºC (3.6ºF) goal. We need to
pursue new building paradigms in order to achieve the ambitious goal of reducing GHG emissions by 75%. As
Albert Einstein observed, we can’t solve problems by using the same kind of thinking we used when we created
Since the first appearances of programmable computers, people started questioning if such devices could
become intelligent. Today, with many practical applications and active research topics, artificial intelligence
(AI) is a thriving field. In terms of a hierarchy of concepts, AI deep learning is an approach that enables
computers to learn from experience and understand the data. This approach eliminates the need for human
operators to formally define all the information that the machine requires . Machine learning (ML) as an
application of AI can be broadly described as computational methods that use experience to improve
performance or to make accurate predictions. Experience here refers to the previous information available to
the learner, which normally takes the form of collected data and is made available for review. This data may be
in the form of digitized human-labeled training sets or other forms of information gathered through
environmental interaction. Its consistency and size are crucial to the success of the learner's predictions in all
situations . In this paper, we strive to solve and validate problems in the field of energy efficiency of
households with the aid of machine learning. Making use of the numerous opportunities that machine learning
provides could satisfy the rising need of energy efficiency systems needed by households and other buildings.
2 RELATED WORK
There are multiple research papers published on the topic of using machine learning and artificial
intelligence in general for tackling the problem of energy efficiency. Below, we provide an overview of the
most notable papers, in order to highlight the existing research done in this field.
2.1 Machine Learning Based System for Managing Energy Efficiency of Public Sector as an Approach
Towards Smart Cities
This paper proposes a system for managing energy efficiency of public buildings . Their motivation
comes from the statement of the European Parliament and Council that 40% of all energy consumption in the
EU belongs to the buildings sector. Many of the countries have implemented a central information system that
gathers data about the energy consumption of public sector buildings. However, what they lack is intelligent
models based on machine learning and big data platforms which would be able to process those large amounts
of data in order to optimize energy consumption. The dataset used in their research was from the Croatian
government, and included data about more than 17,000 public buildings along with attributes about
construction, heating, cooling and energy data, meteorological, geospatial and occupational attributes describing
environmental factors. The input space consists of 82 attributes and plus the energy consumption data for
electricity, natural gas, heat, water, as well as CO2 emission in the period from 2006 to 2017. Their preprocessing
steps included variable reduction procedures, outlier removal and missing values replacement, as well as data
normalization by the distance between the minimum and maximum attribute. After the preprocessing phase,
they selected a sample of 575 public buildings, in order to create a machine learning model. Table 1 shows the
sampling procedures they used for building the model.
Since buildings don't fulfill the assumptions of linearity, the used machine learning methods were selected
due to their nonlinearity and the ability to learn from historical data, i.e. past values. In order to create predictive
models of energy consumption and efficiency, they selected three machine learning models: artificial neural
networks (deep ANNs), recursive partitioning methods such as CART decision trees, and random forest (RF).
The most accurate model was produced by the random forest method, yielding normalized root mean square
error (NRMSE) of 0.0989, and symmetric mean absolute percentage error (SMAPE) of 13.5875%, although all
three tested methods: DNN, Rpart tree and RF have produced SMAPE below 20%, showing a potential for all
three machine learning methods in predicting energy consumption.
Furthermore, they suggest an intelligent system which gathers and preprocess data and then uses the created
ML models to predict the consumption of energy from diverse sources such as electricity, natural gas, etc. After
generating the predictions, it assists users in making decisions on future actions benefiting the public buildings
sector by improving energy efficiency, reducing energy consumption and reducing energy cost.
2.2 Buildings Energy Efficiency Analysis and Classification Using Various Machine Learning
The research introduced in this paper presents the benefits of the application of machine learning
techniques on smart buildings . Specifically, the verification of building energy models is shown. In order to
calculate their energy efficiency, this evaluation is carried out and, based on the model, obtains an accurate
prediction of a building before construction. It is an aspect that is common today because it contributes to smart
cities’ sustainability and beyond. This paper discusses the notion of energy efficiency in buildings and offers an
overview of the classification of buildings by relation to the variety of regulations in different countries based
on their energy efficiency. In order to achieve sustainability goals, the authors also identified the increasing
importance of information and communication technology (ICT). Currently, ICT is the way to collect, process
and prepare vast and adequate quantities of data for their use, to improve energy efficiency, among other
The authors suggest an approach where the outcome prediction is gradual and the scalability of machine
learning techniques is taken into account. Moreover, it is shown that some parameters have a greater effect on
energy savings than others. In summary, they establish a technique that makes it possible to incorporate new
parameters in the model to determine their importance and decide whether to use them or remove them. In their
research, they use histograms and distribution intervals of parameters’ values to identify whether they are
impacting the classification problem. For six different classification algorithms, the results of the three
approaches are drawn. Specifics on the accuracy of the classification algorithm for the various categories, as
well as other quality metrics, are given in the results report.
Table 1. Sampling procedures used for modeling
Table 2. F1-scores comparison for the best classifiers.
Table 2. compares the results of the best classifiers for the different approaches and the classification in
the different groups. This comparison is based on the F1-score. On average, the decision tree gives the best
results. The authors notice that the Gaussian classifier gives really poor results in the first approach, but it is
also clear that it achieves better results than any other classifier in the 2nd and 3rd approach. This classifier
shows good scores when the parameters in the dataset spreads in their range, especially if they are uniformly
distributed. The decision tree gives bad results for noisy datasets.
The datasets that we use in our analysis are extracted from the data contained in the London data store,
which contains energy consumption readings for a sample of 5,567 London households that took part in the UK
Power Networks led Low Carbon London project between November 2011 and February 2014 . The
versions that we use contain information about the energy consumption of 112 households. The first dataset has
information about the daily energy consumption for each household. It has 8 columns: date, energy_median,
energy_mean, energy_max, energy_count, energy_std, energy_sum, energy_min. The second dataset contains
readings about the energy consumption in kWh, taken at 30 minute intervals, along with information about the
date and time.
In this section we present the methods and steps used to build machine learning models which will aid us
in accomplishing our purpose. In order to build a system that analyzes the daily behavior of the households,
takes in consideration the possible change of behavior regarding the change of seasons, and classifies
households into different groups based on their level of energy consumption, we will need to build several
models. Let us explore all these features separately.
4.1 Households Segmentation Based on Their Energy Consumption Behavior
Our first challenge is to divide the households into various segments based on their actions in terms of
daily energy usage. For this purpose, we use the dataset that contains information about the energy consumption
of households in 30-minute intervals in a day. Thus, we have 48 data points for each day and we have
information for almost 4 years for each household, which means we have information for about 1,000 days on
average, since the dates vary between households. We first preprocess our dataset and group it by date. This
way we get a list of 48 elements for each date, representing the energy consumption. Then, we build an
unsupervised clustering model which is able to group our households into different clusters. For this purpose
we create a k-means model, which is one of the most used clustering algorithms. It stores n centroids which are
used to define clusters. A point (household) is considered to be in a particular cluster if it is closer to that cluster’s
centroid than to any other cluster’s centroid . We use the implementation from the scikit-learn python library
to build our model. However, in order to build a k-means model, our dataset requires a bit more preprocessing.
For each household, we calculate an average daily energy consumption. Since we have energy consumption
values at 30-minute points for all the dates, we calculate an average energy consumption for each 30-minute
point, for the entire span of the measuring period (~ 1,000 days). This way, we can get a single average energy
consumption list composed of 48 data points, where each point represents the average energy consumption per
household for that half-hour interval, giving us a clearer understanding of how energy consumption changes
during the day. The k-means model also requires a parameter n_clusters, which represents the number of clusters
to form, i.e the number of centroids to generate. In order to find the optimal number of clusters, we use the
elbow method. This method runs the k-means model on the dataset for a range of values, which in our case
ranges between 1 and 10. For each value, it computes the distortion and interia. Distortion is the average of the
squared distances from the centroids of the clusters. Usually, the Euclidean distance metric is used. Inertia, on
the other hand, is the sum of squared distances of samples to their closest centroid .
The elbow method is used to discover the optimal number of clusters for the k-means model. The elbow
method plots the value of the cost function produced by changing the values of k. For each value of k, we
calculate the distortion and inertia. As the value of k increases, the average distortion decreases, which results
with the instances being closer to their respective centroids. However, the improvements in average distortion
will decline with the increase of k. The value of k for which the improvement in distortion declines the most is
called the elbow, that's the point where we should stop dividing the data into further clusters .
As we can see from Figure 1, the “elbow” in our case is 3, so our model extracts three clusters. After
building our model and visualizing the results, we have three clusters of households (Figure 2), which we also
averaged using the same method and got the averaged household behavior per cluster (Figure 3).
4.2 Household Segmentation Based on Their Energy Consumption Behavior in Different Seasons
Since the energy consumption in most households differs between seasons, we split each household data
into two subsets: one for the warmer months (April - September) and another for the colder months (October -
March). We call these subsets a summer and a winter subset. We then run the same model again, over the two
newly formed datasets. Both of the datasets showed that 3 clusters is again the optimal number of clusters.
Figures 4 and 6 show the clusters of households in summer and winter, respectively, while Figures 5 and 7 show
the averaged household behaviour, per cluster, for summer and winter, respectively.
Figure 1. Optimal number of clusters
using the elbow method.
Figure 2. Three clusters of household behavior.
Figure 3. Types of household behavior.
4.3 Extracting Households With a Significant Difference in Their Energy Consumption Behavior in
With the help of the previous analysis, we concluded that some households have tremendous variations
between their energy usage in different seasons, whereas others hardly have any difference. In this section, we
try to identify and single out the users who have significant variations between their use in different seasons.
The reason behind this is that they will most likely switch between clusters for summer and winter, since their
consumption activity changes across seasons. In practice, this means that the energy providers need different
targeting systems for them in different seasons. On the other hand, they can use the same targeting system for
the households which do not have significant variations in their energy consumption during the seasons.
Figure 5. Types of household behavior in the summer.
Figure 6. Winter clusters, based on household behavior.
Figure 7. Types of household behavior in the winter.
Figure 4. Summer clusters, based on household behavior.
In order to find if a household x has a significant variation in its energy consumption behavior in different
seasons, we take x’s averaged summer behavior and x’s averaged winter behavior and calculate the Manhattan
distance between those two vectors. This method helped us identify 12 households that have significant
differences in their energy consumption behavior in the summer and in the winter.
4.4 Segmentations of Households Based on Their Energy Consumption Intensity
Here, we separate the households into different clusters based on the intensity of their energy consumption.
Again, the optimal number of clusters for this goal is 3. For each household, we estimate the average daily
energy consumption and use those values to create the clusters. As we can see on Figure 10, the first graph
shows us the most inefficient households, while the third graph shows the most efficient ones from our dataset.
The group of the most inefficient households contains 72 households out of 112. On the other hand, the most
efficient group contains 30 households. Since most of the households are highly inefficient, there is a lot of
room for improvement in their energy consumption behavior.
Figure 8. An example household with a big
difference in the behavior between seasons.
Figure 9. An example household with no
significant difference in the behavior
Figure 10. Clusters based on the intensity of energy consumption.
Through incorporating machine learning methods, thе research presented in this paper deals with energy
efficiency problems. We are able to segment households into various behavioral classes by using clustering
techniques. Segmenting households enables us to better target households with energy efficient programs and
provide more reliable estimates of energy savings. Additionally, we noticed that different seasons might be
worth looking into while researching. By introducing seasons in our analysis, we noticed that some households
have extremely different behavior in different seasons. Furthermore, we segmented the households based on
their level of energy consumption. This approach allows us to extract a group of very inefficient energy
consumers and target them with energy efficiency programs more often. The analysis demonstrated in this paper
provides a solid foundation for designing an energy efficiency system.
Understandably, there are possibilities for further research. We could furthermore investigate how
temperature affects the energy consumption, how much does energy consumption change during different
months, and what’s the difference in energy consumption between weekends and week days. Another possible
future step would be to build a prediction model that will have the ability to learn from the past and create future
predictions for the energy consumption of a certain household. Energy efficiency is a major challenge that is
slowly but surely going to concern us all. We're making progress faster than ever in every field, but we're also
drastically shortening the planet's resources. We must take this issue very seriously and propose effective
solutions before it’s too late.
We would also like to show our gratitude to our mentor, Dr. Milos Jovanovik, for sharing his professional
expertise with us and giving us significant guidance and feedback during this research.
 Rudolf Giffinger, Christian Fertner, Hans Kramar, Evert Meijers, “Smart Cities: Ranking of European
Medium-Sized Cities”, Centre of Regional Science: Vienna, Austria, pp. 1-12, 2007.
 Houbing Song, Ravi Srinivasan, Tamim Sookoor, Sabina Jeschke, “Smart Cities: Foundations, Principles,
and Applications”, John Wiley & Sons, Hoboken, 2017.
 European Commission, “Energy Performance of Buildings Directive”, Available from:
 César Benavente-Peces, Nisrine Ibadah, “Buildings Energy Efficiency Analysis and Classification Using
Various Machine Learning Technique Classifiers”, Energies, 13(13): 3497, 2020.
 US Green Building Council, “Green Building 101: Why is Energy Efficiency Important?”, Available from:
 D. Ravi Kumar, K. Anuradha, P. Saraswathi, R. Gokaraju, M. Ramamoorty, “New Low Cost Passive Filter
Configuration for Mitigating Bus Voltage Distortions in Distribution Systems”. In 2015 IEEE International
Conference on Building Efficiency and Sustainable Technologies, pp. 79-84. IEEE, 2015.
 Ian Goodfellow, Yoshua Bengio, Aaron Courville, “Deep Learning”, MIT Press, Cambridge, 2016.
 Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar, “Foundations of Machine Learning”, MIT Press,
 Marijana Zekić-Sušac, Saša Mitrović, Adela Has, “Machine Learning Based System for Managing Energy
Efficiency of Public Sector as an Approach Towards Smart Cities”, International Journal of Information
 Jean-Michel D., “Smart Meter Data From London Area”, Version 11, Available from:
 Chris Piech, “K Means”, Available from: https://stanford.edu/~cpiech/cs221/handouts/kmeans.html.
 Alind Gupta, “Elbow Method for Optimal Value of k in KMeans”, Available from:
 Pratap Dangeti, “Statistics for Machine Learning”, Packt Publishing Ltd., Birmingham, UK, 2017.