Conference PaperPDF Available

Edge Computing with Embedded AI: Thermal Image Analysis for Occupancy Estimation in Intelligent Buildings


Abstract and Figures

With the rise of the IoT, there has been a growing demand for people counting and occupancy estimation in Intelligent buildings for adapting their heating, ventilation and cooling systems. This can have a significant impact on energy consumption at a global scale as such systems consume about 40% of electricity and create about 36% of the CO2 emissions in Europe. Previous approaches to occupancy estimation either utilize methods that do not ensure people's privacy when obtaining high accuracy estimations, such as RGB cameras, or utilize thermal or radar sensors with lower accuracy. Thermal vision for people detection has several advantages. It protects people's privacy while being less affected by changes in the environment. In addition, most previous approaches relying on image processing stream data to the cloud to be analyzed. However, with the development of the more distributed network paradigms edge and fog computing, there has been a trend in moving computation towards the edge of the network. This process of embedding intelligence into end-devices enables more efficient energy consumption and network load distribution. In this work, we present an embedded algorithm for room occupancy estimation based on a thermal sensor with accuracy over the state-of-the-art. We study the performance of a variety of deep learning models on different embedded processors. We achieve a prediction accuracy of 98.9% for people counting estimation with a minimal 2 KB RAM utilization. Furthermore, the proposed algorithm has very low latency achieving execution times under 14 ms.
Content may be subject to copyright.
Edge Computing with Embedded AI: Thermal Image Analysis
for Occupancy Estimation in Intelligent Buildings
Aly Metwaly
University of Turku
Turku, Finland
Jorge Peña Queralta
University of Turku
Turku, Finland
Victor Kathan Sarker
University of Turku
Turku, Finland
Tuan Nguyen Gia
University of Turku
Turku, Finland
Omar Nasir
Helvar Oy Ab
Espoo, Finland
Tomi Westerlund
University of Turku
Turku, Finland
With the rise of the IoT, there has been a growing demand for
people counting and occupancy estimation in Intelligent buildings
for adapting their heating, ventilation and cooling systems. This
can have a signicant impact on energy consumption at a global
scale as such systems consume about 40% of electricity and create
about 36% of the CO2 emissions in Europe. Previous approaches
to occupancy estimation either utilize methods that do not ensure
people’s privacy when obtaining high accuracy estimations, such
as RGB cameras, or utilize thermal or radar sensors with lower ac-
curacy. Thermal vision for people detection has several advantages.
It protects people’s privacy while being less aected by changes in
the environment. In addition, most previous approaches relying on
image processing stream data to the cloud to be analyzed. However,
with the development of the more distributed network paradigms
edge and fog computing, there has been a trend in moving compu-
tation towards the edge of the network. This process of embedding
intelligence into end-devices enables more ecient energy con-
sumption and network load distribution. In this work, we present
an embedded algorithm for room occupancy estimation based on a
thermal sensor with accuracy over the state-of-the-art. We study
the performance of a variety of deep learning models on dierent
embedded processors. We achieve a prediction accuracy of 98.9%
for people counting estimation with a minimal 2 KB RAM utiliza-
tion. Furthermore, the proposed algorithm has very low latency
achieving execution times under 14 ms.
Computer systems organization Embedded software
Computing methodologies Scene understanding
learning algorithms
Hardware Sensors and actuators
Digital signal processing;Sensor applications and deployments.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from
INTESA ’19, October 13–18, 2019, NY, USA
©2019 Association for Computing Machinery.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
Edge Computing; IoT; Embedded Intelligence; Embedded AI; Ther-
mal Imaging; Intelligent Buildings;
ACM Reference Format:
Aly Metwaly, Jorge Peña Queralta, Victor Kathan Sarker, Tuan Nguyen Gia,
Omar Nasir, and Tomi Westerlund. 2019. Edge Computing with Embedded
AI: Thermal Image Analysis for Occupancy Estimation in Intelligent Build-
ings . In INTESA ’19: INTelligent Embedded Systems Architectures and Appli-
cations, Co-Located with ES WEEK 2019 - October 13–18, 2019, NY, USA. ACM,
New York, NY, USA, 6 pages.
In the Industry 4.0 era, cutting-edge technologies such as the IoT
and AI are emerging rapidly [
]. These technologies have the po-
tential to impact our daily lives through applications in smart cities,
smart homes or intelligent buildings [
]. Dierent industries are
adopting these technologies and transforming them into market
opportunities. One promising application is people counting and
occupancy estimation in buildings. The acquired information can
be utilized for more ecient planning and intelligent space man-
agement in smart workplaces. Furthermore, information about the
occupancy in buildings and individual rooms can have a signi-
cant impact on energy consumption. Buildings are considered the
largest energy consumer in Europe using approximately 40% of
the total energy and creating about 36% of the total carbon dioxide
emissions [
]. Similarly, heating, ventilation and air conditioning
(HVAC) systems in buildings were liable for 38.9% of the total en-
ergy consumption in 2017 in the USA [
]. By acquiring reliable
information on occupancy of the buildings, energy consumption
can be drastically reduced if HVAC systems in the building are
adjusted automatically [13].
Within the IoT, there is a recent trend in more distributed net-
work architectures, in contrast with traditional cloud-centric com-
puting [
]. Edge and fog computing paradigms involve moving
computational power and data analysis closer to where the data
originates [
]. Combined with articial intelligence algorithms
running at the local network level, this approach enables lower-
latency and reduced network load [
]. In this work, we
explore the case of embedded articial intelligence, in which the
data analysis runs directly on the sensor node itself.
Previous works on building occupancy estimation or people
counting have used RGB cameras [
], motion sensors [
INTESA ’19, October 13–18, 2019, NY, USA A. Metwaly et al.
and, more recently, thermal arrays [
]. Motion sensors, such
as passive infrared (PIR) sensors, have the drawback of being in-
accurate as the number of people increases, as well as limited
range [
]. RGB cameras are able to produce high accuracy
occupancy estimations, but require computationally intensive im-
age processing [12].
Thermal imaging is one of the most promising sensing technolo-
gies. It has been frequently used in smart-city applications [
]. The
advantages of the thermal cameras over the RGB cameras are that
they are not light-dependent and can work in dark environments.
However, thermal cameras with low and medium resolution usually
cannot recognize characteristics of the detected person. Therefore,
they are unable to identify features of the people in the scene.
In this paper, we propose novel solutions based on Deep Neu-
ral Networks (DNNs) to use images from generic, low-resolution,
thermal cameras to reliably detect occupancy and count number of
people with high accuracy. Compared to previous works, our mod-
els achieve higher occupancy prediction accuracy, and enable faster
image processing than other micro-controller-based implementa-
tions. Our approach provides an almost error-free prediction in the
case of no occupancy, and otherwise matches the number of people
in the room. For the tests included in the paper, we have trained
the model with a dataset where the occupancy ranged from 0 to 5
persons. However, the same model can be retrained with a more
varied dataset to provide a wider range of inference occupancy
estimation outputs.
The main contributions of this work are: (i) the design of multi-
ple deep learning models for estimating room occupancy based on
thermal images; (ii) the implementation of these models on Arm
Cortex M4 and M7 micro-controllers for real-time analysis of ther-
mal images; (iii) the analysis of the performance and impact on the
micro-controller computing resources of the proposed models; and
(iv) the comparison of our work with the state-of-the art showing
an improved accuracy and reduced computation time.
The rest of the paper is organized as follow: Section 2 reviews
existing works in occupancy estimation. Section 3 introduces the
concept of embedded AI and describes the types of neural networks
utilized in this work. Section 4 describes the data acquisition pro-
cess, the hardware platforms utilized for testing and the evaluated
machine learning models. In Section 5, we demonstrate the superior
performance of our algorithm when compared to the state-of-the-
art and provide overview of the models which produced the best
results. Finally, Section 6 concludes this work and outlines the
directions for future work.
Oosterhout et al. introduced a head-detection system based on
stereo cameras for counting people from video streams [
]. The
method is robust and provides high accuracy ranging from 90-95 %
for dierent scenarios. In contrast, we rely on thermal cameras in
order to preserve people’s privacy and enable fast embedded image
processing. In addition, we are able to achieve higher accuracy.
Other early approaches which do preserve people’s privacy use PIR
sensors with some limitations. Wahl et al. presented an approach for
people counting for oce environments [
] which use distributed
PIR sensors enhanced with algorithms to interpret the sensors’
information. They explored the performance of two people counting
algorithms on this experimental setup with dierent simulation
scenarios. Their approach required a larger number of sensors and
the accuracy decreased with an increasing number of people.
Beltran et al. presented a system for estimating occupancy [
called ThermoSense based on a thermal sensor array and a PIR sen-
sor. It can detect occupancy with a RMS error of approximately 0.35
persons. More recently, Gomez et al. developed a people counting
algorithm on thermal images-based on CNN [
]. The used CNNs t
in less than 500 KB of memory and operated on Cortex-M4 MCU.
The CNN algorithm could provide an error-free detection accuracy
of 53.7% while using 308 KB of the MCU memory. The resolution
of the thermal sensors used is 80x60 which shows some features of
the people involved in the scene. The execution time for one image
is 63 seconds. In our work, we aim for a high error-free accuracy in
an oce environment using a thermal sensor of 24x32 pixels which
cannot detect any features of the people.
Griths et al. used a thermal imager with 60x80 pixel resolu-
tion [
]. The algorithm used is based on the individuals’ height
dierences for presence detection. Further, the algorithm detects
the movement direction of the individuals. Similarly, Tyndall et al.
proposed a low-pixel thermal imager system for occupancy estima-
tion [
] and used a classication algorithm. The system is based on
Thermosense [
] but is dierent from Thermosense in the choice of
the thermal sensor, positioning of the sensor and the classication
algorithm. In our work, we are able to achieve better real-time
occupancy estimation accuracy while embedding the AI models in
low power microprocessors.
A high accuracy method for estimating room occupancy with
thermal array sensors was proposed by Abedi et al. [
]. The au-
thors presented a real-time monitoring system which was only able
to detect presence of people in a room giving a binary output. They
achieved an accuracy of over 99%. The authors rely on cloud com-
puting for image processing and their model is unable to estimate
the exact number of people in the room. In our work, we achieve a
similar accuracy while estimating the number of people and embed
the algorithms so that it is not required to send raw data to the
cloud for processing.
With the increasing pervasiveness of the IoT in all aspects of our
daily lives, it is expected that billions of edge devices will be con-
nected to the internet in the near future. These devices will be
producing extremely large amounts of data. In the traditional cloud-
centric approach, all data acquired at the edge devices is sent to
the cloud to be crunched and processed. Then, the results of the
analysis and commands are sent back to the edge devices. As the
most important information resides on the data analysis results, the
process of sending raw data to the cloud can be avoided if part of
the computation is moved towards the edge of the network. Within
the edge and fog computing paradigms, embedded AI refers to
embedding articial intelligence algorithms into low-power and
computationally-constrained devices. Jägare reects on the benets
of moving data analysis from cloud-centric arquitectures towards
embedded systems for given applications in a recent work [
These benets include (i) reduced latency, increased reliability, and
Edge Computing with Embedded AI for Occupancy Estimation INTESA ’19, October 13–18, 2019, NY, USA
safety in time-critical applications; (ii) overall energy-eciency and
reduced cost with a reduced impact to network trac and cloud
server load; and (iii) enhanced privacy and security, with a lower
risk of raw data being exposed, and natural support for applications
where privacy is paramount and raw data cannot be shared.
In summary, applying AI at the edge instead of the cloud achieves
a more reliable low latency response. Also, it has the potential of
providing a better user experience with enhanced security and
privacy. However, applying AI algorithms on embedded devices
can present signicant challenges. Embedded systems are resource-
constrained devices because of their low computational power, low
memory, and low power consumption requirements. In the rest
of this section, we overview the basic concepts for the neural net-
works that have been studied in this work. Each of these networks
has a dierent impact on system requirements (RAM, Flash) and
execution time.
3.0.1 Feedforward Neural Networks. FNN, also known as Deep
FNN are the basic deep learning models. It is called feedforward
because the information ow is only in the forward direction. In
other words, there is no feedback connection from the output that is
fed to the model [
]. FNN form the basis of many other signicant
neural networks such as the convolutional networks. In addition,
it is an essential step on the path to the recurrent networks [
FNN is composed of dierent functions that are chained together.
Each function is called a layer and the overall length of the chain
is called the depth of the model. The training data shows only the
overall output of the whole network which specifying the output
of each layer, that’s why they are called the hidden layers. Each
of the hidden layers is vector-valued and their dimension deter-
mines the width of the model which is measured in the number of
neurons [14].
3.0.2 Convolutional Neural Networks. Convolutional network or
CNN employs the convolution mathematical operation instead of
general matrix multiplication in at least one of their layers. CNN
is enhanced from the FNN by overcoming the FNN disadvantages.
Sparse connectivity is used in CNNs to reduce the number of
weights. On the other hand, Parameter sharing is used to decrease
the memory required for neural models. It also reduces the complex-
ity of the model at a given accuracy, which is called the statistical
eciency [
]. A sliding window called kernel is required to perform
the convolution process. When convolution is applied in machine
learning, the input is usually a multidimensional array and the ker-
nel is usually a multidimensional array of parameters (tensors) that
are adjusted by the learning algorithm. In the case of a 2D image,
the input would be a frame matrix of the number of pixels and the
kernel would be a 2D convolution sliding window [
]. Each layer
in a CNN has neurons arranged in 3 dimensions: width, height, and
depth. The depth is the number of channels (lters) for the layer.
3.0.3 Recurrent Neural Networks. RNN is a special form of the
FNN with internal states and loops. The fundamental dierence
is that FNN neurons are not accessed twice whereas in RNN the
neurons can be accessed more than once through the loops in back-
propagation. This allows the information to persist in a time-series.
This feature makes RNNs used widely in speech recognition and
video processing [
]. The RNNs are one of the families that are used
Table 1: STM32F401RE and STM32F722ZE specs.
Clock 84 MHz 216 MHz
Flash 512 KB 512 KB
SRAM 96 KB 256 KB
Pipeline Stages 3 6 (dual-issue)
Cache No 8 KB/I&D
I2C3 3
Table 2: Distribution of samples in the training and test sets.
Dataset Labels
0 1 2 3 4 5
Original Training 3540 196 229 201 74 125
Test 881 39 59 66 14 33
Augmented Training 3540 1568 1832 1608 592 1000
Test 881 312 472 528 112 264
(a) Original image (b) Zoomed
(c) Vertical Flip (d) Added Noise
Figure 1: Dierent types of data augmentation.
for sequential data. It is specialized in processing sequential data in
time series. However, RNNs can be applied to 2-dimensional data
such as images which is the case in this work [
]. The look-back of
the RNN is the number of previous inputs that the network will keep
before it does the back-propagation process. This is a fundamental
process that makes the RNN able to keep a time-series. Without
look-back, each input to the DNN is treated independently.
In this work, cloud instances are used to train the models according
to the aforementioned needs and CPU instances are used to train
the GRUs model due to lower parallelism. The CPU instances run
on 4 Intel Xeon Scalable Processors (Cascade Lake) with a turbo
clock frequency of 3.6 GHz. Also, GPUs are used to train the FNN
INTESA ’19, October 13–18, 2019, NY, USA A. Metwaly et al.
Execution Time (ms)
STM32 F4
STM32 F7
Execution Time (ms)
Flash Memory (Bytes)
RAM Memory (Bytes)
Figure 2: Comparison of execution time, ash and RAM usage for the dierent models tested.
and XNN models. GPU instances provide one NVIDIA Tesla K80
Accelerator which runs a pair of NVIDIA GK210 GPUs providing
a total of 2496 parallel processing cores. Also, the instance has 4
GPUs of Intel’s Broadwell microarchitecture running at 2.7 GHz.
The actual implementation of the embedded intelligence has
been carried out with two 32-bit MCUs from ST-Microelectronics.
We have used the STM32F401 and the STM32F722 from the Arm
Cortex M4 and M7 families respectively for running the DNNs as
these have sucient resources. Two MCUs are used to provide
the evaluation a more extensive scale and to overcome some of
the limitations that might occur. In the process, the deep learning
algorithms are applied rst to the MCUs for realizing the proof
of concept. The features and available resources of the two MCUs
used in the experiments are listed in Table 1.
For applying the deep learning models, an expansion package
named X-CUBE-AI is used which helps in applying deep learning
algorithms and is capable of converting trained neural networks
and generating STM32-optimized library. In addition, the package
supports various deep learning frameworks such as Keras which is
used in our trained models [26].
4.1 Data Acquisition and Analysis
In this work, a fully calibrated 24x32 pixels FIR thermal sensor array
MLX90640 from Melexis is used. This is a medium resolution cam-
era and therefore the images from it are not sucient to identify
features which can help in revealing a person’s identity. This con-
forms to our research requirements of ensuring individual’s privacy.
Moreover, it has integrated sensors to measure the supply voltage
(VDD) and ambient temperature (Ta) of the chip. The measurement
outputs stored in the internal RAM are accessed through the I
interface [
]. In addition, there are two FOVs of the thermal sensor
array- 55x35 and 110x75 degrees of which the wider one is used in
our experimental setup. The output is a thermal image where heat
signatures are represented by the intensity of the colors.
In our experiments, the MLX90640 is installed in an indoor oce
environment. For such a contained environment, it can be assumed
that people are constantly warmer than the ambient or room tem-
perature [
]. An additional RGB camera is set up to cross-check
the total number of people from the results of our experimental
setup. This serves as ground truth for model validation and bench-
marking. In this work, we have collected data for 2 days and 9 hours
resulting in a total of 5457 data samples collected from the thermal
The experiments involved zero to ve people in the oce room.
The experiments included one or more person(s) entering and exit-
ing the room sequentially and simultaneously. Moreover, people in
the room were sitting, standing or walking.
The total pool of collected samples are divided into 4365 (80%)
for the training-set and 1092 (20%) for the test-set. The training
set was further subdivided into training and validation sets, with a
ratio of 4:1. The case distribution of the training-set and the test-set
are shown in Table 2. Here, the case value refers to the ground-truth
of the number of people in the room.
4.2 Error Analysis
Hyper-parameter tuning for optimization is an important process
in ML which denes a set of optimal parameters for a learning
algorithm. These parameters are typically not adjustable or cannot
change during the training process. For example, in a DNN, the
number of layers and neurons are hyper-parameters which have to
be optimized rst.
A grid-search is an approach to choose the hyper-parameters
where all the possible combinations of hyper-parameters are tried
from a grid of parameters values. In this work, we adopted this
approach for tuning hyper-parameters. The model is trained on the
Edge Computing with Embedded AI for Occupancy Estimation INTESA ’19, October 13–18, 2019, NY, USA
0 1 2 3 4 5
99.8 0.2 0.0 0.0 0.0 0.0
7.7 89.7 2.6 0.0 0.0 0.0
1.7 6.8 91.5 0.0 0.0 0.0
0.0 0.0 1.5 98.5 0.0 0.0
0.0 0.0 0.0 0.0 100.0 0.0
0.0 0.0 0.0 0.0 0.0 100.0
Predicted label
True label
FNN Confusion Matrix
Figure 3: Thermal sensor FNN_L1_N512 original data confu-
sion matrix.
training subset and its performance is evaluated on the validation
subset using minimum squared errors (MSE) as the loss function.
For all training epochs, the model state with the lowest validation
error is selected as the best representative for a particular set of
hyper-parameters. The training itself is performed with an early
stopping manner in which the process is terminated if the change
in validation error is less than 0.1% for 10 consecutive epochs. The
data-set is pre-processed before being fed to the neural network
by centering the mean to 0 and scaling to unit variance. Moreover,
each layer is augmented with appropriate dropout values and Adam
is used as gradient descent optimization algorithm with a tuned
learning rate value [
]. After nding the best hyper-parameters
for each of the neural network, the best model is tested with the
test-set to measure its prediction accuracy.
In this section, the experimental results are presented and analyzed.
The purpose of these experiments is to evaluate the chosen MCUs
while applying the trained people counting algorithms. This is
achieved by running a set of dierent DNN models on the MCUs
in inference mode. That will be followed by an accuracy analysis
for the models that work on the MCUs. There are two data-sets
used for training and testing the models. The original data-set and
augmented data-set of the thermal images.
The accuracy of the DNNs trained with the original data-set
was remarkably high. Consequently, we decided to make it more
challenging for the neural networks by augmenting the data with
more corner cases that are expected to be harder to process by
the algorithms. The data augmentations used are (i) cropping, (ii)
ipping upside down, (iii) ipping left to right, (iv) zooming out,
(v) adding random noise, (vi) rotating the image, and (vii) blurring
the image. A subset of the dierent data augmentation techniques
utilized is shown in Figure 1. The FNN & GRU models are labeled as
FNN_xL_yN where xis the depth and yis the width of the model.
The CNN models are denoted as CNN_Kx_Fy_Lz where xis the
kernel size, yis the number of lters and zis the number of layers
of the network.
The total number of samples for the thermal sensor data-set after
augmentation is 12709 the same division method mentioned earlier
for separating into training, validation and test sets is followed.
The augmented data-set samples are divided to 10140 (80%) for the
training-set and 2569 (20%) for the test-set. The distribution os the
augmented training and the test sets are shown in Table 2.
5.1 Results
The DNNs provide robust and accurate results with the thermal
sensor data-sets. The data quality is suitable to result in high ac-
curacy even with the relatively simpler FNNs. The algorithm was
able to learn when to detect a temperature signature as a human
or another heat source. In Table 3, a side by side comparison of
the best performing models is presented. As shown, there are three
dierent novel solutions for the thermal sensor. The solutions have
dierent resources requirements. This allows the possibility of tai-
loring the algorithm based on the resources available in the MCU.
The resources required are shown in Table 3.
The Flash and RAM requirements of the models, together with
the execution time for the dierent models, are presented in Fig-
ure 2. The three DNN models that have been used are dierent in
their structure and thus cannot be directly compared. However, we
compare them against the targeted application of people counting.
An example of the structural dierence between network types
is that the number of neurons and layers are lower in the GRUs
than in the FNNs. This is mainly because the GRUs depends on the
look-back and do not need a high width or depth. In consequence,
similar MSEs where achieved with lower number of layers and neu-
rons in GRUs. The Flash requirements for the networks is reported
after compression using the X-CUBE-AI package.
In terms of resource utilization, the CNN models have the highest
impact on processor resources. Because of the convolution layers,
CNNs require larger RAM usage and longer computation times. On
the opposite side, FNNs are the simplest models in terms of net-
work structure, and this has a direct relation regarding the memory
usage and computation time. GRUs are situated in a middle point.
Nonetheless, because of the lower number of neurons and layers in
GRU models, their RAM requirements are also lower.
The best accuracy obtained with each of the models is very simi-
lar, ranging from 97.27% to 98.90%. The best FNN model achieves
a prediction accuracy of 98.90%, which considerably improves the
state-of-the-art results in people counting from thermal images.
The confusion matrix illustrating the performance of this model is
shown in Figure 3. Only the work from Abedi et al. [
] achieves
higher accuracy. However, in their case, the authors only detect
whether the room is empty or not, with a binary output. Moreover,
in that work the machine learning analysis runs on cloud servers.
Implementing the models in embedded processors enables a more
robust design with lower latency. A comparison with other previ-
ous works is summarized in Table 4. Within the works utilizing
thermal cameras and estimating the exact number of people in the
image, our prediction accuracy is over 15% better than the previous
work by Tyndall et al. [
]. We also achieve the best accuracy within
embedded AI algorithms for any type of thermal or PIR sensor.
INTESA ’19, October 13–18, 2019, NY, USA A. Metwaly et al.
Table 3: Summary of prediction performance and system requirements for the best model of each network type.
Pred. Acc. F4 Exec. Time F7 Exec. Time Alloc. Flash Alloc. RAM Test MSE Valid. MSE
FNN_L1_N512 98.90% 44.141 ms 13.269 ms 196.07 KB 2.01 KB 0.0137 0.004
CNN_K3_F8_L3 98.26% 77.075 ms 18.435 ms 8.46 KB 22.13 KB 0.0174 0.039
GRU_L1_N12 97.27% 60.7 ms 20.097 ms 109.88 KB 0.055 KB 0.030 0.017
Table 4: Comparison of system setup, data analysis technique and results of our method with the state-of-the-art.
Sensor Placement Output Platform Processing Accuracy
Beltran et al. [2] PIR+Thermal Ceiling Numbered Tmote Sky Custom NA
Gomez et al. [3] Thermal Wall Numbered Cortex M4 CNN 53.7%
Tyndall et al. [4] PIR+Thermal Ceiling Numbered Arduino K* algorithm 82.56%
Abedi et al. [20] Radar+Thermal Ceiling Binary Cloud CNN 99%
Zappi et al. [22] PIRs Wall Numbered (0-3) GT60 MCU Custom 89%
Ours Thermal Ceiling Numbered (0-5) STM32F FNN 98.90%
Knowing the number of people can help manage resources in smart
buildings and places where automation can improve management
and dramatically reduce the total consumption of electricity hence
eectively decreasing greenhouse emissions. In this paper, we pre-
sented a novel solution for people counting with high prediction
accuracy. The proposed algorithms have low computational, power
and memory requirements making those suitable for resource-
constrained devices used in IoT-based applications. It is observed
that the thermal imaging technique is promising for counting peo-
ple and more eective than other approaches such as the ones based
on RGB cameras. Among the tested algorithms, FNN_L1_N512 re-
sulted in the highest accuracy of 98.90%. The two MCUs are able to
run the FNN_L1_N512 algorithm in inference mode. The algorithm
utilized 4% of CPU processing cycles on the STM32F401 MCU while
using 37% of its ash memory and less than 2.1% of total available
RAM. In our experiments, two other models (CNN_K3_F8_L3 and
GRU_L1_N12) resulted in similar prediction accuracy which oer
various choices for the ash memory and RAM and hence can be
tailored according to the available resources of the MCU. In this
work we have focused on novel solutions of embedded AI enhanced
thermal sensor for counting people. In future work, we will extend
the dataset to include more cases, as well as study the impact of the
camera location and distance to subjects on the prediction accuracy.
A. Anjomshoaa et al. 2018. City scanner: Building and scheduling a mobile
sensing platform for smart city services. IEEE Internet of Things Journal (2018).
A. Beltran et al. 2013. Thermosense: Occupancy thermal based sensing for hvac
control. In ACM BuildSys Workshop. ACM.
A. Gomez et al. 2018. Thermal image-based CNN’s for ultra-low power people
recognition. In ACM International Conference on Computing Frontiers. ACM.
A. Tyndall et al. 2016. Occupancy Estimation Using a Low-Pixel Count Thermal
Imager. IEEE Sensors Journal (2016).
B. Moons et al. 2018. Embedded Deep Learning: Algorithms, Architectures and
Circuits for Always-on Neural Network Processing (1st ed.). Springer.
B. Thomas et al. 2016. Thermal Imaging Systems for Real-Time Applications in
Smart Cities. Aalborg Universitet (2016).
C. J. Bartodziej. 2017. The concept industry 4.0. In The Concept Industry 4.0.
Springer, 27–50.
European Commission. 2002. European union directive on the energy perfor-
mance of buildings (EPBD). European Commission, Tech. Rep. 2002/91/EC (2002).
D. B. Yang et al. 2003. Counting people in crowds with a real-time network
of simple image sensors. In Proceedings Ninth IEEE International Conference on
Computer Vision. 122–129 vol.1.
D. P. Kingma et al. 2014. Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980 (2014).
E. Griths et al. 2018. Privacy-preserving Image Processing with Binocular
Thermal Cameras. 1, 4 (2018).
F. Jazizadeh et al. 2018. Personalized thermal comfort inference using RGB video
images for distributed HVAC control. Applied Energy (2018).
F. Wahl et al. 2012. A Distributed PIR-based Approach for Estimating People
Count in Oce Environments. 15th IEEE CSE and 10th IEEE/IFIP EUC, 640–647.
[14] I. Goodfellow et al. 2016. Deep Learning. MIT Press.
J. Peña Queralta et al. 2019. Edge-AI in LoRabased healthcare monitoring: A case
study on fall detection system with LSTM Recurrent Neural Networks. In 2019
42nd International Conference on Telecommunications, Signal Processing (TSP).
J. Yun et al. 2014. Human movement detection and identication using pyroelec-
tric infrared sensors. Sensors (Switzerland) 14 (2014).
K. Hashimotoet al. 1997. People count system using multi-sensing application.
In Transducers 97.
L. Qingqing et al. 2019. Edge Computing for Mobile Robots: Multi-Robot Feature-
Based Lidar Odometry with FPGAs. In 12th ICMU, IEEE.
L. Qingqing et al. 2019. Visual Odometry Ooading in Internet of Vehicles with
Compression at the Edge of the Network. In 12th ICMU, IEEE.
M. Abedi et al. 2019. Deep-learning for Occupancy Detection Using Doppler Radar
and Infrared Thermal Array Sensors. In ISARC, Vol. 36. IAARC Publications.
[21] Melexis. [n. d.]. MLX90640 32x24 IR array. Datasheet.
P. Zappi et al. 2007. Enhancing the spatial resolution of presence detection in a
PIR based wireless surveillance network. 295 – 300.
R. Mahmud, et al. 2018. Fog computing: A taxonomy, survey and future directions.
In Internet of everything. Springer, 103–130.
S. Koebrich et al. 2017. 2017 Renewable Energy Data Book Including Data and
Trends for Energy Storage and Electric Vehicles Acknowledgments. (2017), 142.
S. Lu et al. 2018. Dynamic HVAC Operations with Real-time Vision-based Occu-
pant Recognition System. In 2018 ASHRAE Winter Conference, Chicago.
STM. 2019. User manual Getting started with X-CUBE-AI Expansion Package
for Articial Intelligence ( AI ). January (2019), 1–62.
T. K. L. Hui et al. 2017. Major requirements for building Smart Homes in Smart
Cities based on Internet of Things technologies. FGCS (2017).
T. Nguyen Gia et al. 2019. Edge AI in Smart Farming IoT: CNNs at the Edge and
Fog Computing with LoRa. In 2019 IEEE AFRICON.
T. V. Oosterhout et al. 2011. Head Detection in Stereo Data for People Counting
and Segmentation. 2003 (2011), 620–625.
U. Jägare. 2019. Embedded Machine Learning Design FD Arm Special Edition. John
Wiley & Sons, Inc. 30 pages.
V. K. Sarker et al. 2019. Ooading SLAM for Indoor Mobile Robots with Edge-
Fog-Cloud Computing. In ICASERT.
V. K. Sarker et al. 2019. A Survey on LoRa for IoT: Integrating Edge Computing.
In International Workshop on Smart Living with IoT, Cloud and Edge Computing
Y. Agarwal et al. 2010. Occupancy-driven Energy Management for Smart Building
Automation. In Proceedings of the 2Nd ACM Workshop on Embedded Sensing
Systems for Energy-Eciency in Building (BuildSys ’10). ACM, 1–6.
... Because of their outstanding precision, cameras are also helpful in estimating and detecting building occupancy . The use of a thermal camera to detect occupancy was explored in Metwaly et al. (2019). The performance of several DL models was investigated on various embedded processors. ...
... Furthermore, the development of real-time systems employing cloudlet platforms is hindered by bandwidth limitations, latency issues and internet dependency caused by data transfer to cloud data centers (Cao et al., 2020). The works done in Tse et al. (2020) and Metwaly et al. (2019) demonstrates the usage of edge devices for occupancy detection. Nevertheless, sending and storing confidential and personal information in the cloudlet platforms raises serious privacy and security concerns that were not explored in most existing studies (Raafat et al., 2017). ...
... Typically, in Metwaly et al. (2019), different DL algorithms including feedforward neural networks (FNN), CNN, RNN have been used for room occupancy estimation based on thermal sensors. These DL models have been run on different edge devices, including the STM32F401 and STM32F722 (from the Arm Cortex M4 and M7 families), as they have sufficient resources. ...
Full-text available
The building internet of things (BIoT) is quite a promising concept for curtailing energy consumption, reducing costs, and promoting building transformation. Besides, integrating artificial intelligence (AI) into the BIoT is essential for data analysis and intelligent decision-making. Thus, data-driven approaches to infer occupancy patterns usage are gaining growing interest in BIoT applications. Typically, analyzing big occupancy data gathered by BIoT networks helps significantly identify the causes of wasted energy and recommend corrective actions. Within this context, building occupancy data aids in the improvement of the efficacy of energy management systems, allowing the reduction of energy consumption while maintaining occupant comfort. Occupancy data might be collected using a variety of devices. Among those devices are optical/thermal cameras, smart meters, environmental sensors such as carbon dioxide (CO2), and passive infrared (PIR). Even though the latter methods are less precise, they have generated considerable attention owing to their inexpensive cost and low invasive nature. This article provides an in-depth survey of the strategies used to analyze sensor data and determine occupancy. The article’s primary emphasis is on reviewing deep learning (DL), and transfer learning (TL) approaches for occupancy detection. This work investigates occupancy detection methods to develop an efficient system for processing sensor data while providing accurate occupancy information. Moreover, the paper conducted a comparative study of the readily available algorithms for occupancy detection to determine the optimal method in regards to training time and testing accuracy. The main concerns affecting the current occupancy detection system in terms of privacy and precision were thoroughly discussed. For occupancy detection, several directions were provided to avoid or reduce privacy problems by employing forthcoming technologies such as edge devices, Federated learning, and Blockchain-based IoT.
... Furthermore, their limited power consumption, low cost, and low-resolution outputs has another positive implication for privacy. That is, it enables the implementation of social distance monitoring directly on the sensing devices, which are typically battery operated and equipped with resourceconstrained Microcontrollers (MCUs) [17]- [21]. Executing the monitoring on end-nodes, in turn, enables privacy-preserving solutions in which social distance violations are signaled in real time, warning the responsible staff without ever transmitting and/or storing the collected data in the cloud. ...
... LG] 22 Apr 2022 human presence detection [33], [34] and people counting [20]- [22]. Also in this case, both classical [22], [33] and DL solutions exist [20], [21], [34], and the latter are again based on CNNs, LSTMs and GRUs. In proper conditions, people counting can be transformed into social distance monitoring: if the IR sensor is positioned so that having more than a given number of people in the field of view corresponds to a violation of social distancing rules, then the monitoring algorithm can simply compute the people count and compare it with a threshold to trigger an alarm. ...
... In proper conditions, people counting can be transformed into social distance monitoring: if the IR sensor is positioned so that having more than a given number of people in the field of view corresponds to a violation of social distancing rules, then the monitoring algorithm can simply compute the people count and compare it with a threshold to trigger an alarm. However, existing IR array-based people counting implementations use relatively high resolution sensors (e.g., 24x32 [21] and 80x60 [20]), which do not guarantee the same level of privacy, and also induce more power consumption than low-resolution ones, both in the sensors themselves and in the computation part of the system. To our knowledge, the only people counting solution for low-resolution IR sensors (8x8 pixels) is the one proposed in [22], which is not based on ML. ...
Full-text available
Low-resolution infrared (IR) array sensors offer a low-cost, low-power, and privacy-preserving alternative to optical cameras and smartphones/wearables for social distance monitoring in indoor spaces, permitting the recognition of basic shapes, without revealing the personal details of individuals. In this work, we demonstrate that an accurate detection of social distance violations can be achieved processing the raw output of a 8x8 IR array sensor with a small-sized Convolutional Neural Network (CNN). Furthermore, the CNN can be executed directly on a Microcontroller (MCU)-based sensor node. With results on a newly collected open dataset, we show that our best CNN achieves 86.3% balanced accuracy, significantly outperforming the 61% achieved by a state-of-the-art deterministic algorithm. Changing the architectural parameters of the CNN, we obtain a rich Pareto set of models, spanning 70.5-86.3% accuracy and 0.18-75k parameters. Deployed on a STM32L476RG MCU, these models have a latency of 0.73-5.33ms, with an energy consumption per inference of 9.38-68.57{\mu}J.
... In recent years, many researchers have studied applications of ML to low-resolution IR sensors data [4]- [9]. However, these studies focused on different tasks, such as human activity recognition [5], [6], presence detection [7], [8] or people counting [9]. ...
... In recent years, many researchers have studied applications of ML to low-resolution IR sensors data [4]- [9]. However, these studies focused on different tasks, such as human activity recognition [5], [6], presence detection [7], [8] or people counting [9]. To our knowledge, our previous work of [4] is the only dedicated implementation of a social distance monitoring system on a low-resolution IR sensor, based on Convolutional Neural Networks (CNNs). ...
Full-text available
Low-resolution infrared (IR) Sensors combined with machine learning (ML) can be leveraged to implement privacy-preserving social distance monitoring solutions in indoor spaces. However, the need of executing these applications on Internet of Things (IoT) edge nodes makes energy consumption critical. In this work, we propose an energy-efficient adaptive inference solution consisting of the cascade of a simple wake-up trigger and a 8-bit quantized Convolutional Neural Network (CNN), which is only invoked for difficult-to-classify frames. Deploying such adaptive system on a IoT Microcontroller, we show that, when processing the output of a 8x8 low-resolution IR sensor, we are able to reduce the energy consumption by 37-57% with respect to a static CNN-based approach, with an accuracy drop of less than 2% (83% balanced accuracy).
... Another study achieved 97.27%-98.90% accuracy in an indoor office by employing several Deep Neural Networks (DNNs) [125]. The method's accuracy also depends on the type of data collected. ...
Full-text available
The occupants' presence, activities, and behaviour can significantly impact the building's performance and energy efficiency. Currently, heating, ventilation, and airconditioning (HVAC) systems are often run based on assumed occupancy levels and fixed schedules, or manually set by occupants based on their comfort needs. However, the unpredictability and variability of occupancy patterns can lead to over/under the conditioning of space when using such approaches, affecting indoor air quality and comfort. As a result, machine learning-based models and methodologies are progressively being used to forecast occupancy behaviour and routines in buildings, which may subsequently be used to aid in the design and operation of building systems. The present work reviews recent studies employing machine learning methods to predict occupancy behaviour and patterns, with a special focus on its related applications and benefits to building systems, improving energy efficiency, indoor air quality and thermal comfort. The review provides insight into the workflow of a machine learning-based occupancy prediction model, including data collection, prediction, and validation. An organised evaluation of the applicability or suitability of the different data collection methods, machine learning algorithms, and validation methods was carried out.
... The dataset can be used to train the models, especially with convolutional neural networks, as applied in [3]. Similar solutions are presented in [1], where the authors achieved only binary presence information, and [4], but the presented method is both more accurate and resource-efficient. Moreover, both papers did not disclose their datasets. ...
Full-text available
Presence monitoring in office buildings is a vivid topic in building management systems. One of the well-established techniques to achieve it is using infrared sensors. In this paper, we present an annotated dataset consisting of low-resolution thermal images from different office rooms, with a changing number of persons in the scene. For each thermal image, a corresponding image from the RGB camera is available for visual inspection. On each thermal image, the centre position of every person is annotated, allowing not only to know the total number of people but also to track their positions. Along with the dataset, an evaluation of U-Net like convolution neural network architecture on low-power edge devices was carried out, with a comparison of their performance and energy consumption. Due to FLASH memory deficiencies on embedded systems, quantization of the models was applied, with an added benefit of shorter interference time. The presented solution allows estimating the presence density map while maintaining low-level power consumption.
... Numerous existing studies describe applications similar to the ones described in our work, albeit in isolation. Specifically, human occupancy estimation has been performed using the thermal imager [61,62] and the RGB camera [63]. These methods estimated human occupancy when there were three to five people in the room. ...
Full-text available
The non-contact patient monitoring paradigm moves patient care into their homes and enables long-term patient studies. The challenge, however, is to make the system non-intrusive, privacy-preserving, and low-cost. To this end, we describe an open-source edge computing and ambient data capture system, developed using low-cost and readily available hardware. We describe five applications of our ambient data capture system. Namely: (1) Estimating occupancy and human activity phenotyping; (2) Medical equipment alarm classification; (3) Geolocation of humans in a built environment; (4) Ambient light logging; and (5) Ambient temperature and humidity logging. We obtained an accuracy of 94% for estimating occupancy from video. We stress-tested the alarm note classification in the absence and presence of speech and obtained micro averaged F1 scores of 0.98 and 0.93, respectively. The geolocation tracking provided a room-level accuracy of 98.7%. The root mean square error in the temperature sensor validation task was 0.3°C and for the humidity sensor, it was 1% Relative Humidity. The low-cost edge computing system presented here demonstrated the ability to capture and analyze a wide range of activities in a privacy-preserving manner in clinical and home environments and is able to provide key insights into the healthcare practices and patient behaviors.
This paper provides an overview of trends in the application of digital technologies in the energy management system of commercial buildings. In recent years, energy management in buildings, based on digital technologies, has resulted in the reduction in energy consumption of up to 50%. The paper covers trends in the development and application of digital devices and software in various technological areas such as Internet of Things, Edge Computing, Cloud Computing, Big Data, Artificial Intelligence, and Blockchain. Based on the review of the results of the conducted experiments as well as the characteristics of the technologies themselves, automation has been defined as a cornerstone of maximization of energy savings and digital transformation of the energy management system in buildings.
Conference Paper
Full-text available
A recent trend in the IoT is to shift from traditional cloud-centric applications towards more distributed approaches embracing the fog and edge computing paradigms. In autonomous robots and vehicles, much research has been put into the potential of offloading computationally intensive tasks to cloud computing. Visual odometry is a common example, as real-time analysis of one or multiple video feeds requires significant on-board computation. If this operations are offloaded, then the on-board hardware can be simplified, and the battery life extended. In the case of self-driving cars, efficient offloading can significantly decrease the price of the hardware. Nonetheless, offloading to cloud computing compromises the system's latency and poses serious reliability issues. Visual odometry offloading requires streaming of video-feeds in real-time. In a multi-vehicle scenario, enabling efficient data compression without compromising performance can help save bandwidth and increase reliability.
Conference Paper
Full-text available
Offloading computationally intensive tasks such as lidar or visual odometry from mobile robots has multiple benefits. Resource constrained robots can make use of their network capabilities to reduce the data processing load and be able to perform a larger number tasks in a more efficient manner. However, previous works have mostly focused on cloud offloading, which increases latency and reduces reliability, or high-end edge devices. Instead, we explore the utilization of FPGAs at the edge for computational offloading with minimal latency and high parallelism. We present the potential for modelling feature-based odometry in VHDL and utilizing FPGA implementations.
Conference Paper
Full-text available
Remote healthcare monitoring has exponentially grown over the past decade together with the increasing penetration of Internet of Things (IoT) platforms. IoT-based health systems help to improve the quality of healthcare services through real-time data acquisition and processing. However, traditional IoT architectures have some limitations. For instance, they cannot properly function in areas with poor or unstable Internet. Low power wide area network (LPWAN) technologies, including long-range communication protocols such as LoRa, are a potential candidate to overcome the lacking network infrastructure. Nevertheless, LPWANs have limited transmission bandwidth not suitable for high data rate applications such as fall detection systems or electrocardiography monitoring. Therefore, data processing and compression are required at the edge of the network. We propose a system architecture with integrated artificial intelligence that combines Edge and Fog computing, LPWAN technology, IoT and deep learning algorithms to perform health monitoring tasks. In particular, we demonstrate the feasibility and effectiveness of this architecture via a use case of fall detection using recurrent neural networks. We have implemented a fall detection system from the sensor node and Edge gateway to cloud services and end-user applications. The system uses inertial data as input and achieves an average precision of over 90\% and an average recall over 95\% in fall detection.
Conference Paper
Full-text available
Indoor mobile robots are widely used in industrial environments such as large logistic warehouses. They are often in charge of collecting or sorting products. For such robots, computation-intensive operations account for a significant percentage of the total energy consumption and consequently affect battery life. Besides, in order to keep both the power consumption and hardware complexity low, simple micro-controllers or single-board computers are used as onboard local control units. This limits the computational capabilities of robots and consequently their performance. Offloading heavy computation to Cloud servers has been a widely used approach to solve this problem for cases where large amounts of sensor data such as real-time video feeds need to be analyzed. More recently, Fog and Edge computing are being leveraged for offloading tasks such as image processing and complex navigation algorithms involving non-linear mathematical operations. In this paper, we present a system architecture for offloading computationally expensive localization and mapping tasks to smart Edge gateways which use Fog services. We show how Edge computing brings computational capabilities of the Cloud to the robot environment without compromising operational reliability due to connection issues. Furthermore, we analyze the power consumption of a prototype robot vehicle in different modes and show how battery life can be significantly improved by moving the processing of data to the Edge layer.
Conference Paper
Full-text available
Increased automation and intelligence in computer systems have revealed limitations of Cloud-based computing such as unpredicted latency in safety-critical and performance-sensitive applications. The amount of data generated from ubiquitous sensors has reached a degree where it becomes impractical to always store and process in the Cloud. Edge computing brings computation and storage to the Edge of the network near to where the data originates yielding reduced network load and better performance of services. In parallel, new wireless communication technologies have appeared to facilitate the expansion of Internet of Things (IoT). Instead of seeking higher data rates, low-power wide-area network aims at battery-powered sensor nodes and devices which require reliable communication for a prolonged period of time. Recently, Long Range (LoRa) has become a popular choice for IoT-based solutions. In this paper, we explore and analyze different application fields and related works which use LoRa and investigate potential improvement opportunities and considerations. Furthermore, we propose a generic architecture to integrate Edge computation capability in IoT-based applications for enhanced performance.
Conference Paper
Full-text available
An integrated heating, ventilation and airconditioning (HVAC) system is one of the most important components to determine the energy consumption of the entire building. For commercial buildings, particularly office buildings and schools, the heating and cooling loads are largely dependent on the occupant behavioral patterns such as occupant density and their activities. Therefore, if HVAC system can respond to dynamic occupancy profiles, there is a large potential for reducing energy consumption. However, currently, most of existing HVAC systems are being operated without the ability to adjust supply air rate in response to the dynamic profiles of occupants. Due to this inefficiency, much of the HVAC energy use is wasted, particularly when the conditioned spaces are unoccupied or under-occupied (fewer occupants than the intended design). The solution to this inefficiency is to control HVAC system based on dynamic occupant profiles. Motivated by this, the research provided a real-time vision-based occupant pattern recognition system for occupancy counting as well as activity level classification. The research was divided into two parts. The first part was to use an open source library based on deep learning for real-time occupancy counting and background subtraction method for activity level classification with a static RGB camera. The second part utilized a DOE reference office building model with dynamic set-point control and conventional HVAC control to identify the potential energy savings and thermal comfort. The research results revealed that the vision-based system can detect occupants and classify activity level in real time with accuracy around 90% when there are not many occlusions. Additionally, the dynamic set-point control strategies indeed can bring about energy savings and thermal comfort improvements.
Full-text available
A large number of vehicles routinely navigate through city streets; with on-board sensors, they can be transformed into a dynamic network that monitors the urban environment comprehensively and efficiently. In this paper, drive-by approaches are discussed as a form of mobile sensing, that offer a number of advantages over more traditional sensing approaches. It is shown that the physical properties of the urban environment that can be captured using drive-by sensing include Ambient Fluid, Electromagnetic, Urban Envelope, Photonic, and Acoustic properties, which comprise the FEELS classification. In addition, the spatiotemporal variations of these phenomena are discussed as well as their implications on discrete-time sampling. The mobility patterns of sensor-hosting vehicles play a major role in drive-by sensing. Vehicles with scheduled trajectories, e.g., buses, and those with less predictable mobility patterns, e.g., taxis, are investigated for sensing efficacy in terms of spatial and temporal coverage. City Scanner is a drive-by approach with a modular sensing architecture, which enables cost-effective mass data acquisition on a multitude of city features. The City Scanner framework follows a centralized IoT regime to generate a near real-time visualization of sensed data. The sensing platform was mounted on top of garbage trucks and collected drive-by data for eight months in Cambridge, MA, USA. Acquired data were streamed to the cloud for processing and subsequent analyses. Based on a real-world application, we discuss and show the potential of using drive-by approaches to collect environmental data in urban areas using a variety of non-dedicated land vehicles to optimize data collection in terms of spatiotemporal coverage.
Deep learning networks have recently come up as the state-of-the-art classification algorithms in artificial intelligence, achieving super-human performance in a number of perceptive tasks in computer vision and automated speech recognition. Although these networks are extremely powerful, bringing their functionality to always-on embedded devices and hence to wearable applications is currently impossible because of their compute and memory requirements. First, this chapter introduces the basic concepts in machine learning and deep learning: network architectures and how to train them. Second, this chapter lists the challenges associated with the large compute requirements in deep learning and outlines a vision to overcome them. Finally, this chapter gives an overview of my contributions to the field and a general structure of the book.
Conference Paper
Detecting the amount of people occupying an environment is an important use case for surveillance in public spaces such as airports, stations and squares, but also for smaller environments such as classrooms (e.g. to track occupation of classrooms). Using visible imaging for this task is often suboptimal because 1) it potentially violates user privacy 2) to have a good final count, high resolution cameras are required. Long-wave infrared imaging is a viable solution to both these issues. In this paper, we developed a people counting algorithm on thermal images based on convolutional neural networks (CNNs) small enough that they can run on a limited-memory low-power platform. We created a dataset with 3k manually tagged thermal images and developed a fast and accurate CNN that is able to provide a completely error-free detection on 53.7% of the test images and an error bound within ±1 detection in 84.4% of the images, using only 308 kilobytes of system memory in a Cortex M4 platform.