PreprintPDF Available

Abstract and Figures

Location based services, already popular with end users, are now inevitably becoming part of new wireless infrastructures and emerging business processes. The increasingly popular Deep Learning (DL) artificial intelligence methods perform very well in wireless fingerprinting localization based on extensive indoor radio measurement data. However, with the increasing complexity these methods become computationally very intensive and energy hungry, both for their training and subsequent operation. Considering only mobile users, estimated to exceed 7.4billion by the end of 2025, and assuming that the networks serving these users will need to perform only one localization per user per hour on average, the machine learning models used for the calculation would need to perform 65*10^12 predictions per year. Add to this equation tens of billions of other connected devices and applications that rely heavily on more frequent location updates, and it becomes apparent that localization will contribute significantly to carbon emissions unless more energy-efficient models are developed and used. This motivated our work on a new DL-based architecture for indoor localization that is more energy efficient compared to related state-of-the-art approaches while showing only marginal performance degradation. A detailed performance evaluation shows that the proposed model producesonly 58 % of the carbon footprint while maintaining 98.7 % of the overall performance compared to state of the art model external to our group. Additionally, we elaborate on a methodology to calculate the complexity of the DL model and thus the CO2 footprint during its training and operation.
Content may be subject to copyright.
Towards Sustainable Deep Learning for Wireless
Fingerprinting Localization
Anže Pirnat, Blaž Bertalanič, Gregor Cerar, Mihael Mohorčič, Marko Meža, and Carolina Fortuna
Department of Communication Systems, Jožef Stefan Institute, Slovenia.
Faculty of Electrical Engineering, University of Ljubljana, Slovenia.
ap6928@student.uni-lj.si, blaz.bertalanic@ijs.si, gregor.cerar@ijs.si, miha.mohorcic@ijs.si,
marko.meza@fe.uni-lj.si, carolina.fortuna@ijs.si
Abstract—Location based services, already popular with end
users, are now inevitably becoming part of new wireless infras-
tructures and emerging business processes. The increasingly pop-
ular Deep Learning (DL) artificial intelligence methods perform
very well in wireless fingerprinting localization based on extensive
indoor radio measurement data. However, with the increasing
complexity these methods become computationally very intensive
and energy hungry, both for their training and subsequent
operation. Considering only mobile users, estimated to exceed 7.4
billion by the end of 2025, and assuming that the networks serving
these users will need to perform only one localization per user
per hour on average, the machine learning models used for the
calculation would need to perform 65×1012 predictions per year.
Add to this equation tens of billions of other connected devices
and applications that rely heavily on more frequent location
updates, and it becomes apparent that localization will contribute
significantly to carbon emissions unless more energy-efficient
models are developed and used. This motivated our work on a new
DL-based architecture for indoor localization that is more energy
efficient compared to related state-of-the-art approaches while
showing only marginal performance degradation. A detailed
performance evaluation shows that the proposed model produces
only 58 % of the carbon footprint while maintaining 98.7 % of the
overall performance compared to state of the art model external
to our group. Additionally, we elaborate on a methodology to
calculate the complexity of the DL model and thus the CO2
footprint during its training and operation.
Index Terms—localization, fingerprinting, wireless, deep learn-
ing (DL), neural network (NN), carbon footprint, energy effi-
ciency, green communications
I. Introduction
Location-based services (LBS) are software services that
take into account a geographic location and even context of
an entity [1] in order to adjust the content, information or
functionality delivered. Entities can be people, animals, plants,
assets and any other object. Perhaps the most widely used
LBS is the Global Positioning System (GPS), which integrates
data from satellite navigation systems and cell towers [2]
and is used daily in navigation systems. Another popular
application of LBS is locating tagged items and assets in
indoor environments.
With 5G systems, accurate localization is no longer only
important for the provision of more relevant information to the
end user, but also for optimal operation and management of the
network, e.g. for creating and steering the beams of antenna
array-based radio heads [3]. As discussed in [3], the poor
performance of fundamental geometry-based techniques in
challenging indoor environments characterized by non line-of-
sight (NLoS) and/or multipath propagation can be significantly
improved by using higher mmWave frequency bands and steer-
able multiple-input multiple-output (MIMO) antennas along
with advanced techniques such as cooperative localization,
machine learning (ML) and user tracking. Given the ubiquitous
presence of wireless networks and the associated availability
of radio-frequency (RF) measurements, ML methods promise
the highest accuracy, albeit at a higher deployment cost. In
particular, in the offline training phase, ML methods use
available RF measurements to create a fingerprint database of
the wireless environment, hence we refer to this localization
approach as wireless fingerprinting. The fingerprint database is
then used in the online localization phase to compare the real-
time RF measurement with the stored (measured or estimated)
values associated with exact or estimated locations.
Recent advances in Deep Learning (DL) [4] have enabled
particularly accurate localization, and such models trained
with large amounts of data are considered the most promising
enablers for the future LBS. However, the development and
use of DL models involves additional technical complexity,
increased energy consumption and corresponding environmen-
tal impacts. Recently, the impact of such technologies has
received increased attention from regulators and the public,
triggering related research activities [5]. One way to reduce
the environmental impact of power-hungry AI technology is
to increase the proportion of electricity from clean energy
sources such as wind, solar and hydro. However, this must
be complemented by further efforts to optimize energy con-
sumption relative to the performance of existing and emerging
technologies. Studies on estimating the energy consumption of
ML models [6] show that the increasing complexity of models,
manifested in the number of weights, the type of layers and
their respective parameters, affects both their performance and
energy efficiency. In DL architectures, one way to optimize the
use of energy is to reduce the size of the filters, also referred
to as kernels, that represent matrices used to extract features
from the image. In these filters, we can adjusts the amount
of movement over the image by a stride. Another way is to
adjust pools, which represent layers that resize the output of
a filter and thus reduce the number of parameters passed to
subsequent layers, making a model lighter and faster.
arXiv:2201.09071v1 [cs.LG] 22 Jan 2022
In this paper, we propose a new DL architecture that is an
adaptation of ResNet18 for the indoor localization problem
under consideration, and prove that its performance is com-
parable to the state of the art while being much more energy
efficient. Our contributions are as follows:
We design a new model with a kernel and a max pool that
extract the most useful information from the subcarriers
of a single antenna (e.g. those with the highest response),
thus taking a different approach from the square kernel
and pools of ResNet18.
We elaborate on a methodology for computing computa-
tional complexity during training and operation of a DL
architecture.
In a carbon footprint study we show that the proposed
DL model during training produces only 58 % of the
carbon footprint and maintains 98.7 % of the overall
performance.
The paper is organized as follows. Section II describes
related work, Section III provides the problem statement,
Section IV presents the proposed model and elaborates on
methodological aspects, while Section V provides a compre-
hensive evaluation. Finally, Section VI concludes the paper.
II. Related Work
In this section, we first present some very recent related
work on wireless fingerprinting for massive MIMO setups
using deep learning, and then summarize the state of the art
in energy-efficient design of ML models.
A. ML-based wireless fingerprinting
In [7], De Bast et al. proved that convolutional neural
networks (CNNs) can be effectively used for wireless fin-
gerprinting, and that more antennas significantly increase
localization accuracy. They designed a model for a massive
MIMO setup with 64 antennas, using stride 1 ×n (1D con-
volution). The model includes 13 convolutional layers and 3
dense layers, enhanced by skip-connection and drop-out layers.
In [8], De Bast et al. proposed another model based on the
dense convolutional network (DenseNet) [9] and evaluated its
performance under LoS and NLoS conditions. They concluded
that in addition to the direct signal paths, the model needs
to exploit the multipath components for more robust and
accurate positioning. The proposed dense blocks consist of 4
convolutional layers, skip-connection layers and concatenation
layers. After each convolutional layer there is an average
pooling layer and a batch normalization layer.
Using the same dataset as [7], Widmaier et al. [10] showed
that objects that are in the line of sight can be localized better
than those that are not. They also proved the robustness of
their model by running it for a few days without any noticable
decrease in accuracy.
Cerar et al. [11] came to the same conclusion as [7], but on
a different data set with a smaller MIMO array of 16 antennas.
However, since they use 4 times fewer antennas than [7], [10]
and [8], their results are less accurate.
Chin et al. [12] has proven that it is possible to use a
MIMO setup with 16 antennas and channel state information
(CSI) to fingerprint in the case of shielding, where GPS would
not work. They have also shown how effective convolutional
layers are compared to fully connected layers in DL. Also
Sobehy et al. [13] designed their model for a MIMO setup
with 16 antennas and used the k-nearest neighbors algorithm
for wireless fingerprinting based on CSI, proving that the most
reliable input feature is the magnitude.
Arnold et al. [14] used a MIMO orthogonal frequency
division multiplex (OFDM) system for localization. By pre-
training the neural network (NN) with LoS data, they signif-
icantly reduced the number of samples needed to achieve a
good training result.
Finally, in [15] Foliadis et al. showed methods that allow
reliable wireless fingerprinting when inconsistencies with the
raw phase make CSI unreliable. They proved that developing
a model that pooled over subcarriers rather than antennas was
more appropriate, so in our architecture we also pool over
subcarriers using a kernel of size 1 ×4.
B. Carbon footprint
In [16], Hsueh analyzed the carbon footprint of machine
learning algorithms and concluded that convolutional layers
are power hungry because they operate in three dimensions,
as opposed to fully connected layers which operate in two
dimensions. The model with the fewest parameters (weights)
showed the best trade-off between performance and carbon
footprint. In [17], Verhelst et al. analyzed the complexity of
CNNs and discussed hardware optimization techniques, mainly
targeting the Internet of Things (IoT) and embedded devices.
In [18], Jurj et al. proposed four different metrics that
account for different aspects of the trade-off between model
performance and energy consumption, while in [6] Garcia et
al. surveyed the energy consumption of various models. They
proposed a taxonomy of power estimation models at the soft-
ware and hardware levels and discussed existing approaches
for estimating energy consumption. They stated that using
the number of weights is not accurate enough and therefore
calculating the number of floating-point operations (FLOPs)
or multiply-accumulate operations (MACs) is required for
accurate calculation of energy consumption. In our work, we
also evaluate the carbon footprint of the compared models.
III. Problem Statement
Assuming a system with an antenna array of size 𝑁×𝑀
receiving transmissions from an entity E located at a position
p (x, y, z), we want to develop a model capable of estimating
the spatial coordinates ˜𝑝(˜𝑥,˜𝑦,˜𝑧) of E.
The model should be developed to predict the position ˜𝑝
of E as accurately as possible. We use two standard metrics
to measure the distance between the actual position 𝑝and
the estimated position ˜𝑝, namely the mean distance error
𝑀 𝐷 𝐸 =𝐸[||𝑝˜𝑝||2]and the root mean square error
𝑅𝑀𝑆𝐸 =𝐸[ || 𝑝˜𝑝||2
2].
Input
BatchNormalization
14, pool
17 Conv2D, 32, 13
33 Conv2D, 32
33 Conv2D, 128, 2
33 Conv2D, 64, 2
33 Conv2D, 256, 2
GlobalAveragePooling2D
Flatten
FC, 3
FC, 1000, LeakyReLU
33 Conv2D, 32
33 Conv2D, 32
33 Conv2D, 32
Output
33 Conv2D, 128
33 Conv2D, 128
33 Conv2D, 128
33 Conv2D, 256
33 Conv2D, 256
33 Conv2D, 256
33 Conv2D, 64
33 Conv2D, 64
33 Conv2D, 64
Figure 1: The proposed PirnatEco architecture adapted from ResNet18 with differences marked with red circles.
At the same time, the model should consume less energy
than the state of the art models, subject to a minor performance
degradation. To estimate the energy consumption for training
a DL model, it is necessary to consider the number of FLOPs
per type of layer used in the model architecture [6].
IV. Proposed DL Network Architecture
To achieve high localization performance and reduce en-
ergy consumption during training and operation of a DL
model, we propose a multilayer model PirnatEco inspired
by ResNet18 [19], shown in Figure 1. We chose ResNet18
because it is the least complex ResNet DL model and is more
adaptable to less complex types of images constructed from
time series, as is the case with localization. In Figure 1, each
layer is visible and explained with its kernel size, type, number
of nodes and in some cases stride and activation function.
The red circles mark the differences with ResNet18. Unlike
ResNet18, in PirnatEco the first layer is a convolutional 2D
layer (Conv2D) with a kernel size of 1×7and a stride of
1×3, followed by a batch normalization and pooling layer
with a pool size of 1×4. These kernels and pools are designed
to move across the subcarriers of a single antenna, which is
different from the square kernels and pools in ResNet18.
Next, we use adapted ResNet blocks with reduced number of
weights, where the number of nodes doubles every four layers
from 32 to 256, unlike ResNet18 which starts with 64. The
kernel size in the blocks is 3×3, similar to ResNet18. Finally,
PirnatEco uses LeakyReLU activation with a parameter alpha
set to 10-3 at the fully connected (FC) layer with 1000 nodes,
unlike ResNet18 which uses ReLU.
A. Methodology for calculating model complexity
Starting from the existing methods for calculating model
complexity [17]1, we use the following equations to calculate
the FLOPs for the layers and then the total FLOPs used by
PirnatEco.
1) Fully connected layer: A fully connected (Ffc) layer
performs MAC operations. Their number depends on the input
size 𝐼sand the output size 𝑂s. A MAC consists of 2 FLOPs.
For layers that use rectifying linear units (RelU), the output
1https://cs231n.github.io/convolutional-networks/#conv
size has to be added to the results of the product, as shown in
Eq. 1.
𝐹pe =2𝐼s𝑂s+𝑂s(1)
2) Convolutional layer: A convolutional layer consists of a
set of filters of size 𝐾r×𝐾cused to scan an input tensor of
size 𝐼r×𝐼c×𝐶with a stride 𝑆. More precisely, the number of
all FLOPs per filter 𝐹pf is given by Eq. 2.
𝐹pf =(𝐼r𝐾r+2𝑃r
𝑆r
+1)( 𝐼c𝐾c+2𝑃c
𝑆c
+1)(2𝐶𝐾r𝐾c+1)(2)
The first term of the equation gives the height of the output
tensor, where 𝐼ris the size of the input rows, 𝐾ris the height
of the filter, 𝑃ris the padding and 𝑆ris the size of the stride.
The second term represents the same calculation for the width
of the output tensor, where the indices in 𝐼c,𝐾c,𝑃cand
𝑆ccorrespond to the input columns. The last term provides
the number of computations per filter for each of the input
channels 𝐶that represent the depth of the input tensor and the
bias.
The number of FLOPs used throughout the convolutional
layer is equal to the number of filters times the flops per
filter given in Eq. 2, i.e. 𝐹c=(𝐹pf +𝑁ipf)𝑁f. However, in the
case where ReLU are used, one additional comparison and
multiplication are required to calculate the number of FLOPs
used in one epoch 𝐹pe. We therefore added the number of
FLOPs used for each filter and the number of instances for
each filter and then multiplied by the number of all filters 𝑁f:
𝐹c=(𝐹pf + (2𝐶𝐾r𝐾c+1)) 𝑁f.(3)
3) Pooling layer: The pooling layer is responsible for
downsampling the height and width of the input tensor. No
padding is performed when pooling, and there is only one
filter in it, therefore the number of FLOPs per pooling layer
𝐹pis given by:
𝐹p=(𝐼r𝐾r
𝑆r
+1)( 𝐼c𝐾c
𝑆c
+1)(2𝐶𝐾r𝐾c+1)(4)
4) Final model: The process of training the model involves
sequential forward and backward propagation through the dif-
ferent layers of the architecture. During forward propagation,
the network computes the loss based on the initialized weights.
During backward propagation, it updates the weights and
biases based on the gradients it computed against the loss.
Training is carried out in epochs, where an epoch involves
going forward and then backward through all available training
samples. Prediction, on the other hand, requires only one
forward pass through the network.
The number of operations in a DL architecture depends on
the number and types of layers 𝐿and can be computed as:
𝑀𝐹 𝐿𝑂 𝑃𝑠 =
𝐿
𝑙=1
𝐹l,(5)
where 𝐹lrefers to the 𝑙th layer of the architecture and
corresponds to a fully connected 𝐹𝑝𝑒 , convolutional 𝐹𝑐or
pooling 𝐹𝑝layer. The energy consumed during the forward
propagation 𝐸(𝑓 𝑝)of the training process corresponds to the
energy required for making a forward pass multiplied by the
size of the training data 𝑡𝑟 𝑎𝑖𝑛𝑖𝑛𝑔𝑠 𝑎𝑚 𝑝𝑙 𝑒𝑠 and the number of
epochs, as shown in Eq. 6:
𝐸𝑓 𝑝 =
𝑀𝐹 𝐿𝑂 𝑃𝑠
𝐺𝑃𝑈𝑝 𝑒𝑟 𝑓 𝑜𝑟𝑚𝑎 𝑛𝑐𝑒
(𝑡𝑟 𝑎𝑖𝑛𝑖𝑛𝑔𝑠𝑎𝑚 𝑝𝑙 𝑒𝑠 ×𝑒 𝑝𝑜𝑐 ℎ𝑠)(6)
where 𝐺𝑃𝑈𝑝 𝑒𝑟 𝑓 𝑜𝑟𝑚𝑎 𝑛𝑐𝑒 is measured in FLOPS/Watt, and
FLOPS stands for FLOPs per second. Computing the energy
for backward propagation is a more challenging step, so we
approximate it as 𝐸𝑏 𝑝 =2×𝐸𝑓 𝑝, since we know that backward
propagation is generally more computationally intensive and
on ResNet20 it takes about twice as long to compute as forward
propagation [20]. Therefore the energy required for training 𝐸𝑇
is:
𝐸𝑡𝑟 𝑎𝑖 𝑛𝑖𝑛 𝑔 =𝐸𝑓 𝑝 +𝐸𝑏 𝑝 =3×𝐸𝑓 𝑝 (7)
Once we use the trained model in production, the energy
required for prediction is equal to the energy required for a for-
ward pass 𝐸𝑝𝑟 𝑒𝑑 𝑖𝑐 𝑡𝑖 𝑜𝑛 =𝑀𝐹 𝐿𝑂 𝑃𝑠 /𝐺𝑃𝑈𝑝𝑒𝑟 𝑓 𝑜𝑟𝑚𝑎 𝑛𝑐𝑒 ×𝑖𝑛 𝑝𝑢𝑡,
where input is the number of input samples for the prediction.
B. Model training and evaluation methodology
To develop a localization model, we used CSI and GPS
measurements from the publicly available CTW 2019 chal-
lenge2dataset. To train and test the model, we generated
four different evaluation datasets with different splits between
training and testing data areas in a 9:1ratio, labelling the
obtained evaluation sets as Random, Narrow, Wide and Within
as in [11]. Thus, we trained and tested the model with 15723
and 1748 samples in batches of 32 samples, respectively. In
each epoch, we went through 17471 ×32 samples. Weights
were updated using stochastic gradient descent (SGD) with a
learning rate of 0.01 and momentum of 0.9. We also ran tests
with other learning rates (i.e. 0.04, 0.02, 0.005) and moments
and selected the best values.
When calculating the computational cost of model training
𝐸𝑡𝑟 𝑎𝑖 𝑛𝑖𝑛 𝑔 (Eq. 7) and operation 𝐸𝑝𝑟 𝑒𝑑𝑖 𝑐𝑡 𝑖𝑜 𝑛 , we considered the
number of FLOPS per watt of power of the NvidiaT4 graphics
cards, since they are used by Google Colab3, on which we
conducted our research. Furthermore, we calculate the carbon
2https://data.ieeemlc.org/Ds1Detail
3https://colab.research.google.com/
footprint assuming that electricity is produced with a footprint
of 250 g of CO2equivalent per kilowatt hour, as determined
for the west coast of the USA from electricitymap.org.
Figure 2: CDF of PirnatEco developed according to the four train/test data
set splits: Random, Wide, Narrow, Within.
Figure 3: Performance vs. epochs for the four train/test data set splits: Random,
Wide, Narrow, Within.
V. Performance Evaluation
To evaluate the proposed PirnatEco model, we first evaluate
the performance of the model and then quantify the energy
consumption for its training and prediction.
A. Performance of the PirnatEco model
Figure 2 shows the performance of the model using a
cumulative distribution function (CDF) of the MDE of the
estimated position ˜𝑝x. It can be seen that a very large majority
of the locations predicted for the Random category have an
accuracy of 0-0.2 m, providing the best performance. This
is followed by the Within and Wide categories, where the
accuracy for most locations is in the range of 0.1-1 m and
0.2-1.5 m, respectively. The worst performance is obtained for
the Narrow category, where the prediction accuracy for 90%
of locations only reaches 0.5-2m.
To select the best model for each of the four evaluation
sets, we evaluated accuracy as a function of epochs, as shown
(a) Random: histogram (b) Narrow: histogram (c) Wide: histogram (d) Within: histogram
Figure 4: Distribution of the prediction with PirnatEco.
Table I: Performance evaluation on CTW 2019 dataset
Approach Weights Random Narrow Wide Within
[106]RMSE MDE RMSE MDE RMSE MDE RMSE MDE
Dummy (linear), FCNN <0.1 0.724 1.122 1.055 1.809 0.878 1.428 0.441 0.721
Arnold et al. [14], FCNN 32.3 0.570 0.853 1.001 1.594 0.733 1.145 0.381 0.584
Arnold et al. [14], CNN 7.6 0.315 0.445 0.857 1.330 0.605 0.923 0.454 0.702
De Bast et al. [7], CNN 0.4 0.722 1.120 1.110 1.907 0.828 1.331 0.377 0.611
Chin et al. [12] FCNN 123.6 0.563 0.838 1.007 1.611 0.726 1.133 0.365 0.574
Chin et al. [12] CNN 13.7 0.100 0.093 0.854 1.326 0.530 0.808 0.381 0.620
Cerar et al. [11] CNN4 5.3 0.122 0.149 0.819 1.286 0.514 0.787 0.365 0.552
Cerar et al. [11] CNN4R 10.8 0.113 0.127 0.776 1.227 0.539 0.835 0.351 0.521
Cerar et al. [11] CNN4S 16.3 0.108 0.120 0.821 1.285 0.528 0.804 0.351 0.524
PirnatEco 3.1 0.109 0.112 0.801 1.260 0.523 0.793 0.398 0.596
in Figure 3. It can be seen that the performance improvement
slows down after 85 epochs for the Random category and after
15 epochs for the Wide category. Accuracy for the Narrow
category shows the worst results, with no obvious relation
between accuracy and epochs, while for the Within category
the best performance is obtained after 20 epochs and slightly
deteriorates after 50 epochs. Considering these results, we used
85 epochs in the Random category, 15 epochs in the Narrow
and Wide categories, and 20 epochs in the Within category.
For comparison Chin et al. [12] model needed 67, 30, 23 and
31 epochs, while Cerar et al. [11] needed 181, 32, 34 and 68
epochs, respectively.
Further insight into the quality of the proposed model
is provided by the histograms in Figure 4, depicting the
distribution of predictions as a function of MDE for different
dataset splits. In the case of Random, the spread of MSE
values is very narrow around very small values and shows high
accuracy. In the case of Narrow, the dispersion is relatively
large and forms a bell around 1.2 m. In the case of Wide, the
bell is narrower and higher around 0.8m with relatively few
outliers above 1.5m, while in the case of Within the spread is
relatively large but still with most values below 1 m.
B. Performance comparison with the state of the art
Table I summarizes different proposed localization models
in terms of number of weights and accuracy in the four
considered evaluation categories. As explained in the previous
subsection and also evident in the table, the most accurate
localization was obtained using the Random category, where
the training and test samples are much closer to each other
and both distributed across the entire area of interest, thus
reducing the effects of unbalanced training. The difference in
the success of the neural network structures compared is quite
large. The worst performing model, De Bast et al. [7], is more
than a meter away from PirnatEco, but also has 7.75 times
fewer weights. The best performing results are less than 2 cm
away from ours and have at least four times more weights.
However the results were not as far apart in other evaluation
categories. Our structure did not perform as well in the Within
category, which was second in the overall localization accuracy
achieved. We believe that this can also be explained by the
aforementioned logic of balanced and unbalanced training. In
this category, the differences were actually the smallest, and
all models achieved accuracy within the range of 20 cm.
The worst results were obtained for the Narrow category,
which had the largest difference between training and test
datasets, followed by the Wide category with slightly more
balanced training, but which still did not produce as accurate
results as the Within or Random categories. However, our
model was among the best performing also in the Narrow and
Wide categories.
C. Environmental costs for training and prediction
Finally, we also evaluated the best performing models from
Table I in terms of the carbon footprint for their training.
The calculated carbon footprints for the selected models are
Table II: CO2footprint used in training
NN carbon footprint FLOPs energy
PirnatEco 10.6 g CO2eq. 345 ·106152 kJ
Chin et al. [12] CNN 18.3 g CO2eq. 535 ·106264 kJ
Cerar et al. [11] CNN4R 176.9g CO2eq. 2479 ·1062547 kJ
summarized in Table II. The results represent an average
energy consumption and carbon footprint needed for training
a model for one of the four presented categories. The results
show that on average PirnatEco produces only 6% of the
carbon footprint of CerarCNN4 and 58 % of ChinCNN, while
their performance is very comparable, i.e. our model achieves
99.4 % of the performance of CerarCNN4 and 98.7 % of the
performance of ChinCNN.
In Figure 5, we plot the calculated CO2emissions as a
function of the number of location predictions. The final
number in the graph shows CO2emissions produced if we
made only one prediction for each mobile user in 2025 when
the estimate number of mobile users is supposed to exceed 7.4
billion.
Figure 5: Carbon footprint vs predictions made in logarithmic scale
VI. Conclusions
In this paper, we propose a new DL architecture used in
the PirnatEco model for indoor positioning, paying special
attention to energy efficiency during training and operation
with only minor performance degradation compared to similar
models. In developing the architecture, we started from the
ResNet18 architecture and (i) reduced the size of the filters and
(ii) adapted the pools, while being aware of the specificities
of the data available for the problem. Since there is a paucity
of work evaluating the energy efficiency and computational
complexity of DL models, we also elaborated the methodology
to benchmark the three best performing models in terms of
their carbon footprint for training and prediction. We have
shown that it is possible to develop DL models for wireless
fingerprinting localization that optimize both accuracy and
environmental cost, providing a viable alternative to models
that focus only on accuracy.
Acknowledgments
This work was funded in part by the Slovenian Research
Agency under the grant P2-0016.
References
[1] I. A. Junglas and R. T. Watson, “Location-based services,” Communi-
cations of the ACM, vol. 51, no. 3, pp. 65–69, 2008.
[2] B. Hofmann-Wellenhof, H. Lichtenegger, and J. Collins, Global posi-
tioning system: theory and practice. Springer, 2012.
[3] O. Kanhere and T. S. Rappaport, “Position location for futuristic cellular
communications: 5g and beyond,IEEE Communications Magazine,
vol. 59, no. 1, pp. 70–75, 2021.
[4] J. Yan, G. Qi, B. Kang, X. Wu, and H. Liu, “Extreme learning machine
for accurate indoor localization using rssi fingerprints in multi-floor
environments,” IEEE Internet of Things Journal, 2021.
[5] E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy con-
siderations for deep learning in nlp,” in 57th Annual Meeting of the
Association for Computational Linguistics, 2019, pp. 3645–3650.
[6] E. García-Martín, C. F. Rodrigues, G. Riley, and H. Grahn, “Estimation
of energy consumption in machine learning,” Journal of Parallel and
Distributed Computing, vol. 134, pp. 75–88, 2019. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0743731518308773
[7] S. De Bast, A. P. Guevara, and S. Pollin, “Csi-based positioning in
massive mimo systems using convolutional neural networks,” in 2020
IEEE 91st Vehicular Technology Conference, 2020, pp. 1–5.
[8] S. De Bast and S. Pollin, “Mamimo csi-based positioning using cnns:
Peeking inside the black box,” in 2020 IEEE International Conference
on Communications Workshops, 2020, pp. 1–6.
[9] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, “Densely
connected convolutional networks,” in 2017 IEEE Conference on
Computer Vision and Pattern Recognition. Los Alamitos, CA, USA:
IEEE Computer Society, jul 2017, pp. 2261–2269. [Online]. Available:
https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.243
[10] M. Widmaier, M. Arnold, S. Dorner, S. Cammerer, and S. ten Brink,
“Towards practical indoor positioning based on massive mimo systems,”
in 2019 IEEE 90th Vehicular Technology Conference, 2019, pp. 1–6.
[11] G. Cerar, A. Švigelj, M. Mohorčič, C. Fortuna, and T. Javornik, “Im-
proving csi-based massive mimo indoor positioning using convolutional
neural network,” in 2021 Joint European Conference on Networks and
Communications & 6G Summit, 2021, pp. 276–281.
[12] W. L. Chin, C. C. Hsieh, D. Shiung, and T. Jiang, “Intelligent indoor
positioning based on artificial neural networks,” IEEE Network, vol. 34,
no. 6, pp. 164–170, 2020.
[13] A. Sobehy, E. Renault, and P. Mühlethaler, “Csi-mimo: K-nearest
neighbor applied to indoor localization,” in 2020 IEEE International
Conference on Communications, 2020, pp. 1–6.
[14] M. Arnold, S. Dorner, S. Cammerer, and S. Ten Brink, “On deep
learning-based massive mimo indoor user localization,” in 2018 IEEE
19th International Workshop on Signal Processing Advances in Wireless
Communications, 2018, pp. 1–5.
[15] A. Foliadis, M. H. C. Garcia, R. A. Stirling-Gallacher, and R. S. Thomä,
“Csi-based localization with cnns exploiting phase information,” in 2021
IEEE Wireless Communications and Networking Conference, 2021, pp.
1–6.
[16] G. Hsueh, Carbon Footprint of Machine Learning Algorithms. Senior
Projects Spring 2020. 296. [Online]. Available: https://digitalcommons.
bard.edu/senproj_s2020/296
[17] M. Verhelst and B. Moons, “Embedded deep neural network processing:
Algorithmic and processor techniques bring deep learning to iot and edge
devices,” IEEE Solid-State Circuits Magazine, vol. 9, no. 4, pp. 55–65,
2017.
[18] S. L. Jurj, F. Opritoiu, and M. Vladutiu, “Environmentally-friendly
metrics for evaluating the performance of deep learning models and
systems,” in International Conference on Neural Information Processing.
Springer, 2020, pp. 232–244.
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2016, pp. 770–778.
[20] A. Devarakonda, M. Naumov, and M. Garland, “Adabatch: Adap-
tive batch sizes for training deep neural networks,” arXiv preprint
arXiv:1712.02029, 2017.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Multiple-input multiple-output (MIMO) is an enabling technology to meet the growing demand for faster and more reliable communications in wireless networks with a large number of terminals, but it can also be applied for position estimation of a terminal exploiting multipath propagation from multiple antennas. In this paper, we investigate new convolutional neural network (CNN) structures for exploiting MIMO-based channel state information (CSI) to improve indoor positioning. We evaluate and compare the performance of three variants of the proposed CNN structure to five NN structures proposed in the scientific literature using the same sets of training-evaluation data. The results demonstrate that the proposed residual convolutional NN structure improves the accuracy of position estimation and keeps the total number of weights lower than the published NN structures. The proposed CNN structure yields from 2cm to 10cm better position accuracy than known NN structures used as a reference.
Article
A new extreme learning machine (ELM) localization technique that uses received signal strength indicator (RSSI) fingerprints only is proposed for multi-floor environments. This structured scheme forms multiple individual ELMs for the floors as well as for the geographically formed data clusters of each floor. Multi-floor environments often have huge amount of training and online measurement data. To maximize efficiency we develop a data preprocessing algorithm, aiming to: (a) efficiently extract out only the essential information from the vast amount of datasets and reduce the data dimension, and (b) transform the floor-level datasets and positioning datasets of each floor into a proper structure that is suitable for the proposed ensemble ELM technique. The proposed solution is unique in that its offline phase exploits multiple individual ELMs for all floors to generate a set of floor-level classification functions with the preprocessed training datasets, and for each floor, it exploits multiple ELMs for the data clusters to generate a set of position regression functions. The online phase executes a coarse localization step to estimate the floor by using the floor-level classification functions and a refined step to estimate the position on the floor by using the position regression functions. The proposed algorithm and several existing algorithms are implemented to perform localization using the same measured datasets in a multi-story building. For both floor estimation and localization on the floor, it outperforms existing schemes. And for most cases the performance gap is substantial.
Article
With vast mmWave spectrum and narrow beam antenna technology, precise position location is now possible in 5G and future mobile communication systems. In this article, we describe how centimeter-level localization accuracy can be achieved, particularly through the use of map-based techniques. We show how data fusion of parallel information streams, machine learning, and cooperative localization techniques further improve positioning accuracy.
Conference Paper
Access to full text here: https://hal.archives-ouvertes.fr/hal-02491175
Article
LBS has become an integral part of people's life nowadays. However, the GPS restricted by the shielding effect is unavailable for indoor environments. Therefore, accurately locating an electronic device indoors has become a challenging issue in recent years. This work employs the CSI combined with neural networks to achieve an accurate indoor positioning. The CSI refers to known channel properties of a communication link in wireless communications. This information describes how a signal propagates from the transmitter to the receiver and represents the combined effects of, for example, scattering, fading, and power decay with distance. This work will evaluate several neural networks for the positioning, including the FCNN, CNN, and GCNN. In multi-carrier communication systems, the CSI of adjacent subcarriers has a high correlation, and hence, the CNN is promising to learn and extract the features of this input information corresponding to the location of radio devices. Beyond that, we also investigate an improved CNN, that is, the GCNN, which has more talent to locate in indoor environments than traditional CNNs. Experimental results show that the proposed GCNN can achieve a root-mean-square error (RMSE) of less than 0.08m and 0.3m for 16 and two antennas, respectively. In addition, the computational complexities and required numbers of parameters of compared deep neural networks have been analyzed as well.