A visual analytics system for radio frequency fingerprinting-based localization.
A Visual Analytics System for Radio Frequency Fingerprinting-based
Erich P. Stuntebeck†
John T. Stasko‡
Gregory D. Abowd§
School of Interactive Computing & GVU Center
Georgia Institute of Technology
Radio frequency (RF) fingerprinting-based techniques for localiza-
tion are a promising approach for ubiquitous positioning systems,
particularly indoors. By finding unique fingerprints of RF signals
received at different locations within a predefined area beforehand,
whenever a similar fingerprint is subsequently seen again, the lo-
calization system will be able to infer a user’s current location.
However, developers of these systems face the problem of finding
reliable RF fingerprints that are unique enough and adequately sta-
ble over time. We present a visual analytics system that enables
developers of these localization systems to visually gain insight
on whether their collected datasets and chosen fingerprint features
have the necessary properties to enable a reliable RF fingerprinting-
based localization system. The system was evaluated by testing and
debugging an existing localization system.
Index Terms: H.5.2 [User Interfaces]: Graphical user interfaces
(GUI)—; D.2.5 [Testing and Debugging]: Debugging aids—
Tracking the location of people and objects inside of buildings has
been an active area of research for some years. The traditional
means of accomplishing this outdoors - GPS satellites - is unavail-
able indoors since buildings block the satellite signals. One ap-
proach researchers have taken in solving this problem is generating
their own indoor radio frequency (RF) signal(s) as a type of local
GPS signal. Small tags, which can be thought of as ”indoor GPS
receivers” track some aspect of these locally generated RF signals
and use this information to locate themselves within the building.
Outdoor GPS receivers operate by triangulating their position
based on the time of arrival of signals from multiple GPS satel-
lites. There is typically line-of-sight between the GPS satellites and
the receivers, allowing predictable RF signal propagation. Indoors,
RF signal propagation is very difficult to predict due to phenomena
such as multi-path propagation, wherein the signal can propagate
and furniture. Small movements in physical space can produce
large differences in the signal since the multiple paths may con-
structively or destructively interfere at any given position. These
phenomena are nearly impossible to predict a priori.
To address this problem, researchers have developed the method
of Radio frequency location fingerprinting. RF fingerprinting relies
on measurements of relevant features of the signals at various dis-
cretized locations. These measurements are taken when the system
is initially deployed. Later, when the system is in use, live mea-
surements taken by the mobile tags are matched to the fingerprinted
Figure 1: Generation of a location fingerprint. (a) An RF receiver
receives the RF signal at a location block. (b) the received signal
data are parsed and preprocessed. (c) The sampled signal data are
the potential features. (d) The location fingerprint is a subset of these
collected features that is unique to this location block.
measurements to calculate the location of the tags. Since the year
2000, there have been over thirty fingerprinting-based localization
systems proposed by researchers around the world [10, 12, 13].
An RF fingerprint consists of a set of features of the available
RF signals at a particular location. A commonly used feature is the
received signal strength of a signal at a particular frequency, illus-
trated in Figure 1. RF fingerprinting requires that the chosen fea-
tures vary in space so as to be able to differentiate the locations, but
remain constant in time, so that the off-line fingerprint measure-
ment phase does not need to be continually repeated. A location
fingerprint is normally built from multiple sets of samples in order
to tolerate some degree of noise in the features. Each set of samples
collected from all the surveyed locations that constitutes a dataset
is called a site survey. Site surveys can be gathered with some time
in between to observe how temporally stable the fingerprints are.
One of the most important challenges for RF fingerprinting,
therefore, is to select features of the RF signal for the fingerprint
that will produce reliable location estimates of the tags. Too few
features selected for the fingerprint may not give sufficient infor-
mation to differentiate the various locations of interest, while too
many features may include bad features that are unstable in time,
causing the system to produce poor results.
To aid with the process of RF fingerprinting-based localization
system development, we present a visual analytics system for view-
ing the quality of the fingerprinting data collected during a site sur-
vey. By utilizing heat maps to display different perspectives of the
features used in the location fingerprints, developers of these sys-
tems can not only visually inspect the geospatial feedback for the
location classification results, but also be able to select the features
to use by visually finding those that are temporally stable and spa-
tially differentiable in a high dimensional feature space. When nec-
essary, developers can even explore lower level details of any indi-
vidual feature to see raw values and relationships to others through
a multivariate visualization. Using our system, developers of local-
ization systems can tell whether their datasets collected are capable
of building good RF location fingerprints that can enable accurate
location estimates over time.
The contribution of this work is to show how visual ana-
lytics can support the development and practical deployment of
fingerprinting-based localization systems. We feel that this tool is
a particularly good example of visual analytics because the most
effective way to find a good location fingerprint is to combine the
computational data analysis with an interactive geospatial visual-
The RADAR system proposed by Bahl and Padmanabhan in 2000
was the earliest RF fingerprinting-based localization system .
The researchers were able to achieve a median of 2-3 meters ac-
curacy indoors using Wi-Fi signals. Since then, researchers have
reported over thirty systems using different RF signals or classifica-
tion algorithms . However, although these localization systems
are easy to deploy, the initial setup and calibration process for gen-
erating the fingerprints is tedious and time consuming . They
can also be less reliable when the features used for the fingerprints
are not spatially differentiable and stable over time. Kaemarungsi
and Padmanabhan studied the properties of Wi-Fi location finger-
prints using received signal strength and learned that even the pres-
ence of a human body can make a significant difference on the fin-
gerprints. Therefore, itiscrucialtoidentifyandremoveunstable
features in the generated fingerprints to maintain the reliability of
the localization system over time.
Visualizing RF signals on a geospatial map using heat maps is
prevalent in 802.11 WLAN site survey tools for optimizing Wi-Fi
network coverage. Ekahau Site Survey took a step forward to not
only visualize the propagation of Wi-Fi signals but also integrate
the output to power their Real-Time Location Tracking System .
Nevertheless, this consumer-facing site survey tool cannot support
more advanced visual debugging functions on feature selection and
location fingerprint classification.
Spectrum analyzers for identifying physical locations of signal
sources also require visualizing signals on a geospatial map and
classifying them. Tektronix’s RFHawk Signal Hunter identifies
potential malicious RF signals by singling them out from known
signals . The malicious signal will then be documented on a
geospatial map with a color-coded wave form or signal strength
icon for later reference. However, as the tool did not aim to sup-
port location fingerprinting, the wave form icons on the map have
little power to show the individual feature differences for building
spatially differentiable location fingerprints.
Andrienko and Andrienko used interactive cartographic visual-
for knowledge discovery . Their work suggested that interac-
tive visual facilities that allow an analyst to manipulate variables
and immediately observe the resulting changes in a map is effec-
tive for geospatial data analysis. Our visual analytics system took
a step further for the K-nearest neighbor classification algorithm as
to even visualize the intermediate steps of the algorithm for indoor
Our system was developed with data from the PowerLine Posi-
tioning localization system (PLP) . PLP injects an RF signal
into the power lines of a residential building and uses the power
lines as a giant antenna for propagating the signals. The mobile
wireless tag can then use this signal’s characteristics as the feature
settofingerprintlocations withintheareawherethepower linescan
reach. The latest revision of this system utilizes a feature set that
samples 44 different frequencies of the amplitudes of the signal for
location fingerprinting . All the illustrations shown in this pa-
per are either using the original data of this system or a modified
version of it. The data of the system was gathered in a residential
laboratory on a university campus. This lab has a similar layout and
electrical infrastructure as a common residential house. We marked
out one meter by one meter blocks on the floor, producing 66 dif-
ferent locations for our site survey.
In the next section, we will discuss the current problems in build-
ing a good location fingerprint with the existing, analytic text-based
machine learning approaches. In Section 4, we will briefly provide
an overview of the visualization interface. We will present a sce-
nario that demonstrates how our visual analytics system works in
Section 5. More details and example uses of the visualization will
be discussed in Section 6.
3RADIO FREQUENCY FINGERPRINTING-BASED LOCALIZA-
System Development Procedure
The procedure to build a location fingerprinting system can be
roughly decomposed into three steps. The system presented here
is focused on supporting the last two steps.
1. The first step is to gather the datasets and feature sets that can
be potentially used to generate a location fingerprint database.
This requires a tedious site survey that maps where the RF
signals are gathered in the real world.
2. The second step is to find the right set of RF signal character-
istics for the fingerprints. This step involves feature selection
and building the fingerprints with the selected features.
3. The last step is to test the collected fingerprints with RF sig-
nals received at random locations in the surveyed area (ran-
dom fingerprints). The signal data will be input to the local-
ization system to see if it can accurately find the true loca-
tions of these random fingerprints through classification algo-
3.2Problems and Challenges for Building Location Fin-
The generation of the location fingerprint database on a radio map
requires a site survey in advance. This survey normally requires
a user to manually tell the system where they are so that the sys-
tem can learn the RF signal pattern at that specific location. This
process can be very tedious and time-consuming. For example, in
the PowerLine Positioning system, the time to survey each location
with the full 44 features can take around 2 minutes. It takes about 2
hours to survey 66 locations in practice. If the location fingerprints
are unstable over time, users might need to conduct the site survey
again later to calibrate the system.
One major challenge is how to find the best features that can
be used for building a set of good location fingerprints. In prac-
tice, we would like to use as few features as possible to build the
fingerprints. There are two reasons for this. First, the fewer the
features means that the training time and classification time for the
machine learning algorithm can be shorter. For real-time localiza-
tion, this can be very crucial. Second, fewer numbers of features for
a fingerprint can result in a shorter time required for the site survey
data collection process. Half the number of features needed means
half the time for this tedious preprocessing procedure. However,
the fewer the features used, the less likely individual fingerprints
will be unique, resulting in higher overall classification error. So
the technical challenge is how to find a balancing point where a
smaller set of features can be used while the system is still capable
to accurately classify a certain area of interest.
The current approach used by localization developers to prove these
required properties of the location fingerprints are achieved is by
running machine learning algorithms with the fingerprints gathered
at different times. The outputs of this approach are the text-based
classification accuracy and misclassified locations when they test
the fingerprints. There are several problems with this approach:
First of all, it is not easy to tell how each feature composed in a
fingerprint is contributing to the overall classification results. For
practical applications, one might have a few locations that are more
important to be always classified with high accuracy while other
locations are fine to be occasionally incorrect. There are many fea-
ture selection algorithms to analyze how each feature can build up
the overall accuracy. However, different features may improve the
classification accuracies of different areas on the radio map while
they all improved the same overall accuracy.
Additionally, if there are a few locations that are always misclas-
sified by the algorithm, it is very difficult to dig down into the multi-
dimensional raw feature sets to identify the problem. Is it caused
by a problematic training data set gathered or is the current finger-
print just not unique enough to correctly classify this location? If
this kind of debugging cannot be performed, it is very hard for a
location fingerprinting system to be practically deployed with the
desired accuracy for any specified area of interest. Moreover, dur-
ing the site survey process, sometimes there are RF interferences.
These interference events can jeopardize the reliability of the pro-
duced fingerprints that should be mostly accurate for the most com-
mon cases. Moreover, it is not easy to find extreme cases when
dealing with multidimensional data.
The requirements can be summed up in two major questions that
need to be answered:
Problems with the Current Approach
1. How do we effectively find a set of location fingerprints that
are good enough for certain areas of interest?
2. If there are some locations that consistently receive inaccurate
classifications, how do we find the problem?
To answer these two questions, several capabilities are required.
1. Test new unknown fingerprints with a preview of classifica-
tion results on a map.
2. Test different subsets of features that can be used to compose
the location fingerprints.
3. Examine the raw data of each individual feature for the fin-
gerprints at different locations and its temporal stability.
4. Examine the spatial variance between locations in the high
dimensional feature space of the fingerprints.
questions and targets these tasks. However, in subsequent use of the
system, several unexpected interesting insights of the datasets and
features were also discovered.
The interface of the system contains four main panels as shown in
VISUAL ANALYTICS SYSTEM OVERVIEW
1. Dataset selection This panel allows developers to select the
datasets to be viewed or used. Datasets can be selected in-
dividually or with others according to the operation context.
For example, multiple datasets should be selected when one
attempts to calculate the standard deviation between them
Figure 2: The four panels of our visual analytics system interface.
whereas only a single selection is needed when one attempts
to view the raw feature value of a specific dataset. To the right
of the dataset selection combo box is a timeline that shows
when the datasets selected were collected. When a dataset is
selected, the oval symbol representing it will be highlighted in
blue. Each oval symbol on the top or bottom of the timeline
represents a data set gathered.
2. Feature selection This panel allows developers to select the
features to use to compose the fingerprints. It supports single
or multiple selections according to the context of use.
3. Main map The main map panel is the display area for the
geospatial visualization. A preloaded map is displayed in the
background to provide the geospatial context for the visual-
ization. By selecting different viewing perspectives, datasets
and feature sets, this panel shows a grid-based heat map for
the selected parameters. The heat map representation is very
useful in showing the relative query results between different
locations on the map. This visualization technique is partic-
ularly effective for examining a fingerprinting-based localiza-
tion system because we are most interested in the spatial dif-
ferentiability of the location fingerprints. At the bottom of
this perspective is the status bar. It shows the current selected
feature set, the mouse interaction mode and information about
the heat map being presented.
4. Perspective control This panel is used to control the view-
ing perspectives. The system provides three different viewing
perspectives, each showing a different type of information of
the datasets, features and complementing each other when the
developer intends to drill down to a specific problem.
• Data Variance Perspective (Figure 3) shows the raw data of
all the datasets with their corresponding feature sets.
• Spatial Variance Perspective (Figure 8) shows the spatial
variance of between fingerprints in the high dimensional fea-
ture space using the selected features.
• Test Classification Perspective (Figure 4) provides a geospa-
tial representation to show the results of the location classifi-
cation using the generated fingerprints.
We use a green-gray-red color scheme for the heat maps dis-
played in the main map panel. Green indicates better results and
red indicates worse. As for other colors used in the system, we
avoid using green or red to avoid any semantic confusion.
antenna amp 447
antenna amp 448
antenna amp 500
antenna amp 600
antenna amp 601
Table 1: Feature transformation for ranking version of PLP
5.1PLP Ranking Dataset
To illustrate use of this visual analytics system, we present an actual
analysis scenario we conducted using the PLP data. From our pre-
vious research, we knew the original feature values (the raw signal
data) from the power line is useful for localization. However, since
the original data was real valued, it is sometimes more clustered in
the high dimensional feature space. As a result, when the location
fingerprints contains certain amount of noise in the signal, the clas-
sification would be incorrect. Therefore, one of the researchers pro-
posed to transform the features of the datasets from raw amplitude
values into the relative ranks of raw amplitude values as illustrated
in Table 1. Using the ranking of the original feature values will cre-
ate a unified spacing in between the them for each block. In theory,
this approach can be more robust to noise because the real values
are dynamically ranged and rounded up into a ranking form. Our
task is to see if the PLP ranking version is better than the original
One major evaluation criteria for PLP ranking version is to com-
pare an optimal set of fingerprints built for an area of interest of it
to the original PLP. The following scenario will show how to use
the system to rapidly build a good location fingerprint database that
is capable of maximizing the classification accuracy of an area of
interest for the PLP ranking version. The same procedure is con-
ducted on the original PLP for comparison. For the scenario, we
assume that the kitchen area in the residential lab (lower left area)
is our area of interest as shown in Figure 3.
5.2Scenario: Building an optimized fingerprint for an
area of interest
5.2.1Temporal stability feature selection
After importing all the datasets, ranked feature sets and the resi-
dential lab map into the system, the system will begin with the Data
Variance Perspective (Figure 3). It shows the raw feature values as
a heat map on the main map view. The greener blocks represents
higher raw values (stronger signal). The first thing we would like
to determine for feature selection is whether the datasets we gath-
ered at different times are consistent enough to build reliable finger-
prints. Therefore, we check the ”Calculate STD between datasets”
checkbox and select all the datasets to calculate the standard de-
viation of each feature throughout all the datasets. Previous re-
search found that the smaller this standard deviation is, the higher
the overall system accuracy will be . Because our focus is to
compare the consistency of different features, we then check the
”Global color over all features” checkbox to dynamically range the
colors properly for inter-feature comparison. The main map now
shows a mostly green heat map. This means that most of the loca-
tions on the map for the selected feature are roughly consistent. By
cycling through the features, we find that 9 features are exhibiting
less consistent values (red blocks) at our area of interest such as
the one shown in Figure 3. Therefore, they are eliminated from our
potential feature set for the target fingerprints.
Figure 3: Standard deviation view of a selected feature in the Data
Variance Perspective that shows several temporally unstable blocks.
One is in the kitchen area and two are in the rooms at the back of the
We now switch to the Test Classification Perspective to preview lo-
cation classification results on the map (Figure 4). By default, it
will use all the features to perform a leave-one-out cross valida-
tion on the datasets. As explained earlier in Section 3.2, we gen-
erally prefer a smaller fingerprint with fewer features. As a result,
we first eliminate the 9 features identified in the last section from
the full 44 features. Then, we click the Auto Selection button to
use a correlation-based feature selection algorithm to automatically
filter out some irrelevant features from the remaining 35 features.
This results in an elimination of 14 more features. The classifica-
tion results are shown in Figure 4. However, by cycling through
different test datasets to use for cross validation, we notice that al-
though some of them have all the kitchen’s blocks correctly clas-
sified, some test datasets like N2-1, still have a couple of blocks
Preview classification result
5.2.3Debugging problematic blocks by finding spatial vari-
In order to find the problem of the two misclassified blocks, we
switch into the Spatial Variance Perspective (Figure 5). In this
perspective, we click on the problematic block a1.0n located at
the lower left corner of the map in the kitchen area. From the
heat map shown in Figure 5, we can see the reddest block on the
map is b2.0n, the block where a1.0n was misclassified to. In this
case, a1.0n was probably misclassified because of the closeness
of the fingerprints in the high dimensional feature space of these
two blocks. Therefore, if we can find a few features to change this
closeness, the classification could potentially be corrected. So by
the clicking on the block, we can pull up the parallel coordinates
view that shows the differences of all the raw feature values from
other blocks to further inspect the data.
In the parallel coordinates view, we can visualize all the raw fea-
Figure 4: Features selected using automatic correlation-based fea-
ture selection in the Test Classification Perspective. 21 features are
selected in this view. N2-1 have misclassified two blocks in the lower
left corner of the kitchen when used as the test dataset.
ture values for a selected block to directly identify its degree of spa-
tial differentiability and temporal stability. The default view shows
the difference of raw feature values between a1.0n, the block se-
lected, and the block on the y-axis. A higher value shown in the
x-axis indicates there is more spatial variance between the blocks
in the high dimensional feature space. Moreover, if we only select
one feature to investigate and plot all the datasets’ values together,
we will be able to identify the degree of temporal stability too
by visually observing the pattern overlapping amount in the plot.
Therefore, the ideal form of parallel coordinates for a good feature
should have a pattern like the one shown in Figure 6 (a). How-
ever, due to RF interference and multi-path propagation, in many
cases we will see patterns like Figure 6 (b) or (c) which either do
not have sufficient spatial differentiability or temporal stability. As
a result, we could identify features like antenna amp 500 (b) and
antenna amp 10000 (c) in Figure 6 to be potential removal candi-
Now with these removal candidate features identified, we can go
back to the Test Classification Perspective and conduct some trial
and error with these features included or excluded to see the effect
on the overall classification results. It turns out that removing an-
tenna amp 9500 will give us the distinction needed for the lower
left block (Figure 7(a)). We can continue this procedure several
times to optimize all the classifications of the blocks in the area of
Within a few minutes of experimenting with the feature selection,
we managed to find 15 features for the fingerprints that can best
classify locations in the kitchen (97.44 percent accuracy) as shown
in Figure 7 (a). For comparison, we also used this method to find
the best fingerprints in the original PLP datasets. Five features
Figure 5: Selected block a1.0n is clearly closest to block b1.0n as
shown in the Spatial Variance Perspective.
Figure 6: Parallel Coordinates of three different features plotted from
block a1.0n. Each of the lines represents a different dataset. (a)
An ideal feature with difference of feature values consistent across
datasets and have sufficient spatial variance to most of the blocks (b)
Problematic feature with high temporal stability but low spatial differ-
entiability (c) Problematic feature with high spatial differentiability but
low temporal stability.