Content uploaded by Jianfei Yang
Author content
All content in this area was uploaded by Jianfei Yang on Jun 05, 2020
Content may be subject to copyright.
Poster: WiFi-based Device-Free Human Activity Recognition
via Automatic Representation Learning
Han Zou
University of California, Berkeley
Berkeley, CA
hanzou@berkeley.edu
Yuxun Zhou
University of California, Berkeley
Berkeley, CA
yxzhou@berkeley.edu
Jianfei Yang
Nanyang Technological University
Singapore
yang0478@ntu.edu.sg
Weixi Gu
Tsinghua University
China
guweixigavin@gmail.com
Lihua Xie
Nanyang Technological University
Singapore
elhxie@ntu.edu.sg
Costas Spanos
University of California, Berkeley
Berkeley, CA
spanos@berkeley.edu
ABSTRACT
Existing human activity recognition approaches require either
the deployment of extra infrastructure or the cooperation
of occupants to carry dedicated devices, which are expen-
sive, intrusive and inconvenient for pervasive implementation.
In this paper, we propose SmartSense, a device-free human
activity recognition system based on a novel machine learn-
ing algorithm with existing commercial off-the-shelf (COTS)
WiFi routers. By exploiting the prevalence of existing WiFi
infrastructure in buildings, we developed a novel OpenWrt
based firmware for COTS WiFi routers to collect the CSI
measurements from regular data frames. To identify different
human activities, an automatic kernel representation learning
method, namely auto-HSRL, is established to selection infor-
mative Hilbert space patterns from time, frequency, wavelet,
and shape domains. A new information fusion tool based
on multi-view kernel learning is proposed to combine the
representations extracted from diverse perspectives and build
up a robust and comprehensive activity classifier. Extensive
experiments were conducted in an office and the experimental
results demonstrate that SmartSense outperforms existing
methods and achieves a 98% activity recognition accuracy.
CCS CONCEPTS
•Human-centered computing →Ubiquitous and mo-
bile computing;
KEYWORDS
WiFi, human activity recognition, representation learning
1 INTRODUCTION
Human activity recognition plays an indispensable role in a
myriad of emerging applications in smart buildings. Various
Permission to make digital or hard copies of part or all of this work
for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advantage
and that copies bear this notice and the full citation on the first page.
Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
MobiCom ’17, October 16–20, 2017, Snowbird, UT, USA
©2017 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-4916-1/17/10.
https://doi.org/10.1145/3117811.3131266
human activity recognition systems, such as camera, Passive
Infra-Red (PIR) sensor, and embedded sensors in mobile
and wearable devices, have been proposed in recent years
[
1
]. However, certain limitations hinder them from practical
and pervasive implementation. They either require occupants
to carry or wear the devices or need extra infrastructure
for activity monitoring. With the pervasive and wide avail-
ability of WiFi infrastructure, leveraging WiFi signals to
estimate the location of occupants and distinguish human
activities becomes feasible [
4
,
5
]. Channel State Information
(CSI), a fine-grained channel measurement from the physical
layer that describes how WiFi signals propagate from the
transmitter (TX) to receiver (RX) through multiple paths
at the granularity of OFDM subcarriers, has become avail-
able from commodity WiFi NIC [
2
]. CSI is able to reveal
human activity in a non-intrusive manner because the body
movements during different activities change the signal propa-
gation paths and lead to the high variation of CSI. Identifying
activities with high recognition accuracy can be feasible by
analyzing the CSI measurement at RX. Although some C-
SI based activity recognition systems have been proposed
in recent years [
1
], they only explore features in time and
frequency domain, require expert feature engineering, and
these features may not be transferable due to environmental
and temporal dynamics. Therefore, it is still challenging to
realize human activity recognition in an accurate, automatic
and non-invasive manner.
In this paper, we propose SmartSense, a device-free human
activity recognition scheme that is able to accurately dis-
tinguish common activities using only COTS WiFi routers.
We developed a novel OpenWrt based firmware for WiFi
routers so that the CSI measurements from regular data
frames can be obtained directly from them. For the purpose
of human activity recognition, a novel machine learning al-
gorithm, namely automatic kernel representation learning
(auto-HSRL), is established. The algorithm starts with a
library based Hilbert space patterns generation from time,
frequency, wavelet, and shape domains. Then an optimal k-
ernel string representation is learned with an efficient greedy
method. Finally, the representations obtained from diverse
domains are combined together via a new information fusion
tool in the form of multi-view kernel learning. Extensive
Poster
MobiCom’17, October 16-20, 2017, Snowbird, UT, USA
606
Figure 1: System architecture of auto-HSRL.
experiments were conducted in typical indoor environment
and demonstrated that SmartSense can distinguish numbers
of daily human activities with 97.9% recognition accuracy
using only two COTS WiFi routers.
2 SYSTEM DESIGN
2.1 CSI enabled WiFi Router Platform
Conventional CSI-based sensing systems adopt either the
Intel 5300 NIC tool or the Atheros 9390 tool to extract the
CSI data from modified WiFi NIC card equipped with laptop
or PC. The demand of laptops as receivers severely hinders
them from large-scale implementation. To overcome this
bottleneck, we upgrade the Atheros CSI Tool [
2
] and develop
a new OpenWrt based firmware for COTS WiFi routers
so that the CSI measurements from regular data frames
transmitted in the existing traffic can be obtained directly
from routers instead of using laptop or PC with external
WiFi NIC adapter. Moreover, the traditional Intel 5300 NIC
tool only provides CSI for 30 out of the 56 subcarriers. Our
platform reports CSI data on all the 114 subcarriers for 40
MHz bandwidth on each TX-RX pair, which provides much
more information than conventional CSI tools. SmartSense
only requires two routers, one serves as TX and the other is
adopted as RX, to perform device-free activity recognition.
TX continuously transmits data packets and RX monitors the
channels, captures and analyzes these packets. Suppose
𝑁𝑇 𝑋
and
𝑁𝑅𝑋
represent the number of transmitting and receiving
antennas. At each time instant,
𝑁𝑇 𝑋 ×𝑁𝑅𝑋 ×
114 CSI streams
are available to analyze the variations of WiFi communication
links caused by human presence and movement.
2.2 Automatic Hilbert Space
Representation Learning
In order to recognize various human activities in an accurate,
automatic and robust manner, we propose auto-HSRL, a nov-
el automatic representation learning and multi-view learning
scheme. Figure 1 illustrates its system architecture, which
consists three stages. Firstly, we calculate various features
based on the CSI measurements from time (mean, variance,
skewness, kurtosis), frequency (FFT), wavelet (DTW), and
shape (DTW) domains. Previous WiFi-based activity recogni-
tion approaches often resort to manual inspection, or simply
combine all information together as the input of an ML algo-
rithm, which leads to degraded recognition performance due
to the suboptimal usage of extracted patterns. We propose
auto-HSRL, a simple yet powerful tool to learn useful repre-
sentation from data. Consider a possibly nonlinear mapping
𝜑
:
𝒳 → ℋ
from the raw data space
𝑥∈ 𝒳
to a feature space
ℋ
. The Mercer’s theorem allows the expression of the inner
product between the features via a positive definite (PD or
Mercer’s) kernel function, i.e.,
𝑘
(
𝑥, 𝑥′
) =
⟨𝜑(𝑥), 𝜑(𝑥′)⟩ℋ
The
kernel function
𝑘
(
·,·
) is uniquely determined by the respec-
tive reproducing kernel Hilbert space
ℋ
. Multiple Mercer’s
kernels are calculated and leveraged to offer more modeling
capacity and performance improvement due to increased mod-
eling flexibilities. Then, auto-HSRL learns a multiple kernel
information representation for each domain. The combina-
tion structure is called “string kernel”, which contains the
multiplication and addition operations of base kernels. The
Hilbert-Schmidt Independence Criterion (HSIC) is leveraged
to learn the optimal string kernel. HSIC can be computed
with statistical estimation techniques, such as
1
𝑚(𝑚−1) tr(𝐾𝐿) + 1𝑇𝐾11𝑇𝐿1
(𝑚−1)(𝑚−2) −2
𝑚−21𝑇𝐾𝐿1
,\
HSIC(𝐾, 𝐿)
where
𝐾
is the kernel matrix calculated by
𝐾𝑖,𝑗
=
𝑘
(
𝑥𝑖, 𝑥𝑗
),
and
𝐿
is the kernel matrix of the target output space. In
the current case dealing with activity recognition, the output
kernel is computed with
𝐿𝑖,𝑗
=
𝑚−1
𝑦𝑖𝛿
(
𝑦𝑖, 𝑦𝑗
). We propose a
greedy algorithm to learn the optimal string kernel based on
HSIC. After that, we propose a multi-view learning based
information fusion through multiple kernel learning (MKL).
The key idea is to perform an overall regularized empirical
risk minimization over the combined Hilbert space identified
from the previous step, in order to find the classifier that takes
all information sources into consideration. We substantiate
information fusion by solving the following MKL problem:
min
{𝑓𝑚},𝑏,𝜉,𝑑
1
2
𝑚
1
𝑑𝑚
‖𝑓𝑚‖2
ℋ𝑚+𝐶
𝑖
𝜉𝑖
s.t. 𝑦𝑖
𝑚
𝑓𝑚(𝑥𝑖) + 𝑦𝑖𝑏≥1−𝜉𝑖∀𝑖
𝑚
𝑑𝑚= 1, 𝑑𝑚≥0∀𝑚, 𝜉𝑖≥0∀𝑖
(Primal)
The above Primal problem can be rewritten into a dual form
that directly uses kernel representations learned from the last
step. The Primal problem is equivalent to
min
𝑑𝐽(𝑑) s.t.
𝑚
𝑑𝑚= 1, 𝑑𝑚≥0
where
𝐽(𝑑) = max𝛼−1
2𝑖𝑗𝛼𝑖𝛼𝑗𝑦𝑖𝑦𝑗𝑚𝑑𝑚𝐾𝑚+𝑖𝛼
𝑖𝛼𝑖𝑦𝑖= 0,0≤𝛼𝑖≤𝐶∀𝑖
𝐽
(
𝑑
) is a convex function of
𝑑
, which can be solved with gradi-
ent based methods [
3
]. The kernel representations
𝐾𝑇, 𝐾𝐹, 𝐾𝑊, 𝐾𝑆
obtained from auto-HSRL can be combined together within
the above MKL learning framework.
Poster
MobiCom’17, October 16-20, 2017, Snowbird, UT, USA
607
96.9%
0.0%
0.0%
0.0%
1.3%
98.1%
0.0%
0.0%
0.9%
0.0%
99.0%
2.0%
0.9%
1.9%
1.0%
98.0%
sit stand walk run
Estimation
sit
stand
walk
run
Grounf Truth
(a) Confusion matrix of human activity recogni-
tion accuracy.
0 300 600 900 1200 1500 1800 2100
Time [min]
Empty
Sitting
Standing
Walking
Running
(b) Occupant behavior profile in the office on one
weekday.
(c) Occupant behavior profile in the office on
one weekday.
Figure 2: Performance of SmartSense.
3 PERFORMANCE EVALUATION
We implemented SmartSense on 2 TPLINK N750 WiFi router-
s: one serves as TX and another one acts as RX. We upgraded
their firmware to our CSI OpenWrt version so that the CSI
measurements from regular data frames are reported directly
from them. TX was operated on 5 GHz frequency band and
we leveraged the 40 MHz channel bandwidth. We used 1
TX antenna to send data packets to 3 RX antennas at a
transmission rate of 700 packets/s. 4 common activities, e.g.
sitting, standing, walking and running, were performed by 10
volunteers to validate the activity recognition performance
of SmartSense in an office (50
𝑚2
). Training and testing data
were collected on different days to evaluate the performance
of SmartSense under both temporal and environmental dy-
namics. Figure 2(a) depicts the confusion matrix of activity
recognition accuracy using SmartSense. It achieves an aver-
age cross-validation activity recognition accuracy of 97.9%
across all the four activities. As shown in Figure 2(a), its
recognition accuracy for sitting, standing, walking and run-
ning is 96.9%, 98.1%, 99% and 98% respectively. SmartSense
considers the features from four domains and combines them
with an optimal representation, which can reveal the nature
of each activity more clearly than existing approaches.
To validate its performance for practical implementation,
we leveraged SmartSense to monitor one occupant’s activities
and analyzed his activity patterns in an office during one
weekday. As presented in Figure 2(b), we can infer when the
occupant arrived at the office, the duration of his lunch break
and when he got off work with the measurements of Smart-
Sense. With this fine-grained occupant activity information,
the building management systems can adjust the light and
ventilation accordingly to reduce the energy consumption in
commercial buildings. As shown in Figure 2(c), the occupant
spent more than 53% of the time sitting in the office. Thus,
we can recommend him to exercise more in order to have a
healthier lifestyle and also improve his productivity.
4 CONCLUSION
In this paper, we proposed SmartSense, a device-free human
activity recognition scheme using only COTS WiFi router-
s. We designed a CSI enabled WiFi router platform and
various features from time, frequency, wavelet, and shape
domains were calculated based on the de-noised CSI data and
developed a multi-view kernel learning model to select the
most representative subset of features to build up a robust
activity classifier. Extensive experiments were conducted and
demonstrate that SmartSense can distinguish numbers of
daily human activities with 97.9% recognition accuracy by
leveraging only two WiFi routers. It has great potential to
serve as a fundamental service to facilitate a broad range of
emerging applications in smart buildings.
ACKNOWLEDGEMENT
This research is funded by the Republic of Singapore National
Research Foundation (NRF) through a grant to the Berkeley
Education Alliance for Research in Singapore (BEARS) for
the Singapore-Berkeley Building Efficiency and Sustainability
in the Tropics (SinBerBEST) Program. BEARS has been
established by the UC Berkeley as a center for intellectual
excellence in research and education in Singapore.
REFERENCES
[1]
Wei Wang, Alex X Liu, Muhammad Shahzad, Kang Ling, and
Sanglu Lu. 2015. Understanding and modeling of wifi signal based
human activity recognition. In MobiCom 2015. ACM, 65–76.
[2]
Yaxiong Xie, Zhenjiang Li, and Mo Li. 2015. Precise power delay
profiling with commodity wifi. In MobiCom 2015. ACM, 53–64.
[3]
Yuxun Zhou, Ninghang Hu, Costas J Spanos, et al
.
2016. Veto-
Consensus Multiple Kernel Learning.. In AAAI. 2407–2414.
[4]
Han Zou, Baoqi Huang, Xiaoxuan Lu, Hao Jiang, and Lihua
Xie. 2016. A robust indoor positioning system based on the
procrustes analysis and weighted extreme learning machine. IEEE
Transactions on Wireless Communications 15, 2 (2016), 1252–
1266.
[5]
Han Zou, Yuxun Zhou, Hao Jiang, Baoqi Huang, Lihua Xie, and
Costas Spanos. 2016. A transfer kernel learning based strategy
for adaptive localization in dynamic indoor environments: poster.
In Proceedings of the 22nd Annual International Conference on
Mobile Computing and Networking. ACM, 462–464.
Poster
MobiCom’17, October 16-20, 2017, Snowbird, UT, USA
608