Content uploaded by Sasu Tarkoma
Author content
All content in this area was uploaded by Sasu Tarkoma on Feb 17, 2020
Content may be subject to copyright.
IoT-KEEPER: Securing IoT Communications in Edge Networks
Ibbad Hafeez†, Markku Antikainen†,‡, Aaron Yi Ding∗, Sasu Tarkoma†,‡
†University of Helsinki, Helsinki, Finland
‡Helsinki Institute of Information Technology, Helsinki, Finland
∗Delft University of Technology, Delft, Netherlands
Abstract
The increased popularity of IoT devices have made them lucrative targets for attackers. Due to insecure product
development practices, these devices are often vulnerable even to very trivial attacks and can be easily compromised.
Due to the sheer number and heterogeneity of IoT devices, it is not possible to secure the IoT ecosystem using traditional
endpoint and network security solutions. To address the challenges and requirements of securing IoT devices in edge
networks, we present IoT-Keeper, which is a novel system capable of securing the network against any malicious activity,
in real time. The proposed system uses a lightweight anomaly detection technique, to secure both device-to-device
and device-to-infrastructure communications, while using limited resources available on the gateway. It uses unlabeled
network data to distinguish between benign and malicious traffic patterns observed in the network. A detailed evaluation,
done with real world testbed, shows that IoT-Keeper detects any device generating malicious traffic with high accuracy
(≈0.982) and low false positive rate (≈0.01). The results demonstrate that IoT-Keeper is lightweight, responsive and
can effectively handle complex D2D interactions without requiring explicit attack signatures or sophisticated hardware.
Keywords: IoT, Network, Security, Privacy, Activity Detection, Clustering, Anomaly Detection
1. Introduction
IoT-enabled automation systems have opened homes
and industrial environments to countless new threats [1, 2].
There are several reasons for the sad state of IoT device
security. IoT development teams often work without suf-
ficient resources and under strict time constraints. These
factors make it tempting for development team to cut cor-
ners, for example, by re-using unverified code snippets,
insecure third-party libraries and not following secure soft-
ware development practices [3, 4, 5]. These, and several
other factors, result in production of inherently vulnerable
devices for consumer markets.
The number of device specific exploits is constantly in-
creasing due to growing number of IoT installations in
small office, home office (SOHO), and enterprise networks.
The adversaries can also re-use existing exploits from PC-
platforms against IoT devices running a stripped down
version of Linux or Windows as device firmware. More-
over, a vast majority of the IoT devices are connected to
SOHO networks with no security in place except for the
network address translation (NAT), which is done on the
gateway. On several occasions, attackers have been able
Email addresses: ibbad.hafeez@helsinki.fi (Ibbad Hafeez†),
markku.antikainen@hiit.fi (Markku Antikainen†,‡),
aaron.ding@tudelft.nl (Aaron Yi Ding∗),
sasu.tarkoma@helsinki.fi (Sasu Tarkoma†,‡)
to compromise these devices, installed deep inside SOHO
networks, to launch extremely large scale attacks [6, 7].
Due to prevalence of insecure IoT devices, network own-
ers can no longer rely on the assumption that all devices
in their network are well-behaving and trustworthy [8].
While this, to some extent, applies to every network, it
is a particular concern in SOHO environments where the
network owners do not have the know-how or resources to
improve security. This, together with the fact that IoT de-
vices are rarely updated [9], makes it probable that some
devices in the network will, eventually, get compromised
by an attacker.
There are three key solutions commonly used to secure
PC and mobile devices: software updates,endpoint secu-
rity solutions, and network-based security solutions. How-
ever, there are numerous reasons that these solutions can-
not be used for securing IoT devices. Firstly, in most cases,
there is limited, if any, product life-cycle support avail-
able for IoT devices, as the manufacturers do not provide
regular firmware updates or security patches for these de-
vices [10, 11]. Secondly, it is not possible to develop ef-
fective endpoint security solutions, such as anti-malwares,
due to lack of firmware support, software APIs and limited
resources available on IoT devices.
Securing network communications is a practical solution
for securing IoT devices because these devices require con-
stant network connectivity for their operations. It is also
easier to gain network access to user devices, compared to
October 22, 2018
arXiv:1810.08415v1 [cs.CR] 19 Oct 2018
physical access. Traditional network security solutions of-
fer limited support for securing IoT because these solutions
mainly rely on traffic signatures for anomaly detection and
it is practically infeasible to obtain enough labeled data
from heterogeneous IoT devices, to generate these signa-
tures. High deployment and operational costs of network
security solutions is also a limiting factor in the use of ex-
isting network security solutions for securing SOHO and
small enterprise networks.
Various techniques have been proposed to detect anoma-
lies in network traffic using machine learning [12, 13, 14].
The applicability of these techniques depends on how ac-
curately the classification model can capture given devices’
benign network behavior and use to it identify malicious
traffic produced by that device. It is also challenging to
keep these classification models up-to-date, as the devices’
network behavior can change substantially due to firmware
updates and configuration changes.
The aim of this work is to propose a system addressing
the challenges and needs of securing IoT in SOHO and
enterprise networks. Such a system should actively moni-
tor the network to identify and block any malicious traf-
fic flows. It should maintain an up-to-date classification
model detecting anomalies in network behavior of con-
nected devices. The system needs to be lightweight and
cost-efficient to support wide-scale deployments using only
limited hardware resources. It should be scalable to sup-
port SOHO and enterprise scale deployments. Lastly, the
system should ideally have high sensitivity, for identify-
ing any malicious traffic, and low fall-out, to prevent false
alarms.
In the light of these requirements, we propose IoT-
Keeper, a novel system capable of classifying network
traffic in real-time using semi-supervised machine learning
techniques. IoT-Keeper monitors all traffic flows within
and across the network. It identifies a devices’ malicious
behavior using the previous network activity of that de-
vice. In order to secure other devices in the network,
IoT-Keeper uses adhoc overlay networks to limit net-
work access for any device generating malicious traffic.
IoT-Keeper uses fuzzy C-Mean clustering and fuzzy-
interpolation scheme to identify malicious network traffic.
This technique is lightweight enough to allow deployments
using single-board computers, for example Raspberry-PI,
making IoT-Keeper cost-efficient and easy to deploy.
Given the challenges of collecting labeled traffic data, the
classification algorithm was developed to work with unla-
beled traffic data. This classification model can be repre-
sented as a set of rules, making it easier to share across
multiple nodes, and improve scalability of the system.
This work demonstrates how a simple yet efficient classi-
fication algorithm, when combined with sophisticated fea-
ture analysis, enables us to successfully classify network
traffic, using only limited hardware resources. Conse-
quently, IoT-Keeper can be realized using legacy hard-
ware.
More specifically, our contributions are:
•Design and implementation of IoT-Keeper, a novel
end-to-end solution, capable of blocking any malicious
network activity
•A simple and lightweight mechanism for dynamically
enforcing network access control, at per-device, per-
destination granularity, using adhoc overlay networks.
•Detailed study of individual features and their relative
importance to formulate a set of most useful features
for network traffic classification.
•A thorough investigation of IoT-Keeper perfor-
mance using a real world testbed with 40+ devices.
The evaluation results demonstrate that the proposed
system is able to identify various types of network at-
tacks, with high accuracy (0.982) and few false alarms
(0.01), without any significant impact of user experi-
ence (latency increment ≈1.8%).
Organization: The rest of this paper is organized as fol-
lows. Section 2 introduces the threat model and challenges
faced in securing IoT ecosystems. Section 3 describes the
design and architecture of proposed system in detail. In
Section 4, we present the techniques developed for feature
engineering and anomaly detection. Section 5 describes
the process of collecting dataset, used for training and eval-
uation of the proposed system. In Section 6, we present
the evaluation of IoT-Keeper, in terms of performance
achieved for detecting anomalies, network throughput and
system efficiency. In Section 7, we revise the current state
of the art for securing IoT ecosystem. Section 8 discusses
possible shortcomings of the proposed solution. Finally,
we present our conclusive statement about this work in
Section 9.
2. Background
This paper refers to any network, where user devices are
connected to get Internet access, as edge network, includ-
ing SOHO and enterprise networks. A device with network
connectivity and some sensing support can be referred to
as connected or IoT device. This definition covers a wide
range of devices, which can be divided into two categories:
•Single-purpose devices: include resource constrained
devices such as sensors, appliances, with limited or
dedicated functionality. Users manage and interact
with these devices using a smartphone or tablet de-
vice.
•Multi-purpose devices: include high end devices such
as smartphones and PCs, with better hardware re-
sources. These devices support endpoint security so-
lutions and device diagnostic tools. Due to access
to sensitive user information and much larger attack
surface, numerous sophisticated attacks have been de-
veloped to compromise such devices.
2
For the sake of simplicity, the remainder of this paper
will refer to any of these two types of devices as an IoT
device or simply device. These devices require network
connectivity for majority of their operations. We divide
the network communications of these devices into two cat-
egories.
•Device-to-Device (D2D) communications: This cate-
gory includes network communications among the de-
vices connected to same network. These communica-
tions usually occur within a single broadcast domain.
•Device-to-Infrastructure (D2I) communications: This
category includes network communications between
user devices and remote destinations. A remote des-
tination can be any device or service operating in a
different network or broadcast domain.
Unless specified otherwise, the rest of paper refers to
both these types of communication as network communi-
cation.
Some single-purpose devices use low-power communica-
tion protocols, such as Bluetooth-LE (BLE) or Zigbee, to
communicate with their respective IoT hub. The hub then
communicates with respective cloud service(s) via wired or
wireless network. As a result, the network traffic to and
from the hub gives a fairly accurate representation of the
D2I communications of the IoT devices, connected to the
hub.
We now discuss the threat model for IoT ecosystem and
the challenges faced in securing the networks where these
IoT devices are connected.
2.1. Threat Model
Edge networks typically contain a mixture of single-
purpose and multi-purpose devices. These networks are
set up using a single gateway, to provide Internet access
to all connected devices. The gateway offers basic secu-
rity features such as, MAC/IP filtering. However, these
features are not generally configured by users [11]. In case
there is any intrusion detection system or firewall installed
in the network, it only filters incoming and outgoing traf-
fic, and treats all connected devices within the network as
secure and trusted.
With IoT devices, this assumption about the trustwor-
thiness of the devices does not hold, because it is fairly
easy to exploit the vulnerable IoT devices, and thus gain
access to the local network [15, 16]. Owing to common
network setups, once an adversary is within the network,
it gets unwarranted access to perform any type of attack
against other devices in the same network. These attacks
can be categorized as:
1. Network scanning: These attacks are used to rec-
ognize any TCP and UDP services that run at target
hosts and to identify what kind of traffic filtering is
done in the network. Network scanning can also be
used to identify the firmware that is running on a tar-
get. These attacks are generally used to scan target
nodes before launching dedicated attacks against the
scanned targets. Commonly used variants of network
scanning attacks include address-sweep,port-sweep,
and port-scan attacks.
2. Privilege escalation: Once the target is identified
and scanned, the adversary tries to gain privileged
access to it, in order to deploy malicious code. Many
IoT devices use stripped down Linux as firmware and,
therefore, attacker may try to invoke shell to gain root
access. Attacker can also use factory-default creden-
tials or device specific exploits to gain privileged ac-
cess. Upon success, attacker is able to upload mali-
cious code and perform desired state changes to com-
promise the target node.
3. Man-In-The-Middle (MitM): An adversary, con-
nected to user network, can snoop-in on and interrupt
all traffic in the network. It can use the communi-
cation patterns of legitimate user devices to conduct
replay attacks. For example, an adversary can re-
play traffic intercepted from communication between
smartphone and garage door sensor, to later open
garage door without users knowledge. MitM attacks
have serious security and privacy implications, as they
can be used to steal user data and disrupt potentially
critical devices [17].
4. Data theft: Health IoT, smart appliances, and simi-
lar devices collect a lot of data, which reveals a lot of
information about their users. Typically, users do not
have discrete control over how this data is collected
and transmitted [18]. An attacker can compromise
user devices to steal this data and use it for targeted
attacks.
5. Botnets: Botnets are generally comprised of infected
devices installed inside edge networks [19, 12]. These
devices maintain normal state of their operations until
acommand & control server instructs them to launch
an attack against specific target(s). Distributed De-
nial of Services (DDoS) attacks are a common exam-
ple of how seemingly benign user devices are used to
launch attacks at unprecedented scales [7, 20].
The goal of IoT-Keeper is to identify and block any
variants of the aforementioned attacks in edge networks.
More specifically, IoT-Keeper presumes that, for a given
device, any deviation from its benign behavior is moti-
vated by malicious intent. Based on the type of attack
the device was executing, network access restrictions are
setup to limit the network activity of compromised device.
This way, IoT-Keeper is able to block any compromised
device from launching attacks against local or remote tar-
gets.
2.2. Challenges for Securing IoT
Conventional network and endpoint security solutions
fall short in addressing the challenges of securing IoT
3
ecosystem for a number of reasons. We now go through
some of these.
Firstly, there is a huge diversity in IoT devices’
firmwares, software stacks and APIs. Given this hetero-
geneity among devices, it is very challenging to develop
generic endpoint security solutions for IoT devices. Al-
though there are endpoint security solutions available for
multi-purpose IoT devices, these solutions are not com-
monly used [21, 22]. There have been reports of incidents
where endpoint security solutions could not detect smart-
phone applications, which were involved in stealing user
data or performing similar attacks [23]. This heterogene-
ity also affects the network traffic patterns of the devices,
making it infeasible to collect and maintain traffic signa-
tures databases, which are used by signature-based net-
work security solutions [24, 25].
Second challenge is related to the communication pat-
terns of the IoT devices. Because IoT devices often need
to interact with other devices in the same network, the
IoT devices cannot be completely isolated from each other.
However, simultaneously, one cannot blindly trust every
device in the local network. Thus, it is not enough to sim-
ply protect the perimeter of the local network, but instead,
also the internal traffic would need to be monitored.
Thirdly, traffic analysis should be performed on net-
work gateways to address privacy and latency concerns.
Typical SOHO networks are set up using low-cost net-
work gateways with constrained hardware resources. This
means that the in order to perform traffic classification
on these gateways, classification algorithms should be
lightweight. Furthermore, these gateways should support
automated configuration because it is clear that having a
dedicated administrator for every edge network is an unre-
alistic requirement [25]. These devices should not require
much manual configuration and any required configura-
tion changes should be easy for the users to make — it is
well known that due to poorly designed interfaces and lack
of support for automated configuration, the networks are
rarely configured by users [10, 25, 26].
Based on this discussion, the basic set of requirements
for a security solution addressing the challenges posed by
IoT can be summarized as follows.
•It should be easy to deploy and operate with minimal
manual effort. Meanwhile, it should be low cost with
limited resource footprint.
•It should be able to monitor inter as well as intra-
network communications to detect various (benign
and malicious) types of network traffic generated by
connected devices.
•Traffic analysis should be performed close to edge net-
works to immediately detect and block any attacks.
Meanwhile, it should be easy to share the data needed
to detect these attacks, among network gateways.
3. System Design
IoT-Keeper consists of two primary components,
Keeper Gateway and Keeper Service, as shown in
Fig. 1. Keeper Gateway is a redesigned gateway used
to set up edge networks. It also performs traffic classi-
fication and security policy enforcement for traffic filter-
ing. Keeper Service is a cloud service assisting vari-
ous functionalities of Keeper Gateway. This two–tier
design achieves low cost and high scalability, to provide
enterprise–grade security at only a fraction of cost.
SDN controller
Monitoring Prediction Enforcement
KEEPER
SERVICE
KEEPER GATEWAY
IoT and other devices
Device activity
storage
Policy cache
EDGE NETWORK INTERNET
3
1
2
4
Figure 1: IoT-Keeper architecture, where Keeper Gateway per-
forms traffic monitoring and classification. Controller (1) is respon-
sible for traffic management at OF switch (2), traffic classification,
caching (3) and enforcement of security policies. Keeper Service
(4) is used for support operations
3.1. Keeper Gateway
Keeper Gateway is a lightweight network gateway de-
signed to be agnostic of the underlying hardware, such that
it can be deployed using a WiFi access point [10] or single
board computer, for example, Raspberry–PI (R-PI).
In principle, Keeper Gateway is a gateway used to
setup edge networks. In addition to routing and switching,
this gateway is responsible for traffic monitoring, anomaly
detection, and policy enforcement to manage network ac-
cess at per–device granularity. Meanwhile, the initial
model training, state management, and remote adminis-
tration is performed with the help of Keeper Service.
Keeper Gateway runs an SDN controller and Open
vSwtich (OVS) to perform traffic monitoring, anomaly de-
tection and traffic filtering. Other than the basic routing
functions, the three key modules that are operated by the
SDN controller are traffic monitoring,anomaly detection,
and security policies enforcement.Monitoring module in-
spects all intra and inter–network traffic flows to maintain
up-to-date information about the network behavior of all
4
devices connected to the network. Detection module uses
the data from monitoring module to identify if any given
traffic flow is malicious. The classification model, which
is used to identify different types of traffic flows, is main-
tained by Keeper Gateway .Enforcement module is re-
sponsible for setting up network access control to restrict
network activity of any device exhibiting anomalous net-
work behavior. This module uses a set of security policies
to generate flow table entries and deploy them at the OVS,
to perform traffic filtering.
For every traffic flow, enforcement module looks for a
relevant security policy from cache. If there are multiple
matches, the most specific security policy is used for set-
ting up flow table entries to handle the flow. Otherwise, if
no relevant security policy is found, the detection module
analyzes the traffic flow to identify its type. The result
of analysis is cached in form of a security policy and re-
spective flow table rules are deployed, by the enforcement
module, at the switch handling given traffic flow.
In current description, both the controller and OVS run
on the same gateway. However, IoT-Keeper architecture
supports deployments where a single instance of Keeper
Gateway manages multiple OpenFlow–enabled switches
in the network. In such cases, traffic from all the switches
will be classified and managed by the controller running
on Keeper Gateway.
Caching
In general, majority of traffic in edge networks is des-
tined to a handful of cloud services. In case of a new
traffic flow, there is high probability that subsequent flows
in the session are related to same network activity. The
benign network activity for single–purpose IoT devices is
also fairly limited.
By caching the security policies relevant to these fre-
quent traffic flows, we can greatly reduce the number of
traffic classification operations, thereby, reducing the la-
tency experienced by users as well as limiting the resource
consumption. The impact of caching on latency and re-
source consumption are discussed in detail in Sect. 6.3.
Caching can be implemented using hash table data
structure, to achieve time and space complexity of O(1)
and O(n) respectively. The storage consumption can fur-
ther be limited by associating an expiry time to each secu-
rity policy stored in cache. This expiry time is refreshed
every time security policy is used to set up filtering for
some traffic flow. Once this time period expires, secu-
rity policy is removed from cache. The optimal choice for
expiry time depends on the underlying network traffic pat-
terns and storage capacity available for cache.
Management API
Unlike traditional gateways and routers, Keeper
Gateway does not run a local web server for hosting gate-
way management portal. This design choice was made
to reduce the attack surface. Setup and configuration
changes are performed using a mobile application, which
communicates with the gateway using low-power protocols
such as BLE. This requires physical proximity of user to
make any configuration changes to the gateway. The re-
quirement of physical proximity bars any untrusted entity
from accessing the management interface over the network.
Sect. 3.3 discusses how the users can perform configuration
changes, when they are connected to some other network.
3.2. Adhoc Overlay Networks
It is a common requirement for gateways and routers
to support multiple networks for different kinds of devices
connected to the network. Although it is possible to run
multiple networks on legacy gateways using VLANs or
multiple SSIDs, there is only a limited number of VLANs
and SSIDs1supported by any router or access point avail-
able in market. It is difficult to automatically setup and
manage the VLANs on router and gateways typically used
to deploy edge networks. In case of multiple SSIDs, client
devices need to (re)associate every time SSID configu-
ration is updated, thereby, ruining the user experience.
Therefore, it is not easy to achieve per device access con-
trol using legacy gateways.
IoT-Keeper uses adhoc overlay networks (AON), for
creating multiple virtual networks, over a physical net-
work. The number of AONs is not limited by hardware as
they are set up dynamically by the enforcement module.
The network restrictions defined for a given AON can be
updated dynamically, without requiring any action from
client devices. Figure 2 demonstrates how AONs work in
practice.
D2
D3
D1
Safe
Suspicious
Controller/5
restricted5Ac cess
No5access
X
XX
All5devices
Figure 2: A single SSID is partitioned in 3 AONs, where (i) D1,
D2 and D3 are fully isolated and can not communicate with any
other device in the same network, (ii) devices in safe AON have
full network access to Internet and other devices in same AON, but
restricted access to devices in other AON. (iii) devices in suspicious
AON are partially isolated and have limited access to Internet and
other devices in same AON.
With AONs, it is possible to set up access control on
per-device granularity. For example, a smart fridge can
be restricted to communicate only with its own cloud ser-
vice and owners’ smartphone. AONs restrict the network
1Raspberry PI supports 1 SSID using built-in WLAN interface.
Commercially available routers for non-enterprise networks can sup-
port upto 8 SSIDs
5
access for suspicious device(s) in a way that given device
retains its Internet access, but cannot attack any local or
remote targets.
3.3. Keeper Service
Keeper Service is designed as a support service for
Keeper Gateway operations. It does not perform traf-
fic classification for edge networks but it hosts different
services to enable seamless operations of Keeper Gate-
way.
The design choice of using a remote service is motivated
by cost efficiency and scalability. This approach allows
us to use data from multiple sources and perform vari-
ous analysis to train initial classification model, used by
Keeper Gateway. It also provides support for operat-
ing a multitude of services to enhance Keeper Gateway
functionalities. Keeper Service can be used to per-
form sophisticated analysis and operate middleboxes for
re-routing suspicious traffic from edge networks, through
these middleboxes. This service can be deployed within
user premises or provided by a third party.
Keeper Service is primarily responsible for main-
taining up-to-date classification model for bootstrapping
traffic classification on the Keeper Gateway. It also
supports state management and remote administration of
Keeper Gateway.
Initial classification model
When Keeper Gateway is set up for the first time,
it recieves the initial classification model from Keeper
Service. This model is trained by Keeper Service, us-
ing the data collected from various sources including net-
work logs, malware databases [27], and public vulnerability
databases (CVE [28] and CWE [29]). Keeper Gateway
initially uses this model for anomaly detection, and con-
tinues to improve it using the data collected from local
network traffic.
The use of same model training technique by both
Keeper Service and Keeper Gateway improves the
interoperability, in such a way that Keeper Service and
Keeper Gateway can share the trained models with each
other. The trained models are shared in form of a set
of rules containing feature value distributions and corre-
sponding labels (described in Sect. 4.6). This representa-
tion is compact and thus can be easily shared among nodes
without incurring significant network overhead.
State management
In order to support stateful recovery, Keeper Service
periodically backs up the state of Keeper Gateway,
in fully encrypted form. The state information includes
classification model data, security policy cache and gate-
way configurations. Using this data, Keeper Gateway
is fully restored at any required state. Encryption and
trusted platform module can be used to ensure that the
state can only be restored on the same Keeper Gateway
that was backed up [30, 31]. To support rapid deployment
and recovery, the state can be restored or synchronized
across multiple gateways simultaneously.
Remote administration
Since users need to be in physical proximity of Keeper
Gateway to perform any configuration changes, any users
connected to remote networks can perform update gate-
way configuration using the web-portal offered by Keeper
Service. Any configuration changes made on web-portal
are installed at respective Keeper Gateway, deployed
in the edge networks, by Keeper Service. The configu-
ration changes made through web portal are validated to
make sure that they do not compromise Keeper Gate-
way functioning, by switching it to insecure state. The
additional verification prevents any scenarios where an
attacker gets access to user account or compromises the
Keeper Service, to configure all Keeper Gateway de-
vices and disrupt their functioning.
3.4. Communications with user
A key requirement of designing usable security solutions
is finding the right balance in user experience and security.
IoT-Keeper achieves this balance by maintaining mini-
mal network access for suspicious device, so that they can
continue their normal operations2, but cannot perform any
attacks.
IoT-Keeper automatically detects any attacks and sets
up counter measures to prevent these attacks, without user
involvement. However, it is important to notify the users
about blocked threats and the state of their network. For
this purpose, IoT-Keeper supports two types of notifi-
cations. Passive notifications simply inform users about
the network activity, while actionable notification require
users to take an action, for example, reconfigure the de-
vice. A detailed discussion on the appropriate level of
nagging and notification mechanism is outside the scope
of this paper.
4. Methodology
4.1. Feature extraction
IoT-Keeper is designed for networks with heteroge-
neous device base and no dedicated network security de-
vices. It is safe to assume that there are no device logs
available from most single–purpose IoT devices. There-
fore, IoT-Keeper heavily relies on feature extracted from
network traffic data, to differentiate between benign and
malicious network activity.
A key challenge in feature extraction for an online traffic
classification technique is to swiftly compute the statistics
over incoming data (packet) stream, where packet arrival
2Our analysis reveals that ≥95% traffic from benign IoT device
is HTTP/HTTPS traffic to their cloud service, which will not be
blocked, so IoT device can maintain its normal operation
6
rates is very high. To address this, we incrementally com-
pute a set of statistics, including number of observations
N, sum of observations Soand sum of squares of observa-
tion Ssq, for all traffic streams. Using these statistics, we
can compute mean and standard deviation for the set of
observations, as shown in Eq. 1.
µ=So/N σ =q|Ssq/N −(So/N)2|(1)
Table 1 lists the 44 attributes obtained from network
metadata and device logs. We calculate the three afore-
mentioned statistics for all 44 attributes to get the final
feature vector ~
F, where ~
F Rn∧n≤132. The statis-
tics can be summarized on per device basis using source
and destination MAC, IP and ports. These statistics allow
us to calculate the divergence of devices’ behavior, over a
specific time window, from its baseline (benign) behavior,
to detect any anomalies.
In contrast to existing techniques [32], which use time-
based aggregation to aggregate same host, same service
features, we use connection based aggregation. The key
limitation of time-based aggregation is that it falls short
in detecting attacks with a wait mechanism, where a ran-
dom delays are added in-between successive connection
attempts. In comparison, connection-based aggregation
aggregating the features over nlatest connections allows
us to accommodate any random delays between successive
connections.
Table 1 lists six attributes retrieved from device logs.
These attributes contain information about any login at-
tempts, SSH connections and service discovery requests.
Such information can be useful to identify the sub-type
of malicious traffic flows identified by anomaly detection
technique. In order to correlate the data from device
logs with network traffic data, time must be synchronized
across the network. Time synchronization among all con-
nected devices can be achieved using protocols such as
NTP. In case no such service is running in the network,
time difference between network and device logs can be
manually resolved.
4.2. Feature Analysis
We study the variance and modality of each feature to
identify its contribution to anomaly detection model. Any
features that do not contribute significantly to the cluster-
ing and classification process are pruned off. Reducing the
number of features helps in speeding up the classification
process and reducing the resource footprint of clustering
and anomaly detection scheme.
Initially, we plot cumulative distribution function
(CDF) for each feature, to study its variance. Figure 3
shows CDF plots for a few of the features corresponding
to connections made by any device. For example, Fig. 3a
shows the distribution for number of unique destination
IPs contacted by devices in the network. It can be ob-
served that this distribution is not Gaussian but heavy
tailed with majority of probability mass lying in smaller
values. This distribution reveals that more than 70% of
the devices connect to fewer than 20 unique destination
IPs. On the other hand, tail of distribution contains data
points where a single device may connect to more than 500
unique destinations. Similary, Fig. 3d shows that more
than 75% of the devices do not generate any SSH traffic.
However, there are some devices generating a large volume
of SSH traffic, indicating presence of suspicious activity in
the network.
The data points in the tail of distributions are of pri-
mary importance for anomaly detection since they capture
anomalous behaviors. It is important to capture this infor-
mation as it helps in differentiating anomalies from benign
network behavior, during clustering.
When we jointly study the distribution of different fea-
tures, it reveals interesting details about semantics of var-
ious attacks. For example, during a fuzzing attack, when
an attacker tries to login to open services on target node
with brute-force, the attack is reflected in network traffic
with a large number of connections between same source
and destination nodes. Meanwhile, device logs from target
node show a large number of failed login attempts, proving
the hypothesis that an attack is undergoing against target
node.
Further analysis revealed that during such brute-force
attacks, if the few initial login attempts fail, there is a
high probability that attack will not succeed. This ob-
servation shows that users who change the factory default
passwords choose reasonably unique passwords, which are
not easy to crack using traditional dictionary or brute-
force attacks. However, this claim cannot be made with
absolute certainty because of limited scope of dataset.
The study of feature value distributions also reveals pos-
sible correlations among different features. For example,
as the attacker scans a target node, both total number
of connections initiated by the attacker node and num-
ber of connections between source (attacker) and destina-
tion (target) node increases. These correlations help us to
identify and remove features containing redundant infor-
mation.
4.3. Feature reduction
We use correlation-based feature selection (CFS) and de-
viation method to identify and remove any features which
contain redundant information and do not contribute sig-
nificantly to the anomaly detection scheme.
CFS identifies strongly correlated features by measuring
their linear dependencies. The dependencies are calculated
using Pearson correlation coefficient Rbecause it provides
fairly accurate results with bounded feature value ranges
for datasets of fairly large size. Based on the value of R,
CFS discards one of any two features which are strongly
co-related, since such two features contain redundant in-
formation and keeping both features do not offer value for
anomaly detection. Figure 4 shows that majority of fea-
tures in our feature set ~
Fare linearly independent. How-
7
Table 1: List of attributes extracted from network and device metadata
Type Feature
Source, Destination [Total, Unique] destination IP addresses
Connection counters [Total, Unique] source ports, destination ports, connections, (same source, same
destination, same service) connections, connection durations (binned)
Packet counters ARP, LLC, IP(v6), ICMP(v6), EAPoL, TCP(v6), UDP(v6), HTTP, FTP, HTTPS,
DHCP, (M)DNS, NTP, Router Alert, (SYN, REJ) (errors), Urgent, Padding
Data (binned) Total data, source to destination (SRC2DST) data, destination to source
(DST2SRC), packet size
Authentication [Successful, Failed] login attempts to [SSH, Service, Device]
(a) (b)
0 5.0e51.0e61.5e62.0e6
No. of total connections built
0.00
0.25
0.50
0.75
1.00
(c)
0 2.5e25.0e27.5e21.0e31.2e31.5e3
No. of SSH connections
0.00
0.25
0.50
0.75
1.00
(d)
Figure 3: CDF plots for a small subset of features corresponding to network behavior of user devices. These distributions are observed in the
dataset used for training and evaluation purposes.
ever, some of the features, such as f1–f6, may contain
redundant information and can be removed from ~
F.
Figure 4: Correlation map plot depicting linear dependence among
a subset of features from ~
F
Using deviation method, 1-length frequent items sets are
mined from the feature set to obtain Fi= [V1, V2, . . . , Vj],
where Vj= [fi; 1 ≤i≤n], supp(fi)≥m,mis minimum
support and fiis frequent item. The deviation range for
each feature is calculated as DVj= [fmax, fmin], where
fmax = max(fj) and fmin = min(fj), for all benign and
malicious traffic classes. If the deviation range of a fea-
ture is similar for all classes, the feature is considered non-
contributing feature and removed from the feature set.
In order to make sure that no feature over-influences
clustering, all feature values are normalized to range [0,1].
After clustering, normalized feature score are computed,
for each feature, in all clusters. Any features, such as REJ
errors, with same scores (within a defined tolerance) in
multiple clusters are considered non-contributing feature
and removed from the final feature set.
4.4. Clustering
We use fuzzy C-mean (FCM) clustering algorithm to
partition the data points based on their mutual likeli-
ness. During clustering, initially a membership value is
assigned to all data points Xj(j= 1,2, ..., n), for all clus-
ters Ci(i= 1,2, ..., c). Each data point Xjis represented
as f(1)
j, f (2)
j, ..., f (k)
j, ..., f (h)
jwhere f(k)
jis value for kth
feature in Xjand 1 ≤k≤n, n =len(~
F).
The membership value for Xj∈Ciis given as µij , where
0≤µij ≤1,Pc
i=1 µij = 1 ∀1≤i≤c∧1≤j≤n.
The membership value µij for each data points and cluster
centers Vifor each cluster are optimized using Eq. 2 and
Eq. 3 respectively, in order to minimize objective function
given in Eq. 4.
µij = c
P
d=1 kVi−Xjk
kVd−Xjk2m−1!−1
,1≤i≤c
1≤j≤n(2)
Vi=
(
n
P
j=1
(µij )m×Xj)
n
P
j=1
(µij )m,1≤i≤c
1≤j≤n(3)
Jm=
c
X
i=1
n
X
j=1
µm
ij
Vi−Xj
2(4)
8
where mis fuzziness index [33] and
Vi−Xj
is the Eu-
clidean distance between cluster center Vi(for cluster Ci)
and data point Xj.
A label is assigned to each cluster based on normalized
feature scores observed in the given cluster. These labels
correspond to different types of benign and malicious net-
work traffic. Each cluster can be represented as a rule,
where feature scores represent antecedent variables and
cluster label is the consequent variable. The set of rules,
obtained as output of clustering process, is used by FIS to
perform anomaly detection.
4.5. Parameter Selection
The choice of number of clusters ican affect the per-
formance of anomaly detection technique. Therefore, we
use both direct and statistical testing methods to choose
the optimal value of i. Initially, we compute 30 dif-
ferent indices for a range of possible values for i, using
NbClust package [34]. Our implementation uses agglom-
eration method for cluster analysis using Wards’ linkage
method and euclidean distance metric. Figure 5a shows
the number of votes (minimum 3 votes) received by each
possible choices for optimal number of clusters. One vote
represents that one of the 30 indices suggests that the given
value of iis the optimal number of clusters. A detailed dis-
cussion on various indices computed by NbClust package
is out of scope for this paper.
Based on the voting results of NbClust, we select the top
eight candidate values of iand analyze them using elbow
method and average silhouette heuristic [35]. These two
methods provide a measure of global clustering character-
istic. For elbow method, within-cluster-sum-of-distances
(WCSD) is calculated using Eq. 5, where cis the number
of clusters, Siis the set of data points belonging to ith
cluster, and xki is the kth variable of Vi.
WCSD =
c
X
i=1 X
j∈Si
p
X
k=1
xki −xji
(5)
Silhouette heuristics are calculated using Eq. 6, where
a(x) = 1
kPk
j=1
x−pj
,pj∈Ci∧x∈Ci. Similarly,
b(x) = 1
kPk
j=1kpk−xk, where pk∈C0
iand C0
iis the clos-
est neighboring cluster for xsuch that C0
i=Ci∈Cwith
min(kx−Vik)∀Ci∈C∧x6∈ Ci. Figure 5 shows that
both elbow and silhouette method suggest i= 17 as opti-
mal value for i.
s(x) = (b(x)−a(x))
max a(x), b(x)(6)
We also studied gap statistic method [36] to get a sta-
tistical formulation of WCSD and silhouette statistics.
In general, the optimal value for ishould maximize the
gap statistic as well as silhouette values, while minimizing
WCSD. Using 1-standard-error method [36], gap statis-
tics analysis suggests i= 17 as optimal number of clusters
for given scenario.
4.6. Anomaly detection
IoT-Keeper uses Fuzzy interpolation scheme [37, 38]
(FIS) to identify the type of given traffic flow in the net-
work. FIS uses the sparse fuzzy rule base consisting of
nrules (n=c), obtained from clustering, to identify the
type of traffic flows in the network. The set of rules is
represented as.
Rule 1:if f1∈A11,f2∈A21 , ... ,fk∈Ak1, ... ,fh∈Ah1=⇒y∈O1
Rule 2:if f1∈A12,f2∈A22 , ... ,fk∈Ak2, ... ,fh∈Ah2=⇒y∈O2
.
.
.
Rule Q:if f1∈A1q,f2∈A2q, ... ,fk∈Akq , ... ,fh∈Ahq =⇒y∈Oq
Observation:f1∈A∗
1, f2∈A∗
2, ... ,fk∈A∗
k, ... ,fh∈A∗
h
Conclusion:y=O∗
where Ri(1 ≤i≤Q) is ith rule generated from cluster Ci.
Aki and Oiare triangular fuzzy sets for kth antecedent fea-
ture fk,1≤k≤hand consequent variable yrespectively.
For any new observation, A∗
kand O∗are triangular fuzzy
sets for antecedent and consequent variable obtained as a
result of interpolation of spare fuzzy rule base.
A fuzzy triangular set Ais represented using three char-
acteristic points a,b, and c, where bis center point with
maximum membership value and a,care left,right points
respectively, with minimum membership value. The char-
acteristic points aki,bk i,cki for fuzzy set Ak i of kth an-
tecedent feature fkin rule Riare calculated as:
bki =f(k)
q, where µiq = max
1≤j≤nµij ,(7)
aki =X
j=1,2,...,n and f(k)
j≤bki
µij ×f(k)
j
X
j=1,2,...,n and f(k)
j≤bki
µij
,(8)
cki =X
j=1,2,...,n and f(k)
j≥bki
µij ×f(k)
j
X
j=1,2,...,n and f(k)
j≥bki
µij
,(9)
where bki has membership value of 1 and aki and cki
have membership value of 0. f(k)
jis the kth feature’s value
in sample Xjwith 1 ≤k≤h. The defuzzified value of a
triangular set Ais calculated as
Df(A) = (a+ 2 ×b+c)
4(10)
Similarly, the characteristic variable ai, bi, cifor conse-
quent variable Bifor Riare calculated as:
bi=Oq, where µiq = max
1≤j≤nµij ,(11)
ai=X
j=1,2,...,n and Oj≤bi
µij ×Oj
X
j=1,2,...,n and Oj≤bi
µij
,(12)
9
(a) (b)
8 11 15 17 21 27 28
Number of clusters (i)
0.0
0.2
0.4
0.6
0.8
1.0
Average silhouette value
Silhouette heuristic
(c) (d)
Figure 5: (a) Candidate values for optimal number of clusters (i), based on voting results (minimum 3 votes), obtained using NbClust package.
Elbow method (b), average silhouette heuristics (c) and gap statistics (d), were used to identify the optimal value of i, out of top 8 candidate
values for ishown in (a).
ci=X
j=1,2,...,n and Oj≥bi
µij ×Oj
X
j=1,2,...,n and Oj≥bi
µij
,(13)
where Ojis expected output class for Xjand 1 ≤i≤c.
The membership value for input feature f(k)
jis
µAk,i (f(k)
j), where min
1≤k≤hµAk,i (f(k)
j)>0, 1 ≤i≤p, and p
is the number of activated fuzzy rules. The inferred output
O∗
jbased on fuzzy rules activated by f(1)
j, f (2)
j, ..., f (h)
j∈
Xjis calculated as,
O∗
j=
p
X
i=1
min
1≤k≤hµAk,i (f(k)
j)×Df(Bi)
p
X
i=1
min
1≤k≤hµAk,i (f(k)
j)
(14)
Df(Bi) is defuzzified value for consequent fuzzy set, in
Riactivated by Xjinputs and it can be calculated using
Eq. 10. We calculate the weight Wiof activated rule Ri,
such that 0 ≤Wi≤1,
c
P
i=1
Wi= 1, on the basis of input
observations x1=f(1)
j, x2=f2)
j, ..., xh=f(h)
jas:
Wi=
c
X
d=1 kr∗−rik
kr∗−rdk2
−1
,(15)
where r∗is the input feature vector f(1)
j, f (2)
j, ..., f (h)
j
and riis set of defuzzified values of Aki in Ri.
DfA1,i, DfA2,i , ..., DfAh,i, 1 ≤k≤h.
The final inferred output is calculated as
O∗
j=
c
X
i=1
Wi×Df(Bi) (16)
5. Dataset
We have deployed a real world testbed for data collec-
tion and system evaluation. The testbed consists of more
than 40 consumer IoT devices including single-purpose and
multiple-purpose devices. All devices mainly use wireless
mode for network connectivity, while some devices also
support wired connectivity as well. Some of the devices
use BLE for device pairing purposes only. Other low-
energy communication protocols such as Weave, Zigbee
are mainly used by devices to communicate with IoT hubs.
The set of common vulnerabilities found in these devices
include factory-default, commonly-used login credentials,
open, unfiltered ports, and terminal access without pass-
word.
The choice of multi-purpose devices such as phones and
PCs, in the testbed is motivated by the fact that these
devices constitute a large proportion of devices connected
to edge networks. With access to much more sensitive in-
formation, smartphones and PCs are lucrative targets for
attackers. Attackers exploit vulnerable IoT devices and
use them to compromise high-end devices containing sen-
sitive user data. Therefore, it is important to study the
communications between single and multi-purpose IoT de-
vices to detect any attacks.
Testbed Setup
The network setup used for data collection is shown in
Fig. 6. In this setup, all devices were connected to Keeper
Gateway, which is deployed using a Raspberry-PI (R-
PI), and setup as a wireless access point using hostapd.
It also runs a DHCP server and manages NAT for both wired
and wireless network. Current setup uses public DNS server
but a local DNS can also be set up. All incoming and out-
going traffic from both wired and wireless interfaces is col-
lected using tcpdump. Any traffic filtering was performed
using layer-2 addresses. During data collection, R-PI was
configured to drop all unfiltered outgoing traffic to prevent
spread of malicious traffic on public Internet.
As mentioned before, all devices in the testbed support
WiFi or wired connectivity. In case any device communi-
cates to Internet via an IoT hub, using Weave, ZigBee or
similar protocols, its D2I communications are monitored
by capturing the network traffic generated by IoT hub.
10
KEEPER GATEWAY
KEEPER SERVICE
Training data
Attack nodes
User devices
Remote Servers
INTERNET
EDGE-NETWORK
IoT devices
Personal
devices
Laptops and
desktops
Figure 6: Testbed used for data collection and system evaluation.
Keeper Gateway is set up using Raspberry PI and operates as an
access point, where user IoT devices and attacks nodes are connected.
Keeper Service is deployed using a consumer grade laptop.
Data collection
The data collection process spans three phases of device
activity:
•Setup: This phase covers the network activity of a
device when it is setup by the user for the first time.
•Background: This phase covers the network activ-
ity of a device during its normal operation, includ-
ing the phase when it connects or disconnects from
the network. The background activity may vary with
the kind of device, for example, single-purpose de-
vices may only generate heartbeat or status update
messages, whereas, multi-purpose devices may period-
ically fetch application updates, generate notifications
etc.
•Activity: This phase covers the network activity of
a device when it is actively communicating with other
entities in the network. The network traffic generated
by device corresponds to user interactions or messages
communicated with other devices. The network ac-
tivity during this phase varies with the functionali-
ties available in device. For example, Dlink power
plugs support only on/off functions, whereas a secu-
rity camera allows user to switch on/off video feed,
change video quality.
To collect device setup phase traffic, we collected all traf-
fic from the device itself, as well as the management device
used to setup the device. The device was reset and booted
from factory default state prior to every time it was setup.
For most devices, the firmware was upgraded before con-
cluding the setup. The background traffic was collected by
setting up a device and leaving it in connected state for a
given time interval. The duration of these intervals ranged
from 10 minutes to 72 hours.3To collect data for device
activity, user repeatedly performed an action on the device
over a period of time, with irregular wait intervals between
repetitions. The data collection was performed for differ-
ent types of actions supported by the device. During data
collection, the management device was either connected
to same network, as the IoT device itself, or a remote
network. After every iteration of data collection activity,
network setup was reset to recover virgin network state for
subsequent iteration.
We assume every device to be inherently benign, there-
fore, the traffic it generates during standby and normal
user interaction is considered its benign network behav-
ior. Table 2 lists different types of network attacks used
for collecting traffic traces of malicious network activity.
These attacks are commonly observed in IoT and edge
networks [39, 40, 41, 42, 43].
Table 2 gives a high level classification and description
for different types of attacks. It also lists the tools which
are used to simulate these attacks. The number of train-
ing samples indicate the traffic flows used for model gen-
eration, whereas evaluation samples show the number of
traffic flows used for evaluation. In order to emulate real
world deployments, the volume of traffic handled by IoT-
Keeper during evaluation is much higher compared to
the volume of traffic used to train the system. In addition
to data collected from the testbed, we also use publicly
available datasets for malicious IoT traffic [44].
6. Evaluation
6.1. System Implementation
The evaluation testbed uses a Raspberry PI model 3 to
deploy Keeper Gateway and a Core-i5 machine with
32GB memory to deploy Keeper Service.Keeper
Gateway runs Open vSwitch (OVS) [45] and Flood-
light [46] based SDN controller, with self-implemented cus-
tom modules that are used to perform traffic monitoring,
traffic filtering, state management, security policy enforce-
ment and cache management. The feature engineering
and anomaly detection schemes were implemented with
Python. Both Keeper Gateway and Keeper Service
use REST-APIs for communicating with each other.
Keeper Gateway was setup as a WiFi AP using
hostapd module [47]. All wired and wireless interfaces are
bridged to OVS, so that all network traffic is managed by
Keeper Controller. In larger deployments, multiple
OF switches are configured to use SDN controller running
at the Keeper Gateway, for traffic management. Dur-
ing evaluation, no data was sent to Keeper Service and
the initial classification model was trained using previously
collected traffic data from the testbed.
3(10, 20, 30) minutes, (1, 2, 6, 10, 12, 24, 36, 72) hours
11
Classification Activity Tool Description Evaluation
samples
Training
samples
Scanning
Port
Scan
ZenMap,
NMap
Scanning network for open ports on different
hosts in the network
1243647 292092
Port
Sweep
ZenMap,
NMap
Scanning all TCP/UDP ports on one or more
target hosts
953684 238421
Address
sweep
ARPing,
ARP scan,
Skipfish
Scanning all hosts on the network and service
running on them
824363 229173
Botnet Mirai Telnet Find and infect devices by deploying Mirai mal-
ware
1389672 437418
MitM ARP Poi-
soning
Arpspoof,
EtterCap
Using ARP poisoning attack to capture LAN
traffic
1706479 416619
Privilege es-
calation
Fuzzing PowerFuzzer,
WFuzz,
Python
Searching vulnerabilities in devices connected to
the network
2356842 559211
Data Theft Data hi-
jacking
Telnet Gain privileged access to other hosts and down-
load collected data.
821468 195371
Malware Malware
injection
Metasploit Upload malware to target hosts 1161347 304336
Denial of
Service
SYN
Flooding
Python
scapy,
Hyenae
Flood the target host with many SYN requests
to block it from performing any other task
2801145 699543
SSL rene-
gotiation
tls-dos Flood the target with SSL renegotiation packets
to disable its packet stream
3084492 671123
Table 2: Types of network attacks executed by compromised and malicious devices.
This testbed setup serves as a reference implementation
of IoT-Keeper. Our implementation was not optimized
for performance gains. Therefore, the system and network
performance results may vary with different hardware and
software stacks used for system implementation.
6.2. Anomaly Detection
We studied the performance of anomaly detection tech-
nique in terms of sensitivity and false positive rates (FPR).
Sensitivity, also known as recall, gives a measure of reli-
ability of our technique in correctly identifying the mali-
cious traffic flows, whereas, FPR gives an estimation of
false alarms raised by the system, when benign activity
is flagged as malicious. Ideally, the FPR rate should be
zero, producing no false alarms. The trade-off between
FPR and false negative rate (FNR) may vary with differ-
ent scenarios. In general, low FPR may be preferred as it
improves user experience by preventing false alarms. How-
ever, highly sensitive installations may require low FNR, so
that no malicious traffic goes undetected and compromise
the whole network. Using IoT-Keeper, false positives
do not significantly impact user experience because IoT-
Keeper enforces (and removes) network restrictions for
any device, while maintaining minimal network access for
the device, allowing it to continue its normal operations.
This enables us to target lower FNR as well, for better
security, without negatively affecting user experience.
We consider two types of classification problems:
•Binary-class problem: Differentiating between benign
and malicious network activity to detect anomalies.
•Multi-class problem: Identify the sub-type of mali-
cious activity exhibited by the device.
The motivation to identify the sub-type of malicious ac-
tivity is that it provides us more information that can be
used to enforce different levels of network restrictions for
any device. For example, a device executing a network
scanning attack may only be allowed to access its respec-
tive cloud service, whereas, network access for a device
stealing user data should be completely blocked. This pa-
per does not focus on identifying the sub-types of benign
activity.
Our evaluation shows that IoT-Keeper was able to
achieve an accuracy of 0.982 with FPR= 0.01 and FNR=
0.02 for binary-class problem. It shows that our anomaly
detection technique can differentiate between benign and
malicious network activity with high sensitivity. Based on
these results, it can be concluded that IoT-Keeper is able
to successfully identify block any malicious activity in the
network.
Table 3 shows the performance achieved for identifying
different types of malicious traffic. The results show that
IoT-Keeper can identify volumetric attack (generating
large volumes of traffic) with high sensitivity (0.99) and
12
low FPR (= 0.02). The network scanning attacks, in gen-
eral, are detected with an accuracy of 0.993 and f1-score
0.986. This performance is better than the performance
achieved for identifying different variants of network scan-
ning attacks.
In order to investigate this discrepancy, we study the
feature value distributions in clusters representing these
attacks. The feature value distributions represent the
network behavior for different types of network activity.
Therefore, if multiple network attacks have similar net-
work footprint, the feature value distributions, observed
in the clusters representing that traffic, will be overlap-
ping. This overlap will result in misclassification. This
phenomenon is prominent when we study different variants
of network scanning attacks and it explains the relatively
lower accuracies achieved for detecting variants of network
scanning attacks. However, it should be noted that a net-
work scanning attack, if it happens, is only misclassified
as another network scanning attack. Since, the network
restrictions for a device performing any type of network
scanning attack are similar, the resulting security impli-
cations of these misclassification are negligible for given
problem scenario.
Compared to volumetric attacks, it is difficult to detect
detect MitM and data theft attacks because the network
activity for these attacks is sporadic and difficult to distin-
guish from benign traffic. However, IoT-Keeper achieves
good performance in detecting these attacks, which other-
wise go undetected by anomaly detection systems.
Our analysis revealed that device logs can also be useful
for identifying the sub-type of malicious traffic. For ex-
ample, analyzing network traffic generated during fuzzing
attack may register it as a network scanning or DoS attack.
However, studying device logs reveals it was a fuzzing at-
tack against particular service running on target host.
Type Accuracy Recall FPR f1
Port Scan 0.96 0.98 0.07 0.97
Port sweep 0.97 0.99 0.06 0.98
Address sweep 0.97 0.99 0.08 0.98
Botnet 0.99 0.99 0.02 0.99
MitM 0.77 0.92 0.52 0.85
Fuzzing 0.99 0.99 0.01 0.99
Data theft 0.74 0.88 0.45 0.77
Malware injec-
tion
0.79 0.94 0.49 0.88
SYN flooding 0.98 0.98 0.03 0.99
SSL renegotia-
tion
0.96 0.99 0.07 0.97
Table 3: Performance achieved by IoT-Keeper for identifying net-
work attacks
6.3. Network Performance
In order to maximize usability, it is vital for network se-
curity solutions to have minimal impact on user experience
in terms of latency. Therefore, the design of IoT-Keeper
is driven by the goal to minimize the latency experienced
by end users.
To study the impact on latency while browsing Internet,
we studied page load times for top 1000 websites, ranked
by Majestic [48]. The measurements were taken for three
different scenarios including;
1. IoT-Keeper disabled.
2. IoT-Keeper enabled with 0% cache hit rate.
3. IoT-Keeper enabled with 95% cache hit rate.
We compare the latency experienced when IoT-Keeper
is disabled and no analysis is performed to the latency
experienced when IoT-Keeper is enabled with anomaly
detection and traffic filtering. 0% cache hit rate means
that the security policy cache is empty and all traffic flows
are analyzed, whereas, 95% cache hit rate means that 95%
of network traffic should have a matching policy available
in the cache. Section 3 explains how caching reduces the
number of requests made to perform anomaly detection,
thereby, reducing the latency experienced by user as well
as resource consumption.
Figure 7a shows that when IoT-Keeper is enabled,
average page load time is increased by upto 4.76% and
15.89% for 95% and 0% cache hit rate respectively. This
increase in latency due to anomaly detection is not signif-
icantly high, even if there are no cached security policies
available at Keeper Gateway. Investigating the increase
in page load time for 0% cache hit reveals that the relative
increase in page load time is higher (up to 40%) for web-
sites with very small page load times such as, google.
com and microsoft.com, but very low ≤7% for web-
sites with longer page load time such as, linkedin.com,
instagram.com. This is because the page load time for
sites such as google.com is very low (≈0.5s) and addi-
tion of constant time interval (required to perform anal-
ysis) will result in a large percentage increase for total
page load time. On the other hand, this additional time
will account for only a small percentage of time required
to load websites with large page load times ≈2s, such
as, instagram.com,qq.com.
The latency overhead does not depend on the volume
of data loaded for webpage, instead it only depends on
the time taken to identify traffic type and install flow ta-
ble entry to handle the traffic. This overhead is only seen
for the first time a web page is requested and any future
requests for same web page will be handled by the secu-
rity policy cache, with nearly no delay. Our experiments
showed that with 100% cache hit, latency is increased by
1.8% (±1.49%) only.
Identifying the type of traffic flow is the most time con-
suming task performed by Keeper Gateway. Figure 7b
shows the time taken to identify a traffic flow type can
account for 13.93% ±(9.55%) percent of the total time re-
quired to fetch a web page. In comparison, the time taken
for feature extraction, cache lookup and installation of flow
table rules is negligible.
13
0 2 4 6 8 10 12 14
Time taken to load a web page (s)
0.0
0.2
0.4
0.6
0.8
1.0
Page load times for top 1000 websites
IoTKeeper disabled
IoTKeeper 95% cache hit
IoTKeeper 0% cache hit
(a) Page load times for top 1000 websites ranked by Ma jestic.
(b) Percentage of page load time consumed for traffic classifica-
tion.
Figure 7: Impact of traffic classification over the latency experienced
during web browsing.
IoT-Keeper architecture suggests that Keeper
Gateway will serve as a regular gateway used to setup
edge networks. Therefore, we study the network perfor-
mance achieved using Keeper Gateway, in detail. For
this purpose, we measured the layer-4, layer-7 goodput,
bufferbloat latencies as well as TCP and UDP latencies.
Layer-4 goodput was calculated using iperf34and layer-7
goodput was calculated for bulk file transfer using curl, to
include protocol processing overhead as well. Bufferbloat
latencies was calculated using netperf5with RRUL test
(simulated by netperfrunner 6) and speedtest (simulated
using betterspeedtest7). These tests use multiple simulta-
neous connections to simulate heavy network load to study
latency and throughput in uplink and downlink. Lastly,
TCP and UDP latencies were calculated using qperf8.
The experiments were conducted to study the perfor-
mance for D2D (LAN↔LAN) and D2I (LAN↔WAN)
communications. For each type, we compared the per-
formance achieved in insecure setting with IoT-Keeper
disabled and secure setting with IoT-Keeper enabled for
anomaly detection and traffic filtering.
These results give a qualitative and quantitative under-
4https://iperf.fr/
5https://github.com/HewlettPackard/netperf
6https://github.com/richb-hanover/CeroWrtScripts/blob/
master/netperfrunner.sh
7https://github.com/richb-hanover/CeroWrtScripts/blob/
master/betterspeedtest.sh
8https://linux.die.net/man/1/qperf
Figure 8: Time required for analyzing traffic samples, using different
number of clusters
standing of the overheads on network performance due
to traffic classification. It is evident that IoT-Keeper
does not introduce significant deterioration in network per-
formance in comparison with baseline performance using
same hardware.
6.4. System Performance
We investigated the deployment feasibility of IoT-
Keeper, using a R-PI, for analyzing network traffic at
line speeds. The experimentation was conducted using
traffic traces collected for duration of 30 and 60 minutes
respectively, from a fully saturated link on R-PI. We calcu-
lated the time required to analyze all data points in these
samples, using different number of clusters i.
Figure 8 shows that the time required to cluster and an-
alyze the data points increases linearly as the number of
clusters increase. It can be observed that for i= 17, all
data points in 30 minutes sample can be analyzed in less
than 4 minutes using a R-PI. For comparison, same analy-
sis takes less than 30 seconds on a consumer grade Core i5
laptop with 32Gb memory. Similarly, we can analyze the
60 minute sample, with i= 50, in approximately 21 min-
utes, using R-PI. These results show that IoT-Keeper is
able to operate at line speeds using low-cost single board
computers.
In order to study the impact of caching over the system
performance, we studied how the number of security poli-
cies in cache affects the number of classification operations
performed. Figure 9a shows that 90% traffic flows in the
network can be handled using roughly 200 security poli-
cies in the cache. These results validate the hypothesis
that most of network traffic from edge networks is des-
tined to only few cloud services. Therefore, a relatively
small number of cached security policies can result in high
cache hit rate, thereby, lowering the latency, as well as
resource footprint.
We analyze our hash-table based implementation of
cache to study the performance in terms of lookup time
and cache size, relative to the number of cached secu-
rity policies. As expected, the deep-memory size of cache
increases linearly with number of security policies while
lookup time remains constant. A few spikes in lookup
14
D2D D2I
Metric Direction Insecure Secure Insecure Secure
Layer 4 goodput Up 89.97 (±0.77) 89.69 (±0.03) 90.11 (±0.80) 88.91 (±0.10)
90.46 (±0.34) 89.70 (±0.02) 91.01 (±1.53) 89.70 (±0.15)
Layer 7 goodput Up 87.67 (±1.32) 84.152 (±0.12) 89.94 (±0.60) 86.23 (±0.34)
Down 88.60 (±1.52) 88.17 (±2.42) 89.12 (±0.89) 87.78 (±1.22)
Bufferbloat latency
(ms) (Speedtest)
Up 2.11 (±0.40) 3.02 (±0.36) 3.77 (±0.24) 3.01 (±0.36)
Down 90.71 (±2.01) 92.02 (±2.31) 81.41 (±2.67) 82.83 (±2.10)
Bufferbloat latency
(ms) (RRUL test)
Up 2.11 (±0.13) 2.82 (±0.44) 2.92 (±0.89) 3.22 (±0.77)
Down 45.81 (±1.73) 50.13 (±1.44) 54.11 (±1.87) 55.93 (±2.44)
Latency (ms) TCP 0.37 (±0.004) 0.42(±0.003) 0.38 (±0.003) 0.38 (±0.004)
UDP 0.38 (±0.003) 0.40 (±0.003) 0.39 (±0.004) 0.39 (±0.003)
Table 4: Network performance achieved by IoT-Keeper, in terms of throughput and latency, with R-PI based deployment
time can be attributed to underlying hardware and oper-
ating system. Section 3.1 discusses how we can limit the
linearly increasing cache size using expiry time and setting
an upper bound on the maximum disk space available for
cache.
0 200 400 600 800 1000
Number of security policies in cache
0.0
0.5
1.0
Cache hit
Classification requests
(a) Relationship between the number of cached security
policies and the number of classification requests made
for handling new traffic flows.
(b) Behavior of cache lookup time and (deep memory)
cache size relative to the number of security policies
stored locally.
Figure 9: Effects of cached security policies over system performance
7. Related Work
A number of techniques have been proposed to iden-
tify anomalies in network traffic [49, 50, 51, 52, 53, 54].
Researchers have studied various feature analysis and ma-
chine learning techniques to identify botnets [12, 55, 56],
Denial of Service [51] and other attacks in the net-
work [57, 58].
The anomaly detection techniques can be categorized
into two types: offline and online. Offline techniques are
developed using labeled dataset. Offline algorithms have
access to whole dataset and multiple iterations of training
and evaluation are performed to produce the final classi-
fication model. This exercise consumes lots of time and
resources. These algorithms are often used by signature-
based network security solutions to identify network at-
tacks.
Online techniques do not have access to complete
dataset during model training and they mostly use unla-
beled data to perform network traffic classification. They
also need to be efficient enough to ensure high detection ac-
curacy at high packet arrival rates, using limited resources.
Majority of existing traffic anomaly detection techniques
are developed for offline analysis [49, 12] and require la-
beled data. Given the number and variety of IoT devices,
data collection is challenging. Crowd-sourced data collec-
tion has been proposed to address the problem [14] but it
has its own limitations [59].
IoT-Keeper is an online anomaly detection technique.
Therefore, we do not compare its functioning and perfor-
mance with signature-based network security solutions as
these solutions are highly specialized, operate on custom
hardware, incur high deployment and operational costs.
The performance of signature-based solutions is also lim-
ited by the availability of attack signatures.
Among online traffic classification techniques, Secure-
box was, to the authors’ knowledge, the first one to pro-
pose a two tier model, where a lightweight network gate-
way uses a cloud service to analyze traffic from edge net-
works [26]. The cloud service supports traffic analysis us-
ing software middleboxes, and various machine learning
based analysis technique. IoT-Keeper addresses the pri-
vacy and latency problems in Securebox model by per-
forming traffic analysis locally on the network gateway.
15
IoT Sentinel [10] identifies the IoT devices by analyzing
their network traffic and sets up network access control
based on the profile of given IoT device. While IoT Sen-
tinel detects device types with high accuracy, it requires
network traces captured during device setup, for this pur-
pose. If a device was already setup, before it was connected
to network, IoT Sentinel unable to identify the device and
fail to setup network restrictions. Also, the access control
policies are coupled with IoT devices and IoT Sentinel does
not provide a mechanism to update these policies with the
evolution of devices’ network behavior. In comparison,
IoT-Keeper constantly monitors network activity to de-
tect and block malicious traffic, at any time, irrespective
of what device is generating the traffic.
Recently proposed Kitsune [13] uses an ensemble of au-
toencoders for high accuracy online anomaly detection.
Kitsune uses incremental damped statistics to extract fea-
tures and track devices’ network behavior. To reduce
memory footprint, Kitsune maintains information about
device behavior for a fixed time. Hence, if a device ex-
hibits anomalous behavior long enough, the model will
consider it as normal behavior. To address this issue, IoT-
Keeper maintains device behavior information through
the timeline of its connectivity. This information is used
to compare devices’ latest behavior to its previous behav-
ior at any point in time. It enables us to detect changes
in devices’ network behavior due to firmware updates and
configuration changes.
DIoT [14] uses the periodicity in IoT device traffic to
identify device type and uses device-type-specific anomaly
detection model to detect network attacks. Although this
technique achieves high accuracy, the anomaly detection
model depends on device type identification. Given the
huge variety of devices, it is difficult to develop and main-
tain device-type-specific anomaly model. Meanwhile, any
wrong device type identification results will essentially ren-
der the device useless, thereby, negatively affecting user
experience. DIoT also does not update the anomaly de-
tection model based on changes in device configuration and
software updates. Compared to DIoT, IoT-Keeper does
not require device-type information to detect anomalies
and the anomaly detection scheme can also accommodate
any changes in network behavior due to changes in device
configuration.
Anomaly detection techniques such as, DIoT, Bot-
Miner [12] mainly detect volumetric attacks (producing
large volume of network traffic) such as, Mirai botnet. It
is difficult for these techniques to detect attacks such as
MitM, ARP Spoofing, which have sporadic network activ-
ity similar to normal device activity. IoT-Keeper, on the
other hand, is able to detect both these kinds of attacks
with high accuracy.
Online detection techniques [10, 13, 14] use the intrin-
sic device behavior as its normal behavior. Hence, they
are unable to detect any anomalies in devices’ network be-
havior if it is inherently compromised. Meanwhile, IoT-
Keeper can capture any discrepancies in a devices’ nor-
mal network behavior, during clustering, and labels such
behavior as malicious, depending on the feature value dis-
tributions observed in given cluster. Therefore, we are
able to detect malicious behavior of inherently compro-
mised devices.
Recently proposed anomaly detection techniques also
use recurrent neural networks [60, 61, 62, 63] or gated re-
current units [64, 65] for anomaly detection. Some tech-
niques model network traffic as symbols in a language and
use a frequency based model to identify anomalous se-
quence of symbols, indicating network anomalies [61, 14].
These techniques are mainly employed for offline analysis
and have high resource footprint.
Any technique which performs remote traffic analysis
raise security and privacy concerns for users whose traffic
data is analyzed in remote environment. The data storage,
processing and analysis in these environments is beyond
users’ control and the traffic data being analyzed contains
sensitive user data and personally identifiable information.
Using IoT-Keeper, we alleviate these concerns by per-
forming traffic analysis within user network, where user
has complete ownership and control over the data being
analyzed by the network.
Software middleboxes are proposed for on-demand traf-
fic analysis using cloud infrastructure [66]. Middlebox vir-
tualization reduces deployment costs and improves scal-
ability. However, the increase in latency experienced by
re-routing traffic through these middleboxes, cost of an-
alyzing huge volumes of network traffic, and privacy im-
plications of analyzing business critical data under third
party control, are some of the challenges faced by these
proposals.
Various commercial products such as, Cujo9, Dojo10,
Core11, Sense12 have been launched to protect IoT and
smart homes. These products claim to perform real-time
behavioral traffic analysis and deep packet inspection to
detect network attacks. At the time of writing, not all of
these features are available on latest generation of these de-
vices. Due to limited resources on gateway, most products
perform traffic analysis in their respective cloud services,
raising aforementioned privacy challenges. Ensuring low
latency and high network throughput performance is also
a big challenge for these products, which essentially use a
proxy to intercept and analyze user traffic flowing through
the gateway, impacting the latency and throughput.
Some of these devices claim to perform deep packet in-
spection on the router itself, resulting in severely degraded
network performance. The growing use of encrypted pro-
tocols also limits the usefulness of deep packet inspection.
Since IoT-Keeper does not perform payload analysis, its
performance is not limited by the use of encrypted proto-
cols.
9https://www.getcujo.com/smart-firewall- cujo/
10https://dojo.bullguard.com/
11https://us.norton.com/core
12https://sense.f- secure.com/
16
8. Discussion
IoT-Keeper is a network-based security solution de-
signed to detect anomalies and react to those by isolating
the devices exhibiting malicious behavior. Our evaluation
demonstrates that the proposed solution efficiently secures
edge networks against any attacks. We now discuss possi-
ble shortcomings and limitations of IoT-Keeper.
Feature engineering and model generalization
Feature engineering is an important step in developing
a generalized detection model. The resource limitations
of single-board computers required us to identify the least
number of features capturing maximum variance in the
network traffic data, to detect anomalies. The final fea-
ture set had to be compact and generalizable such that it
results in consistent anomaly detection performance across
multiple datasets. To address these requirements, we ana-
lyzed each feature individually to study its significance for
traffic classification. Our analysis revealed that a concise
feature set, extracted from network data, can successfully
identify anomalies in network traffic. Meanwhile, device
logs, if available, can also be helpful in improving the per-
formance of anomaly detection scheme.
It should be noted that IoT-Keeper does not perform
deep packet inspection or use any features extracted from
unencrypted payload analysis. It can identify malicious
network activity of any connected device but cannot detect
any malicious data included in packet payload.
Detecting non-volumetric attacks
The network activity of volumetric attacks such as
denial-of-service attacks, is substantially different from
regular device activity because of high traffic volumes and
protocols used. We can achieve high accuracy in detect-
ing volumetric attacks, using features extracted from traf-
fic metadata only. However, it is not possible to achieve
similar performance, using same feature set, if the net-
work footprint of an attack is small and infrequent such
as, MitM attacks.
Although IoT-Keeper is able to detect these at-
tacks with infrequent network activity, the performance
of anomaly detection can be improved by using human ex-
pertise to analyze the underlying model. In this regard,
Keeper Service can collect various statistics about clas-
sification models trained by Keeper Gateway and hu-
man experts can analyze this data to identify any discrep-
ancies in the anomaly detection model. Any updates, if
needed, are sent from Keeper Service to all gateways
deployed in edge networks.
Free loaders: By default, IoT-Keeper restricts the
network access for malicious devices but it does not fully
block their access to the Internet. Although this strat-
egy prevents any attacks in the network, it does not block
free loaders from consuming network bandwidth. How-
ever, Keeper Gateway allows users to monitor and limit
the bandwidth consumed by connected devices, to prevent
these free loaders from exhausting limited bandwidth.
Evolution in device behavior
The ability of IoT-Keeper to identify device firmware
upgrades and configuration changes allows us to limit num-
ber of false alarms raised by the system, as well as track
the progress of software updates for device deployed in
the wild. The knowledge of firmware versions (operated
by IoT devices) allows us to readily update security poli-
cies, to prevent any attempts to exploit known issues and
vulnerabilities in the given firmware version. Given that
IoT-Keeper can identify these updates, it does not de-
tect minor upgrades such as, software patches, which do
not have significant impact on device’ network behavior.
Physical tampering with devices:IoT-Keeper
monitors network traffic to detect any malicious activity.
Hence, it is not able to detect if a device has any back-
doors or is physical tampered with, by an adversary. We
assume that any backdoors or physical tampering are mo-
tivated by malicious intent to influence devices’ behavior
to the favor of adversary. Since majority of IoT devices
are connected to the network, any malicious behavior will
be detected and blocked by IoT-Keeper.
Cellular and bluetooth communications:IoT-
Keeper only monitors the communications passing
through Keeper Gateway. Any communications using
other channels such as, cellular data, satellite link, can not
be secured by the proposed system. The current imple-
mentation does not monitor D2D communications occur-
ring via low-power communication protocols. Our study
revealed that IoT devices do not generally use low-power
protocols for D2D communications and such communica-
tions are performed via IoT hub, which can be monitored.
Attacks against IoT-Keeper
Our system design limits the attack surface of Keeper
Gateway by requiring physical proximity or access to
cloud service to perform any configuration changes. In
case if an adversary gains access to user credentials for the
cloud service configuration portal, it can reset or disable
security features on Keeper Gateway and render it use-
less. To prevent realization of such attacks, IoT-Keeper
architecture supports the use of 2-factor authentication,
notifications about configuration changes and ability to
roll back changes to any point in time using state back-
ups.
MAC address spoofing:IoT-Keeper sets up net-
work access restrictions based on layer-2 MAC addresses.
An adversary can circumvent these restrictions by spoof-
ing device MAC address. In such scenarios, as long as
the adversary does not exhibit malicious activity, it will
have regular network access but this behavior has no in-
centive for the adversary. On the contrary, if adversary
engages in malicious activity with spoofed MAC address,
IoT-Keeper will identify and block that activity.
Denial of Service: There is a possibility that adver-
sary can exploit MAC address spoofing to perform DoS at-
tack against Keeper Gateway. In that case, caching will
17
limit the number of times similar traffic flows, coming from
different MAC addresses, are analyzed by the gateway.
Moreover, our evaluation also shows that IoT-Keeper
is able to perform anomaly detection at line speeds with-
out becoming a bottleneck. An adversary can flood the
upstream link in Keeper Gateway but this will block
all traffic flows in the network, including attacker’s own
traffic, giving no incentive to the attacker.
There is also a possibility of DDoS attacks against
Keeper Service. Due to the system architecture, these
attacks do not affect Keeper Gateway functionality be-
cause it does not depend on Keeper Service. Mean-
while, the attacks against Keeper Service can be han-
dled using several known techniques to prevent DDoS at-
tacks. To prevent Keeper Service from becoming single
point of failure when issuing updates, peer-to-peer pro-
tocols with checksums and public-key encryption, can be
used for transmitting updates to Keeper Gateway de-
ployed in edge networks.
Adversarial machine learning: An advanced adver-
sary can use adversarial machine learning techniques [67]
to understand the anomaly detection model and gener-
ate specially crafted packet flows to circumvent detection
mechanism. However, this approach is infeasible because
even small changes in packet headers substantially change
the network behavior. Meanwhile, payload modifications
do not help in circumvention because IoT-Keeper does
not perform any payload analysis. Therefore, it is difficult
make small enough changes in packets’ header space, which
preserve the malicious intent and do not change character-
istics of network flow.
Scalability
Current architecture uses Keeper Service to only pro-
vide support services for Keeper Gateway but system
architecture allows us to deploy additional analytics ser-
vices in the Keeper Service as well. For example,
Keeper Gateway can re-route traffic from specific de-
vices through middleboxes and perform sophisticated anal-
ysis. In order to speed up model training process, Keeper
Gateway can use the resources available in Keeper Ser-
vice to offload model training. It is also possible to use
the computational power of devices connected to the net-
work, such as PC, smartphones, for training anomaly de-
tection model and performing computationally intensive
traffic analysis.
9. Conclusion
This paper presents IoT-Keeper, a platform for se-
curing edge networks by detecting malicious network traf-
fic and isolating the devices generating that traffic. IoT-
Keeper platform adopts lightweight design and can be de-
ployed using low-cost programmable devices so that traf-
fic classification and security policy enforcement can be
performed at the network gateway, in real time. It re-
lies on the feature set extracted from network traffic data,
to successfully identify various types of network attacks.
The ability to dynamically generate and enforce security
policies enables automation of network configuration and
readily blocks any malicious actor in the network, using
adhoc overlay networks. IoT-Keeper evaluation, using
a real world testbed, demonstrates that IoT-Keeper can
successfully perform traffic analysis on network gateways,
with minimal impact on user experience. Moreover, it does
not require sophisticated hardware or modifications on ex-
isting IoT and other devices for its operations.
References
[1] H. Ning, H. Liu, L. T. Yang, Cyberentity security in the inter-
net of things, Computer 46 (4) (2013) 46–53. doi:10.1109/MC.
2013.74.
[2] S. Sicari, A. Rizzardi, L. Grieco, A. Coen-Porisini, Security, pri-
vacy and trust in internet of things: The road ahead, Computer
Networks 76 (2015) 146 – 164. doi:https://doi.org/10.1016/
j.comnet.2014.11.008.
URL http://www.sciencedirect.com/science/article/pii/
S1389128614003971
[3] Senrio, 400,000 publicly available iot devices vulnerable
to single flaw, http://blog.senr.io/blog/400000-publicly-
available-iot- devices-vulnerable- to-single-flaw/, [Ac-
cessed: 2017-05-05] (2017).
[4] D. Pauli, 414,949 d-link cameras, iot devices can be hijacked
over the net, https://www.theregister.co.uk/2016/07/08/
414949_dlink_cameras_iot_devices_can_be_hijacked_over_
the_net/, [Accessed: 2018-07-21] (2017).
[5] M. Patton, E. Gross, R. Chinn, S. Forbis, L. Walker, H. Chen,
Uninvited connections: A study of vulnerable devices on the
internet of things (iot), in: 2014 IEEE Joint Intelligence and
Security Informatics Conference, 2014, pp. 232–235. doi:10.
1109/JISIC.2014.43.
[6] A. D. Rayome, The stakes have changed: No end in sigh for
ddos attack size growth, https://pages.arbornetworks.com/
rs/082-KNA- 087/images/WISR_Infographic_NoEndInSight_
FINAL.pdf, [Accessed: 2018-30-06] (2018).
[7] A. Networks, Ddos attacks increased by 91% in 2017 thanks to
iot, https://www.techrepublic.com/article/ddos-attacks-
increased-91- in-2017- thanks-to-iot/, [Accessed: 2018-30-
06] (2017).
[8] M. Wall, How ’the invisible network’ poses a major se-
curity threat, http://www.bbc.com/news/business-41252203,
[Accessed: 2017-09-24] (2017).
[9] M. B. Barcena, C. Wueest, Insecurity in the internet of things,
Security Response, Symantec.
[10] M. Miettinen, S. Marchal, I. Hafeez, N. Asokan, A. R. Sadeghi,
S. Tarkoma, Iot sentinel: Automated device-type identification
for security enforcement in iot, in: 2017 IEEE 37th International
Conference on Distributed Computing Systems (ICDCS), 2017,
pp. 2177–2184. doi:10.1109/ICDCS.2017.283.
[11] T. Yu, V. Sekar, S. Seshan, Y. Agarwal, C. Xu, Handling a
trillion (unfixable) flaws on a billion devices: Rethinking net-
work security for the internet-of-things, in: Proceedings of the
14th ACM Workshop on Hot Topics in Networks, HotNets-
XIV, ACM, New York, NY, USA, 2015, pp. 5:1–5:7. doi:
10.1145/2834050.2834095.
URL http://doi.acm.org/10.1145/2834050.2834095
[12] G. Gu, R. Perdisci, J. Zhang, W. Lee, Botminer: Cluster-
ing analysis of network traffic for protocol- and structure-
independent botnet detection, in: Proceedings of the 17th Con-
ference on Security Symposium, SS’08, USENIX Association,
Berkeley, CA, USA, 2008, pp. 139–154.
URL http://dl.acm.org/citation.cfm?id=1496711.1496721
[13] Y. Mirsky, T. Doitshman, Y. Elovici, A. Shabtai, Kitsune: An
ensemble of autoencoders for online network intrusion detection,
18
CoRR abs/1802.09089. arXiv:1802.09089.
URL http://arxiv.org/abs/1802.09089
[14] T. D. Nguyen, S. Marchal, M. Miettinen, M. H. Dang,
N. Asokan, A. Sadeghi, D¨ıot: A crowdsourced self-learning
approach for detecting compromised iot devices, CoRR
abs/1804.07474. arXiv:1804.07474.
URL http://arxiv.org/abs/1804.07474
[15] D. Pauli, Connected kettles boil over, spill wi-fi passwords
over london, https://www.theregister.co.uk/2015/10/19/
bods_brew_ikettle_20_hack_plot_vulnerable_london_pots/,
[Accessed: 2017-05-07] (2015).
[16] M. Kumar, How to hack wifi password from smart doorbells,
https://thehackernews.com/2016/01/doorbell-hacking-
wifi-pasword.html, [Accessed: 2018-17-06] (2016).
[17] S. Gollakota, H. Hassanieh, B. Ransford, D. Katabi, K. Fu,
They can hear your heartbeats: Non-invasive security for im-
plantable medical devices, in: Proceedings of the ACM SIG-
COMM 2011 Conference, SIGCOMM ’11, ACM, New York,
NY, USA, 2011, pp. 2–13. doi:10.1145/2018436.2018438.
URL http://doi.acm.org/10.1145/2018436.2018438
[18] R. Mortier, J. Zhao, J. Crowcroft, L. Wang, Q. Li, H. Had-
dadi, Y. Amar, A. Crabtree, J. Colley, T. Lodge, T. Brown,
D. McAuley, C. Greenhalgh, Personal data management with
the databox: What’s inside the box?, in: Proceedings of the
2016 ACM Workshop on Cloud-Assisted Networking, CAN ’16,
ACM, New York, NY, USA, 2016, pp. 49–54. doi:10.1145/
3010079.3010082.
URL http://doi.acm.org/10.1145/3010079.3010082
[19] E. B. Beigi, H. H. Jazi, N. Stakhanova, A. A. Ghorbani, To-
wards effective feature selection in machine learning-based bot-
net detection approaches, in: 2014 IEEE Conference on Com-
munications and Network Security, 2014, pp. 247–255. doi:
10.1109/CNS.2014.6997492.
[20] M. Antonakakis, T. April, M. Bailey, M. Bernhard,
E. Bursztein, J. Cochran, Z. Durumeric, J. A. Halder-
man, L. Invernizzi, M. Kallitsis, D. Kumar, C. Lever, Z. Ma,
J. Mason, D. Menscher, C. Seaman, N. Sullivan, K. Thomas,
Y. Zhou, Understanding the mirai botnet, in: 26th USENIX
Security Symposium (USENIX Security 17), USENIX Associa-
tion, Vancouver, BC, 2017, pp. 1093–1110.
URL https://www.usenix.org/conference/
usenixsecurity17/technical-sessions/presentation/
antonakakis
[21] J. Synder, Is installing anti-virus software on mobile devices
necessary?, https://insights.samsung.com/2018/01/22/is-
installing-anti- virus-software- on-mobile-devices-
necessary/, [Accessed: 2018-06-17] (2018).
[22] M. Black, Do you need antivirus on android?, https:
//www.techadvisor.co.uk/how-to/google- android/do-you-
need-antivirus- on-android- 3668607/, [Accessed: 2018-06-
24] (2018).
[23] E. Pan, J. Ren, M. Lindorfer, C. Wilson, D. R. Choffnes, Panop-
tispy: Characterizing audio and video exfiltration from android
applications.
[24] I. Butun, S. D. Morgera, R. Sankar, A survey of intrusion de-
tection systems in wireless sensor networks, IEEE Communica-
tions Surveys Tutorials 16 (1) (2014) 266–282. doi:10.1109/
SURV.2013.050113.00191.
[25] N. Feamster, Outsourcing home network security, in: Proceed-
ings of the 2010 ACM SIGCOMM Workshop on Home Net-
works, HomeNets ’10, ACM, New York, NY, USA, 2010, pp.
37–42. doi:10.1145/1851307.1851317.
URL http://doi.acm.org/10.1145/1851307.1851317
[26] I. Hafeez, A. Y. Ding, L. Suomalainen, A. Kirichenko,
S. Tarkoma, Securebox: Toward safer and smarter iot networks,
in: Proceedings of the 2016 ACM Workshop on Cloud-Assisted
Networking, CAN ’16, ACM, New York, NY, USA, 2016, pp.
55–60. doi:10.1145/3010079.3012014.
URL http://doi.acm.org/10.1145/3010079.3012014
[27] L. Zelster, Malware sample sources for researchers, https://
zeltser.com/malware-sample- sources/, [Accessed: 2018-10-
11] (2018).
[28] L. Zeltser, Common vulnerabilities and exposures, https://
cve.mitre.org/, [Accessed: 2018-10-11] (2017).
[29] Mitre, Common weakness enumeration, https://cwe.mitre.
org/, [Accessed: 2018-06-24] (2018).
[30] A. M. Dunn, O. S. Hofmann, B. Waters, E. Witchel, Cloaking
malware with the trusted platform module, in: Proceedings of
the 20th USENIX Conference on Security, SEC’11, USENIX
Association, Berkeley, CA, USA, 2011, pp. 26–26.
URL http://dl.acm.org/citation.cfm?id=2028067.2028093
[31] J.-E. Ekberg, N. Asokan, External authenticated non-volatile
memory with lifecycle management for state protection in
trusted computing, in: Proceedings of the First International
Conference on Trusted Systems, INTRUST’09, Springer-Verlag,
Berlin, Heidelberg, 2010, pp. 16–38. doi:10.1007/978-3-642-
14597-1_2.
URL http://dx.doi.org/10.1007/978-3- 642-14597- 1_2
[32] U. K. Archive, Kdd cup 1999 data, http://kdd.ics.uci.
edu/databases/kddcup99/kddcup99.html, [Accessed: 2018-07-
18] (1999).
[33] K. Zhou, C. Fu, S. Yang, Fuzziness parameter selection in fuzzy
c-means: The perspective of cluster validation, Science China
Information Sciences 57 (11) (2014) 1–8. doi:10.1007/s11432-
014-5146- 0.
URL http://dx.doi.org/10.1007/s11432-014- 5146-0
[34] M. Charrad, N. Ghazzali, V. Boiteau, A. Niknafs, Nbclust: An
r package for determining the relevant number of clusters in a
data set, Journal of Statistical Software, Articles 61 (6) (2014)
1–36. doi:10.18637/jss.v061.i06.
URL https://www.jstatsoft.org/v061/i06
[35] P. J. Rousseeuw, Silhouettes: A graphical aid to the interpre-
tation and validation of cluster analysis, Journal of Computa-
tional and Applied Mathematics 20 (1987) 53 – 65. doi:https:
//doi.org/10.1016/0377-0427(87)90125- 7.
URL http://www.sciencedirect.com/science/article/pii/
0377042787901257
[36] R. Tibshirani, G. Walther, T. Hastie, Estimating the number
of clusters in a data set via the gap statistic, Journal of Royal
Statistical Society, Statistical Methodology 63 (2) (2001) 411–
423. doi:10.1111/1467-9868.00293.
URL https://doi.org/10.1111/1467-9868.00293
[37] Z. Huang, Q. Shen, Fuzzy interpolation and extrapolation: A
practical approach, IEEE Transactions on Fuzzy Systems 16 (1)
(2008) 13–28. doi:10.1109/TFUZZ.2007.902038.
[38] Y. C. Chang, S. M. Chen, Temperature prediction based on
fuzzy clustering and fuzzy rules interpolation techniques, in:
2009 IEEE International Conference on Systems, Man and
Cybernetics, 2009, pp. 3444–3449. doi:10.1109/ICSMC.2009.
5346229.
[39] D. M. Mendez, I. Papapanagiotou, B. Yang, Internet of things:
Survey on security and privacy, CoRR abs/1707.01879. arXiv:
1707.01879.
URL http://arxiv.org/abs/1707.01879
[40] N. Apthorpe, D. Reisman, S. Sundaresan, A. Narayanan,
N. Feamster, Spying on the smart home: Privacy attacks
and defenses on encrypted iot traffic, CoRR abs/1708.05044.
arXiv:1708.05044.
URL http://arxiv.org/abs/1708.05044
[41] M. Haus, M. Waqas, A. Y. Ding, Y. Li, S. Tarkoma, J. Ott, Se-
curity and privacy in device-to-device (d2d) communication: A
review, IEEE Communications Surveys Tutorials 19 (2) (2017)
1054–1079. doi:10.1109/COMST.2017.2649687.
[42] Symantec, Smart home security and the internet of things: The
future is here, https://us.norton.com/internetsecurity-
iot-smart- home-security- core.html, [Accessed: 2018-10-11]
(2017).
[43] K. Lab, Kaspersky iot scanner: How to keep your home net-
work and its smart devices safe, https://www.kaspersky.com/
blog/kaspersky-iot- scanner/18449/, [Accessed: 2018-10-10]
(2017).
[44] Kitnet dataset, https://goo.gl/iShM7E, [Accessed: 2018-07-
19
09].
[45] B. Pfaff, J. Pettit, T. Koponen, E. Jackson, A. Zhou, J. Raja-
halme, J. Gross, A. Wang, J. Stringer, P. Shelar, K. Amidon,
M. Casado, The Design and Implementation of Open vSwitch,
in: 12th USENIX Symposium on Networked Systems Design
and Implementation (NSDI 15), USENIX Association, Oakland,
CA, 2015, pp. 117–130.
[46] Big Switch Networks, Project floodlight - floodlight OpenFlow
controller, http://www.projectfloodlight.org/floodlight/,
[Accessed: 2016-09-17] (Oct. 2016).
[47] P. Martin, Using your new raspberry pi 3 as a wifi access point
with hostapd, https://frillip.com/using- your-raspberry-
pi-3- as-a- wifi-access-point- with-hostapd/, [Accessed:
2018-10-12] (2016).
[48] Majestic, The ma jestic million, https://majestic.com/
reports/majestic-million, [Accessed: 2016-07-28].
[49] T. T. T. Nguyen, G. Armitage, A survey of techniques for inter-
net traffic classification using machine learning, IEEE Commu-
nications Surveys Tutorials 10 (4) (2008) 56–76. doi:10.1109/
SURV.2008.080406.
[50] D. Bekerman, B. Shapira, L. Rokach, A. Bar, Unknown malware
detection using network traffic classification, in: 2015 IEEE
Conference on Communications and Network Security (CNS),
2015, pp. 134–142. doi:10.1109/CNS.2015.7346821.
[51] S. Akbar, J. A. Chandulal, K. N. Rao, G. S. Kumar, Improving
network security using machine learning techniques, in: 2012
IEEE International Conference on Computational Intelligence
and Computing Research, 2012, pp. 1–5. doi:10.1109/ICCIC.
2012.6510197.
[52] T. Shon, J. Seo, J. Moon, SVM Approach with a Genetic Al-
gorithm for Network Intrusion Detection, Springer Berlin Hei-
delberg, Berlin, Heidelberg, 2005, pp. 224–233. doi:10.1007/
11569596_25.
URL http://dx.doi.org/10.1007/11569596_25
[53] R. Shanmugavadivu, N. Nagara jan, Network intrusion detection
system using fuzzy logic, Indian Journal of Computer Science
and Engineering (IJCSE) 2 (1) (2001) 101–111. doi:10.1.1.
300.7185.
URL http://citeseerx.ist.psu.edu/viewdoc/download?doi=
10.1.1.300.7185&rep=rep1&type=pdf
[54] A. Bohara, U. Thakore, W. H. Sanders, Intrusion detection in
enterprise systems by combining and clustering diverse monitor
data, in: Proceedings of the Symposium and Bootcamp on the
Science of Security, HotSos ’16, ACM, New York, NY, USA,
2016, pp. 7–16. doi:10.1145/2898375.2898400.
URL http://doi.acm.org/10.1145/2898375.2898400
[55] W. T. Strayer, D. Lapsely, R. Walsh, C. Livadas, Botnet De-
tection Based on Network Behavior, Springer US, Boston, MA,
2008, pp. 1–24. doi:10.1007/978-0- 387-68768-1_1.
URL https://doi.org/10.1007/978-0- 387-68768- 1_1
[56] W. Lu, M. Tavallaee, A. A. Ghorbani, Automatic discovery of
botnet communities on large-scale communication networks, in:
Proceedings of the 4th International Symposium on Informa-
tion, Computer, and Communications Security, ASIACCS ’09,
ACM, New York, NY, USA, 2009, pp. 1–10. doi:10.1145/
1533057.1533062.
URL http://doi.acm.org/10.1145/1533057.1533062
[57] W. Lee, S. J. Stolfo, K. W. Mok, A data mining frame-
work for building intrusion detection models, in: Proceedings
of the 1999 IEEE Symposium on Security and Privacy (Cat.
No.99CB36344), 1999, pp. 120–132. doi:10.1109/SECPRI.
1999.766909.
[58] I. Hafeez, A. Yi Ding, M. Antikainen, S. Tarkoma, Poster –
ioturva: Securing device-to-device communications for iot, in:
Proceedings of the 14th ACM International Conference on Mo-
bile Computing and Networking, MobiCom ’17, ACM, New
York, NY, USA, 2017. doi:10.1145/3117811.3131262.
URL https://doi.org/10.1145/3117811.3131262
[59] H. Garcia-Molina, M. Joglekar, A. Marcus, A. Parameswaran,
V. Verroios, Challenges in data crowdsourcing, IEEE Transac-
tions on Knowledge and Data Engineering 28 (4) (2016) 901–
911. doi:10.1109/TKDE.2016.2518669.
[60] M. Du, F. Li, G. Zheng, V. Srikumar, Deeplog: Anomaly detec-
tion and diagnosis from system logs through deep learning, in:
Proceedings of the 2017 ACM SIGSAC Conference on Computer
and Communications Security, CCS ’17, ACM, New York, NY,
USA, 2017, pp. 1285–1298. doi:10.1145/3133956.3134015.
URL http://doi.acm.org/10.1145/3133956.3134015
[61] B. J. Radford, B. D. Richardson, S. E. Davis, Sequence aggre-
gation rules for anomaly detection in computer network traffic,
CoRR abs/1805.03735. arXiv:1805.03735.
URL http://arxiv.org/abs/1805.03735
[62] P. Malhotra, L. Vig, G. Shroff, P. Agarwal, Long short term
memory networks for anomaly detection in time series, 2015.
[63] L. Bontemps, V. L. Cao, J. McDermott, N. Le-Khac, Collective
anomaly detection based on long short term memory recurrent
neural network, CoRR abs/1703.09752. arXiv:1703.09752.
URL http://arxiv.org/abs/1703.09752
[64] N. N. Thi, V. L. Cao, N. Le-Khac, One-class collective anomaly
detection based on long short-term memory recurrent neural
networks, CoRR abs/1802.00324. arXiv:1802.00324.
URL http://arxiv.org/abs/1802.00324
[65] A. Oprea, Z. Li, T. F. Yen, S. H. Chin, S. Alrwais, Detection of
early-stage enterprise infection by mining large-scale log data,
in: 2015 45th Annual IEEE/IFIP International Conference on
Dependable Systems and Networks, 2015, pp. 45–56. doi:10.
1109/DSN.2015.14.
[66] J. Sherry, S. Hasan, C. Scott, A. Krishnamurthy, S. Ratnasamy,
V. Sekar, Making middleboxes someone else’s problem: Network
processing as a cloud service, SIGCOMM Comput. Commun.
Rev. 42 (4) (2012) 13–24. doi:10.1145/2377677.2377680.
URL http://doi.acm.org/10.1145/2377677.2377680
[67] D. J. Miller, X. Hu, Z. Qiu, G. Kesidis, Adversarial learning: A
critical review and active learning study, CoRR abs/1705.09823.
arXiv:1705.09823.
URL http://arxiv.org/abs/1705.09823
20