ArticlePDF Available

A Survey of Smart Home IoT Device Classification Using Machine Learning-Based Network Traffic Analysis

Authors:
  • Télécom SudParis - Institut Polytechnique de Paris

Abstract

Smart home IoT devices lack proper security, raising safety and privacy concerns. One-size-fits-all network administration is ineffective because of the diverse QoS requirements of IoT devices. Device classification can improve IoT administration and security. It identifies vulnerable and rogue items and automates network administration by device type or function. Considering this, a promising research topic focusing on Machine Learning (ML)-based traffic analysis has emerged in order to demystify hidden patterns in IoT traffic and enable automatic device classification. This study analyzes these approaches to understand their potential and limitations. It starts by describing a generic workflow for IoT device classification. It then looks at the methods and solutions for each stage of the workflow. This mainly consists of i) an analysis of IoT traffic data acquisition methodologies and scenarios, as well as a classification of public datasets, ii) a literature evaluation of IoT traffic feature extraction, categorizing and comparing popular features, as well as describing open-source feature extraction tools, and iii) a comparison of ML approaches for IoT device classification and how they have been evaluated. The findings of the analysis are presented in taxonomies with statistics showing literature trends. This study also explores and suggests undiscovered or understudied research directions.
Received 11 August 2022, accepted 31 August 2022, date of publication 8 September 2022, date of current version 19 September 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3205023
A Survey of Smart Home IoT Device
Classification Using Machine Learning-Based
Network Traffic Analysis
HOUDA JMILA , GREGORY BLANC , MUSTAFIZUR R. SHAHID, AND MARWAN LAZRAG
SAMOVAR, Télécom SudParis, Institut Polytechnique de Paris, 91764 Palaiseau, France
Corresponding author: Houda Jmila (houda.jmila@telecom-sudparis.eu)
This work was supported in part by the Vulnerability and Attack Repository for IoT (VarIoT) Project under Grant TENtec n.28263632, and
in part by the Connecting Europe Facility of the European Union.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
ABSTRACT Smart home IoT devices lack proper security, raising safety and privacy concerns. One-size-
fits-all network administration is ineffective because of the diverse QoS requirements of IoT devices. Device
classification can improve IoT administration and security. It identifies vulnerable and rogue items and
automates network administration by device type or function. Considering this, a promising research topic
focusing on Machine Learning (ML)-based traffic analysis has emerged in order to demystify hidden patterns
in IoT traffic and enable automatic device classification. This study analyzes these approaches to understand
their potential and limitations. It starts by describing a generic workflow for IoT device classification. It then
looks at the methods and solutions for each stage of the workflow. This mainly consists of i) an analysis of
IoT traffic data acquisition methodologies and scenarios, as well as a classification of public datasets, ii) a
literature evaluation of IoT traffic feature extraction, categorizing and comparing popular features, as well
as describing open-source feature extraction tools, and iii) a comparison of ML approaches for IoT device
classification and how they have been evaluated. The findings of the analysis are presented in taxonomies
with statistics showing literature trends. This study also explores and suggests undiscovered or understudied
research directions.
15
16
INDEX TERMS Classification, security, device, fingerprinting, identification, internet of things, machine
learning, network traffic, survey.
I. INTRODUCTION17
In the last decade, the Internet of Things (IoT) has spread:18
according to IoT Analytics [1], the IoT market will rise by19
18% to 14.4 billion active connections in 2022. Researchers20
have suggested several definitions of the IoT, but almost all21
agree that it is a framework of sensors, industrial machines,22
video cameras, mobile phones, etc., all of which are collec-23
tively referred to as IoT devices and can interact directly24
with one another or over the internet. IoT is used in smart25
environments (homes, cities, campuses, etc.) to help users26
understand and control their environment.27
Despite its undeniable advantages, IoT expansion28
raises security and privacy concerns. Most IoT device29
manufacturers tend to prioritize the three Ps (prototyping,30
The associate editor coordinating the review of this manuscript and
approving it for publication was Taehong Kim .
production, and performance) above security [2]. This results 31
in an ineffective security design for IoT devices. As revealed 32
by Wikileaks [3], poorly secured IoT devices are ideal targets 33
for attackers seeking to obtain unauthorized access and infer 34
sensitive information: e.g., smart TVs were converted into lis- 35
tening devices. Attackers can also use compromised devices 36
to inject malicious data and conduct large-scale attacks 37
against third parties or other devices inside the network [4]. 38
Automatically classifying devices is the first step toward 39
securing IoT networks. It enables the detection of vulnerable 40
devices and the enforcement of access control. 41
The growing diversity and heterogeneity of IoT devices, 42
each with its own QoS requirements (cameras require more 43
bandwidth than smart light bulbs, healthcare device traf- 44
fic must be prioritized, and so on), makes one-size-fits-all 45
network management ineffective. IoT device classification 46
enables network management automation. By setting QoS 47
VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ 97117
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
FIGURE 1. The scope of the survey is highlighted in red. We focus on IoT device classification in
smart homes, also called consumer IoT devices. We analyze approaches using machine
learning-based traffic analysis.
and network management policies based on the type of48
device, each automatically classified device can be assigned49
to a class with predetermined policies.50
Note that the term device classification is often confused51
with many similar tasks, namely i) traffic classification,52
ii) intrusion detection, iii) device identification, and iv) device53
fingerprinting. Traffic classification is a broad research field54
that involves classifying network traffic based on various55
parameters [5] (see Fig. 1). For instance, traffic can be classi-56
fied as either legitimate or malicious based on attack patterns:57
this is called intrusion detection. It can also be classified by58
the device that generates the traffic (device classification).59
The devices can be categorized into groups of similar devices,60
such as devices for energy management or devices for health61
monitoring, or according to their function, such as cameras,62
hubs, home assistants, etc. Device identification classifies63
devices more finely according to their model or constructor,64
such as D-link camera,Nest camera, Alexa home assistant,65
or Google home mini assistant, etc. Device fingerprinting is66
the finest level of device classification. It gives each device67
instance (e.g., camera A and camera B are two instances of68
the Nest Camera) a distinct fingerprint that is impossible to69
forge and independent of environmental changes and mobil-70
ity’’ [6]. In this study, we focus on device classification as71
a specific case of traffic classification, broader than device72
identification and device fingerprinting.73
A simple way to classify IoT devices is to mon-74
itor their MAC addresses and DHCP negotiation [7].75
Sivanathan et al. [8] outline the shortcomings of this method.76
First, IP and MAC addresses can be easily spoofed by other77
devices, making them unreliable identifiers. Furthermore,78
MAC addresses are not necessarily indicative of device man-79
ufacturers, and even if they were, there is no standard for 80
recognizing device brands and types accordingly. To cope 81
with this problem, researchers have examined IoT network 82
traffic and witnessed that IoT devices perform very specific 83
tasks [9]: for example, it is possible to turn on or off a 84
smart bulb or change its brightness and light color, however, 85
a smart bulb can not stream videos or send emails. Therefore, 86
we assume that the IoT network traffic could follow a stable 87
and predictable pattern that may characterize it. Machine 88
learning may reveal hidden network traffic patterns and learn 89
their characteristics, making device classification easier. This 90
study explores IoT device classification using ML-based net- 91
work traffic analysis. To characterize a device, we focus on all 92
the network traffic it creates, which is device-specific and not 93
application-specific because it comprises all the applications 94
(tasks) executed by the device, which can be distinct. 95
According to [10], IoT devices can be divided into con- 96
sumer,commercial, and industrial categories. Consumer 97
IoT devices include personal devices, such as smartphones, 98
and internet-connected home devices like cameras, home 99
assistants, and smart lamps. Larger organizations employ 100
commercial IoT devices for smart city deployments, trans- 101
portation and electric car monitoring, health monitoring sys- 102
tems, etc. Industrial IoT devices improve process control 103
and productivity, such as sensors, robots, and power plant 104
controllers. Some devices, like cameras and sensors, can 105
belong to multiple categories. This survey focuses on con- 106
sumer IoT devices, commonly called smart home devices.107
This choice is motivated by the rich and abundant litera- 108
ture on smart home devices due to i) the availability of 109
data, compared to its confidentiality in the industrial world, 110
and ii) the large number of smart home devices, which 111
97118 VOLUME 10, 2022
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
FIGURE 2. Workflow of IoT device classification using ML-based traffic analysis: input includes the
devices to be classified. First, raw traffic data is collected as pcap files and supplied to the feature
extraction procedure, which creates feature vectors (in text-based format) representing the raw
traffic. ML algorithms use these files to classify the originating device of each sample.
Classification results can be used in various contexts, including cyber security enforcement,
network management, and malicious usage.
represent the largest share of the IoT market (63% according112
to Gartner [11]). Furthermore, many people, including those113
unaware of security, use smart home IoT devices, making114
their protection crucial.115
Other surveys have examined IoT device classification-116
related tasks, but none have focused on the topic of this117
study. Tahaei et al. [7] discussed IoT traffic classifica-118
tion, while [12] examined ML-based internet traffic classi-119
fication, although neither focused on device classification.120
Sanchez et al. [13] discussed device behavioral fingerprinting121
but not IoT devices specifically. Yadav et al. [6] provided a122
taxonomy for IoT device identification approaches. However,123
they did not focus on ML methods and what data collection,124
feature extraction, and model learning require. To the best of125
our knowledge, no recent work studies exhaustive IoT classi-126
fication datasets, no current work explores feature extraction127
methodologies and compares the most useful and interesting128
features for IoT device classification, and no previous work129
examined each step of the IoT classification process as we do.130
For a comprehensive literature review, we analyzed papers131
from different digital libraries like IEEE Xplore, Research-132
Gate, Google Scholar, etc. First, we performed a keyword133
search using terms related to i) ii) IoT devices, like ‘IoT134
devices,’’ ‘wearable devices,’’ and ‘IoT gadgets,’ iii) classi-135
fication, like ‘classification,’’ ‘clustering,’ ‘‘identification,’’136
and ‘‘fingerprinting,’’ iv) traffic analysis, like ‘traffic anal-137
ysis,’ ‘‘traffic classification,’ ‘‘communication analysis,’’138
‘‘network characteristics,’’ ‘network packets,’ and ‘‘network139
flows,’’ and v) machine learning as ‘machine learning,’140
‘‘deep learning,’’ ‘artificial intelligence,’ ‘‘supervised learn-141
ing,’ ‘‘unsupervised clustering,’’ ‘automated,’ and ‘intel- 142
ligent.’ Our search was limited to 2018-2022 articles to 143
capture recent advancements. Second, we examined the ref- 144
erence lists and citations of the selected articles to find 145
more papers. Third, we scanned titles and abstracts to reject 146
items that did not fit the scope (task: classification, context: 147
smart home, and classification approach: ML-based traffic 148
analysis). Finally, a deep evaluation of the publications was 149
conducted, and articles with insufficient information on all 150
stages of the classification procedure were removed. At the 151
end of this process, 58 papers were deemed pertinent to our 152
investigation. 153
II. ANALYSIS STEPS AND CONTRIBUTIONS 154
Fig. 2shows a general flowchart summarizing the multiple 155
steps and actors that can be involved in IoT device clas- 156
sification using ML-based traffic analysis. The initial step 157
is data acquisition, which consists of collecting raw traffic 158
from devices in pcap files (the pcap file format is the 159
de facto standard for packet captures). The second phase 160
is feature extraction, which aims at representing raw traffic 161
with numerical or categorical information in a text-based 162
format (e.g. csv (Comma-Separated Values) or text) files 163
that ML algorithms can use. The final stage is classifi- 164
cation using machine learning algorithms. The classifica- 165
tion result can be used for cyber security enforcement, net- 166
work management, as well as malicious activities like cyber 167
attacks. 168
To help develop more effective solutions for IoT device 169
classification, this study investigates the literature regarding 170
VOLUME 10, 2022 97119
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
FIGURE 3. Table of content and discussed questions.
each stage of the process and attempts to provide answers to171
the following research topics.172
RQ1. How to design a practical data-acquisition method173
for IoT device classification? Data acquisition is a174
crucial step that should enable the practical and real-175
istic capture of the most relevant information about176
the environment. To design an effective and practical177
solution for the IoT device classification problem, it is178
essential to know: i) which devices should be used179
for data-acquisition to represent a realistic smart home180
environment, ii) when to collect the traffic to capture181
the diversity of the devices’ operational modes, and iii)182
where to place the collection probe so as to capture183
traffic in an effective yet privacy-preserving manner.184
RQ2. How to create an efficient feature extraction solu-185
tion? Feature extraction is a critical step that must186
describe the collected traffic as accurately as possible187
to reflect its patterns. To develop an appropriate feature188
extraction technique for IoT device classification, it is189
necessary to know: i) how to represent a single data190
sample, as a packet or as a flow of packets, in other191
terms, at what level to extract features (packet-level192
or flow-level), ii) in the latter scenario, how to define193
a packet flow (by time interval, number of packets,194
or connection), and iii) which are the most informative195
and discriminating features, and how to calculate them.196
RQ3. How to build effective machine learning classi-197
fiers for IoT device classification? Classification using198
machine learning algorithms is the last, but not the199
least important step. To answer this research question,200
it is essential to decide: 1) the scope of a classifier201
(one classifier per device type or one multi-class classi-202
fier), 2) the learning strategy (supervised, un-supervised, 203
semi-supervised), and 3) the machine learning tech- 204
niques to use (deep or shallow algorithms). 205
Q4. How to choose the classification granularity? 206
Device classification can be performed at different levels 207
of granularity. It’s crucial to understand the pros and 208
cons of each classification level in order to choose the 209
optimal granularity for each context and avoid extra 210
classification costs. 211
To the best of our knowledge, this is the first paper that 212
covers all of the above mentioned challenges and explores 213
their impact on IoT device classification. As an attempt to 214
address the above-mentioned research questions, this survey 215
also produces the following contributions (Fig. 3, which 216
provides a table of contents, depicts where and how the above 217
questions are handled in this study.): 218
An analysis of the various applications for the classifi- 219
cation of smart home IoT devices. 220
An in-depth examination of IoT traffic data collection 221
strategies. This includes: i) a review of the devices used 222
to represent a smart home setting, ii) a study of IoT 223
traffic types (depending on device operation mode) and 224
their utility for classifying devices, iii) a description of 225
the architecture and different traffic collection points 226
(depending on the traffic probe location) and a debate 227
on how realistic they are, and iv) an evaluation of public 228
datasets for IoT device classification. 229
A thorough review of feature extraction approaches. 230
This includes: i) exploring different feature types and 231
comparing their significance and computation method- 232
ologies, ii) exploring deep learning-based automatic 233
feature extraction, iii) describing open-source feature 234
97120 VOLUME 10, 2022
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
extraction tools, and iv) investigating feature dimension-235
ality reduction for better IoT device categorization.236
A comparison of machine learning approaches for IoT237
device classification and how they were assessed in the238
literature.239
An examination and assessment of the various classifi-240
cation granularity levels.241
A summary of contributions in the form of taxonomies242
and statistics to highlight trends. The statistics were243
calculated based on a thorough review of each research244
article with respect to taxonomies (See Tables 2to 4in245
the Appendix).1
246
This document follows the classification process from247
bottom to top, except for the applications of IoT device248
classification, which will be shown first for sake of clarity.249
III. THE DIFFERENT APPLICATIONS OF IoT DEVICE250
CLASSIFICATION251
A. NETWORK AND SECURITY MANAGEMENT252
Due to the variety of IoT devices, it is difficult to con-253
trol them with a single policy. One solution is to describe254
network and security management rules by device class255
and assign each device to a class with automated policies.256
Miettinen et al. [14] describe an interesting use-case where257
newly introduced devices are categorized and the classifica-258
tion result is used to determine whether the device is vul-259
nerable. The decision is based on a vulnerability assessment260
of the device type carried out by consulting a vulnerability261
dataset. Consequently, the device is assigned one of the262
following isolation levels: i) strict, where the device can263
only interact with untrusted devices, ii) restricted, where264
it can communicate with untrusted devices but has limited265
internet access, and iii) trusted, where the device is allowed to266
communicate with other trusted devices and has unrestricted267
internet access. This mitigation approach allows vulnerable268
devices to cohabit with other devices without compromising269
their security.270
Note that detecting vulnerable devices in a smart home271
is crucial since most IoT devices suffer from poor security272
design and can be easily compromised by an attacker to gain273
unauthorized network access or launch massive attacks. For274
instance, in 2016, the Mirai malware infected millions of275
IoT devices to launch DDoS (distributed denial-of-service)276
attacks [4]. The BYOD (Bring Your Own Device) trend,277
which allows employees to bring their own personal IoT278
devices at work and connect them to the corporate network,279
extends the attack surface of companies as compromised per-280
sonal devices may inject malware into the corporate network281
and cross-contaminate other devices. Similarly, remote work-282
ing has exposed professional devices to a less trustworthy283
environment where they cohabit with possibly more vulnera-284
ble smart home devices.285
As described above [14], black listing approaches detect286
vulnerable devices that should be disconnected from the287
1A dynamic version of the taxonomy and websites is available at :
https://gitlab.com/jmila/smart-home-iot-device-classification-using-
machinelearning-based-network-traffic-analysis
network (blocked). IoT device classification can also be used 288
to establish an automatic whitelisting system to ensure only 289
authorized IoT devices can connect to the network, as pro- 290
posed by Meidan et al. [15]. If the determined IoT device 291
type is not in the white list, the organization’s SIEM system is 292
alerted to take appropriate action (e.g., disconnect the device 293
from the network). 294
Note that White listing is more scalable than blacklisting, 295
which grows with untrusted devices. Moreover, data from 296
authorized (whitelisted) devices is easier to obtain. Neverthe- 297
less, using a whitelist would be less robust against adversary 298
attacks, as an attacker may simulate authorized device behav- 299
ior to avoid the intrusion detection system. 300
B. MALICIOUS USAGE 301
IoT device classification can also be exploited by attackers to 302
leak sensitive information about the IoT device and its users. 303
For instance, Hafeez et al. [16] demonstrate that an adver- 304
sary, with access to upstream traffic from a smart home net- 305
work, can identify the device types and user interactions with 306
IoT devices, with significant confidence. Dong et al. [17] 307
study the case where an adversary attempts to infer the type 308
of IoT devices behind a smart home network even when the 309
traffic of all devices is merged behind the gateway using 310
VPN (Virtual Private Network) and NAT (Network Address 311
Translator) techniques. 312
Sensitive information revealing device types and user inter- 313
actions, can be used to infer user activities or home pres- 314
ence [16]: e.g. if the smart lights are in the off state for a long 315
period of time, it means that there is no one at home, opening 316
an opportunity for a break-in. Such passive attacks are hard 317
to identify and mitigate. In this context, Hafeez et al. [16] 318
propose a traffic morphing technique helping to hide the 319
traffic of IoT devices, lowering the occurrence of attacks. 320
IV. APPROACHES TO DATA ACQUISITION 321
This section describes the data acquisition methodologies 322
found in the literature. In order to organize the findings, 323
we present them along four axes: first, we examine the 324
devices considered for data collection, second we analyze the 325
IoT traffic types that can be captured, third, we discuss data 326
collection scenarios, and finally, we provide a comparative 327
study of public datasets. A taxonomy in Fig. 4illustrates the 328
main outcomes of this section. 329
A. THE CLASSIFIED DEVICES 330
The input to the IoT device classification process is a list 331
of devices to be classified. They can be both IoT and 332
non-IoT devices, also referred to as single-purpose and multi- 333
purpose devices, since IoT devices are typically intended for 334
a single specific task. An up-to-date list of the most common 335
smart home IoT devices can be found on the website [18]. 336
Examples of non-IoT devices include laptops, cell phones, 337
and Android tablets. 338
In the literature, some approaches classify only IoT devices,339
and others classify both IoT and non-IoT devices.340
VOLUME 10, 2022 97121
https://www-public.telecom-sudparis.eu/~blanc_gr/survey/
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
FIGURE 4. Taxonomy of data-acquisition approaches: the approaches are classified according to i) the devices
under consideration: only IoT, or both IoT and non-IoT devices, ii) the operation mode of the devices, iii) the
probe location, and iv) whether a public dataset is utilized or the traffic is collected by the authors.
Percentages show how often each approach is used in the reviewed papers. This highlights the trends
discussed in Sec. IV.
Fig. 4shows that the majority of reviewed papers (63%)341
consider the classification of only IoT devices. However,342
we think that this is not the most realistic scenario since343
the traffic must be collected from all the devices connected344
to the smart home network to ensure its security and auto-345
matic management. Since IoT and non-IoT devices cohabit346
in smart homes, they must be considered during the traffic347
collection process. However, note that classifying both IoT348
and non-IoT devices is more challenging since IoT traffic is349
small and sparse compared to non-IoT data. As shown by350
Dong et al. [17], some IoT devices might be easily confused351
with non-IoT devices. For example, home assistants have352
diverse and varied functions (compared to simple single-use353
devices like light bulbs), making their behavior very similar354
to non-IoT devices. To address this challenge, we suggest355
training ML algorithms with mixed (IoT and non-IoT) traffic356
to boost their generalization capabilities.357
B. THE DIFFERENT TYPES OF IoT TRAFFIC358
IoT devices generate three types of traffic based on their359
operation mode, namely: i) setup traffic (also called initial360
traffic) is generated by an IoT device during installation, also361
called registration or enrollment, ii) interaction traffic (also362
called active traffic) is generated when a device interacts with363
the user or environment (e.g., a home assistant responding to364
a voice request from the user), and finally, iii) idle traffic rep-365
resents device activity in the absence of external stimulation.366
It includes routine communications between the device and367
the back-end server, as well as keep-alive or heartbeat signals.368
1) THE SETUP TRAFFIC369
When a new device with a new MAC address connects to370
the network, it follows a device/provider-specific procedure371
to connect [14]. In most situations, this operation is assisted372
by a smartphone, laptop, or PC application. The installation373
procedure typically involves: i) activating the device,374
ii)connecting with the provider’s app, iii) transmitting WiFi375
credentials, and iv) resetting and connecting to the user’s376
network using the credentials provided.377
To collect the installation traffic, existing approaches378
record the first packets {p1,p2,p3,...,pn}exchanged379
between the device and the gateway. The decrease in packets 380
exchanged marks the end of the installation phase. To gener- 381
ate enough data, the installation process should be performed 382
multiple times for each device, with a hard reset between each 383
save [14]. 384
2) THE INTERACTION AND IDLE TRAFFIC 385
IoT devices generate mostly interaction and idle traffic. Inter- 386
action traffic can be triggered either i) by a direct user request, 387
like adjusting light bulb color and intensity, or ii) by a change 388
in the environment observed by the IoT device, such as a 389
sensor that detects motion or a light bulb that detects an 390
inhabitant [19]. Idle traffic mainly includes device-Cloud 391
service exchanges during standby, such as heartbeat mes- 392
sages, regular status updates or notifications [16]. IoT devices 393
generate more traffic when active compared to background 394
mode [20]. This is reasonable since user and environmental 395
interaction stimulates diverse reactions [20]. 396
3) WHICH TRAFFIC TYPE IS MOST SUITED FOR IoT DEVICE 397
CLASSIFICATION? 398
Statistics detailed in Fig. 4show that 86% of reviewed papers 399
use idle and (or) interaction traffic. Only 19% of reviewed 400
papers rely on setup traffic for device classification. The 401
advantage of setup traffic over idle and interaction traffic is 402
its stability, as the IoT device’s behavior during configuration 403
is the same regardless of the environment. Moreover, relying 404
on setup traffic allows for rapid recognition once the device 405
is connected to the network. However, as the initialization 406
state may not appear several times during the IoT device life 407
cycle, setup traffic is scarce, sparse, and difficult to collect 408
in real-world network monitoring. On the other hand, idle 409
and interaction traffic is more abundant and easier to collect, 410
making it better suited for machine learning algorithms, espe- 411
cially deep learning. 412
C. DIFFERENT LOCATIONS FOR TRAFFIC PROBE 413
1) A TYPICAL NETWORK SETUP FOR CAPTURING IoT 414
TRAFFIC 415
Fig. 5shows a typical smart home network architecture. 416
It includes IoT and non-IoT devices connected to an internet 417
97122 VOLUME 10, 2022
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
FIGURE 5. A typical network configuration for capturing IoT traffic includes i) IoT and
non-IoT devices connected to the gateway via wireless or wired connections and ii)
packet capture and storage modules for collecting traffic. There are two capture points
discussed in the literature: i) at the gateway and ii) after the gateway.
gateway using wireless or wired connections. At least two418
tools are required for traffic collection:419
a Packet capture module to capture the traffic as pcap420
records comprising entire packets from MAC layer to421
application layer. Examples include tcpdump [21] or422
Wireshark [22], and423
a storage module to store the traffic data on a distant424
server, or within the network.425
To label the ground truth, the MAC address in the packet426
header is used to reveal the identity of the device and label427
the data accordingly.428
2) TRAFFIC CAPTURE SCENARIOS429
The literature considers two scenarios for collecting IoT430
traffic depending on the location of the probe (capture point):431
i) at the gateway, i.e. from inside the home device, or ii) after432
the gateway, i.e. from outside the smart home.433
At the gateway, the captured traffic is the one flowing434
between the devices connected to the home network and435
the gateway and can be separated by IP or MAC address.436
Whereas the traffic captured after the gateway contains traffic437
from all connected devices aggregated using a single public438
IP address due to the frequent use of NAT at gateways.439
3) WHICH PROBE LOCATION IS MORE PRACTICAL?440
Approaches that gather traffic at the gateway assume the441
ability to intercept and sniff the traffic flowing inside the442
smart home. However, this clean and controlled experimental443
setup does not reflect most real-world use cases where traffic444
is only seen from the outside. A typical application is when445
Internet Service Providers (ISPs) classify IoT traffic to iden-446
tify devices inside a smart home and then allocate resources447
and configure appropriate security rules according to their448
population and vulnerabilities. But ISPs can not intercept449
traffic inside the home network. It is then more realistic to 450
collect traffic from outside the smart home after the gateway. 451
However, classifying devices based on such traffic is more 452
challenging because the original packet headers, such as 453
source IP and port, are hidden. Moreover, the widely used 454
VPN-enabled gateways encapsulate the original packets in 455
an encrypted tunnel, hiding the traffic characteristics. This 456
makes device classification even more challenging, and new 457
solutions should be investigated. 458
Although realistic, this scenario is understudied. This sce- 459
nario is used in only four papers: [17], [23], [24], [25]. It is 460
worth noting that Meidan et al. [25] and Dong et al. [17] made 461
their datasets public so that more research could be done on 462
this topic. 463
D. PUBLIC DATASETS COMPARISION 464
57% of reviewed publications use public datasets, either 465
completely or to complement or enrich their data. Most of the 466
datasets we mention in this survey were created for IoT device 467
classification. However, we include other datasets developed 468
for other topics that contain IoT traffic and can be used for 469
IoT device classification. 470
Table 1summarizes the datasets listed below. To compare 471
them, we specify for each: i) the devices used to generate the 472
traffic (IoT only, or both IoT and non-IoT), ii) the operation 473
mode of the devices (i.e. setup, interaction, idle), iii) the probe 474
location (i.e., at or after the gateway), iv) the duration of the 475
collection, v) the amount of traffic collected, and we provide 476
vi) a direct access link to the dataset. 477
1) IoTSentinel DATASET [14] 478
This dataset was collected to identify IoT devices based on 479
their setup traffic. To generate enough traffic, the typical 480
device configuration process was repeated 20 times for each 481
VOLUME 10, 2022 97123
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
TABLE 1. Publicly available datasets for IoT device classification.
device. During the setup process, all network traffic between482
IoT devices and the gateway was recorded. A representative483
set of 31 IoT smart home devices available on the Euro-484
pean market in the first quarter of 2016 was used. There485
are 27 different device types (4 types are represented by486
2 devices each). Most of the devices were connected via WiFi487
or Ethernet. Some of them utilised ZigBee or Z-Wave.488
2) UNSW DATASET [19], [27]489
This dataset was published by UNSW researchers and covers490
various IoT research areas. In addition to traffic for IoT491
device classification, the dataset includes IoT attack traces,492
IoT MUD profiles, and IoT IPFIX records that can be useful493
for other IoT-related research topics (the relevance of MUD494
profiles to device classification is discussed in Sec. VIII).495
In this paper, we focus on the traffic for IoT device classi-496
fication. It was first published in [19] and has since evolved.497
The first version has been extensively used in the literature.498
The same authors published an updated and more elabo-499
rate version in [27]. Recent articles now use the modified500
version.501
This study focuses on the IoT traffic traces reported in [27].502
They were collected over 26 weeks, from October 1st, 2016 to503
April 13th, 2017, but only two weeks’ worth of data is avail-504
able for download.505
3) IoTFinder [31] AND YourThings DATASETS [29]506
The IoTFinder dataset was created to explore IoT device507
identification using DNS fingerprints. Thus, the dataset con-508
tains pcap files of DNS responses for 53 IoT devices from509
different vendors. The data was collected from August 1st,510
2019 to September 30th, 2019.511
YourThings dataset was created by the same authors to512
analyze security properties for home-based IoT devices.513
4) SHIoT DATASET [32]514
This dataset was created for behavior-based IoT device clas-515
sification. The test bed was implemented at the Faculty of516
Transport and Traffic Sciences in Zagreb. The dataset con- 517
tains 144 pcap files with 24-hour traffic each. 518
5) DADABox DATASET [34] 519
This dataset was created to compare some approaches to 520
classifying IoT devices. The testbed was developed at the 521
University of Cambridge, where researchers sporadically 522
interact with IoT devices. The dataset contains 41 different 523
IoT devices, and the data was collected over a period of 524
27 weeks. 525
6) HomeMole DATASET [17] 526
This dataset was created to identify IoT devices behind VPN 527
and NAT-enabled gateways in smart homes. Three collection 528
scenarios were developed: i) a single device environment in 529
which only one device is considered, ii) a noisy environment 530
in which various IoT and non-IoT devices are investigated. 531
Multiple devices may be operating simultaneously at any 532
given time, resulting in traffic aggregation, and iii) a VPN 533
environment where VPN is enabled. In this case, traffic is 534
collected before and after the VPN. 535
7) IoT-deNAT [25] 536
The dataset was collected to detect vulnerable IoT devices 537
behind a home NAT. The traffic is captured considering 538
only NetFlow’s [42] statistical aggregations (i.e., Netflow is 539
a flow-level aggregation of information, usually a 5-tuple 540
header and some counters) instead of the raw data to reduce 541
processing and storage. 542
8) THE MON(IOT)R DATASET [38] 543
This data set examines IoT device information exposure. 544
It contains data from 81 IoT devices deployed in two labs 545
(one at Northeastern University in the United States and 546
the second at Imperial College London in the United King- 547
dom) over 30 days between September 2018 and February 548
2019. Different types of traffic are provided: i) power traffic 549
97124 VOLUME 10, 2022
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
FIGURE 6. Taxonomy of feature extraction approaches. The approaches are classified according to i) the use of header or payload
packet level features, ii) the stream definition, iii) the type of used stream level features (volume, protocol, time, or periodicity), iv)
the use of automatic feature extraction (DL based), and v) the use of dimensionality reduction. Percentages show how often each
approach is used in the reviewed papers. This highlights the trends discussed in Sec. V.
(487 samples), which is traffic generated by IoT devices when550
they are turned on, ii) interaction traffic (32,030 samples),551
iii) idle traffic covering an average of 8 hours per night for552
one week for each lab, and iv) unlabeled traffic, which is553
generated when 36 participants use the IoT devices in a studio554
at their leisure during the data collection period. Data labeling555
includes the name of the device, where it was used (the US or556
the UK), when and for how long it was used, and whether or557
not a VPN was used.558
9) IoT-23 DATASET [40]559
IoT-23 is a dataset containing benign and malicious IoT net-560
work activity. The traffic was captured at the Czech Technical561
University. The dataset contains 20 pcap files from infected562
IoT devices, labeled by the malware that infected them,563
and 3 pcap files containing benign network traffic generated564
by 3 IoT devices: a smart lamp, a voice assistant, and a smart565
door lock. The packet captures are labeled with the device566
that generated the traffic. As done in [43], legitimate traffic567
can be used for IoT device classification,.568
10) HOW VALUABLE ARE PUBLIC DATASETS?569
Public datasets enable comparing different solutions. Unfor-570
tunately, the available public datasets for IoT device clas-571
sification are scarce (only 5 of the surveyed papers shared572
their datasets publicly) and not diversified: most provide idle573
and interaction traffic, and capture at the gateway, when this574
is not the most realistic scenario. Since public datasets are575
not diverse, researchers must collect their own data when576
examining new scenarios. For instance, Yu et al. [44] identify577
IoT devices based on passively receiving broadcast and multi-578
cast packets, and had to collect their own data from different579
WiFi networks. In conclusion, additional datasets exploring580
new classification scenarios should be released, and more581
diversified IoT traffic needs to be collected, in order to boost582
research on IoT device classification. As shown in Fig. 4, the583
most used datasets are UNSW (30%), IoTSentinel (15%), and584
YourThings (6%).585
V. FEATURE EXTRACTION METHODOLOGIES586
This section describes feature extraction methodologies.587
First, we discuss packet-level feature extraction: we exam-588
ine the most commonly used header and payload features 589
and compare them. Second, we analyze stream-level feature 590
extraction. Third, we explore deep learning based automatic 591
feature extraction. Fourth, we provide a list of open-source 592
feature extraction tools, and finally, we highlight the feature 593
dimensionality reduction approaches. Fig 6, gives a taxon- 594
omy summarizing the approaches and trends. 595
Feature extraction is defined in [45] as ‘the process of 596
defining a set of features (. . .) which will most efficiently or 597
meaningfully represent the information that is important for 598
analysis and classification.’ In our case, the feature extraction 599
step consists of describing the network traffic in the most 600
appropriate way to retrieve the maximum amount of infor- 601
mation about the device. 602
In the majority of examined articles, significant work 603
has been dedicated to the extraction of features. Existing 604
approaches are diverse and heterogeneous. The objective of 605
this section is to summarize them in a logical and consistent 606
manner. 607
Network traffic is the volume of data flowing over a net- 608
work. It is divided into packets of data and delivered over a 609
network before being reassembled by the receiving computer 610
or device. Packets can be used to describe the network either 611
individually or as a stream of packets, also called a flow 612
(see Fig. 7). 613
These two approaches are known as packet-level and flow- 614
level feature extraction methods, respectively. The following 615
sections present approaches in each category. 616
A. APPROACHES TO PACKET-LEVEL FEATURE EXTRACTION 617
These approaches describe each packet individually. A packet 618
consists of a header and a payload. The header contains 619
protocol information for a given layer, whereas the payload 620
contains the data. 621
1) THE MOST IMPORTANT PACKET HEADER FEATURES 622
Extracting features from a packet header is straightforward 623
and has no overhead. One just needs to parse the packet’s 624
header fields. 625
Depending on the layer and protocol, several fields can be 626
present in the packet header. For example, the IPv4 header 627
contains essential routing and delivery information and con- 628
VOLUME 10, 2022 97125
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
FIGURE 7. The main methods for feature extraction: packet-level and stream-level. For
stream-level approaches, three definitions are proposed for the stream.
sists of 13 fields, including version, header length, service629
type, total length, time to live, and protocol, etc. Relying on630
source and destination IP addresses and ports for classifica-631
tion is not recommended due to potential spoofing issues,632
as mentioned in Sec. I.633
The most important header features include i) the packet634
length, which is widely used for IoT device classifica-635
tion [46], and ii) the TCP window size, which is very636
useful for distinguishing between IoT and non-IoT devices637
as it depends on the memory and processing speed of the638
device [47]. Small constrained devices, like sensors, have639
small window sizes, while more powerful devices like video640
cameras and home assistants have variable and larger window641
sizes [47].642
2) THE MOST IMPORTANT PAYLOAD FEATURES643
Typically, payloads consist of the header and payload of644
the upper layer, which in our case indicates the application645
payload. It may consist of textual features indicating the646
device’s name, location, manufacturer, type, operating sys-647
tem, services, etc.648
The length of the payload transported inside a TCP mes-649
sage can indicate the length of the message sent by a given650
device, and this is device specific [47]. The entropy of the651
payload has been used as a discriminative feature [47], [48].652
In [49], the distribution of payload bytes per flow is used653
for IoT device classification. Encrypted packets may make654
feature extraction from the payload impossible.655
Note that processing each packet separately for feature656
extraction is time-consuming and computationally exhaust-657
ing, requiring large storage and processing resources. The658
Google Chromecast generates 2,459,538 packets per day,659
compared to 11,877 traffic flows [32]. Thus, extracting fea- 660
tures from packets is more expensive than from flows. Unsur- 661
prisingly, most research concentrates on flow-level features 662
(81% of reviewed papers). 663
B. STREAM-LEVEL FEATURES EXTRACTION METHODS 664
In this section, we discuss the different stream definitions, 665
we investigate and categorize the most important features, 666
and we examine the approaches to calculating them. 667
1) STREAM DEFINITION 668
Features can be extracted from a set of packets known as 669
a ‘‘stream.’’ We have identified three main approaches to 670
defining a stream: i) a stream is a set of Nconsecutive 671
packets, ordered by arrival time, ii) a stream is a set of 672
packets exchanged within a time window 1, iii) a stream is a 673
connection between a source and a destination where packets 674
are sent in both directions in a certain order. More information 675
on the approaches using each definition is presented below. 676
a: A STREAM AS A FINITE SEQUENCE OF N PACKETS 677
In this category, a fixed number Nof consecutive packets 678
generated and received from a single IoT device is used to 679
construct a ‘‘signature,’’ also called a ‘fingerprint’’ of the IoT 680
device. 33% of surveyed papers use this definition, in partic- 681
ular approaches leveraging setup traffic (cf. Sec. IV-B1) for 682
device classification, because they use the first packets sent 683
by the devices when connecting to the network. For example, 684
in [14] and [50], the authors use the first 12 packets to identify 685
an IoT device, and in [51], 30 packets are used. The authors 686
of [52] extract features from a sequence of 20-21 packets. 687
97126 VOLUME 10, 2022
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
Shahid et al. [9] consider Nconsecutive packets where N688
varies between 2 and 10. 24% of surveyed papers use this689
definition.690
Note that determining the optimal value of the flow size,691
N, is challenging. Small flows allow for quick classification692
but may not be enough to characterize the device, whereas693
large flows can be time and memory-consuming to analyze.694
Moreover, the appropriate value of Nmay vary from device695
to device since IoT devices generate different quantities of696
data. A small number of packets may be enough to identify697
certain device types, while a greater number may be required698
for others. This is problematic because machine learning699
algorithms require a fixed size for the input. The authors700
of [14] added padding for devices that emit fewer packets than701
the required size. Furthermore, capturing the same number702
of packets for all devices may take a variable amount of703
time as IoT objects do not generate traffic at the same rate.704
For example, it is possible to capture packets generated by705
a camera in seconds. However, it takes longer to capture the706
same number of packets generated by a motion sensor [46].707
This makes the data collection process complicated and time-708
consuming.709
b: A STREAM AS A SET OF PACKETS EXCHANGED WITHIN A710
TIME WINDOW 1711
This consists of subdividing the captured traffic into dis-712
tinct time-windows of an appropriate duration 1. For exam-713
ple, Fan et al. [53] extract the features every 30 minutes.714
Pinheiro et al. [46] use a window of one second to enable715
real-time device classification. Hafeez et al. [16] use a716
10-second time window. Le et al. [54] retrieve DNS names717
requested by a device over a time period ranging from 10 min-718
utes to 24 hours, and found that performance decreases with719
a decreasing 1.720
Note that as for the previous category, the choice of the721
time window size is important and challenging. Long time-722
windows give richer information about the device but risk723
increasing classification delay and consuming more memory724
to store traffic attributes [46]. Moreover, it may result in725
very similar samples with little feature variation. This could726
also lead to fewer data samples for learning and testing,727
and thus be unsuitable for deep learning-based classification728
approaches. Few and redundant samples may also introduce729
a bias and overfitting. On the other hand, a small time-730
windows may allow real-time classification but may not con-731
tain enough information to reflect the characteristics of the732
device’s behavior. Bai et al. [55] showed that a small seg-733
mentation window interval degrades the classification results734
compared to a larger segmentation. In addition, setting the735
same interval time for all devices can be inappropriate as the736
devices generate different quantities of traffic. For example,737
a motion sensor generates close to 140 packets per minute at738
most, and a camera generates up to 1900 packets per minute739
on average [55].740
c: A STREAM AS A SET OF PACKETS BELONGING TO A 741
CONNECTION 742
Due to the abovementioned issues, the majority of reviewed 743
papers (50%, see Fig. 6) use this definition of stream. This is 744
based on the RFC 2722 [56] traffic flow definition, stating 745
that a flow is ‘an artificial logical equivalent to a call or 746
connection.’ Thus, the flow is the ordered sequence of all 747
packets sent and/or received from a particular source to a 748
particular unicast, anycast, or multicast destination using 749
specific ports and transport protocols. 750
More concretely, a flow can be defined as a set of packets 751
having in common at least two of the following attributes: 752
i) source IP address, ii) source port number, iii) destination 753
IP address, iv) destination port number, v) protocol, and vi) 754
service type. 755
Depending on the criteria utilized to define the flow, there 756
are several definition variants. For Marchal et al. [20], the 757
flow is a sequence of network packets sent by a given IoT 758
device using a specified communication protocol. A flow is 759
described by Sun et al. [49] as a 5-tuple of source and des- 760
tination IP addresses, source and destination port numbers, 761
and protocol. For Meidan et al. [25], the service type is also 762
specified (6-tuplet). 763
Note that a collection of flows can also be used to describe 764
the traffic. The authors of [49] combine features from sev- 765
eral flows to provide a high-level characterization of device 766
activities. Meidan et al. [57] demonstrated that using a set 767
of consecutive flows gives better classification results since 768
it contains more information about the traffic. The different 769
stream definitions are illustrated in the left part of Fig. 7.770
2) IMPORTANT STREAM-LEVEL FEATURES 771
In this section, we review the various stream-level features 772
that are widely used for IoT device classification. To orga- 773
nize them, we divide them into four categories: i) volume 774
features measure the volumetric properties of the stream, ii) 775
protocol characteristics describe the protocols on the stream, 776
iii) temporal characteristics measure the temporal aspects of 777
the stream, and iv) periodicity features reflect the stream’s 778
periodicity. 779
a: VOLUME FEATURES 780
Examples include packet length statistics, the number of 781
packets or bytes in the entire flow or in a specific direction 782
(incoming or outgoing traffic), the flow rate, etc. For instance, 783
Pinheiro et al. [46] identify devices based on statistics of the 784
packet length and number of bytes generated by each device. 785
Sivanathan et al. [58] use average packet size and average rate 786
per flow as two principal attributes. Volume features are very 787
important and widely used (in 60% of reviewed papers). 788
b: PROTOCOL FEATURES 789
Traffic including all protocols and layers, or selected proto- 790
cols, can be used to extract features. In addition to the widely 791
VOLUME 10, 2022 97127
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
studied layer-2 to layer-4 protocols, the following application792
layer protocols have been examined:793
The Domain Name System (DNS) is an essential794
Internet service, and is therefore important to IoT795
devices communicating with remote Cloud services. The796
DNS features differentiate IoT from non-IoT devices.797
IoT devices connect to limited endpoints, mainly their798
provider servers. This behavior can be captured by the799
number of DNS unique queries, as IoT devices have800
fewer unique DNS queries than non-IoT devices [59],801
[60]. Moreover, devices can be identified by the domain802
names they communicate with [27].803
The most frequently used DNS characteristics are:804
i) the number of unique DNS queries, ii) the number of805
unique domain names, iii) the most frequently queried806
domain names, iv) the number of DNS packets, and v)807
the number of DNS errors. The papers [27], [53], [54],808
[59], [60], [61], and [62] exploited these features.809
TLS features: TLS/SSL is used by many IoT devices810
to secure internet communication with servers. The811
TLS protocol consists of two layers: handshake and812
record protocols. The handshake layer is the most inter-813
esting as it comprises of ‘‘text-in-the-clear’ messages814
exchanged between devices and servers to create a815
secure channel and negotiate ciphers and encryption816
keys. Fan et al. [53] use the number of TLS hand-817
shakes as a feature. Sun et al. [49] analyze the unen-818
crypted data of the TLS handshake and exploit the819
plaintext data in the ClientHello,ServerHello,820
and Certificate messages to derive the follow-821
ing features: the list of proposed ciphersuites, the list822
of announced extensions, and the length of the pub-823
lic key. The authors noted less fluctuation in the dis-824
tribution of ciphersuites and TLS extensions in IoT825
devices, compared to non-IoT devices, because they826
advertise a limited and fixed number of ciphersuites.827
Thangavelu et al. [61] used the following TLS fea-828
tures: the minimum, maximum, and mean of the829
TLS packet length, the flow duration, and the num-830
ber of TCP keep-alive probes used in the TLS ses-831
sion. Valdez et al. [63] derive features from TLS832
session initialization messages (ClientHello and833
ServerHello). Features include negotiated ciphers,834
proposed cipher suites, server name, and destination835
end-point.836
c: TIME-RELATED FEATURES837
They measure the temporal aspects of the flow. Examples838
include the inter-packet arrival time (IAT), i.e. the time inter-839
val between two consecutive packets received, the time a flow840
was active before becoming inactive, the time the last packet841
was switched [25] and the flow duration, etc. For instance,842
in [27] and [59], the authors calculate the sleep time of a843
device, the average time interval between two consecutive844
DNS requests, and the NTP interval. Thangvelu et al. [61]845
consider the flow activity duration. Sun et al. [49] calculate 846
idle time as it reflects device activity frequency. 847
It is worth noting that the IAT is one of the most useful 848
time-related features as it varies by device depending on the 849
hardware and software configurations [64]. It is therefore, 850
widely used in the literature ( [9], [16], [49], [51], [53], [65], 851
[66], [67], [68]). In particular, we note that the classification 852
of ZigBee, Z-Wave, and Bluetooth IoT devices is often exclu- 853
sively based on IAT [65], [66]. 854
d: PERIODICITY FEATURES 855
IoT devices generate background communications that 856
always present relatively constant and periodic patterns. 857
Some researchers [20], [69] extract features from periodic 858
flows. To do this, they first discretize the flow into a binary 859
time series signal representing the existence or not of packets 860
in the traffic each second. Then, they use the Discrete Fourier 861
Transformation to identify the different distinct periods of 862
the signal. Once identified, statistical features are used to 863
describe these periods in detail. Examples include: the num- 864
ber of periods, the maximum and minimum period values, the 865
averages of the occurrence of periods at the minimum period 866
value, and the accuracy and stability of the inferred peri- 867
ods [20], etc. Note that approaches for extracting periodicity 868
features often use the time-window-based stream definition 869
(Fig. 7). Only 2 papers, namely [20], [69] use periodicity 870
features. 871
3) HOW ARE STREAM-LEVEL FEATURES CALCULATED? 872
We identified two approaches to calculate stream-level 873
features: concatenation and statistics. 874
a: CONCATENATION 875
Stream-level features can be calculated by concatenating 876
individual packet features. The authors of [51] define a n×877
7 feature matrix with 7 packet header features per packet (n878
packets). Similarly, Wan et al. [68] describe a stream of p879
packets defining a device signature using pvector attributes. 880
In general, only approaches defining the stream as a set 881
of Npackets (see Sec. V-B1a) use this method because 882
concatenating a small number of packets is unlikely to create 883
large signatures. 884
b: STATISTICS 885
The second way is to perform statistical calculations on 886
packet-level features. Depending on whether the measured 887
feature is numerical (e.g. TTL) or categorical (e.g. proto- 888
col type), different statistics can be generated, as described 889
below. 890
For numerical features, researchers often calculate: 891
The traditional minimum, maximum, mean, sum, 892
standard deviation, variance, which are widely used 893
in the literature. 894
The entropy, which measures the degree of disorder 895
of features. It is a way of describing the nature of the 896
data without focusing on the data itself. For example, 897
97128 VOLUME 10, 2022
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
the payload entropy indicates the information content898
of a packet. Packets including text data have less pay-899
load entropy than packets carrying audio data [47]. The900
authors of [48] and [47], categorized IoT devices by901
payload entropy. Fan et al. [53] calculate the entropy of902
top DNS requests and packet lengths.903
The skewness [70] and kurtosis, [71] which measure904
the asymmetry and the ‘‘tailedness’ of the probability905
distribution, respectively. In [55], the authors use packet906
length skewness to explore packets’ different lengths in907
a flow.908
The augmented Dickey-Fuller (ADF) test [72], which909
determines whether or not a given time series is station-910
ary. It was usedin [53] to capture how some devices send911
large packets in a short period of time, causing packet912
length to shift substantially.913
The spectral density, which characterizes a stationary914
population time series in the frequency domain. The915
authors of [73] use spectral analysis of packet length916
to record device communication patterns, differentiate917
IoT and non-IoT traffic, and determine the device class918
generating the packet flows.919
Note that when the stream is defined by a time win-920
dow, finer granularity statistics can be generated by921
computing the first quartile, second quartile, and third922
quartile of numerical packet features, as [53] does for923
the ‘‘packet length.’’924
For categorical features, researchers often:925
List or count feature values. Huang et al. [51] use926
a binary vector coded according to whether specific927
protocols exist in the traffic flow. In [49], [69], and [55],928
the authors count the types of protocols involved in the929
device’s communication traffic.930
Determine the dominant values or their proportion.931
For example, Msadek et al. [67] identify the set of932
dominant protocols (the most used). Zhang et al. [69]933
count the proportion of TCP/UDP/ARP in the device934
communication flow.935
C. WHAT ABOUT AUTOMATIC FEATURE EXTRACTION?936
While traditional ML algorithms require costly handcrafted937
features, deep learning approaches may automatically extract938
and learn the optimum features for the classification, directly939
from raw data. As DL requires standardized input data of the940
same type and size for all samples, researchers first convert941
pcaps into a suitable model input. To do so, Greis et al. [74]942
consider the packet captures (in pcap format) collected dur-943
ing the setup phase and transform the first 784 bytes of traffic944
into a 28 ×28 grey-scale image. Each pixel represents a grey945
value between 0 (black) and 255 (white). When a setup phase946
has less than 784 bytes, the remaining pixel values are set to 0947
(black). Similarly, Kotak et al. [75] use TCP payload to create948
greyscale images of the device’s communication pattern.949
Yin et al. [76] rely on traffic vectorization. They use950
the first 10 packets to characterize a flow. This number was951
chosen because the average number of packets in most IoT952
flows is 10. A flow is described using 2.500 bytes of data 953
(first 10 packets ×250 bytes). The first 250 bytes of each 954
packet are concatenated. Streams with fewer than 10 packets 955
employ padding. 956
Despite the benefits of these approaches, which sim- 957
plify and automate feature extraction, transforming data into 958
another format (image, vector, etc.) can lead to semantic 959
information loss. Moreover, this strategy does not take into 960
account expert knowledge, which can help find the most 961
important features. A minority of research papers (12% [74], 962
[75], [76]) explored this solution. 963
D. OPEN-SOURCE FEATURE EXTRACTION TOOLS 964
This section describes the existing feature extraction tools 965
found in the literature. The input of a feature extraction tool is 966
network traffic in pcap format collected by a packet capture 967
tool (e.g. tcpdump). The output is text-based format files 968
(often csv) containing feature vectors. A feature vector is 969
calculated for each observation. 970
CICFlowmeter [77] is an open-source feature extractor that 971
produces more than 80 volume- and time-related features 972
per TCP flow. The authors use two methods to measure the 973
attributes. In the first approach, they measure time-related 974
features over the full TCP flow, such as the time between 975
packets or the time the flow remains active. In the second 976
approach, they fix the time (e.g., every 1 second) and measure 977
other volume-related attributes (e.g., bytes per second or 978
packets per second). 979
Bekerman et al. [78] present a feature extraction tool, 980
which is implemented on top of Wireshark [22] and extracts 981
972 behavioral features across different protocols and net- 982
work layers. The features describe different observations 983
of various granularities, namely i) a conversation window,984
ii) a group of sessions, iii) a session (e.g., a TCP session), 985
and iv) a transaction, i.e., an interaction (request-response) 986
between a client and a server. 987
Joy [79] extracts features from live network flows with 988
a focus on application layers. The main features are: IP 989
packet arrival lengths and times, the sequence of TLS record 990
arrival lengths and times, other unencrypted TLS data, such 991
as the list of proposed and selected ciphersuites, DNS names, 992
addresses, TTLs and HTTP header elements, etc. 993
E. FEATURES DIMENSIONALITY REDUCTION FOR BETTER 994
CLASSIFICATION 995
Feature dimensionality reduction improves classification 996
accuracy and reduces the computational cost. This is a 997
pre-processing phase that identifies relevant features and 998
removes irrelevant or redundant ones. Feature dimensionality 999
reduction is not widely used in IoT device classification. 1000
Only 30% of reviewed papers apply this step. This is because 1001
most publications rely on expert knowledge to derive an 1002
accurate and small set of features, making feature reduction 1003
unnecessary. On the contrary, articles using feature extraction 1004
tools (see Sec V-D) generate a large number of features and 1005
minimize them using feature reduction. 1006
VOLUME 10, 2022 97129
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
FIGURE 8. Taxonomy of ML based classification approaches. Percentages show how often each
approach is used in the reviewed papers (the percentages do not always sum up to 100 because
some papers use algorithms from multiple categories).
Ghojogh et al. [80] review feature dimensionality reduc-1007
tion approaches. They divide them into two groups: 1) feature1008
extraction approaches, where features are projected into a1009
lower dimensional subset to extract a new set of features,1010
and 2) feature selection approaches, where the best subset1011
of original features is selected. Note that the term ‘‘feature1012
extraction’ is also improperly used in the literature to rep-1013
resent the process of describing observations by a vector of1014
features (cf. Sec V).1015
1) APPROACHES USING FEATURE EXTRACTION BASED1016
DIMENSIONALITY REDUCTION1017
Thangavelu et al. [61] use a common feature extraction1018
method called ‘‘Principal Component Analysis (PCA).1019
Fan et al. [53] use Convolution Neural Network- (CNN)1020
based dimensionality reduction. Similarly, Bao et al. [81] use1021
auto encoders for dimensionality reduction. Auto encoders1022
learn a mapping from high-dimensional observations to a1023
lower-dimensional representation space such that the original1024
observation can be reconstructed from the lower-dimensional1025
representation [82]. Auto Encoders are widely used for fea-1026
ture learning in general [80]. Similarly, representation learn-1027
ing [83] is a feature extraction method used to learn automatic1028
discriminative features. It has not been explored yet for IoT1029
device classification.1030
2) APPROACHES USING FEATURE SELECTION BASED1031
DIMENSIONALITY REDUCTION1032
According to Ghojogh et al. [80], there are two feature selec-1033
tion approaches: i) filter methods, and ii) wrapper methods.1034
a: APPROACHES EMPLOYING FILTER METHODS1035
Such methods minimize the feature set by selecting the1036
most discriminative ones. The Correlation Criteria is one of1037
the most widely used solutions. It is based on calculating1038
the correlation between each feature and the label vector.1039
The features with the highest correlation value are selected.1040
Sivanathan et al. [58] use Correlation-based Feature Subset1041
(CFS) and Information Gain (IG). Similarly, Cvitic et al. [32]1042
use CICFlowmeter for feature extraction (83 features) and1043
then apply IG.1044
b: APPROACHES APPLYING WRAPPER METHODS 1045
Such approaches select the features based on the classifier’s 1046
performance. Thus, the selected set can vary from one classi- 1047
fier to another. For instance, in [84], the authors use a genetic 1048
algorithm based feature selection method. The genetic algo- 1049
rithm determines the smallest set of packet header features 1050
in all network layers that contributes significantly to the 1051
classification for a given classifier. 1052
VI. CLASSIFICATION 1053
The aim of the classification step is to predict for each traffic 1054
input, represented by a vector of features X=x1, . . . xf,1055
the class cof the device that has generated it. Different 1056
classification approaches have been explored in the literature. 1057
We will classify them according to i) the number of classes 1058
(multi-class classifier or one-class classifier), ii) supervised 1059
or unsupervised approaches, and iii) shallow or deep learning 1060
algorithms. Fig. 8illustrates the classification results. 1061
A. MULTI-CLASS VS ONE-CLASS CLASSIFIER 1062
1) METHODS USING MULTI-CLASS CLASSIFIER 1063
Only one classifier is used for the multi-class classification. 1064
The trained classification model outputs a vector of class 1065
membership probabilities Ps=ps
i16indenoting the like- 1066
lihood that the inspected traffic sample scomes from device 1067
class ci. The traffic is labelled as originating from the device 1068
having the highest probability. To capture unknown devices, 1069
a threshold parameter tr can be defined and fine-tuned using 1070
the validation dataset. If one probability ps
iexceeds the 1071
threshold parameter tr (ps
i>tr), the traffic is classified as 1072
originating from the device class ci. Otherwise, it is classified 1073
as unknown. A device can also be considered as unknown 1074
if the feature vector matches more than one class with a 1075
low discriminative threshold (0.5 for example). This is the 1076
most popular method in the state of the art (90% of reviewed 1077
papers). 1078
2) METHODS USING ONE-CLASS CLASSIFIER 1079
(A CLASSIFIER PER DEVICE) 1080
A minority of reviewed papers (14%) use this classification 1081
approach. In the following, we describe how this strategy 1082
97130 VOLUME 10, 2022
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
is employed in the literature. This consists of splitting the1083
dataset into numerous binary classification problems (focus-1084
ing on a single class, regardless of all other classes) and then1085
a binary classifier is trained for each device. Each classifier1086
provides either i) a probability pithat the traffic was generated1087
by a device class cior ii) a binary decision on whether the1088
input matches the device type. In the first case, a threshold t1089
(cutoff value) should be set. If pi>t, the traffic is labelled1090
as originating from the device class ci.tis empirically set1091
to maximize the classifier’s accuracy [57]. In the second1092
situation, if a device is accepted by multiple classifiers, the1093
conflict should be resolved, for example, by computing a1094
distance-based metric between the sample to identify and a1095
subset of samples from each class that it has a match for [14],1096
or by applying majority votes [15] to break the tie between1097
multiple matches.1098
Note that using this strategy, classification accuracy can be1099
increased by evaluating the classification results of more than1100
one sample before choosing the device class. For example,1101
in [57], the authors perform a majority vote on the classifica-1102
tion results of several consecutive TCP sessions to determine,1103
with an accuracy of 100%, if they were generated by a1104
certain device. The optimal number of consecutive sessions1105
is defined as the minimum number of sessions on which1106
a majority vote provides zero false positives and zero false1107
negatives on the test dataset.1108
3) MULTI-CLASS CLASSIFIER VS ONE-CLASS CLASSIFIER1109
Generating a model for multi-class classifiers is challenging1110
in practice: when a new device type is added to the net-1111
work or the behavior of existing device types legitimately1112
changes (due to firmware upgrades by device manufactur-1113
ers, for example), the entire model should be re-trained for1114
all classes [85]. On the contrary, building a classifier per1115
device avoids costly re-learning if a new device type is1116
added. In addition, building a classifier per device allows1117
for the discovery of new devices: if a sample is rejected by1118
the classifiers, it may be identified as a new device type.1119
Another advantage is its interpretability. When the number1120
of features is important, one classifier per class gives a set of1121
interpretable models instead of one complex model.1122
However, the one-class classifier approaches are more1123
computationally expensive since the results of more than one1124
classifier should be computed. Moreover, managing conflicts1125
might be time-consuming if a sample fits many device types.1126
As reported in [14], most device type identification time is1127
spent on tiebreaks. Moreover, unbalanced training datasets1128
can affect classifier performance (there are generally fewer1129
samples for one device type compared to the samples of all the1130
remaining samples combined). This issue can be solved by1131
utilizing under-sampling and over-sampling approaches [86].1132
B. SUPERVISED, UNSUPERVISED1133
ML-based classification algorithms are often classified1134
into supervised and unsupervised approaches, with the1135
well-known advantages and limitations of each briefly1136
described below.1137
1) SUPERVISED CLASSIFICATION 1138
In supervised classification, labeled datasets are split into 1139
training, validation, and test datasets. Datasets can be 1140
separated chronologically or randomly. However, temporal 1141
partitioning better matches the real world scenario, when 1142
a classifier is trained on existing data and then tested on 1143
new data. Despite the cost of labeling and the difficulty of 1144
detecting new devices not included in training, supervised 1145
classification techniques are commonly employed in IoT 1146
device classification literature (84% of reviewed papers) due 1147
to their high accuracy and ease of implementation. 1148
2) UNSUPERVISED CLASSIFICATION 1149
Supervised techniques use labeled device class data. Labeling 1150
involves significant human effort, which is tedious and not 1151
scalable given the growing number of IoT devices. 1152
Unsupervised learning is more scalable since it minimizes 1153
human assistance, but it is harder to execute and its accu- 1154
racy is likely to be lower than supervised approaches. Thus, 1155
only 16% of reviewed papers use unsupervised classification 1156
approaches. For instance, the authors of [43] propose a clas- 1157
sification method using semi-supervised GANs (generative 1158
adversarial networks). 1159
C. SHALLOW AND DEEP LEARNING 1160
Deep learning uses multiple layers of nonlinear processing 1161
units. All non-deep learning approaches are shallow learning, 1162
including most machine learning models before 2006 and 1163
neural networks with one hidden layer. 1164
Despite the advantages of deep learning, the majority of 1165
reviewed papers (79%) still use shallow classification algo- 1166
rithms, probably due to its simplicity and ease of implemen- 1167
tation and because some shallow algorithms are intrinsically 1168
interpretable, like decision trees. Random Forest is a popular 1169
classifier due to its accuracy and speed, but its classification 1170
time grows linearly with the number of classes, so it may not 1171
scale to a large number of device types. 1172
D. EVALUATION SCENARIOS 1173
Accuracy, precision, recall, F1score, and ROC are classic 1174
evaluation metrics. Accuracy measures the ratio of correctly 1175
predicted observations to the total observations. Precision 1176
indicates what percentage of positive predictions were cor- 1177
rect. Recall defines what percentage of positive cases a clas- 1178
sifier has caught. F1score is a harmonic average of precision 1179
and recall. 1180
Most of the reviewed research papers (79%) focus on clas- 1181
sic evaluation metrics. However, traditional evaluation does 1182
not accurately measure the performance and limitations of 1183
classification algorithms. For instance, accuracy gives equal 1184
weights to all classes, which is inappropriate if the dataset 1185
is unbalanced (e.g. you can have 90% of total accuracy but, 1186
in minority classes, most samples are misclassified). The 1187
performance of classifiers should be assessed in different 1188
scenarios and through diverse metrics and measures. Below, 1189
we describe some other metrics found in the literature to 1190
inspire other evaluation methodologies. 1191
VOLUME 10, 2022 97131
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
1) MEASURING CLASSIFICATION AND LEARNING SPEED1192
The learning time is significant since classification models1193
that learn rapidly are more adapted to real conditions [20].1194
The classification time (the time required to classify one sam-1195
ple) is critical for instant device identification [14]. In [46],1196
the authors evaluated the training time, the latency, i.e., the1197
time spent performing device identification, and throughput,1198
i.e. the number of identifications per second.1199
2) MEASURING CPU, MEMORY CONSUMPTION AND1200
COMPUTATIONAL COMPLEXITY1201
In [14], the authors measure the CPU used by the secu-1202
rity gateway for the classification and for the enforcement1203
mechanism. In [87], the authors calculate the computational1204
complexity of the different steps of their solution, namely1205
feature extraction, clustering, and model training. The feature1206
extraction cost is estimated to be m×O(n) where mis the1207
number of features and nis the number of packets in the1208
session. The cost of clustering is calculated based on the steps1209
and loops in the proposed algorithm. The Random Forest1210
training cost depends on the feature vector dimension, the1211
number of decision trees, and the number of training samples.1212
3) VARYING EVALUATION SCENARIOS1213
Some papers measure the variation of performance metrics1214
in different scenarios. For example, Huang et al. [51] test1215
the scalability of their approach and show that accuracy1216
diminishes with many device types. Meidan et al. [15] mea-1217
sure the classification accuracy as a function of the number1218
of consecutive sessions needed for classification. Similarly,1219
Song et al. [88] examine the relationship between identi-1220
fication accuracy and the number of packets required for1221
classification. Bai et al. [55] measure the classification results1222
under different time window sizes and over different ratios of1223
training and testing datasets. Similarly, Marchal et al. [20]1224
assess the evolution of accuracy as the number of training1225
samples changes.1226
4) ADDITIONAL EVALUATION METRICS1227
In addition to classic metrics, other evaluation scenarios can1228
been explored. We give the following examples: i) robustness1229
to adversarial attacks [89] to evaluate the classifier quality on1230
ambiguous examples, ii) explainability [90], i.e. if the result1231
can be simply interpreted, to provide better acceptance of1232
ML-based solutions in IoT, iii) transferability [91], that is,1233
whether a model learned in one context can be applied in1234
another, in order to reduce learning costs and provide ‘out-1235
of-the-box’’ tools.1236
VII. GRANULARITY OF CLASSIFICATION1237
In the literature, IoT devices are classified at different lev-1238
els of granularity. Bezawada et al. [47] enumerate three1239
classification levels: i) category, ii) type, and iii) instance1240
(cf. Fig. 9). A device category is a grouping of similar devices;1241
for instance, devices can be grouped by function, e.g., cam-1242
eras, sensors, or home assistants. A device type, however,1243
designates a more specific device model within a general1244
device category. For example, Google Home Mini (GHM) 1245
and Amazon Alexa are device types within the category of 1246
home assistants. Finally, a device instance is a physical device 1247
instance of a device type. For example, two different GHMs 1248
in the same network are two instances of the GHM device 1249
type. In the following, we examine how these different levels 1250
of classification have been considered in the literature. 1251
A. CLASSIFYING DEVICES BY CATEGORY 1252
Different definitions of ‘category’’ have been proposed in 1253
the literature. The most used definition relies on ‘‘the main 1254
functionality (or purpose) of the device,’’ e.g. refrigerator, 1255
TV, watch, or camera, as proposed in [15], [57], and [47]. For 1256
instance, in [55], the devices are classified into hubs, electron- 1257
ics, cameras, and switches & triggers. In [92], four categories 1258
are discussed: IP cameras, smart on/off plugs, motion sen- 1259
sors, and temperature/environmental sensors. A more broader 1260
definition is proposed in [93], where the authors classify the 1261
devices according to their application domain into healthcare, 1262
multimedia, hubs, etc. Note that only 22% of papers exam- 1263
ined in this survey use this classification level. 1264
As the number of IoT devices grows, so do their applica- 1265
tions and features, requiring new device category definitions. 1266
To this end, Cvitic et al. [32], [94] propose classifying devices 1267
according to their ‘‘Cu predictability index. Cu measures 1268
the ‘‘level of predictability of behavior’’ of the device. To do 1269
this, Cu measures the variation in data received and sent 1270
by a device over a period of time. Devices that behave 1271
in roughly the same way over time are easily predictable, 1272
whereas devices whose usage and interaction with the user 1273
modifies their behavior (and consequently the data received 1274
and sent) are more difficult to predict. The authors derive 1275
four device categories based on Cu. In doing so, the authors 1276
propose a more general definition of the IoT device category. 1277
B. CLASSIFYING DEVICE BY TYPE 1278
This is the most common approach in the literature (81% 1279
of surveyed papers). There are several ways to define the 1280
device type. For instance, in [14], a device type denotes the 1281
‘‘combination of model and software version’ of a particular 1282
device. In [44], a device type is defined by three param- 1283
eters: the manufacturer, the manufacturer-type,1284
and the manufacturer-type-model, e.g., ‘‘amazon- 1285
kindle-v2.0.’ In [81], a device type is defined by the manu- 1286
facturer and model (e.g. for security cameras: Simple_Home 1287
XCS7_1001). 1288
C. CLASSIFYING DEVICES BY INSTANCE 1289
This is the finest level of granularity, where instances of the 1290
same device type must be distinguished. It is also the most 1291
difficult and expensive scenario. It should be noted that, in the 1292
literature, the use of the term fingerprint does not reflect the 1293
definition of device instance we propose in this survey but 1294
rather refers to device identification, i.e., the classification 1295
of devices based on their type. Therefore, proposed solutions 1296
97132 VOLUME 10, 2022
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
FIGURE 9. IoT devices classification levels. IoT devices can be classified at different levels of
granularity: i) category, ii) type, and iii) instance. In this example, home assistants and cameras are
two different categories of IoT devices. GHM (Google Home Mini) and Amazon Echo Dot are two
types of Home assistants. Finally, Alexa 1 and Alexa 2, are two instances of the Amazon Echo Dot.
FIGURE 10. Taxonomy of the classification granularity: the approaches
are classified by granularity of classification. Percentages show how often
each approach is used in the reviewed papers.
for device fingerprinting do not distinguish between device1297
instances.1298
Instance level classification has not been sufficiently1299
explored in the literature. To the best of our knowledge,1300
no solution in the literature exists for this scenario. However,1301
is such a classification really necessary? The answer depends1302
on the use case. For example, when detecting vulnerable1303
devices, instance-based classification is not necessary since1304
instances of the same type share vulnerabilities. However,1305
instance-based classification could be useful in some use-1306
cases. For example, in [95], the authors focus on 5G resource1307
allocation and design a solution for automatically selecting1308
a 5G slice based on the type of IoT device connecting to1309
the network, which is detected through a classification of the1310
radio signal shape. An extension could be envisaged based1311
on instance device classification, where two instances of the1312
same IoT device used by two users with different rights are1313
distinguished. This enables better 5G resource management1314
based on the user profile. Instance-based classification could1315
also be useful to track a unique user’s device.1316
D. HOW TO SELECT THE BEST CLASSIFICATION1317
GRANULARITY?1318
The granularity of classification should be carefully set1319
depending on the application scenario. Category level clas-1320
sification may be sufficient in many situations. For instance,1321
to ensure QoS by giving different priorities to flows (e.g.,1322
prioritizing traffic from healthcare devices during periods1323
of high load), it is not necessary to know the manufacturer1324
and software of the device. Even though the category level1325
classification of IoT devices is not very precise, it has the1326
advantage of being scalable.1327
Device type classification is the most commonly used1328
classification level in the literature due to its better ratio1329
between accuracy and ease of implementation. However,1330
many results [14], [96] have shown that it is difficult to distin-1331
guish devices from the same manufacturer or with the same1332
firmware version. This is because these devices usually have1333
similar hardware and software architecture and communicate 1334
with the same remote cloud servers using the same protocols. 1335
Thus, they often share very similar traffic patterns. Note that 1336
this problem is very close to the instance-based classification 1337
problem, which is still an open problem. 1338
VIII. KEY RESEARCH DIRECTIONS 1339
In what follows, we consider research directions that have 1340
received little or no attention in the literature. Follow- 1341
ing the paper’s rationale, we discuss challenges related to 1342
data-acquisition, feature extraction, and machine learning. 1343
We address unbalanced data sets and provide solutions 1344
in VIII-A, the importance of minimizing feature extraction 1345
costs in VIII-B and improving learning quality in VIII-C.1346
In sections VIII-D,VIII-E, and VIII-F, we discuss challenges 1347
related to scalability, deployment in practice and lack of 1348
standardization, respectively. 1349
A. THE PROBLEM OF UNBALANCED DATASETS 1350
This is a common problem in many ML applications, but 1351
it is accentuated in IoT device classification due to the 1352
heterogeneous behavior of IoT devices: some devices, like 1353
plugs, generate sparse traffic, while others, like cameras, 1354
generate large amounts of traffic. This makes the detection of 1355
minority class devices difficult. Bai et al. [55] report limited 1356
data for detecting hubs and Hsu et al. [92] remark that it is 1357
difficult to distinguish smart plug traffic from IP cam traffic. 1358
Thus, having a balanced dataset is more important than the 1359
size of the dataset. 1360
Solutions based on data augmentation can be considered 1361
during the training phase [48]. However, it is important to 1362
avoid introducing biases when over-representing minority 1363
classes. There is therefore a trade-off to consider to avoid 1364
overfitting the model. 1365
B. REDUCING THE COST OF FEATURE EXTRACTION 1366
It is essential to consider the cost of extracting features. 1367
Chakraborty et al. [97] distinguish three types of feature 1368
extraction costs: i) the computational cost involves computing 1369
resources used to calculate the features, ii) the memory cost 1370
measures the memory used to store running feature values 1371
while computing, and iii) the privacy cost is related to privacy 1372
violation, especially for features extracted from the payload 1373
that may contain sensitive information. Desai et al. [98] 1374
propose a framework for ranking features according to their 1375
VOLUME 10, 2022 97133
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
discriminatory power to differentiate between devices. They1376
demonstrate that a small set of highly ranked features is1377
sufficient to achieve an accuracy close to that obtained using1378
all features.1379
Note that using a limited number of features limits the fea-1380
ture extraction cost but can make the classification approach1381
more vulnerable to adversarial attacks. In fact, it is easier for1382
an attacker to generate traffic that mimics the distribution of1383
values taken by one feature (e.g., packet size, or IAT, etc.) to1384
imitate the behavior of a particular IoT device and bypass the1385
classifier. For instance, Shahid et al. [99] generate sequences1386
of packet sizes representing bidirectional flows that look as1387
if they were generated by a real smart device. However, it is1388
more complex to bypass a classifier that takes into account1389
the values of several features because it is difficult to generate1390
traffic that matches the values of all these features at the same1391
time.1392
C. IMPROVING THE QUALITY OF LEARNING1393
1) NEED FOR CONTINUOUS LEARNING1394
The IoT ecosystem and device behavior evolve rapidly. Thus,1395
classification models must be updated to reflect recent data1396
trends. Kolcun et al. [34] note that the accuracy of IoT device1397
classification models falls by 40%, a few weeks after learn-1398
ing, and argue that to preserve the accuracy of the models,1399
they need to be continuously updated. It is then necessary1400
to explore continuous learning ML pipelines that keep the1401
machine-learned models up-to-date [100]. As mentioned in1402
Sec. VI-A3, techniques that train a classifier model per device1403
are more easily re-trained.1404
2) SCARCITY OF LABELED DATA1405
Fan et al. [53] note that collecting and labeling data is1406
costly and time-consuming, which cannot be scaled to the1407
overgrowing IoT environment. However, when labeled data1408
is scarce, supervised learning techniques fail. Using semi-1409
supervised or unsupervised approaches are possible solu-1410
tions. Fan et al. [53] proposed an IoT identification model1411
based on semi-supervised learning. To do so, they i) judi-1412
ciously choose the features describing the traffic, ii) perform1413
a CNN based dimensionality reduction, and then iii) perform1414
the classification using a two-layer neural network, classify-1415
ing the traffic into IoT and non-IoT, then specifying the class1416
of IoT objects. They managed to get 99% accuracy using only1417
5% of labeled data.1418
Generating labeled synthetic data is another solution: e.g,1419
generative adversarial networks (GAN) can generate syn-1420
thetic data close to the real distribution of training data by1421
capturing the hidden class distribution. In addition, training1422
classifiers with additional synthetic data points gives them1423
better generalization ability [99].1424
3) RESILIENCE TO ADVERSARIAL ATTACKS1425
The vulnerability of ML algorithms to adversarial attacks1426
has been demonstrated in several applications, and ML-based1427
IoT device classification is no exception [101]. For example, 1428
malicious devices may attempt to mimic the traffic of a 1429
legitimate device in order to connect to the network. For- 1430
tunately, it is very difficult to do this while preserving the 1431
intended malicious functionality [102]. As discussed in [15], 1432
the rogue device must be able to generate similar requests to 1433
the manufacturer’s servers and get similar responses, which 1434
is difficult to achieve if device authentication is required. 1435
4) TRANSFERABILITY OF THE CLASSIFICATION MODEL 1436
Kolcun et al. [34] reveal that the accuracy of classifiers 1437
degrades over time when evaluated on data collected outside 1438
the training set. However, it is desirable that classifiers that 1439
perform well in one context can be used in another without 1440
expensive retraining. Transfer learning [103] is a promising 1441
solution that should be explored. For example, it would allow 1442
a manufacturer to build a model that learns the behavior of an 1443
IoT device and use the model in a smart home to identify the 1444
device with little-retraining. 1445
D. DISCUSSING SCALABILITY 1446
Given the exponential growth of the number and types of IoT 1447
devices, it is crucial to design scalable solutions. Scalabil- 1448
ity must be considered at all stages of the solution design, 1449
as explained below. 1450
1) Traffic collection: the collection must be quick, effi- 1451
cient, and non-exhaustive. For instance, data sam- 1452
pling [104] (i.e., taking sufficiently representative sam- 1453
ples rather than the entire dataset) can be used to 1454
improve scalability. However, the choice of the sam- 1455
pling solution must be well thought out as it may be 1456
inappropriate for minority and sparse traffic classes, 1457
which brings us back to the unbalanced dataset prob- 1458
lem, discussed above in Sec. VIII-A.1459
2) Feature extraction: feature extraction should not be 1460
complex, long, or costly. It is important to choose 1461
a scalable method. For example, packet-level feature 1462
extraction is very time- and computation-consuming, 1463
and it is therefore not scalable. On the other hand, deep 1464
learning (cf. Sec V-C) could be improved to simplify 1465
and automate the feature extraction process, and is 1466
therefore more likely to be scalable. 1467
3) ML-based classification: the number of classifiers 1468
(one-class classifier or multi-class classifier) should 1469
allow for easier extension to new classes and avoid 1470
extensive updating of all the models, as discussed 1471
in Sec. VI-A3.1472
4) Classification granularity: Bai et al. [55] noticed a 1473
decrease in accuracy with the increase in the number 1474
of classes. One solution is to carefully choose the clas- 1475
sification granularity according to the final application, 1476
as discussed in Sec. VII-D.1477
Moreover, with the emergence of edge computing, it is 1478
interesting to use the powerful computing and storage capa- 1479
bilities provided by neighboring edge servers to facilitate the 1480
IoT device classification and make it more scalable. A first 1481
97134 VOLUME 10, 2022
H. Jmila et al.: Survey of Smart Home IoT Device Classification Using ML-Based Network Traffic Analysis
attempt was proposed by Sun et al. [87] who designed an1482
edge-based IoT device classification scheme. Transfer learn-1483
ing, discussed above, can also be used for scalability by1484
minimizing learning time.1485
E. DEPLOYMENT IN PRACTICE1486
We observe a gap between academic advancements and mar-1487
ket implementation since reviewed IoT device categorization1488
solutions are seldom (if ever) deployed.1489
Indeed, most proposed solutions have not been imple-1490
mented using a realistic case study. Hence, their contribution1491
to improving the security or management of the IoT system1492
has not been evaluated, making their actual effectiveness1493
uncertain. The lack of such evaluation scenarios is due to1494
the difficulties of implementing and mastering realistic and1495
usually complex ecosystems. In addition, the challenges dis-1496
cussed above need to be addressed to make the solutions more1497
mature and ready for market implementation.1498
F. MUD AND STANDARDIZATION1499
Another solution for classifying and identifying IoT devices1500
would be to use the Manufacturer Usage Description1501
(MUD) [105]. The MUD is a standard defined by the1502
IETF [106] that allows IoT device manufacturers to publish1503
device specifications, including intended communication1504
patterns. IoT devices generally perform a specific func-1505
tion [107], and therefore have a recognizable communication1506
pattern, which can be captured formally and concisely as1507
a MUD profile [108]. Unfortunately, current IoT manufac-1508
turers do not yet support MUD specifications and mecha-1509
nism. Hamza et al.