Chapter

Analysis of IoT Device Network Traffic: Thinking Toward Machine Learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The proliferation and diversity of Internet of things (IoT) devices increase IoT system risks. This open many challenging area for research like identifying authorized and unauthorized IoT devices, classifying events of devices, detecting network traffic anomalies generated by such devices, etc. To do so, machine and deep learning algorithms should be applied after extracted a set of features from the associated device’s traffic. The identification accuracy as well as the computational time are both very important factors, especially with the limited resources system. These considerations are firstly dependent on which features list is taken. So, the IoT network traffic analysis is the more important and difficult step in IoT device identification. In this paper, the traffic of six IoT home devices is analyzed using wireshark network protocol analyzer. A set of features and protocols are studied to gain insight into the important ones that can be utilized for creating a device fingerprint for identification purposes in the IoT environment. The result of the analytical study was that some features are unsuitable to be certified in the case of the testbed devices' traffic, while others are expressive features that can be used to identify devices according to their manufacturers.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
It has been well established that the Internet of Things will bring an expansion in traffic volume and types. This will bring new challenges in terms of Quality of Service (QoS) and security, requiring innovative traffic {management} techniques. {Traffic classification is a main network function that helps in managing both QoS and security.} Different machine learning based methods have been applied for this aim. However, traditional machine learning methods rely on hand crafted features, limiting the model ability to learn. Deep Learning (DL), a branch of machine learning, is characterized by its representation learning ability. In this paper, we analyse two methods of data representation for DL-based classification: a raw packet-based representation and a quasi-raw flow-based representation. Different tests are performed to evaluate the robustness of these data representation methods. The tests include features' importance, model robustness, and anonymization tests. The results show that raw data representation suffers from traffic anonymization and the fact that many packet fields are data-dependent. On the other hand, the flow-based representation is sensitive to the number of packets used for classification and to traffic obfuscation.
Article
Full-text available
In recent years, with the rapid development of Internet of Things (IoT) technology, a large number of Internet of things devices such as network printers, webcams and routers have emerged in the cyberspace. However, the situation of network security is increasingly serious. Large-scale network attacks launched by terminal devices connected to the Internet occur frequently, causing a series of adverse effects such as information leakage and property loss to people. The establishment of a set of fingerprint generation system for Internet of things devices to accurately identify the device type is of great significance for the unified security control of the Internet of things. We proposed a RAFM which is a detection and identification system of IoT. RAFM consists two major module including auto detection and fingerprinting. RAFM collects messages sent by different Internet of things devices by means of passive listening. Based on the differences in the header fields of different devices, it USES a series of multi-class classification algorithms to identify device types. Simulation experiments show that RAFM can achieve an average prediction accuracy of 93.75%.
Article
Full-text available
Security experts have demonstrated numerous risks imposed by Internet of Things (IoT) devices on organizations. Due to the widespread adoption of such devices, their diversity, standardization obstacles, and inherent mobility, organizations require an intelligent mechanism capable of automatically detecting suspicious IoT devices connected to their networks. In particular, devices not included in a white list of trustworthy IoT device types (allowed to be used within the organizational premises) should be detected. In this research, Random Forest, a supervised machine learning algorithm, was applied to features extracted from network traffic data with the aim of accurately identifying IoT device types from the white list. To train and evaluate multi-class classifiers, we collected and manually labeled network traffic data from 17 distinct IoT devices, representing nine types of IoT devices. Based on the classification of 20 consecutive sessions and the use of majority rule, IoT device types that are not on the white list were correctly detected as unknown in 96% of test cases (on average), and white listed device types were correctly classified by their actual types in 99% of cases. Some IoT device types were identified quicker than others (e.g., sockets and thermostats were successfully detected within five TCP sessions of connecting to the network). Perfect detection of unauthorized IoT device types was achieved upon analyzing 110 consecutive sessions; perfect classification of white listed types required 346 consecutive sessions, 110 of which resulted in 99.49% accuracy. Further experiments demonstrated the successful applicability of classifiers trained in one location and tested on another. In addition, a discussion is provided regarding the resilience of our machine learning-based IoT white listing method to adversarial attacks.
Article
In this work, we present an IoT botnet detection solution, EDIMA, consisting of a set of lightweight modules designed to be deployed at the edge gateway installed in home networks with the remaining modules expected to be implemented on cloud servers. EDIMA targets early detection of IoT botnets prior to the launch of an attack and includes a novel two-stage Machine Learning (ML)-based detector developed specifically for IoT bot detection at the edge gateway. The ML-based bot detector first employs supervised ML algorithms for aggregate traffic classification and subsequently Autocorrelation Function (ACF)-based tests to detect individual bots. The EDIMA architecture also comprises a malware traffic database, a policy engine, a feature extractor and a traffic parser. Performance evaluation results using our testbed setup with real-world IoT malware traffic as well as other public IoT datasets show that EDIMA achieves high bot scanning and bot-CnC traffic detection accuracies with very low false positive rates. The detection performance is also shown to be robust to an increase in the number of IoT devices connected to the edge gateway where EDIMA is deployed. Further, the runtime performance analysis of a Python implementation of EDIMA deployed on a Raspberry Pi reveals low bot detection delays and low RAM consumption. EDIMA is also shown to outperform existing detection techniques for bot scanning traffic and bot-CnC server communication.
Conference Paper
Internet of Things (IoT) is the major technology of the 4 th industrial revolution in which various types of devices are connected together to work smartly without the intervention of humans. IoT seems to impart a great impact on our social, economic, and commercial lives. IoT applications are converting from smart home and smart me to the smart cities or smart planet. However, the large number of devices interconnected with each other by multi protocols puts the security of IoT networks on the verge of threats. Making the IoT devices more secure is also not feasible because of their limited computational power. Hence, there is a need for advancement in methods to secure IoT networks. Machine Learning (ML) models have been hot topics in security research in past years. As the IoT devices generate tons of data on a daily basis which can be used to train ML algorithms, it could be a reasonable solution to provide security to IoT systems. In this work, the main goal is to provide a broader survey of research works in the IoT security field regarding ML implementation. We briefly described the security issues in IoT networks and their impact on the privacy of important data. We then shed light on different ML algorithms and models and discussed their advantages, disadvantages, and applications in IoT individually. Moreover, the ML models currently working in IoT networks for security purposes are discussed. We also talked about the limitations of using ML models to secure the IoT networks which could provide new future research directions.
Conference Paper
As the number of Internet of Things (IoT) devices and applications increases, the capacity of the IoT access networks is considerably stressed. This can create significant performance bottlenecks in various layers of an end-to-end communication path, including the scheduling of the spectrum, the resource requirements for processing the IoT data at the Edge and/or Cloud, and the attainable delay for critical emergency scenarios. Thus, it is required to classify or predict the time varying traffic characteristics of the IoT devices. However, this classification remains at large an open challenge. Most of the existing solutions are based on machine learning techniques, which nonetheless present high computational cost while non considering the fine-grained flow characteristics. To this end, in this paper we design a two-stage classification framework that utilizes both the network and statistical features to characterize the IoT devices in the context of a smart city. We firstly perform the data cleaning and preprocessing of the data and then analyze the dataset to extract the network and statistical features set for different types of IoT devices. The evaluation results show that the proposed classification can achieve 99% accuracy as compared to other techniques with Mathews Correlation Coefficient of 0.96.
Chapter
The growing use of IoT devices in organizations has increased the number of attack vectors available to attackers due to the less secure nature of the devices. The widely adopted bring your own device (BYOD) policy which allows an employee to bring any IoT device into the workplace and attach it to an organization’s network also increases the risk of attacks. In order to address this threat, organizations often implement security policies in which only the connection of white-listed IoT devices is permitted. To monitor adherence to such policies and protect their networks, organizations must be able to identify the IoT devices connected to their networks and, more specifically, to identify connected IoT devices that are not on the white-list (unknown devices). In this study, we applied deep learning on network traffic to automatically identify IoT devices connected to the network. In contrast to previous work, our approach does not require that complex feature engineering be applied on the network traffic, since we represent the “communication behavior” of IoT devices using small images built from the IoT devices’ network traffic payloads. In our experiments, we trained a multiclass classifier on a publicly available dataset, successfully identifying 10 different IoT devices and the traffic of smartphones and computers, with over 99% accuracy. We also trained multiclass classifiers to detect unauthorized IoT devices connected to the network, achieving over 99% overall average detection accuracy.
Article
Distributed Denial-of-Service (DDoS) attacks launched from compromised Internet-of-Things (IoT) devices have shown how vulnerable the Internet is to large-scale DDoS attacks. To understand the risks of these attacks requires learning about these IoT devices: where are they? how many are there? how are they changing? This paper describes three new methods to find IoT devices on the Internet: server IP addresses in traffic, server names in DNS queries, and manufacturer information in TLS certificates. Our primary methods (IP addresses and DNS names) use knowledge of servers run by the manufacturers of these devices. Our third method uses TLS certificates obtained by active scanning. We have applied our algorithms to a number of observations. With our IP-based algorithm, we report detections from a university campus over 4 months and from traffic transiting an IXP over 10 days. We apply our DNS-based algorithm to traffic from 8 root DNS servers from 2013 to 2018 to study AS-level IoT deployment. We find substantial growth (about 3.5×) in AS penetration for 23 types of IoT devices and modest increase in device type density for ASes detected with these device types (at most 2 device types in 80% of these ASes in 2018). DNS also shows substantial growth in IoT deployment in residential households from 2013 to 2017. Our certificate-based algorithm finds 254k IP cameras and network video recorders from 199 countries around the world.
Chapter
Now Internet of Things is growing fast and presents huge opportunities for the industry, the users, and the hackers. IoT service providers may face challenges from IoT devices which are developed with software and hardware originally designed for mobile computing and traditional computer environments. Thus the first line of security defense of IoT service providers is identification of IoT devices and try to analyze their behaviors before allowing them to use the service. In this work, we propose to use machine learning techniques to identify the IoT devices. We also report experiment to explain the performance and potential of our techniques.
Conference Paper
Device Fingerprinting (DFP) is the identification of a device without using its network or other assigned identities including IP address, Medium Access Control (MAC) address, or International Mobile Equipment Identity (IMEI) number. DFP identifies a device using information from the packets which the device uses to communicate over the network. Packets are received at a router and processed to extract the information. In this paper, we worked on the DFP using Inter Arrival Time (IAT). IAT is the time interval between the two consecutive packets received. This has been observed that the IAT is unique for a device because of different hardware and the software used for the device. The existing work on the DFP uses the statistical techniques to analyze the IAT and to further generate the information using which a device can be identified uniquely. This work presents a novel idea of DFP by plotting graphs of IAT for packets with each graph plotting 100 IATs and subsequently processing the resulting graphs for the identification of the device. This approach improves the efficiency to identify a device DFP due to achieved benchmark of the deep learning libraries in the image processing. We configured Raspberry Pi to work as a router and installed our packet sniffer application on the Raspberry Pi. The packet sniffer application captured the packet information from the connected devices in a log file. We connected two Apple devices iPad4 and iPhone 7 Plus to the router and created IAT graphs for these two devices. We used Convolution Neural Network (CNN) to identify the devices and observed the accuracy of 86.7%.
Article
The Internet of Things (IoT) is being hailed as the next wave revolutionizing our society, and smart homes, enterprises, and cities are increasingly being equipped with IoT devices. Yet, operators of such smart environments may not even be fully aware of their IoT assets. In this paper, we address this challenge by developing a framework for IoT device classification using network traffic characteristics. First, we instrument a smart environment with 28 different IoT devices spanning cameras, lights, plugs, motion sensors and health-monitors. We collect and synthesize traffic traces from this infrastructure for a period of 6 months, a subset of which we release as open data for the community to use. Second, we present insights into the underlying network traffic characteristics using statistical attributes such as activity cycles, port numbers, signalling patterns and cipher suites. Third, we develop a multi-stage machine-learning-based classification algorithm and demonstrate its ability to identify specific IoT devices with over 99%. Finally, we discuss the trade-offs between cost, speed, and performance involved in deploying the classification framework in real-time. Our study paves the way for operators of smart environments to monitor their IoT assets for presence, functionality, and cyber-security without requiring any specialized devices or protocols.