Article

Independent comparison of popular DPI tools for traffic classification

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Deep Packet Inspection (DPI) is the state-of-the-art technology for traffic classification. According to the conventional wisdom, DPI is the most accurate classification technique. Consequently, most popular products, either commercial or open-source, rely on some sort of DPI for traffic classification. However, the actual performance of DPI is still unclear to the research community, since the lack of public datasets prevent the comparison and reproducibility of their results. This paper presents a comprehensive comparison of 6 well-known DPI tools, which are commonly used in the traffic classification literature. Our study includes 2 commercial products (PACE and NBAR) and 4 open-source tools (OpenDPI, L7-filter, nDPI, and Libprotoident). We studied their performance in various scenarios (including packet and flow truncation) and at different classification levels (application protocol, application and web service). We carefully built a labeled dataset with more than 750 K flows, which contains traffic from popular applications. We used the Volunteer-Based System (VBS), developed at Aalborg University, to guarantee the correct labeling of the dataset. We released this dataset, including full packet payloads, to the research community. We believe this dataset could become a common benchmark for the comparison and validation of network traffic classifiers. Our results present PACE, a commercial tool, as the most accurate solution. Surprisingly, we find that some open-source tools, such as nDPI and Libprotoident, also achieve very high accuracy.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As studies closely related to this research, we introduce studies on observing and classifying traffic [1,2,6,[9][10][11][12][13][14][15], anomaly detection based primarily on traffic analysis [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30], and traffic generation [31][32][33][34][35][36][37][38][39]. The taxonomy of traffic measurements and time-series traffic analyses, which are particularly relevant to this study, is presented in Table I. [9] direct measurement Traffic Measurement --Meta-information of sampled flows -Low processing load and high scalability -Low applicability in packet encapsulated environments Classical xFlow technologies (NetFlow [1], IPFIX [2], sFlow [11]) flow measurement --Meta-information of sampled flows -More efficient measurement than classical xFlow technology -Low applicability in packet encapsulated environments probabilistic data structures [12][13] [14] ✔ -Meta-information of sampled flows -Highl applicability in packet encapsulated environments (large-scale carrier network ) Fast xFlow Proxy [6] --Prediction of complex time-series variations by advanced analysis, and difference analysis from actual measurements ARIMA [19] [20], LSTM [21], CNN [22] with model prediction ...
... As studies closely related to this research, we introduce studies on observing and classifying traffic [1,2,6,[9][10][11][12][13][14][15], anomaly detection based primarily on traffic analysis [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30], and traffic generation [31][32][33][34][35][36][37][38][39]. The taxonomy of traffic measurements and time-series traffic analyses, which are particularly relevant to this study, is presented in Table I. [9] direct measurement Traffic Measurement --Meta-information of sampled flows -Low processing load and high scalability -Low applicability in packet encapsulated environments Classical xFlow technologies (NetFlow [1], IPFIX [2], sFlow [11]) flow measurement --Meta-information of sampled flows -More efficient measurement than classical xFlow technology -Low applicability in packet encapsulated environments probabilistic data structures [12][13] [14] ✔ -Meta-information of sampled flows -Highl applicability in packet encapsulated environments (large-scale carrier network ) Fast xFlow Proxy [6] --Prediction of complex time-series variations by advanced analysis, and difference analysis from actual measurements ARIMA [19] [20], LSTM [21], CNN [22] with model prediction ...
... Methods for traffic measurement can be classified into two categories: direct measurement of packets and a flow measurement-based approach that obtains meta-information of sampled packets at regular intervals. A typical example of the former is deep packet inspection (DPI) [9], which accumulates pcap format data as big data. A machine learning approach has been proposed to analyze the data obtained by DPI [10]. ...
Article
Full-text available
A network anomaly detection method is proposed for large-scale, wide-range Internet Protocol (IP) networks. Because network behavior is projected onto communication traffic, anomaly detection can be achieved by properly analyzing the communication traffic flows. However, in wide-area IP networks, communication traffic flows are encapsulated by headers assigned by communication carriers and thus are observed as more macroscopic information. Therefore, accurately detecting the occurrence of anomalies in individual communication flows is difficult because the flow observation results obtained by flow measurement protocols such as IP Flow Information Export (IPFIX) are the result of superimposing various communication flows with different characteristics. In this study, we propose an anomaly-detection method based on time-series traffic flows. First, we decompose superimposed traffic flows into individual flows using our implemented system called the Fast xFlow Proxy, which can decompose traffic flows to a fine granularity. Our method detects anomalies in the decomposed flows based on a simple correlation analysis and dynamic threshold configuration. Our extensive simulation shows that, if we observe individual flows using the Fast xFlow Proxy, our method can detect anomalies caused by service failures with almost 100% accuracy. Our method can achieve an accuracy of approximately 80%-90% even in more difficult detection cases, such as small traffic fluctuations or noisy situations.
... This can result in the traffic payload hardly contributing to VPN traffic classification. Therefore, VPN and TLS encryption will affect some traditional traffic classification technologies, such as deep packet inspection (DPI) [10] and port detection [11]. ...
... Then, whenever a new traffic to be classified is encountered, its fingerprint is matched with the fingerprint in the database to obtain the final classification result. Johanna et al. [12] found that the issuer and subject fields in normal certificates often do not contain information such as location or company name, and use random generic names, which are formatted as www.+ base-32 code of 8-20 letters + Com or Net, so Johanna et al. [10] successfully identified tor traffic from the dataset containing tor traffic using the issuer and subject fields in the certificate as fingerprints. Fingerprint-based methods usually perform well in some fine-grained classification problems, such as website recognition. ...
... There are 185 trainable parameters in this layer. • MaxPooling1D_1: the second layer is the maximum pooling layer and the output size is (20,20,10). Convolution layer plus maximum pooling layer is sometimes referred to as a convolution unit. ...
Article
Full-text available
Network traffic classification has great significance for network security, network management and other fields. However, in recent years, the use of VPN and TLS encryption had presented network traffic classification with new challenges. Due to the great performances of deep learning in image recognition, many solutions have focused on the deep learning-based method and achieved positive results. A traffic classification method based on deep learning is provided in this paper, where the concept of Packet Block is proposed, which is the aggregation of continuous packets in the same direction. The features of Packet Block are extracted from network traffic, and then transformed into images. Finally, convolutional neural networks are used to identify the application type of network traffic. The experiment is conducted using captured OpenVPN dataset and public ISCX-Tor dataset. The results shows that the accuracy is 97.20% in OpenVPN dataset and 93.31% in ISCX-Tor dataset, which is higher than the state-of-the-art methods. This suggests that our approach has the ability to meet the challenges of VPN and TLS encryption.
... The Protocol and Application Classification Engine (PACE) from ipoque employs a combination of deep packet inspection (DPI) technologies, such as pattern matching, behavioral, and statistical analysis. PACE can successfully detect protocols, even when they employ advanced obfuscation and encryption techniques [34] with a low false negative rate and almost no false positives because to this combination. It assists network equipment and software companies in adding strong and proven layer-7 protocol management features to their products. ...
... Over 400 distinct application protocols are supported by Libprotoident, and this number will continue to expand in future editions [55]. For Libprotoident, Payload pattern matching, payload size, port numbers, and IP matching are all rules used in the classification process [34]. ...
Thesis
Master thesis of state of art of deep packet inspection technique
... Крім алгоритмів машинного навчання для класифікації трафіка, можуть застосовуватися методи «глибокого огляду пакетів» (Deep Packet Inspection, DPI). Deep Packet Inspection -це найсучасніша технологія для класифікації трафіка, оскільки вона є найбільш точною методикою [13]. Тому часто найпопулярніші продукти, як комерційні, так і відкриті, покладаються на DPI під час класифікації трафіка. ...
... Тому часто найпопулярніші продукти, як комерційні, так і відкриті, покладаються на DPI під час класифікації трафіка. Однак фактична ефективність DPI все ще невизначена, оскільки обмежена кількість публічних наборів даних обмежує порівняння та відтворювання результатів [13]. ...
Article
Full-text available
The development of mobile networks and implementation of new standards, such as 5G and 6G, in the future, will lead to increased traffic volume in the network and new types of traffic creation. Also, new traffic types demand specific service requirements. Currently, existing traffic processing methods are not adapted to such changes, which can impair the Quality of Service. A possible solution for improving the efficiency of information processing is introducing new algorithms for classifying and prioritizing traffic. That is why in this work, the main focus is on analyzing the effectiveness of machine learning algorithms to solve the problem of traffic classification in mobile networks in real-time. The accuracy of classification and performance for the most common machine learning algorithms is analyzed, and the criterion of classification accuracy determines the optimal algorithm to achieve the goal. The results of the comparative analysis showed that the best accuracy could be achieved when using ANN algorithms (the number of latent network layers is 200) and RF. At the same time, the advantages of ANN include high efficiency and reliability of information processing and simple algorithm learning. Also, the RF algorithm is a quick and powerful classification algorithm, but it has shortcomings during the interpretation of the solution and works poorly for small data. In addition, the work assessment of the importance of the dataset fields for classification was evaluated. These improvements can be implemented both on final devices and base stations. They will improve the quality of classification, clustering, and processing of packets, which will generally increase the efficiency of the intellectual mobile network management system. Further development of the topic may be using the studied algorithms to solve the problems of detecting anomalies in traffic to increase the network’s security.
... The evaluation of the proposed algorithms for the slice selection, along with the traffic classification algorithm, was carried out using Amarisoft software for the whole 5G network, i.e., 5GC, RAN, and UEs. Regarding the classifier, for all of the reasons explained in Section 2.3, as well as for the comparison made in [41,49], the nDPI traffic classifier was selected. In turn, five types of traffic were defined: Best-effort/default traffic, Control data traffic, Video traffic, IoT traffic, and WebData traffic. ...
... Comparison between different DPI algorithms[41]. ...
Article
Full-text available
Network slicing is a promising technique used in the smart delivery of traffic and can satisfy the requirements of specific applications or systems based on the features of the 5G network. To this end, an appropriate slice needs to be selected for each data flow to efficiently transmit data for different applications and heterogeneous requirements. To apply the slicing paradigm at the radio segment of a cellular network, this paper presents two approaches for dynamically classifying the traffic types of individual flows and transmitting them through a specific slice with an associated 5G quality-of-service identifier (5QI). Finally, using a 5G standalone (SA) experimental network solution, we apply the radio resource sharing configuration to prioritize traffic that is dispatched through the most suitable slice. The results demonstrate that the use of network slicing allows for higher efficiency and reliability for the most critical data in terms of packet loss or jitter.
... This will lead to a serious decline in the method's effectiveness. The identification method based on deep packet inspection (DPI) [4] implements traffic classification by defining regular expressions for different categories. The DPI method is effective for plaintext traffic and is useless for ciphertext traffic. ...
Article
Full-text available
Darknet traffic classification is significantly important to network management and security. To achieve fast and accurate classification performance, this paper proposes an online classification model based on multimodal self-attention chaotic mapping features. On the one hand, the payload content of the packet is input into the network integrating CNN and BiGRU to extract local space-time features. On the other hand, the flow level abstract features processed by the MLP are introduced. To make up for the lack of the indistinct feature learning, a feature amplification module that uses logistic chaotic mapping to amplify fuzzy features is introduced. In addition, a multi-head attention mechanism is used to excavate the hidden relationships between different features. Besides, to better support new traffic classes, a class incremental learning model is developed with the weighted loss function to achieve continuous learning with reduced network parameters. The experimental results on the public CICDarketSec2020 dataset show that the accuracy of the proposed model is improved in multiple categories; however, the time and memory consumption is reduced by about 50$ % $. Compared with the existing state-of-the-art traffic classification models, the proposed model has better classification performance.
... Bujlow et al. conducted a comprehensive comparison of 6 commonly used DPI tools, including 2 commercial products (PACE and NBAR) and 4 open-source tools (OpenDPI, L7-filter, nDPI, and libprotoident). The test comparison results show that the PACE commercial tool has the best detection performance among the six tools, but some open-source tools, such as nDPI and libprotoident, can also achieve very high accuracy [16]. ...
Article
Full-text available
Traffic classification is widely used in network security and network management. Early studies have mainly focused on mapping network traffic to different unencrypted applications, but little research has been done on network traffic classification of encrypted applications, especially the underlying traffic of encrypted applications. To address the above issues, this paper proposes a network encryption traffic classification model that combines attention mechanisms and spatiotemporal features. The model firstly uses the long short-term memory (LSTM) method to analyze continuous network flows and find the temporal correlation features between these network flows. Secondly, the convolutional neural network (CNN) method is used to extract the high-order spatial features of the network flow, and then, the squeeze and excitation (SE) module is used to weight and redistribute the high-order spatial features to obtain the key spatial features of the network flow. Finally, through the above three stages of training and learning, fast classification of network flows is achieved. The main advantages of this model are as follows: (1) the mapping relationship between network flow and label is automatically constructed by the model without manual intervention and decision by network features, (2) it has strong generalization ability and can quickly adapt to different network traffic datasets, and (3) it can handle encrypted applications and their underlying traffic with high accuracy. The experimental results show that the model can be applied to classify network traffic of encrypted and unencrypted applications at the same time, especially the classification accuracy of the underlying traffic of encrypted applications is improved. In most cases, the accuracy generally exceeds 90%.
... Salah satu metode yang banyak dipakai dalam mengklasifikasi trafik adalah Deep Packet Inspection (DPI). (Bujlow, Carela-Español, & Barlet-Ros, 2015) membandingkan beberapa tool DPI dalam mengklasifikasi trafik, penelitian yang lain menggunakan DPI adalah (Oklilas & Tasmi, 2017) mampu mengklasifikasi trafik pada jaringan wifi. (Fan & Liu, 2017) Klasifikasi trafik jaringan dengan metode Support Vector Machine (SVM) dan K-Mean dan membandingkan nilai akurasi dari kedua metode tersebut. ...
Article
Limited network resources and the increasing number of internet users in the current digital era have an impact on high traffic which results in decreased access speed to internet services. This is also a problem that occurs at the Indo Global Mandiri University (UIGM) Palembang, causing access to academic services to be slow. The purpose of this research is to identify the types of network traffic patterns which are then carried out by the process of grouping and visualizing these types of traffic. The data in this study were taken in real-time at the UIGM campus. The data obtained is the result of responses which are then extracted. The extraction results are processed using the Support Vector Machine (SVM) method for the process of grouping and visualizing data. The results of this study can distinguish types of traffic based on communication protocols, namely tcp and udp, where the results of the experiment were carried out six times with the results being the first experiment where 99.7% TCP and 0.1% for UDP, the second experiment 97.6% for TCP and 1.1% for UDP , trial three 99.7 % TCP and 0.2% UDP, trial four 97.5% and 1.3% UDP, trial five 99.5 TCP and 02% UDP, and the sixth or final try 97.7% TCP and 1.1% UDP. The data from the use of the SVM method obtained several types of traffic such as games by 0.4%, mail 0.2%, multimedia 0.4% and the web by 82.8% and this research still produces data that the pattern is not yet recognized by 15.5%Â Keywords : Network Traffic, Classification, Support Vector Mesin
... For instance, nDPI (Deri et al. 2014) is an open-source tool for traffic classification based on this method. In a comparison done by Bujlow et al. (2015), nDPI and Libprotoident achieved the best classification performance among other similar opensource softwares. ...
Article
Full-text available
Protecting users’ privacy over the Internet is of great importance; however, it becomes harder and harder to maintain due to the increasing complexity of network protocols and components. Therefore, investigating and understanding how data are leaked from the information transmission platforms and protocols can lead us to a more secure environment. In this paper, we propose a framework to systematically find the most vulnerable information fields in a network protocol. To this end, focusing on the transport layer security (TLS) protocol, we perform different machine-learning-based fingerprinting attacks on the collected data from more than 70 domains (websites) to understand how and where this information leakage occurs in the TLS protocol. Then, by employing the interpretation techniques developed in the machine learning community and applying our framework, we find the most vulnerable information fields in the TLS protocol. Our findings demonstrate that the TLS handshake (which is mainly unencrypted), the TLS record length appearing in the TLS application data header, and the IV field are among the most critical leaker parts in this protocol, respectively.
... For this purpose, DPI systems such as Wireshark [36] and nDPI [37] can be used. In [38,39], several DPI systems were compared. To evaluate the performance of labeling tools, it was proposed to test them on the data for which a special program identified the applications that generated these data, which makes it possible to achieve a priori high quality of labeling. ...
Article
This survey is devoted to the task of network traffic classification, specifically, to the use of machine learning algorithms in this task. The survey begins with the description of the task, its different statements, and possible real-world applications. It then proceeds to the description of the methods historically used for network traffic classification, as well as their limitations and evolution of traffic, making machine learning the main way to solve the problem. The most popular machine learning algorithms used in this task are described and accompanied with examples of research papers that provide insight into their advantages and disadvantages. The problem of feature selection is discussed with subsequent consideration of a more global problem of acquiring a suitable dataset for network traffic classification; examples of popular datasets and their descriptions are provided. The paper concludes with an overview of some current problems.
... However, this approach is already obsolete since most applications adopt dynamic ports nowadays [5]. The packet-based identification mainly refers to identifying Internet traffic depending on packet payload, namely Deep Packet Inspection; this approach requires protocol templates predefined by experts, which is almost impossible for unknown Internet traffic [6]. Moreover, the packet-based analysis puts forward strict requirements for computational abilities. ...
Article
Full-text available
The identification of Internet protocols provides a significant basis for keeping Internet security and improving Internet Quality of Service (QoS). However, the overwhelming developments and updating of Internet technologies and protocols have led to large volumes of unknown Internet traffic, which threaten the safety of the network environment a lot. Since most of the unknown Internet traffic does not have any labels, it is difficult to adopt deep learning directly. Additionally, the feature accuracy and identification model also impact the identification accuracy a lot. In this paper, we propose a surge period-based feature extraction method that helps remove the negative influence of background traffic in network sessions and acquire as many traffic flow features as possible. In addition, we also establish an identification model of unknown Internet traffic based on JigClu, the self-supervised learning approach to training unlabeled datasets. It finally combines with the clustering method and realizes the further identification of unknown Internet traffic. The model has been demonstrated with an accuracy of no less than 74% in identifying unknown Internet traffic with the public dataset ISCXVPN2016 under different scenarios. The work provides a novel solution for unknown Internet traffic identification, which is the most difficult task in identifying Internet traffic. We believe it is a great leap in Internet traffic identification and is of great significance to maintaining the security of the network environment.
... The applications are classi ed based on their characteristics. The pattern in the rst step should be extracted and kept up-to-date because this method can detect the evolution of the applications [5]. The drawback of DPI appears when the encrypted tra c has been used increasingly; hence, this method has a high computational overhead and is not adaptable to users' privacy [6]. ...
Preprint
Full-text available
Nowadays, Internet users are rising and need to be supplied with an adoptable quality of service (QoS). Network traffic classification is one of the essential functions that can lead the internet service provider (ISP) to provide required network resources rationally. In facing new flows, the network traffic classification accuracy improvement can play a critical role in network performance, QoS, and security improvement. In this paper, we propose a novel classification model, including (1) a deep autoencoder and (2) a classifier to improve the network traffic classification accuracy in facing new network flows. The deep autoencoder is designed and evaluated in this article with the mean square error (MSE) metric. The proposed deep autoencoder has advanced the model to extract the effective features from the training set more accurately than other methods like the manual method or shallow neural network model. Three distinct classifiers are considered to be added to the deep autoencoder and make it more accurate. The transfer learning is used to add the distinct classifiers, namely logistic regression, random forest, decision tree, and Support Vector Machine (SVM), as a layer to the proposed model. The proposed deep classification model is evaluated with accuracy and f-score measures. The simulation results show that the proposed model has more accuracy and f-score than Convolutional Neural Network (CNN). The UNB ISCX VPN-nonVPN dataset is used for training and testing the model. Software Defined Network (SDN) architecture is used for the proposed model to be deployed because this architecture has made the network more programmable and flexible than the traditional closed networks.
... A classificação do tráfego em redes de computadores (CTR) é um processo fundamental para diversas áreas de pesquisa referentes às infraestruturas computacionais distribuídas, contribuindo para incremento da segurança, qualidade de serviços e contabilização de recursos tecnológicos [Bujlow et al. 2015] . ...
Conference Paper
Este artigo contribui para a classificação do tráfego de streaming de vídeo explorando conceitos de Lógica Fuzzy Intervalar. Essa abordagem estende os trabalhos relacionados ao considerar as incertezas geradas pelas variações nas condições da rede e a imprecisão dos parâmetros que afetam o comportamento do fluxo da rede, o que aumenta a complexidade para alcançar maior acurácia na identificação do tráfego da rede. Algumas avaliações usando a abordagem de lógica intervalar para classificação de tráfego de streaming de vídeo são apresentadas com o uso de aplicações e datasets para validar a proposta.
... Transport Layer Security (TLS), the most popular protocol for encrypting traffic, has been adopted by almost all famous video sites. As a result, traditional methods for plaintext network traffic, such as Deep Packet Inspection (DPI) [1], do not work any longer. Nevertheless, it does not mean that the content analysis of encrypted video traffic is impossible. ...
Conference Paper
Full-text available
In order to detect the playback of illegal videos, it is necessary for supervisors to monitor the network by analyzing traffic from devices. However, many popular video sites, such as YouTube, have applied encryption to protect users' privacy, which makes it difficult to analyze network traffic at the same time. Many researches suggest that DASH (Dynamic Adaptive Streaming over HTTP) will leak the information of video seg-mentation, which is related to the video content. Consequently, it is possible to analyze the content of encrypted video traffic without decryption. At present, most of the encrypted video traffic analysis adopts supervised learning methods, and there is little research on its unsupervised methods. Analysts are usually faced with unlabeled data, in reality, so the existing approaches will not work. The encrypted video traffic analysis methods based on unsupervised learning are required. In this paper, we proposed a clustering method based on Levenshtein distance for title analysis of encrypted video traffic. We also run a thorough set of experiments that verify the robustness and practicability of the method. As far as I am concerned, it is the first work to apply cluster analysis for encrypted video traffic analysis.
... Therefore, a solution is needed to avert losing these benefits when encrypting transport headers [8] or other HTTP tunnels will be prevalent. Traditional methods of traffic analysis were based on DPI [9], which refers to a set of analysis tools aimed at extracting information from the headers, payload and classifying the flows. However, with the increasing number of new applications, which no longer have fixed port numbers that can be queried but adopt random port strategies, the accuracy of DPI are gradually declining. ...
Preprint
Automatic traffic classification is increasingly becoming important in traffic engineering, as the current trend of encrypting transport information (e.g., behind HTTP-encrypted tunnels) prevents intermediate nodes from accessing end-to-end packet headers. However, this information is crucial for traffic shaping, network slicing, and Quality of Service (QoS) management, for preventing network intrusion, and for anomaly detection. 3D networks offer multiple routes that can guarantee different levels of QoS. Therefore, service classification and separation are essential to guarantee the required QoS level to each traffic sub-flow through the appropriate network trunk. In this paper, a federated feature selection and feature reduction learning scheme is proposed to classify network traffic in a semi-supervised cooperative manner. The federated gateways of 3D network help to enhance the global knowledge of network traffic to improve the accuracy of anomaly and intrusion detection and service identification of a new traffic flow.
... Network traffic classification, aiming to identify the category of traffic from various applications or web services, is an important technique in network management and network security [4,34]. Recently, traffic encryption has been widely utilized to protect the privacy and anonimity of Internet users. ...
Preprint
Encrypted traffic classification requires discriminative and robust traffic representation captured from content-invisible and imbalanced traffic data for accurate classification, which is challenging but indispensable to achieve network security and network management. The major limitation of existing solutions is that they highly rely on the deep features, which are overly dependent on data size and hard to generalize on unseen data. How to leverage the open-domain unlabeled traffic data to learn representation with strong generalization ability remains a key challenge. In this paper,we propose a new traffic representation model called Encrypted Traffic Bidirectional Encoder Representations from Transformer (ET-BERT), which pre-trains deep contextualized datagram-level representation from large-scale unlabeled data. The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks, remarkably pushing the F1 of ISCX-Tor to 99.2% (4.4% absolute improvement), ISCX-VPN-Service to 98.9% (5.2% absolute improvement), Cross-Platform (Android) to 92.5% (5.4% absolute improvement), CSTNET-TLS 1.3 to 97.4% (10.0% absolute improvement). Notably, we provide explanation of the empirically powerful pre-training model by analyzing the randomness of ciphers. It gives us insights in understanding the boundary of classification ability over encrypted traffic. The code is available at: https://github.com/linwhitehat/ET-BERT.
Article
Full-text available
The internet is not a secure platform. Third parties may track us easily and monitor us. At this point, a VPN is needed. The VPN helps to make data more private and secure by routing the traffic through encrypted tunnels. Rather than other VPNs, the facilities of Q-VPN are network level protection. The add-ons and malicious sites are automatically blocked. It is an open-source VPN that follows an LGPL license. Protect the user from network-level hijacking. Using WireGuard Protocol for tunnelling, Provides a TOR facility.
Article
Traffic classification is essential in network-related areas such as network management, monitoring, and security. As the proportion of encrypted internet traffic rises, the accuracy of port-based and DPI-based traffic classification methods has declined. The methods based on machine learning and deep learning have effectively improved the accuracy of traffic classification, but they still suffer from inadequate extraction of traffic structure features and poor feature representativeness. This article proposes a model called Semi-supervision 2-Dimensional Convolution AutoEncoder (Semi-2DCAE). The model extracts the spatial structure features in the original network traffic by 2-dimensional convolution neural network (2D-CNN) and uses the autoencoder structure to downscale the data so that different traffic features are represented as spectral lines in different intervals of a one-dimensional standard coordinate system, which we call FlowSpectrum. In this article, the PRuLe activation function is added to the model to ensure the stability of the training process. We use the ISCX-VPN2016 dataset to test the classification effect of FlowSpectrum model. The experimental results show that the proposed model can characterize the encrypted traffic features in a one-dimensional coordinate system and classify Non-VPN encrypted traffic with an accuracy of up to 99.2%, which is about 7% better than the state-of-the-art solution, and VPN encrypted traffic with an accuracy of 98.3%, which is about 2% better than the state-of-the-art solution.
Article
Full-text available
Botnets are one of the most harmful cyberthreats, that can perform many types of cyberattacks and cause billionaire losses to the global economy. Nowadays, vast amounts of network traffic are generated every second, hence manual analysis is impossible. To be effective, automatic botnet detection should be done as fast as possible, but carrying this out is difficult in large bandwidths. To handle this problem, we propose an approach that is capable of carrying out an ultra-fast network analysis (i.e. on windows of one second), without a significant loss in the F1-score. We compared our model with other three literature proposals, and achieved the best performance: an F1 score of 0.926 with a processing time of 0.007 ms per sample. We also assessed the robustness of our model on saturated networks and on large bandwidths. In particular, our model is capable of working on networks with a saturation of 10% of packet loss, and we estimated the number of CPU cores needed to analyze traffic on three bandwidth sizes. Our results suggest that using commercial-grade cores of 2.4 GHz, our approach would only need four cores for bandwidths of 100 Mbps and 1 Gbps, and 19 cores on 10 Gbps networks.
Article
With the development of the Industrial Internet of Things (IIoT), the complex traffic generated by large-scale IIoT devices presents challenges for traffic analysis. Most of existing deep learning-based traffic analysis methods use a single flow for classification, resulting in being misled by the irrelevant flow. Thus, it is necessary to use flow sequences for traffic analysis. However, existing models fail to effectively distinguish unimportant flows in flow sequence, which affects the classification performance. To address the above challenges, we propose a novel traffic classifier called Flow Transformer to perform traffic analysis with flow sequences, which leverages multi-head attention mechanism to strengthen the information interaction between related flows. Besides, the RF-based feature selection method is designed to select the optimal feature combination, avoiding insignificant features from reducing the performance of the classifier. Experimental results on three real-world traffic datasets demonstrate that our method outperforms state-of-the-art methods with a large margin.
Chapter
As Internet of Things (IoT) technologies enter the consumer market, smart cleaning robots have gained high attention and usage in households. However, as the “privacy paradox” phenomenon states, consumers behave differently even if many claim to be concerned about smart robot privacy issues. In this paper, we describe our attempt to discover effective measures for average consumers to guard against potential privacy intrusions by cleaning robots. We define our target devices, provide an ideal smart home network topology and establish our threat model. We document network redirection and analytic methods we used during our research. We categorize existing privacy protection methods and describe their general procedures. We assess and evaluate the protection methods with regard to three aspects: protection effectiveness, functionality loss and consumer-friendliness. In the end we perform a tabular qualitative comparison and develop our vision for privacy protection against cleaning robots.KeywordsCleaning robotPrivacyProtectionConsumerInternet of things
Article
Full-text available
Internet traffic classification aims to identify the kind of Internet traffic. With the rise of traffic encryption and multi-layer data encapsulation, some classic classification methods have lost their strength. In an attempt to increase classification performance, Machine Learning (ML) strategies have gained the scientific community interest and have shown themselves promising in the future of traffic classification, mainly in the recognition of encrypted traffic. However, some of these methods have a high computational resource consumption, which make them unfeasible for classification of large traffic flows or in real-time. Methods using statistical analysis have been used to classify real-time traffic or large traffic flows, where the main objective is to find statistical differences among flows or find a pattern in traffic characteristics through statistical properties that allow traffic classification. The purpose of this work is to address statistical methods to classify Internet traffic that were little or unexplored in the literature. This work is not generally focused on discussing statistical methodology. It focuses on discussing statistical tools applied to Internet traffic classification Thus, we provide an overview on statistical distances and divergences previously used or with potential to be used in the classification of Internet traffic. Then, we review previous works about Internet traffic classification using statistical methods, namely Euclidean, Bhattacharyya, and Hellinger distances, Jensen-Shannon and Kullback–Leibler (KL) divergences, Support Vector Machines (SVM), Correlation Information (Pearson Correlation), Kolmogorov-Smirnov and Chi-Square tests, and Entropy. We also discuss some open issues and future research directions on Internet traffic classification using statistical methods.
Article
Deep Packet Inspection (DPI) provides full visibility into network traffic by performing detailed analysis on both packet header and packet payload. Accordingly, DPI has critical importance as it can be used in applications i.e network security or government surveillance. In this paper, we provide an extensive survey on DPI. Different from the previous studies, we try to efficiently integrate DPI techniques into network analysis mechanisms by identifying performance-limiting parameters in the analysis of modern network traffic. Analysis of the network traffic model with complex behaviors is carried out with powerful hybrid systems by combining more than one technique. Therefore, DPI methods are studied together with other techniques used in the analysis of network traffic. Security applications of DPI on Internet of Things (IoT) and Software-Defined Networking (SDN) architectures are discussed and Intrusion Detection Systems (IDS) mechanisms, in which the DPI is applied as a component of the hybrid system, are examined. In addition, methods that perform inspection of encrypted network traffic are emphasized and these methods are evaluated from the point of security, performance and functionality. Future research issues are also discussed taking into account the implementation challenges for all DPI processes.
Chapter
Full-text available
Every Android application needs the collection of permissions during installation time, and these can be used in permission-based malware detection. Different ensemble strategies for categorising Android malware have recently received much more attention than traditional methodologies. In this paper, classification performance of one of the primary ensemble approach (Stacking) in R libraries in context of for Android malware is proposed. The presented technique reserves both the desirable qualities of an ensemble technique, diversity, and accuracy. The proposed technique produced significantly better results in terms of categorisation accuracy.KeywordsStackingEnsembleClassificationVotingAndroid malwares
Chapter
In this paper, different types of attacks and their malicious act behind the scenes are presented. The solutions to defend the victim against attacks are given. In today’s scenario when hackers typing away incessantly on keyboards, juggling multiple computers to take down a group of individuals, the user needs to focus on identification and solution from such misbehavior acts. Therefore, the paper includes the detection through acquiring knowledge about packets in networks, functionality of malicious attacks, deep packet inspection (DPI) mechanism together CIA triads and further prevention from new attacks by knowing the behavior of all kind of attacks.
Article
The recent success of Artificial Intelligence (AI) is rooted into several concomitant factors, namely theoretical progress coupled with abundance of data and computing power. Large companies can take advantage of a deluge of data, typically withhold from the research community due to privacy or business sensitivity concerns, and this is particularly true for networking data. Therefore, the lack of high quality data is often recognized as one of the main factors currently limiting networking research from fully leveraging AI methodologies potential. Following numerous requests we received from the scientific community, we release AppClassNet, a commercial-grade dataset for benchmarking traffic classification and management methodologies. AppClassNet is significantly larger than the datasets generally available to the academic community in terms of both the number of samples and classes, and reaches scales similar to the popular ImageNet dataset commonly used in computer vision literature. To avoid leaking user- and business-sensitive information, we opportunely anonymized the dataset, while empirically showing that it still represents a relevant benchmark for algorithmic research. In this paper, we describe the public dataset and our anonymization process. We hope that AppClassNet can be instrumental for other researchers to address more complex commercial-grade problems in the broad field of traffic classification and management.
Conference Paper
Encrypted Network traffic has caused the network classification complicated; therefore, we use deep learning to classify the encrypted network traffic in this paper. The training and testing set is ISCX VPN-non VPN collected in UNB. The simulation result shows that we could improve the network encrypted traffic classification by nearly %2.5 in facing new flows compared with similar research.
Article
Full-text available
In modern networks, network visibility is of utmost importance to network operators. Accordingly, granular network traffic classification quickly rises as an essential technology due to its ability to provide high network visibility. Granular network traffic classification categorizes traffic into detailed classes like application names and services. Application names represent parent applications, such as Facebook, while application services are the individual actions within the parent application, such as Facebook-comment. Most studies on granular classification focus on classification at the application name level. Besides that, evaluations in existing studies are also limited and utilize only static and immutable datasets, which are insufficient to reflect the continuous and evolving nature of real-world traffic. Therefore, this paper aims to introduce a granular classification technique, which is evaluated on streaming traffic. The proposed technique implements two Adaptive Random Forest classifiers linked together using a classifier chain to simultaneously produce classification at two granularity levels. Performance evaluation on a streaming testbed setup using Apache Kafka showed that the proposed technique achieved an average F1 score of 99% at the application name level and 88% at the application service level. Additionally, the performance benchmark on ISCX VPN non-VPN public dataset also maintained comparable results, besides recording classification time as low as 2.6 ms per packet. The results conclude that the proposed technique proves its advantage and feasibility for a granular classification in streaming traffic.
Chapter
Traffic flow classification is an important enabler in network design, capacity planning, identification of user requirements and possible tracking of user population growth based on network usage. In this paper, results from the Internet traffic flow characterization in 1 Mbps community network for a three-week snapshot representing three months of study show that during peak traffic, the network is overwhelmed and service degradation occurs. When the network is upgraded to 10 Mbps the network bandwidth utilization immediately increases dramatically to close in on the new capacity with 20% left unused during peak traffic. The situation gets worse one month later where the network utilization is only 3% away from the maximum capacity. Traffic categorization show that the applications crossing the network are legitimate and acceptable. Since 10 Mbps bandwidth is the capacity that is sustainable for the community and supported by existing technology, bandwidth management is essential to ensure the network remains usable and continues to provide acceptable user experience.KeywordsBandwidthTraffic classificationApplication signature
Chapter
Network traffic classification plays an important role in quality of service engineering. In recent years, it has become apparent that deep learning techniques are effective for this classification task, especially since classical approaches struggle to deal with encrypted traffic. However, deep learning models often tend to be computationally expensive, which weakens their suitability in low-resource community networks. This paper explores the computational efficiency and accuracy of two-dimensional convolutional neural networks (2D-CNNs) deep learning models for packet-based classification of traffic in a community network. We find that 2D-CNNs models attain higher out-of-sample accuracy than traditional support vector machines classifiers and the simpler multi-layer perceptron neural networks, given the same computational resource constraints. The improvement in accuracy offered by the 2D-CNNs has a tradeoff of slower prediction speed, which weakens their relative suitability for use in real-time applications. However, we observe that by reducing the size of the input supplied to the 2D-CNNs, we can improve their prediction speed whilst maintaining higher accuracy than other simpler models.KeywordsNetwork traffic classificationConvolutional neural networksDeep learningCommunity networks
Thesis
Full-text available
Internet traffic analysis is of key interest to network designers, as efficient analysis leads to more efficient and fault-tolerant networks. Internet traffic can be analysed in different ways to perform tasks such as classification and filtration. Deep packet inspection is a means of filtering network traffic by monitoring a stream of packets and identifying strings of data that appear common. Based on information contained in the packet headers, or other protocol-specific parameters, we can distinguish requests made by a specific web application. In this report, Snort is used as a tool for deep packet inspection. Network traffic can be analysed by creating rules for web applications using Snort, and different policies as well can be implemented on Snort for deep packet inspection.
Article
Full-text available
The identification of the nature of the traffic flowing through a TCP/IP network is a relevant target for traffic engineering and security related tasks. Despite the privacy concerns it arises, Deep Packet Inspection (DPI) is one of the most successful current techniques. Nevertheless, the performance of DPI is strongly limited by computational issues related to the huge amount of data it needs to handle, both in terms of number of packets and the length of the packets. One way to reduce the computational overhead with identification techniques is to sample the traffic being monitored. This paper addresses the sensitivity of OpenDPI, one of the most powerful freely available DPI systems, with sampled network traffic. Two sampling techniques are applied and compared: the per-packet payload sampling, and the per-flow packet sampling. Based on the obtained results, some conclusions are drawn to show how far DPI methods could be optimised through traffic sampling.
Conference Paper
Full-text available
Open-source payload-based traffic classifiers are frequently used as a source of ground truth in the traffic classification research field. However, there have been no comprehensive studies that provide evidence that the classifications produced by these software tools are sufficiently accurate for this purpose. In this paper, we present the results of an investigation into the accuracy of four open-source traffic classifiers (L7 Filter, nDPI, libprotoident and tstat) using packet traces captured while using a known selection of common Internet applications, including streaming video, Steam and World of Warcraft. Our results show that nDPI and libprotoident provide the highest accuracy among the evaluated traffic classifiers, whereas L7 Filter is unreliable and should not be used as a source of ground truth.
Article
Full-text available
At present, accurate traffic classification usually requires the use of deep packet inspection to analyse packet pay-load. This requires significant CPU and memory resources and are invasive of network user privacy. In this paper, we propose an alternative traffic classification approach that is lightweight and only examines the first four bytes of packet payload observed in each direction. We have implemented as an open-source library called libprotoident, which we evaluate by comparing its performance against existing traffic classifiers that use deep packet inspection. Our results show that our approach offers comparable (if not better) accuracy than tools that have access to full packet payload, yet requires less processing resources and is more acceptable, from a privacy standpoint, to network operators and users.
Article
Full-text available
This paper aims to show the impact on classification accuracy and the level of computational gain that could be obtained in applying deep packet inspection on truncated peer to peer traffic flows instead of complete ones. Using one of the latest open source classifiers, experiments were conducted to evaluate classification performance on full and truncated network flows for different protocols, focusing on the detection of peer to peer. Despite minor exceptions, all the results show that with the latest deep packet inspection classifiers, which may incorporate different helper technologies, inspecting the first packets at the beginning of each flow, may still provide concrete computational gain while an acceptable level of classification accuracy is maintained. The present paper discusses this tradeoff and provides some recommendations on the number of packets to be inspected for the detection of peer to peer flows and some other common application protocols. As such, a new sampling approach is proposed, which accommodates samples to the stateful classifier's algorithm, taking into consideration the characteristics of the protocols being classified.
Article
Full-text available
Traffic classification technology has increased in relevance this decade, as it is now used in the definition and implementation of mechanisms for service differentiation, network design and engineering, security, accounting, advertising, and research. Over the past 10 years the research community and the networking industry have investigated, proposed and developed several classification approaches. While traffic classification techniques are improving in accuracy and efficiency, the continued proliferation of different Internet application behaviors, in addition to growing incentives to disguise some applications to avoid filtering or blocking, are among the reasons that traffic classification remains one of many open problems in Internet research. In this article we review recent achievements and discuss future directions in traffic classification, along with their trade-offs in applicability, reliability, and privacy. We outline the persistently unsolved challenges in the field over the last decade, and suggest several strategies for tackling these challenges to promote progress in the science of Internet traffic classification.
Conference Paper
Full-text available
In various network tests we often need to use different trace files in order to get the most comprehensive result. This procedure requires multiple input files which were generated in different ways. In this paper we suggest a method for analyzing a traffic measurement and extracting the most typical user behaviors. We introduce the Traffic Descriptive Strings (TDS) which is a projection of measurement data. We present an algorithm which is able to score the similarities between two TDSs.
Article
Full-text available
Much of Internet traffic modeling, firewall, and intrusion detection research requires traces where some ground truth regarding application and protocol is associated with each packet or flow. This paper presents the design, development and experimental evaluation of gt, an open source software toolset for associating ground truth information with Inter-net traffic traces. By probing the monitored host's kernel to obtain information on active Internet sessions, gt gathers ground truth at the application level. Preliminary exper-imental results show that gt's effectiveness comes at little cost in terms of overhead on the hosting machines. Fur-thermore, when coupled with other packet inspection mech-anisms, gt can derive ground truth not only in terms of ap-plications (e.g., e-mail), but also in terms of protocols (e.g., SMTP vs. POP3).
Conference Paper
Full-text available
Enterprise and service provider customers develop, maintain and operate network infrastructure in order to support the applications required to perform their day to day tasks. These applications have certain requirements and expectations from the infrastructure, including access to public networks, and thus rely on quality of service (QoS) controls to manage network traffic. QoS controls are used to ensure non-critical applications do not hamper the operation of critical ones, all the while providing fair access to all legitimate applications. QoS systems are increasingly being used as firewalls, filtering bad traffic and allowing good traffic to traverse the network without delay. This paper investigates the effectiveness of protocol matching within current QoS classifiers and shows that even with the most up to date classifiers, “unknown” or unidentified traffic is still prevalent on a network; a serious concern for IT network administrators. This “unknown traffic could consist of viruses, attempted exploits and other un-authorized connectivity from outside sources.
Conference Paper
Full-text available
Traffic classification approaches based on deep packet inspection (DPI) are considered very accurate, however, two major drawbacks are their invasiveness with respect to users privacy, and their significant computational cost. Both are a consequence of the amount of per-flow payload data - we refer to it as "deepness" - typically inspected by such algorithms. At the opposite side, the fastest and least data-eager traffic classification approach is based on transport-level ports, even though today it is mostly considered inaccurate. In this paper we propose a novel approach to traffic classification - named PortLoad - that takes the advantages of both worlds: the speed, simplicity and reduced invasiveness of port-based approaches, on a side, and the classification accuracy of DPI on the other one.
Conference Paper
Full-text available
Recent research on Internet traffic classification algorithms has yield a flurry of proposed approaches for distinguishing types of traffic, but no systematic comparison of the various algorithms. This fragmented approach to traffic classification research leaves the operational community with no basis for consensus on what approach to use when, and how to interpret results. In this work we critically revisit traffic classification by conducting a thorough evaluation of three classification approaches, based on transport layer ports, host behavior, and flow features. A strength of our work is the broad range of data against which we test the three classification approaches: seven traces with payload collected in Japan, Korea, and the US. The diverse geographic locations, link characteristics and application traffic mix in these data allowed us to evaluate the approaches under a wide variety of conditions. We analyze the advantages and limitations of each approach, evaluate methods to overcome the limitations, and extract insights and recommendations for both the study and practical application of traffic classification. We make our software, classifiers, and data available for researchers interested in validating or extending this work.
Conference Paper
Full-text available
Interesting research in the areas of traffic classification, network monitoring, and application-oriented analysis can not proceed without real traffic traces, labeled with actual application information. However, hand-labeled traces are an extremely valuable but scarce resource in the traffic monitoring and analysis community, as a result of both privacy concerns and technical difficulties. Hardly any possibility exists for payloaded data to be released, while the impossibility of obtaining certain ground-truth application information from non-payloaded data has severely constrained the value of anonymized public traces. The usual way to obtain the ground truth is fragile, inefficient and not directly comparable from one’s work to another. This paper proposes a methodology and details the design of a technical framework that significantly boosts the efficiency in compiling the application traffic ground truth. Further, a case study on a 30 minute real data trace is presented. In contrast with past work, this is an easy hands-on tool suite dedicated to save user’s time and labor and is freely available to the public.
Conference Paper
Full-text available
Detailed knowledge of the traffic mixture is essential for network operators and administrators, as it is a key input for numerous network management activities. Traffic classification aims at identifying the traffic mixture in the network. Several different classification approaches can be found in the literature. However, the validation of these methods is weak and ad hoc, because neither a reliable and widely accepted validation technique nor reference packet traces with well-defined content are available. In this paper, a novel validation method is proposed for characterizing the accuracy and completeness of traffic classification algorithms. The main advantages of the new method are that it is based on realistic traffic mixtures, and it enables a highly automated and reliable validation of traffic classification. As a proof-of-concept, it is examined how a state-of-the-art traffic classification method performs for the most common application types.
Article
Full-text available
The objective of this study was to assess the prevalence and clinical characteristics of anemia and to define the risk factors for anemia in older Koreans in Asan. From January to February 2002, five hundred sixty two community-dwelling older adults aged over 60 years were selected from a cross sectional study. All subjects underwent laboratory tests, which included a complete blood cell count, reticulocyte, liver and renal functional tests, lipid profiles and iron status tests. The median age was 68.6 years (range, 60-92 years). The mean levels of hemoglobin were 14.4 +/- 1.3 g/dL in men and 12.9 +/- 1.0 g/dL in women, and the overall prevalence of anemia was 12.5% in all subjects, 10.8% in men and 13.6% in women. The prevalence of anemia was the lowest among age group of 60-69 (10.0%) followed by 70-79 (15.5%), and the highest among age over 80 (20.7%), but the difference was significant only for men. The age difference was more distinct in men than in women (p<0.05). The mean hemoglobin level was significantly lower in the subjects aged over 80 y than those in the 60-69 y group (p<0.05). According to a logistic regression analysis, lower albumin and higher creatinine levels were identified as independent risk factors of anemia among older adults in Asan. In conclusion, the overall prevalence of anemia in our study group was 12.5% and the highest (20.7%) among those aged over 80 y.
Article
Full-text available
The purpose of this study was to determine the efficacy of short segment fixation following postural reduction for the re-expansion and stabilization of unstable burst fractures in patients with osteoporosis. Twenty patients underwent short segment fixation following postural reduction using a soft roll at the involved vertebra in cases of severely collapsed vertebrae of more than half their original height. All patients had unstable burst fracture with canal compromise, but their motor power was intact. The surgical procedure included postural reduction for 2 days and bone cement-augmented pedicle screw fixations at one level above, one level below and the fractured level itself. Imaging and clinical findings, including the level of the vertebra involved, vertebral height restoration, injected cement volume, local kyphosis, clinical outcome and complications were analyzed. The mean follow-up period was 15 months. The mean pain score (visual analogue scale) prior to surgery was 8.1, which decreased to 2.8 at 7 days after surgery. The kyphotic angle improved significantly from 21.6+/-5.8 degrees before surgery to 5.2+/-3.7 degrees after surgery. The fraction of the height of the vertebra increased from 35% and 40% to 70% in the anterior and middle portion. There were no signs of hardware pull-out, cement leakage into the spinal canal or aggravation of kyphotic deformities. In the management of unstable burst fracture in patients with severe osteoporosis, short segment pedicle screw fixation with bone cement augmentation following postural reduction can be used to reduce the total levels of pedicle screw fixation and to correct kyphotic deformities.
Chapter
Traffic classification has received increasing attention in the last years. It aims at offering the ability to automatically recognize the application that has generated a given stream of packets from the direct and passive observation of the individual packets, or stream of packets, flowing in the network. This ability is instrumental to a number of activities that are of extreme interest to carriers, Internet service providers and network administrators in general. Indeed, traffic classification is the basic block that is required to enable any traffic management operations, from differentiating traffic pricing and treatment (e.g., policing, shaping, etc.), to security operations (e.g., firewalling, filtering, anomaly detection, etc.).
Article
To overcome the drawbacks of existing methods for traffic classification (by ports, Deep Packet Inspection, statistical classification) a new system has been developed, in which data are collected and classified directly by clients installed on machines belonging to volunteers. Our approach combines the information obtained from the system sockets, the HTTP content types, and the data transmitted through network interfaces. It allows grouping packets into flows and associating them with particular applications or types of service. This paper presents the design of our system, implementation, the testing phase and the obtained results. The performed threat assessment highlights potential security issues and proposes solutions in order to mitigate the risks. Furthermore, it proves that the system is feasible in terms of uptime and resource usage, assesses its performance and proposes future enhancements. We released the system under The GNU General Public License v3.0 and published as a SourceForge project called Volunteer-Based System for Research on the Internet.
Article
Network traffic analysis was traditionally limited to packet header, because the transport protocol and application ports were usually sufficient to identify the application protocol. With the advent of port-independent, peer-To-peer, and encrypted protocols, the task of identifying application protocols became increasingly challenging, thus creating a motivation for creating tools and libraries for network protocol classification. This paper covers the design and implementation of nDPI, an open-source library for protocol classification using both packet header and payload. nDPI was extensively validated in various monitoring projects ranging from Linux kernel protocol classification, to analysis of 10 Gbit traffic, reporting both high protocol detection accuracy and efficiency.
Conference Paper
The validation of the different proposals in the traffic classification literature is a controversial issue. Usually, these works base their results on a ground-truth built from private datasets and labeled by techniques of unknown reliability. This makes the validation and comparison with other solutions an extremely difficult task. This paper aims to be a first step towards addressing the validation and trustworthiness problem of network traffic classifiers. We perform a comparison between 6 well-known DPI-based techniques, which are frequently used in the literature for ground-truth generation. In order to evaluate these tools we have carefully built a labeled dataset of more than 500 000 flows, which contains traffic from popular applications. Our results present PACE, a commercial tool, as the most reliable solution for ground-truth generation. However, among the open-source tools available, NDPI and especially Libprotoident, also achieve very high precision, while other, more frequently used tools (e.g., L7-filter) are not reliable enough and should not be used for ground-truth generation in their current form.
Conference Paper
To overcome the drawbacks of existing methods for traffic classification (by ports, Deep Packet Inspection, statistical classification) a new system was developed, in which the data are collected from client machines. This paper presents design of the system, implementation, initial runs and obtained results. Furthermore, it proves that the system is feasible in terms of uptime and resource usage, assesses its performance and proposes future enhancements.
Conference Paper
Traffic identification is an important issue in the network industry. Due to the rapid increase of applications and protocols in the Internet, traffic identification based on TCP/UDP port numbers is no longer a practical approach. Deep packet inspection (DPI) thus becomes necessary, which scans the payload of a flow for certain patterns. In this paper, we analyze the architectures of two popular open-source DPI solutions, L7-filter and OpenDPI, along with their capabilities and limitations. Our extension to L7-filter, called L7-filter-U, which improves the detection accuracy on UDP flows, is also presented. Experiments on real-world traces show that OpenDPI has higher detection accuracy than L7-filter-U, which in turn performs better than L7-filter.
Article
Ground truth information for Internet traffic traces is often derived by means of port analysis and payload inspection (Deep Packet Inspection – DPI). In this paper we analyze the errors that DPI and port analysis commit when assigning protocol labels to traffic traces. We compare the ground truth provided by these approaches with that derived by gt, a tool that we developed, which provides error-free ground truth at the application level by construction. Experimental results demonstrate that, depending on the protocols composing a trace, ground truth information from port analysis and DPI can be incorrect for up to 91% and 26% of the labeled bytes, respectively.
Article
The traffic classification problem has recently attracted the interest of both network operators and researchers. Several machine learning (ML) methods have been proposed in the literature as a promising solution to this problem. Surprisingly, very few works have studied the traffic classification problem with Sampled NetFlow data. However, Sampled NetFlow is a widely extended monitoring solution among network operators. In this paper we aim to fulfill this gap. First, we analyze the performance of current ML methods with NetFlow by adapting a popular ML-based technique. The results show that, although the adapted method is able to obtain similar accuracy than previous packet-based methods (≈90%), its accuracy degrades drastically in the presence of sampling. In order to reduce this impact, we propose a solution to network operators that is able to operate with Sampled NetFlow data and achieve good accuracy in the presence of sampling.
Conference Paper
A flow identification in backbone traffic is a crucial problem for network management to provide better service to the users. However, application traffic on backbone is in general hard to be identified because of the amount of traffic volume, asymmetric nature of the route, and difficulty of capturing full payload. In this paper, we present our preliminary results of the basic ability and limitation of four well-known flow identification algorithms (port-based heuristics and payload-based one) applied to packet trace called MAWI trace measured at a trans-pacific link. First, we show that the identification ratio of the payload-based algorithms is 40-60% in flow level, however the accuracy of the identification in the payload-based algorithms strongly depends on the algorithm and the definition of the rules themselves. Next, we found that only 3% of the traffic is commonly identified by the payload-based algorithms when we apply them to unknown traffic reported by the port-based heuristics. Finally, we evaluate the dependency of the flow size on the identification ratio. Those results emphasize the need for a more accurate and available identification algorithm.
Conference Paper
The fast changing application types and their behavior require consecutive measurements of access networks. In this paper, we present the results of a 14-day measurement in an access network connecting 600 users with the Internet. Our application classification reveals a trend back to HTTP traffic, underlines the immense usage of flash videos, and unveils a participant of a Botnet. In addition, flow and user statistics are presented, which resulting traffic models can be used for simulation and emulation of access networks.
Based on w3schools statistics [34], the most popular web browsers are: Chrome (48
  • Web
  • Traffic
Web browsing traffic. Based on w3schools statistics [34], the most popular web browsers are: Chrome (48.4% of users), Firefox (30.2%), and Internet Explorer (14.3%).
According to the reports from Palo Alto [28], they account for 9% of the total band-width, where 6% of total is SSL and 2% of total is SSH. SSL (Windows, Linux): collected while using vari-ous applications and web services
  • Encrypted
  • Traffic
Encrypted tunnel traffic. According to the reports from Palo Alto [28], they account for 9% of the total band-width, where 6% of total is SSL and 2% of total is SSH. SSL (Windows, Linux): collected while using vari-ous applications and web services. SSH (Linux).
DPI Over Commodity Hardware: Implementation of a Scalable Framework using FastFlow, Master's Thesis
  • Daniele Sensi
  • Marco Danelutto
  • Luca Deri
Daniele De Sensi, Marco Danelutto, Luca Deri, DPI Over Commodity Hardware: Implementation of a Scalable Framework using FastFlow, Master's Thesis, Universitá di Pisa, Italy, 2012. <http:// etd.adm.unipi.it/t/etd-02042013-101033/>.
Intelligent Network Based Application Recognition, uS Patent 6
  • M Ott
M. Ott, Intelligent Network Based Application Recognition, uS Patent 6,961,770, November 2005.
Application Usage and Threat Report
  • Palo Alto
Palo Alto Networks. Application Usage and Threat Report, 2013. <https://www.paloaltonetworks.com/resources/whitepapers/ application-usage-and-threat-report.html>.
Reviewing traffic classification, in: Data Traffic Monitoring and Analysis
  • S Valenti
  • D Rossi
  • A Dainotti
  • A Pescapè
  • A Finamore
  • M Mellia
S. Valenti, D. Rossi, A. Dainotti, A. Pescapè, A. Finamore, M. Mellia, Reviewing traffic classification, in: Data Traffic Monitoring and Analysis, Springer, Berlin Heidelberg, 2013, pp. 123–147, http:// dx.doi.org/10.1007/978-3-642-36784-7_6.