Article

Network flow identification based on machine learning

Authors:
If you want to read the PDF, try requesting it from the authors.

Abstract

Machine learning with C4.5 algorithm is proposed for network traffic identification. The correlation feature selection algorithm and the genetic algorithm are adopted to select the attribute feature subset. A method of combining N-fold cross validation with testing set is suggested to assess the classification results of the current national broadband network traffic. Experiments demonstrate that network traffic can be successfully identified and analyzed, meanwhile, the port number and the application layer protocol label of network flows are not necessary to be known in advance.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
The massive demand for broadband mobile network services are quite successfully covered already by 3G and 4G cellular mobile systems. The challenges for 5G are more diverse: answering the demands of Ultra‐Reliable Low Latency Communications (URLLC) and massive Machine Type Communications (mMTC) users—besides elevating mobile broadband to the next level. While generic, high‐level targets for KPIs (Key Performance Indicators) are widely communicated, it is not yet well‐understood how the various demands can affect the traffic mixture. Both the radio‐ and the core‐domains of the cellular network have to cope with traffic peaks, and have to obey various QoS (Quality of Service) guarantees. In order to cover these gaps, traffic‐related characteristics (data volume, signaling message types, and traffic peaks) should be determined, and this knowledge should be used during network planning, optimization, and service shaping. This paper aims to provide insights into user behavioral patterns for these three key application areas: enhanced Mobile Broadband (eMBB), URLLC and mMTC. Since traffic volume‐ and burst‐related user behavior is not expected to change suddenly, current targeted data collection on legacy mobile network links would provide a good basic insight for future, 5G usage—at least as traffic patterns. We have collected live pre‐5G mobile network data then analyzed them throughout this paper in order to reveal traffic patterns—and their distinguishing features—for the three key 5G application areas.
Article
In order to identify classification quickly and accurately, an early traffic classification method (ETCM) is proposed. The method uses the payload size of three early packets and the server port number obtained from the TCP flow as flow feature, and classifies the traffic based on support vector machine (SVM). The results show that ETCM meets the following conditions: extracted features used, training samples selected without bias, Internet traffic related to WEB, MAIL, BitTorrent and eMule can be identified efficiently and quickly.
Article
To solve the problem of network traffic identification online, a clustering algorithm and a traffic identification scheme is proposed. The scheme uses a few number of the initial data packets in the flows as a sub-flow, extracts the statistical features from sub-flows, and extracts the best feature subset of sub-flows by applying correlation-based filter approach. The network traffic flows are clustered by on-line density based spatial clustering of applications with noise algorithm, and mapped to application types by the dominant application in clusters. Experiments show that the scheme can identify new application types and encrypted flows, and can be implemented in online network traffic classification.
Article
P2P traffic identification model is constructed based on the ensemble learning algorithm, which integrats DTNB, ONER and BP neural network algorithm. Using network flow characteristics and the integrated classification algorithm for rule generation in machine learning, network traffic flow is divided into two types, P2P and non-P2P traffic. The model consists of three steps, i.e. gaining network flow characteristics, P2P traffic feature selection and the establishment of flow classification model. The rationality of the model and the effectiveness of the proposed method are evaluated using method of combining T-fold cross-validation and test sets. The experiment results have shown that the average precision of traffic classification reaches 97.27% and the model has relatively high P2P flow identification accuracy.
Article
To obtain peer to peer (P2P) flow information accurately in a realtime and low-cost way, NetFlow based P2P flows analysis system called netFlow-based P2P flows analysis system(NPFAS) is proposed, it loosely integrates systems for flow aggregation, flow identification and flow analysis respectively in a distributed manner. By utilizing the key information of P2P flows provided by P2P identification systems, NPFAS can extract P2P flow statistical information from NetFlow records generated by NetFlow-enabled devices and analyze it by some existing analysis systems. NPFAS provides a feasible solution for analyzing P2P flows based on the Linux-Intel framework. Experiments of the prototype system have verified feasibility and availability of NPFAS.
Article
A novel P2P traffic identification method based on neural network ensemble is proposed. A P2P flow detection model is developed by using correlation-based feature selection (CFS) algorithm to extract P2P flow characteristics, and utilizing six ensemble neural networks by dynamic weighted integration method. Through experimental comparison between this proposed model and traditional methods, such as single BP neural network, decision tree, Bayesian, and support vector machine, it is shown that the proposed method has a better P2P traffic identification accuracy and stability.
Article
The population diversity of conventional genetic algorithm can be easily destroyed, which further leads to premature convergence. To solve this problem, based on adaptive genetic algorithm (AGA) proposed by Srinivas, a modified adaptive genetic algorithm (MAGA) is presented by introducing a parameter measuring the population diversity. In this way, the probabilities of crossover and mutation are adjusted automatically according to both population diversity and the trends of fitness values. Since MAGA and back-propagation (BP) algorithm are good at searching global and local optimum respectively, an optimized BP neural network based on MAGA (MAGA+BP) is then presented for traffic classification. The Internet traffic dataset provided by university of Cambridge is introduced for experimental validation. Results show that: MAGA shows better performance on maintaining population diversity, overcomes the premature convergence of AGA and improves the fitness value of resulting optimum by 10.17%; MAGA+BP shows a better performance on Internet traffic classification.
Conference Paper
P2P traffic identification model based on machine learning is proposed. The FCBF(Fast Correlation-Based Filter) feature selection algorithm is used to select the P2P flow attribute features subset. A P2P flows identification model is built based on decision tree and FCBF. 10-fold cross-validation method is used to validate the proposed model. Experimental results show that the method of P2P traffic identification based on decision tree is feasible and the FCBF method is a useful method for extracting features from P2P flows.
Article
Acquiring P2P flow information in realtime is very important for network management and security. To obtain P2P flow information accurately in a low-cost way, a novelty P2P flow analysis system called NetFlowP2P is proposed, which integrates systems for flow aggregation, flow identification and flow analysis respectively in a distributed manner. By utilizing the key information of P2P flows provided by P2P identification systems, NetFlowP2P can extract P2P flow statistical information from NetFlow records generated by NetFlow-enabled devices and analyze it by some existing analysis systems. NetFlowP2P provides a feasible solution for analyzing P2P flows based on the Linux-Intel framework. Experimental results of the prototype system have verified feasibility and availability of NetFlowP2P.
ResearchGate has not been able to resolve any references for this publication.