Conference Paper

Identifying VPN Servers through Graph-Represented Behaviors

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
Anonymous server is created for hiding the information of hosts when they are surfing the Internet, such as Tor, Shadowsocks, etc. It is quite difficult to identify these servers, which provides potential criminals with opportunities to commit crime. Also, hackers can make use of these servers to threaten public network security, such as DDoS and Phishing attacks. Hence, the study of identifying these servers is pretty crucial. Current works on detecting Shadowsocks servers are mostly based on the features of servers’ data stream combined with machine learning. However, they are passive methods because they can only be established when the servers are in connection state. Therefore, we propose a new system named ACER, which AC means active and ER means expert, to detect these servers. Besides, we introduce XGBoost algorithm to process the data stream to optimize the detection. The method can recognize more Shadowsocks servers actively instead of monitoring the communication tunnel passively to identify the servers. The experiment result has achieved an accuracy of 94.63% by taking proposed framework and 1.20% more accurate than other existing solutions. We hope to provide a novel solution for those who are conducting research in this area, and provide a detection scheme for network censors to block illegal servers at the same time.
Conference Paper
Full-text available
Global Internet users increasingly rely on virtual private network (VPN) services to preserve their privacy, circumvent censorship, and access geo-filtered content. Due to their own lack of technical sophistication and the opaque nature of VPN clients, however, the vast majority of users have limited means to verify a given VPN service's claims along any of these dimensions. We design an active measurement system to test various infrastructural and privacy aspects of VPN services and evaluate 62 commercial providers. Our results suggest that while commercial VPN services seem, on the whole, less likely to intercept or tamper with user traffic than other, previously studied forms of traffic proxying, many VPNs do leak user traffic---perhaps inadvertently---through a variety of means. We also find that a non-trivial fraction of VPN providers transparently proxy traffic, and many misrepresent the physical location of their vantage points: 5--30% of the vantage points, associated with 10% of the providers we study, appear to be hosted on servers located in countries other than those advertised to users.
Article
Full-text available
Software-defined networking (SDN) has emerged as a new network paradigm that promises control/data plane separation and centralized network control. While these features simplify network management and enable innovative networking, they give rise to persistent concerns about reliability. The new paradigm suffers from the disadvantage that various network faults may consistently undermine the reliability of such a network, and such faults are often new and difficult to resolve with existing solutions. To ensure SDN reliability, fault management, which is concerned with detecting, localizing, correcting and preventing faults, has become a key component in SDN networks. Although many SDN fault management solutions have been proposed, we find that they often resolve SDN faults from an incomplete perspective which may result in side effects. More critically, as the SDN paradigm evolves, additional fault types are being exposed. Therefore, comprehensive reviews and constant improvements are required to remain on the leading edge of SDN fault management. In this paper, we present the first comprehensive and systematic survey of SDN faults and related management solutions identified through advancements in both the research community and industry. We apply a systematic classification of SDN faults, compare and analyze existing SDN fault management solutions in the literature, and conduct a gap analysis between solutions developed in an academic research context and practical deployments. The current challenges and emerging trends are also noted as potential future research directions. This paper aims to provide academic researchers and industrial engineers with a comprehensive survey with the hope of advancing SDN and inspiring new solutions.
Conference Paper
Full-text available
There has been a growth in popularity of privacy in the personal computing space and this has influenced the IT industry. There is more demand for websites to use more secure and privacy focused technologies such as HTTPS and TLS. This has had a knock-on effect of increasing the popularity of Virtual Private Networks (VPNs). There are now more VPN offerings than ever before and some are exceptionally simple to setup. Unfortunately, this ease of use means that businesses will have a need to be able to classify whether an incoming connection to their network is from an original IP address or if it is being proxied through a VPN. A method to classify an incoming connection is to make use of machine learning to learn the general patterns of VPN and non-VPN traffic in order to build a model capable of distinguishing between the two in real time. This paper outlines a framework built on a multilayer perceptron neural network model capable of achieving this goal.
Article
Full-text available
Internet traffic classification has become more important with rapid growth of current Internet network and online applications. There have been numerous studies on this topic which have led to many different approaches. Most of these approaches exploit predefined features extracted by an expert in order to classify network traffic. In contrast, in this study, we propose a deep learning based approach which integrate both feature extraction and classification phases into one system. Our proposed scheme, called "Deep Packet," can handle both traffic categorization in which the network traffic is categorize into major classes (e.g. FTP, P2P, etc.) and application identification in which one identify end-user applications (e.g., BitTorrent, Skype, etc.). Contrary to most of the current methods, Deep Packet can identify encrypted traffic and also distinguishes between VPN and non-VPN network traffic. After initial pre-processing phase on data, packets are fed to Deep Packet framework that embeds stacked autoencoder and convolution neural network in order to classify network traffic. Deep packet with CNN as its classification model achieved F1F_{1} score of 0.95 in application identification and it also accomplished F1F_{1} score of 0.97 in traffic categorization task. To the best of our knowledge, Deep Packet outperforms all of the classification and categorization methods on UNB ISCX VPN-nonVPN dataset.
Conference Paper
Full-text available
Traffic characterization is one of the major challenges in today’s security industry. The continuous evolution and generation of new applications and services, together with the expansion of encrypted communications makes it a difficult task. Virtual Private Networks (VPNs) are an example of encrypted communication service that is becoming popular, as method for bypassing censorship as well as accessing services that are geographically locked. In this paper, we study the effectiveness of flow-based time-related features to detect VPN traffic and to characterize encrypted traffic into different categories, according to the type of traffic e.g., browsing, streaming, etc. We use two different well-known machine learning techniques (C4.5 and KNN) to test the accuracy of our features. Our results show high accuracy and performance, confirming that time-related features are good classifiers for encrypted traffic characterization.
Article
VPN adoption has seen steady growth over the past decade due to increased public awareness of privacy and surveillance threats. In response, certain governments are attempting to restrict VPN access by identifying connections using “dual use” DPI technology. To investigate the potential for VPN blocking, we develop mechanisms for accurately fingerprinting connections using OpenVPN, the most popular protocol for commercial VPN services. We identify three fingerprints based on protocol features such as byte pattern, packet size, and server response. Playing the role of an attacker who controls the network, we design a two-phase framework that performs passive fingerprinting and active probing in sequence. We evaluate our framework in partnership with a million-user ISP and find that we identify over 85% of OpenVPN flows with only negligible false positives, suggesting that OpenVPN-based services can be effectively blocked with little collateral damage. Although some commercial VPNs implement countermeasures to avoid detection, our framework successfully identified connections to 34 out of 41 “obfuscated” VPN configurations. We discuss the implications of the VPN fingerprintability for different threat models and propose short-term defenses. In the longer term, we urge commercial VPN providers to be more transparent about their obfuscation approaches and to adopt more principled detection countermeasures, such as those developed in censorship circumvention research.
Chapter
With the increase of remote working during and after the COVID-19 pandemic, the use of Virtual Private Networks (VPNs) around the world has nearly doubled. Therefore, measuring the traffic and security aspects of the VPN ecosystem is more important now than ever. VPN users rely on the security of VPN solutions, to protect private and corporate communication. Thus a good understanding of the security state of VPN servers is crucial. Moreover, properly detecting and characterizing VPN traffic remains challenging, since some VPN protocols use the same port number as web traffic and port-based traffic classification will not help. In this paper, we aim at detecting and characterizing VPN servers in the wild, which facilitates detecting the VPN traffic. To this end, we perform Internet-wide active measurements to find VPN servers in the wild, and analyze their cryptographic certificates, vulnerabilities, locations, and fingerprints. We find 9.8M VPN servers distributed around the world using OpenVPN, SSTP, PPTP, and IPsec, and analyze their vulnerability. We find SSTP to be the most vulnerable protocol with more than 90% of detected servers being vulnerable to TLS downgrade attacks. Out of all the servers that respond to our VPN probes, 2% also respond to HTTP probes and therefore are classified as Web servers. Finally, we use our list of VPN servers to identify VPN traffic in a large European ISP and observe that 2.6% of all traffic is related to these VPN servers.
Chapter
Virtual Private Network (VPN) technology is now widely used in various scenarios such as telecommuting. The importance of VPN traffic identification for network security and management has increased significantly with the development of proxy technology. Unlike other tasks such as application classification, VPN traffic has only one flow problem. In addition, the development of encryption technology brings new challenges to VPN traffic identification. This paper proposes VT-GAT, a VPN traffic graph classification model based on Graph Attention Networks (GAT), to solve the above problems. Compared with existing VPN encrypted traffic classification techniques, VT-GAT solves the problem that previous techniques ignore the graph connectivity information contained in traffic. VT-GAT first constructs traffic behavior graphs by characterizing raw traffic data at packet and flow levels. Then it combines graph neural networks and attention mechanisms to extract behavioral features in the traffic graph data automatically. Extensive experimental results on the Datacon21 dataset show that VT-GAT can achieve over 99% in all classification metrics. Compared to existing machine learning and deep learning methods, VT-GAT improves F1-Score by about 3.02%–63.55%. In addition, VT-GAT maintains good robustness when the number of classification categories varies. These results demonstrate the usefulness of VT-GAT in the VPN traffic classification.
Article
The anonymous nature of darknets is commonly exploited for illegal activities. Previous research has employed machine learning and deep learning techniques to automate the detection of darknet traffic in an attempt to block these criminal activities. This research aims to improve darknet traffic detection by assessing a wide variety of machine learning and deep learning techniques for the classification of such traffic and for classification of the underlying application types. We find that a Random Forest model outperforms other state-of-the-art machine learning techniques used in prior work with the CIC-Darknet2020 dataset. To evaluate the robustness of our Random Forest classifier, we obfuscate select application type classes to simulate realistic adversarial attack scenarios. We demonstrate that our best-performing classifier can be degraded by such attacks, and we consider ways to effectively deal with such adversarial attacks.
Article
Identifying mobile applications (apps) from encrypted network traffic (also known as app fingerprinting) plays an important role in areas like network management, advertising analysis, and quality of service. Existing methods mainly extract traffic features from packet-level information (e.g. packet size sequence) and build up classifiers to obtain good performance. However, the packet-level information suffers from small discrimination for the common traffic across apps (e.g. advertising traffic) and rapidly changing for the traffic before and after apps’ updating. As a result, their performance declines in these two real scenes. In this paper, we propose FG-Net, a novel app fingerprinting based on graph neural network (GNN). FG-Net leverages a novel kind of information: flow-level relationship, which is distinctive between different apps and stable across apps’ versions. We design an information-rich graph structure, named FRG, to embed both raw packet-level information and flow-level relationship of traffic concisely. With FRG, we transfer the problem of mobile encrypted traffic fingerprinting into a task of graph representation learning, and we designed a powerful GNN-based traffic fingerprint learner. We conduct comprehensive experiments on both public and private datasets. The results show the FG-Net outperforms the SOTAs in classifying traffic with about 18% common traffic. Without retraining, FG-Net obtains the most robustness against the updated traffic and increases the accuracy by 5.5% compared with the SOTAs.
Article
Hybrid electrical/optical (E/O) switching data center network (DCN) has recently emerged as a promising paradigm for future DCN architectures. However, there exist two major challenges: 1) the traffic is a mixture of both stable and burst components due to the diverse and heterogeneous user demands; 2) current scheduling algorithms are mostly static and not designed for the complex structure of hybrid E/O switching DCN, provoking frequent burst traffic congestion and performance degradation. This paper endeavors to overcome the above challenges as follows. We first construct an error feedback-based spiking neural network (SNN) framework with high accuracy burst traffic prediction. We then design a prediction-assisted scheduling algorithm to handle the worst-case burst traffic. On the one hand, the error feedback-based SNN framework can significantly enhance the extraction of burst traffic features by mimicking the biological neuron system. On the other hand, prediction-assisted scheduling arranges the well-predicted traffic using a global evaluation factor and a traffic scaling factor. The simulation results reveal that our approach can efficiently integrate a spiking neural network into the traffic scheduling scheme and achieve satisfying performance with affordable computational complexity.
Article
Micro-burst traffic is not uncommon in data centers. It can cause packet dropping, which may result in serious performance degradation (e.g., Incast problem). However, current approaches to mitigate micro-burst is usually ad-hoc and not based on a principled understanding of the underlying behaviors. On the other hand, traditional studies focus on traffic burstiness in a single flow, while micro-burst traffic in the data centers could occur with highly fan-in communication pattern, and its dynamic behavior is still unclear. To this end, in this paper, we re-examine the micro-burst traffic in typical data center scenarios. We find that the evolution of micro-burst is determined by both TCP’s self-clocking mechanism and congestion control algorithm. Besides, dynamic behaviors of micro-burst under various scenarios can all be described by the time derivative of queue length evolution.Our observations also implicate that conventional solutions like absorbing and pacing are ineffective to mitigate micro-burst traffic.Instead, senders need to rapidly respond to some explicit signals of the queue buildup caused by the micro-burst traffic rather than independently and ineffectually pacing themselves in isolation. Inspired by the findings and insights from experimental observations, we propose Micro-burst-Aware Transport Control Protocol (MATCP), which leverages characteristic behaviors of micro-burst traffic derived from the time derivative of the queue occupancy. MATCP can suppress the sharp queue length increment by over 2×2\times and reduce the tail query completion time by up to 84.4%.
Article
The Quality of Service (QoS) is a continuous challenge issue in the telecommunication industry, mainly for having an impact on telco services provision. Traffic Classification, Traffic Marking, and Policing are general stages of QoS managing. Different approaches have focused on Traffic Classification and Traffic Marking, which machine learning algorithms arise as promising techniques ones. However, Traffic Marking overtime-related features is not widely explored, especially for Virtual Private Network (VPN) traffic. Hence, a specific QoS classifier for VPN traffic based on per-hop behavior (PHB) for a specific domain was proposed. To this end, a baseline QoS-Marked dataset was generated from a characterized VPN traffic; to which some machine learning algorithms were compared and a T-Tester was performed. As a result, Bagging-based learning model has the best behavior for all scenarios in which the higher value achieved was a 94,42% accuracy. Consequently, a QoS classifier is an effective approach for traffic treatment on Differentiated Services (DiffServ) networks.
Article
We present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs. We motivate the choice of our convolutional architecture via a localized first-order approximation of spectral graph convolutions. Our model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes. In a number of experiments on citation networks and on a knowledge graph dataset we demonstrate that our approach outperforms related methods by a significant margin.
Conference Paper
Censorship-circumvention systems are designed to help users bypass Internet censorship. As more sophisticated deep-packet-inspection (DPI) mechanisms have been deployed by censors to detect circumvention tools, activists and researchers have responded by developing network protocol obfuscation tools. These have proved to be effective in practice against existing DPI and are now distributed with systems such as Tor. In this work, we provide the first in-depth investigation of the detectability of in-use protocol obfuscators by DPI. We build a framework for evaluation that uses real network traffic captures to evaluate detectability, based on metrics such as the false-positive rate against background (i.e., non obfuscated) traffic. We first exercise our framework to show that some previously proposed attacks from the literature are not as effective as a censor might like. We go on to develop new attacks against five obfuscation tools as they are configured in Tor, including: two variants of obfsproxy, FTE, and two variants of meek. We conclude by using our framework to show that all of these obfuscation mechanisms could be reliably detected by a determined censor with sufficiently low false-positive rates for use in many censorship settings.
Article
Nmap Network Scanning is the official guide to the Nmap Security Scanner, a free and open source utility used by millions of people for network discovery, administration, and security auditing. From explaining port scanning basics for novices to detailing low-level packet crafting methods used by advanced hackers, this book suits all levels of security and networking professionals. A 42-page reference guide documents every Nmap feature and option, while the rest of the book demonstrates how to apply those features to quickly solve real-world tasks. Examples and diagrams show actual communication on the wire. Topics include subverting firewalls and intrusion detection systems, optimizing Nmap performance, and automating common networking tasks with the Nmap Scripting Engine. Hints and instructions are provided for common uses such as taking network inventory, penetration testing, detecting rogue wireless access points, and quashing network worm outbreaks. Nmap runs on Windows, Linux, and Mac OS X. Nmap's original author, Gordon "Fyodor" Lyon, wrote this book to share everything he has learned about network scanning during more than 11 years of Nmap development. Visit http://nmap.org/book for more information and sample chapters.
Toward a generic fault tolerance technique for partial network partitioning
  • Mohammed Alfatafta
  • Basil Alkhatib
  • Ahmed Alquraan
  • Samer Al-Kiswany
  • Alfatafta Mohammed
Threat modeling and circumvention of Internet censorship
  • David Fifield
  • Fifield David
Topological based classification using graph convolutional networks
  • Roy Abel
  • Idan Benami
  • Yoram Louzoun
  • Abel Roy
Fraud Detection and Bot Detection Solutions | Detect Fraud with IPQS
  • Ipqualityscore
Chasing Shadows: A security analysis of the ShadowTLS proxy
  • Gaukas Wang
  • Jackson Sippe
  • Hai Chi
  • Eric Wustrow
  • Wang Gaukas
How the Great Firewall of China detects and blocks fully encrypted traffic
  • Mingshi Wu
  • Jackson Sippe
  • Danesh Sivakumar
  • Jack Burg
  • Peter Anderson
  • Xiaokang Wang
  • Kevin Bock
  • Amir Houmansadr
  • Dave Levin
  • Eric Wustrow
  • Wu Mingshi
Simplifying graph convolutional networks
  • Felix Wu
  • Amauri Souza
  • Tianyi Zhang
  • Christopher Fifty
  • Tao Yu
  • Kilian Weinberger
  • Wu Felix