ArticlePublisher preview available

ACER: detecting Shadowsocks server based on active probe technology

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Anonymous server is created for hiding the information of hosts when they are surfing the Internet, such as Tor, Shadowsocks, etc. It is quite difficult to identify these servers, which provides potential criminals with opportunities to commit crime. Also, hackers can make use of these servers to threaten public network security, such as DDoS and Phishing attacks. Hence, the study of identifying these servers is pretty crucial. Current works on detecting Shadowsocks servers are mostly based on the features of servers’ data stream combined with machine learning. However, they are passive methods because they can only be established when the servers are in connection state. Therefore, we propose a new system named ACER, which AC means active and ER means expert, to detect these servers. Besides, we introduce XGBoost algorithm to process the data stream to optimize the detection. The method can recognize more Shadowsocks servers actively instead of monitoring the communication tunnel passively to identify the servers. The experiment result has achieved an accuracy of 94.63% by taking proposed framework and 1.20% more accurate than other existing solutions. We hope to provide a novel solution for those who are conducting research in this area, and provide a detection scheme for network censors to block illegal servers at the same time.
This content is subject to copyright. Terms and conditions apply.
Journal of Computer Virology and Hacking Techniques (2020) 16:217–227
https://doi.org/10.1007/s11416-020-00353-z
ORIGINAL PAPER
ACER: detecting Shadowsocks server based on active probe
technology
Jiaxing Cheng1·Ying Li1·Cheng Huang1·Ailing Yu2·Tao Zhang3
Received: 26 August 2019 / Accepted: 2 March 2020 / Published online: 8 April 2020
© Springer-Verlag France SAS, part of Springer Nature 2020
Abstract
Anonymous server is created for hiding the information of hosts when they are surfing the Internet, such as Tor, Shadowsocks,
etc. It is quite difficult to identify these servers, which provides potential criminals with opportunities to commit crime. Also,
hackers can make use of these servers to threaten public network security, such as DDoS and Phishing attacks. Hence, the
study of identifying these servers is pretty crucial. Current works on detecting Shadowsocks servers are mostly based on
the features of servers’ data stream combined with machine learning. However, they are passive methods because they can
only be established when the servers are in connection state. Therefore, we propose a new system named ACER, which AC
means active and ER means expert, to detect these servers. Besides, we introduce XGBoost algorithm to process the data
stream to optimize the detection. The method can recognize more Shadowsocks servers actively instead of monitoring the
communication tunnel passively to identify the servers. The experiment result has achieved an accuracy of 94.63% by taking
proposed framework and 1.20% more accurate than other existing solutions. We hope to provide a novel solution for those
who are conducting research in this area, and provide a detection scheme for network censors to block illegal servers at the
same time.
Keywords Shadowsocks ·XGBoost ·Active detection ·Internet censorship
1 Introduction
Internet censorship is pervasive across the world and mainly
for political reasons [1]. Internet censors implements great
amounts of ways to identify and block Internet access to
information they deem objectionable [2]. In some coun-
try, the censorship even blocks legal Internet services, such
as Chinese government blocked the citizens’ requests to
Google.1With the help of some proxy servers such as VPN,
Tor and Shadowsocks, it is possible for people who want
to acquire useful information to circumvent internet censor-
ship. However, the use of multiple agents also gives hackers
1Great Firewall, https://en.wikipedia.org/wiki/Great_Firewall.
BTao Zhang
zhangtao@stars.org.cn
1College of Cybersecurity, Sichuan University, Chengdu
610065, China
2Department of Computer Science, Boston University, Boston,
MA 02215, USA
3The Third Research Institute of Minister of Public Security,
Shanghai 201204, China
a chance to circumvent internet censorship and hide their
identities for some illegal attacks. In the circumstances, accu-
rately detecting the proxy connections becomes essential to
safeguard cyber security.
The original methods of detecting proxy servers are based
on proxy types, proxy protocols, and HTTP headers, etc [3].
A common limitation of these methods is that they tend
to confuse traffic in some proxy connections. Then, some
researchers proposed a new method combined with machine
learning to detect Shadowsocks, which is able to over-
come the previous shortcoming. The best current performing
machine learning algorithm for detecting Shadowsocks is
the Random Forest algorithm. The accuracy of its detec-
tion based on network layer features using random forest
algorithm has reached 85% [4], and the accuracy of the fea-
tures based on flow context and host behavior has achieved
93.43%, whose method is more suitable for large-scale net-
work environment [5].
Most of the ensemble learning [6] methods used CART
tree [7] as the base learner. As for the Random Forest algo-
rithm, it will overfit on a noisy classification or regression
problem. However, the XGBoost algorithm draws on the
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... They detect Shadowsocks traffic using the Random Forest algorithm. Cheng et al. [13] propose an active method for Shadowsocks servers detection. They collect the IP and port of the server as a dataset, and then classify servers of the Shadowsocks using machine learning algorithm XGBoost. ...
Article
Full-text available
Anonymous proxies are used by criminals for illegal network activities due to their anonymity, such as data theft and cyber attacks. Therefore, anonymous proxy traffic detection is very essential for network security. In recent years, detection based on deep learning has become a hot research topic, since deep learning can automatically extract and select traffic features. To make (heterogeneous) network traffic adapt to the homogeneous input of typical deep learning algorithms, a major branch of existing studies convert network traffic into images for detection. However, such studies are commonly subject to the limitation of large-sized image representation of network traffic, resulting in very large storage and computational resource overhead. To address this limitation, a novel method for anonymous proxy traffic detection is proposed. The method is one of the solutions to reduce storage and computational resource overhead. Specifically, it converts the sequences of the size and inter-arrival time of the first N packets of a flow into images, and then categorizes the converted images using the one-dimensional convolutional neural network. Both proprietary and public datasets are used to validate the proposed method. The experimental results show that the converted images of the method are at least 90% smaller than that of existing image-based deep learning methods. With substantially smaller image sizes, the method can still achieve F1 scores up to 98.51% in Shadowsocks traffic detection and 99.8% in VPN traffic detection.
... us, a continuously increasing number of users protect their anonymity while browsing the Internet by utilizing anonymous network communication systems. However, current research [1][2][3][4][5][6][7][8][9][10] shows that privacy can be compromised even though clients use privacy-enhancing technologies such as Shadowsocks [11], I2P [12], Tor [13], Anonymizer [14], SSH, and VPN. Among several cyberattacks compromising anonymity, the website fingerprinting attack is one of the most representative ones. ...
Article
Full-text available
Website fingerprinting attacks allow attackers to determine the websites that users are linked to, by examining the encrypted traffic between the users and the anonymous network portals. Recent research demonstrated the feasibility of website fingerprinting attacks on Tor anonymous networks with only a few samples. Thus, this paper proposes a novel small-sample website fingerprinting attack method for SSH and Shadowsocks single-agent anonymity network systems, which focuses on analyzing homology relationships between website fingerprinting. Based on the latter, we design a Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) attack classification model that achieves 94.8% and 98.1% accuracy in classifying SSH and Shadowsocks anonymous encrypted traffic, respectively, when only 20 samples per site are available. We also highlight that the CNN-BiLSTM model has significantly better migration capabilities than traditional methods, achieving over 90% accuracy when applied on a new set of monitored sites with only five samples per site. Overall, our experiments demonstrate that CNN-BiLSTM is an efficient, flexible, and robust model for website fingerprinting attack classification.
Chapter
It is very difficult to identify Shadowsocks (SS) traffic, most of which stay in the laboratory environment, and there are very few published research results in this field at home and abroad. ShadowsocksR (SSR) is an enhanced version of SS. It can disguise the traffic of SS as that of conventional protocol, such as HTTP traffic, TLS traffic, etc., which makes it more difficult to identify SSR traffic. Based on Xgboost algorithm, this paper proposes a method to identify SSR traffic for the first time. The experimental results show that this method has a good recognition effect on SSR traffic, and the precision, the recall, the accuracy is all above 95.3%.
Article
Full-text available
Cloud Virtual Private Server (VPS) services provide the chance of rapid deployment of anonymous proxy services, becoming an important part of many anonymous proxy solutions. The anonymous system represented by ShadowSocks (SS), through proxy services deployed on VPSs provided by different cloud service providers, has become an important mean for illegal network activists to engage in illegal network activities such as cyber-attacks and darknet transactions. It is difficult for local network administrators to supervise SS traffic from the cloud. While from the local network, the task faces the challenges of Invisible Negotiation Process and Data Transparent Transmission. In this paper, we present a novel SS detection method based on flow context and host behavior. The method can not only accurately identify SS flows, but also be applicable to large-scale network environment. In this method, we extract 12-dimensional features from three aspects: the relationship between flows, hosts’ flow behavior and hosts’ DNS behavior to build the detection model. Among them, the four features about flow burst and the feature of unassociated domain names’ number are innovatively proposed in this paper. Moreover, the big data statistical and association techniques are used in the method. To verify the effectiveness of the method, we firstly built a real SS running environment based on campus network and two VPSs on two different public cloud platforms. And then we conduct a series of experiments on the NTCI-BDP data platform which is a big data platform built by our team. The experimental results show that our method achieves 93.43% accuracy on experimental data sets and can effectively identify SS traffic.
Article
Traffic classification has been studied for two decades and applied to a wide range of applications from QoS provisioning and billing in ISPs to security-related applications in firewalls and intrusion detection systems. Port-based, data packet inspection, and classical machine learning methods have been used extensively in the past, but their accuracy has declined due to the dramatic changes in Internet traffic, particularly the increase in encrypted traffic. With the proliferation of deep learning methods, researchers have recently investigated these methods for traffic classification and reported high accuracy. In this article, we introduce a general framework for deep-learning-based traffic classification. We present commonly used deep learning methods and their application in traffic classification tasks. Then we discuss open problems, challenges, and opportunities for traffic classification.
Book
A comprehensive introduction to Support Vector Machines and related kernel methods. In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs—-kernels—for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics. Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.
Conference Paper
Encryption is widely used across the internet to secure communications and ensure that information cannot be intercepted and read by a third party. However, encryption also allows cybercriminals to hide their messages and carry out successful malware attacks while avoiding detection. Further aiding criminals is the fact that web browsers display a green lock symbol in the URL bar when a connection to a website is encrypted. This symbol gives a false sense of security to users, who are in turn more likely to fall victim to phishing attacks. The risk of encrypted traffic means that information security researchers must explore new techniques to detect, classify, and take countermeasures against malicious traffic. So far there exists no approach for TLS detection in the wild. In this paper, we propose a method for identifying malicious use of web certificates using deep neural networks. Our system uses the content of TLS certificates to successfully identify legitimate certificates as well as malicious patterns used by attackers. The results show that our system is capable of identifying malware certificates with an accuracy of 94.87% and phishing certificates with an accuracy of 88.64%.
Conference Paper
We develop a means to detect ongoing per-country anomalies in the daily usage metrics of the Tor anonymous communication network, and demonstrate the applicability of this technique to identifying likely periods of internet censorship and related events. The presented approach identifies contiguous anomalous periods, rather than daily spikes or drops, and allows anomalies to be ranked according to deviation from expected behaviour. The developed method is implemented as a running tool, with outputs published daily by mailing list. This list highlights per-country anomalous Tor usage, and produces a daily ranking of countries according to the level of detected anomalous behaviour. This list has been active since August 2016, and is in use by a number of individuals, academics, and NGOs as an early warning system for potential censorship events. We focus on Tor, however the presented approach is more generally applicable to usage data of other services, both individually and in combination. We demonstrate that combining multiple data sources allows more specific identification of likely Tor blocking events. We demonstrate the our approach in comparison to existing anomaly detection tools, and against both known historical internet censorship events and synthetic datasets. Finally, we detail a number of significant recent anomalous events and behaviours identified by our tool.
Conference Paper
Internet censorship is pervasive across the world. However, in some countries like China, even legal, nonpolitical services (e.g., Google Scholar) are incidentally blocked by extreme censorship machinery. Therefore, properly accessing legal Internet services under extreme censorship becomes a critical problem. In this paper, we conduct a case study on how scholars from a major university of China access Google Scholar through a variety of middleware. We characterize the common solutions (including VPN, Tor, and Shadowsocks) by measuring and analyzing their performance, overhead, and robustness to censorship. Guided by the study, we deploy a novel solution (called ScholarCloud) to help Chinese scholars access Google Scholar with high performance, ease of use, and low overhead. This work provides an insider's view of China's Internet censorship and offers a legal avenue for coexistence with censorship.