A preview of this full-text is provided by Springer Nature.
Content available from Journal of Computer Virology and Hacking Techniques
This content is subject to copyright. Terms and conditions apply.
Journal of Computer Virology and Hacking Techniques (2020) 16:217–227
https://doi.org/10.1007/s11416-020-00353-z
ORIGINAL PAPER
ACER: detecting Shadowsocks server based on active probe
technology
Jiaxing Cheng1·Ying Li1·Cheng Huang1·Ailing Yu2·Tao Zhang3
Received: 26 August 2019 / Accepted: 2 March 2020 / Published online: 8 April 2020
© Springer-Verlag France SAS, part of Springer Nature 2020
Abstract
Anonymous server is created for hiding the information of hosts when they are surfing the Internet, such as Tor, Shadowsocks,
etc. It is quite difficult to identify these servers, which provides potential criminals with opportunities to commit crime. Also,
hackers can make use of these servers to threaten public network security, such as DDoS and Phishing attacks. Hence, the
study of identifying these servers is pretty crucial. Current works on detecting Shadowsocks servers are mostly based on
the features of servers’ data stream combined with machine learning. However, they are passive methods because they can
only be established when the servers are in connection state. Therefore, we propose a new system named ACER, which AC
means active and ER means expert, to detect these servers. Besides, we introduce XGBoost algorithm to process the data
stream to optimize the detection. The method can recognize more Shadowsocks servers actively instead of monitoring the
communication tunnel passively to identify the servers. The experiment result has achieved an accuracy of 94.63% by taking
proposed framework and 1.20% more accurate than other existing solutions. We hope to provide a novel solution for those
who are conducting research in this area, and provide a detection scheme for network censors to block illegal servers at the
same time.
Keywords Shadowsocks ·XGBoost ·Active detection ·Internet censorship
1 Introduction
Internet censorship is pervasive across the world and mainly
for political reasons [1]. Internet censors implements great
amounts of ways to identify and block Internet access to
information they deem objectionable [2]. In some coun-
try, the censorship even blocks legal Internet services, such
as Chinese government blocked the citizens’ requests to
Google.1With the help of some proxy servers such as VPN,
Tor and Shadowsocks, it is possible for people who want
to acquire useful information to circumvent internet censor-
ship. However, the use of multiple agents also gives hackers
1Great Firewall, https://en.wikipedia.org/wiki/Great_Firewall.
BTao Zhang
zhangtao@stars.org.cn
1College of Cybersecurity, Sichuan University, Chengdu
610065, China
2Department of Computer Science, Boston University, Boston,
MA 02215, USA
3The Third Research Institute of Minister of Public Security,
Shanghai 201204, China
a chance to circumvent internet censorship and hide their
identities for some illegal attacks. In the circumstances, accu-
rately detecting the proxy connections becomes essential to
safeguard cyber security.
The original methods of detecting proxy servers are based
on proxy types, proxy protocols, and HTTP headers, etc [3].
A common limitation of these methods is that they tend
to confuse traffic in some proxy connections. Then, some
researchers proposed a new method combined with machine
learning to detect Shadowsocks, which is able to over-
come the previous shortcoming. The best current performing
machine learning algorithm for detecting Shadowsocks is
the Random Forest algorithm. The accuracy of its detec-
tion based on network layer features using random forest
algorithm has reached 85% [4], and the accuracy of the fea-
tures based on flow context and host behavior has achieved
93.43%, whose method is more suitable for large-scale net-
work environment [5].
Most of the ensemble learning [6] methods used CART
tree [7] as the base learner. As for the Random Forest algo-
rithm, it will overfit on a noisy classification or regression
problem. However, the XGBoost algorithm draws on the
123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.