ArticlePDF Available

Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet

Authors:

Abstract and Figures

Proxy servers act as an intermediary and a gateway between users and other servers on the Internet, and have many beneficial applications targeting the privacy of users, including bypassing serverside blocking, regional restrictions, etc. Despite the beneficial applications of proxies, they are also used by adversaries to hide their identity and to launch many attacks. As such, many websites restrict access from proxies, resulting in blacklists to filter out those proxies and to aid in their blocking. In this work, we explore the ecosystem of proxies by understanding their affinities and distributions comparatively. We compare residential and open proxies in various ways, including country-level and city-level analyses to highlight their geospatial distributions, similarities, and differences against a large number of blacklists and categories therein, i.e., spam and maliciousness analysis, to understand their characteristics and attributes. We conclude that, while aiming to achieve the same goal, residential and open proxies still have distinct characteristics warranting considering them separately for the role they play in the larger Internet ecosystem. Moreover, we highlight the correlation of proxy locality distribution and five country-level characteristics, such as their Internet censorship, political stability, and Gross Domestic Product (GDP).
Content may be subject to copyright.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Digital Object Identifier 00.0000
Understanding the Proxy Ecosystem: A
Comparative Analysis of Residential and
Open Proxies on the Internet
JINCHUN CHOI1,2, MOHAMMED ABUHAMAD1,2 , AHMED ABUSNAINA2, AFSAH ANWAR2,
SULTAN ALSHAMRANI2, JEMAN PARK2, DAEHUN NYANG3, AND DAVID MOHAISEN2.
1Inha University, Incheon, South Korea.
2University of Central Florida, Orlando, Florida, USA.
3Ewha Womans University, Seoul, South Korea.
Corresponding authors: DaeHun Nyang and David Mohaisen (e-mail: nyang@ewha.ac.kr, mohaisen@ucf.edu).
ABSTRACT Proxy servers act as an intermediary and a gateway between users and other servers on the
Internet, and have many beneficial applications targeting the privacy of users, including bypassing server-
side blocking, regional restrictions, etc. Despite the beneficial applications of proxies, they are also used
by adversaries to hide their identity and to launch many attacks. As such, many websites restrict access
from proxies, resulting in blacklists to filter out those proxies and to aid in their blocking. In this work,
we explore the ecosystem of proxies by understanding their affinities and distributions comparatively. We
compare residential and open proxies in various ways, including country-level and city-level analyses to
highlight their geospatial distributions, similarities, and differences against a large number of blacklists and
categories therein, i.e., spam and maliciousness analysis, to understand their characteristics and attributes.
We conclude that, while aiming to achieve the same goal, residential and open proxies still have distinct
characteristics warranting considering them separately for the role they play in the larger Internet ecosystem.
Moreover, we highlight the correlation of proxy locality distribution and five country-level characteristics,
such as their Internet censorship, political stability, and Gross Domestic Product (GDP).
INDEX TERMS Residential Proxy, Open Proxy, Comparative Analysis, Geospatial Analysis, Blacklisting
I. INTRODUCTION
Recently, a lot of efforts have been made to improve the
privacy of users on the Internet, building an ecosystem
around privacy enhancing infrastructure. Protecting user’s
privacy is an important concern in all areas of technology
and business alike, and users utilize several approaches to
protect their own privacy [1, 2]. For instance, proxy servers
can be considered one of the easiest approaches for users
to strengthening their privacy by hiding their actual Internet
Protocol (IP) address [3, 4]. Proxy servers, shortly proxies,
act as an intermediary for delivering online communication
between users and Internet services (remote servers). By
connecting to proxies, users do not have to directly send their
request to the remote server (e.g., web server) but to proxies.
When a proxy receives a request from a user for a particular
resource, the proxy first searches the internal cache for that
resource and returns it to the user if found. If not found, the
proxy forwards the request to the server to get a response,
which is passed back to the user. The caching operation of
proxies reduces the need for direct communication between
users and remote servers, which leads to the prevention of
network bottlenecks. Moreover, by sending and receiving
packets through a proxy, users can avoid revealing their IP
addresses to the remote servers.
Besides privacy protection, proxies can also be used to
avoid Internet censorship. Users on the Internet may be
censored by Internet providers and/or governments, in certain
regions. Governments of various country can monitor their
networks and block access to information and sites that are
perceived as harmful (to the public or to the government).
For citizens in those countries, a proxy can be an option to
bypass governmental censorship and retrieve the information
they seek. Rather than accessing a particular website directly,
accessing it through a proxy in another country makes it less
likely to be detected by the Internet censors [5].
A proxy that is open to the public is called an open proxy.
Without any permission from the operator, users can utilize
open proxies to protect their privacy and to access informa-
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
tion that is otherwise restricted by local entities. The list of
available open proxies is continuously updated and broadly
posted on many websites [6, 7]. This accessibility often
results in having the open proxies blacklisted easily. Further-
more, most open proxies have data center IP addresses, so
web service providers can easily recognize whether a request
is coming from a proxy or directly from a user [8].
On the other hand, while open proxies allow the users to
hide their IP addresses and protect their privacy, a compro-
mised proxy (or a rogue) can perform a malicious activity,
e.g., between the user and the server. In particular, when the
end-to-end encryption is not used, the malicious proxy can
manipulate the contents of the transferred data or capture
confidential information that is meant only for the user. Since
operators and policies for open proxies in many cases are not
well-defined, the security threats and implications for users
using such open proxies can be significant [9, 10, 11].
Another type is residential proxy, where providers utilize
IP addresses that are assigned by a general Internet Service
Provider (ISP) for their use, which makes the request from a
proxy looks more discreet. In general, the residential proxies
are operated in closed fashion and only paid users are allowed
to use a group of proxies owned by the operator. Both open
proxies and residential proxies share similar characteristics,
although residential proxies are different in how they are
managed, i.e., they are “generally” closed.
Motivation. Proxies contribute to improving the privacy of
users on the Internet, while often being targeted for malicious
behavior, which motivates our research. Given the different
operation settings of open proxies compared to residential
proxies, their distribution, regional background, and behav-
iors can be an characterization to understanding the proxy
ecosystem, and their role in network security. For example,
the usage of proxies is likely to be a result of regional poli-
cies and characteristics, and analyzing them can contribute
to understanding the correlation between several aspects of
regional-level characteristics and attributes. In this work,
we analyze the geospatial distribution of proxies, both open
and residential, at the country- and regional-level, to show
characteristics related to location affinities and gain insights
on their correlation with different country-level policies and
attributes. We highlight distribution of blacklisted proxies
and their correlation to countries policies, performance, and
Internet speed. Using 27 blacklisting services, we highlight
the variety of malicious activities of blacklisted proxies.
We also provide a correlation analysis of proxies geospatial
distribution and five country-level characteristics: Internet
content censorship, Internet freedom, political stability, Inter-
net speed, and gross domestic product. Our analysis shows
that 79.11% of the open proxies are prone to blacklisting.
Similarly, 86.04% of the residential proxies are prone to
blacklisting. Moreover, we investigated the behavior of the
proxies and found that 28.23% and 16.85% of the open
and residential proxies were used for spam, respectively.
In addition, 6.97% of the open proxies are associated with
verified attacks, along with 0.27% of the residential proxies.
Contribution. Our main contributions are as follows:
We investigate the geolocation distribution of a large
dataset that includes 1,045,468 open proxies and
6,419,987 residential proxies. The locality distribution
of proxies is conducted on the country-level, city-level,
and autonomous system-level where the proxies reside.
We analyze the behavior of the proxies using 27 differ-
ent blacklisting services. We show that the majority of
proxies are blacklisted, and 28.23% and 16.85% of open
and residential proxies are used for spam, respectively.
Moreover, we investigate the proxies that are associated
with verified attacks. Our analysis shows that 6.97% of
open proxies, along with 0.27% of residential proxies
participated in launching malicious attacks.
We conduct correlation analyses of proxies locality dis-
tribution and five country-level characteristics, showing
a strong positive correlation between Internet speed
and Gross Domestic Product (GDP) with numbers of
proxies within countries.
Organization. The rest of the paper is organized as follows:
In section II we highlight the efforts toward understand-
ing and analyzing the behavior of the Internet proxies. We
describe the dataset used in this study, the preprocessing,
geolocation distribution of the proxies, and their behavior in
section III. In section IV, we conduct a correlation study to
understand the relationship between the distribution of the
proxies and five factors, including the censorship, Internet
freedom, political stability, Internet speed, and the gross do-
mestic product. Finally, we conclude our work in section V.
II. RELATED WORK
Recently, several studies have been exploring the ecosystem
of proxies by analyzing their behavior and performance, as
well as the security aspects of such services [12]. While
most of the studies addressed different aspects related to
open proxies, few works have been done toward analyzing
residential proxies due to the challenges in identifying them.
Addressing and analyzing the distribution of both open and
residential proxies and their relation to regional character-
istics is the main goal of this study which fills the gap in
current literature. This section highlights the efforts towards
understanding and analyzing the behavior of proxies.
Open Proxy. To fully-understand the reliability and the
security of open proxies, Mani et al. [13] have conducted
a comprehensive study on open proxies using a large-scale
dataset of 107,000 listed open proxies and 13 million proxy
requests over a 50-day period. The authors concluded that
92% of the listed open proxies are unresponsive to proxy
requests. Further, the study also found that a substantial
number of open proxies have a sort of malicious behavior,
e.g., modifying the Hypertext Markup Language (HTML)
content to be used for cryptocurrency mining (cryptojack-
ing), launching man-in-the-middle attacks, fetching remote
access Trojans and/or other forms of malware. Tsiranton-
akis et al. [14] proposed a framework that collects Hypertext
Transfer Protocol (HTTP) proxies from different websites,
2VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
and tests them using decoy websites-based methods (dubbed
honeysites). The study implemented a content modification
detection technique that aims to detect any object modifica-
tions by operating at the level of the pageâ ˘
A´
Zs Document
Object Model (DOM) tree. Applying this technique on a
dataset of (19,473) open proxies, the authors reported that
5.15% of the proxies perform a malicious content modi-
fication or injection. They also reported that 47% of the
malicious proxies inject ads, 39% inject script to collect user
data, and 12% used to redirect the user to malicious websites
that contain malware. Even with such risks, Perino et al. [15]
showed that open proxies services are increasing drastically,
and only a small fraction of the available proxies actually
works. In their study, Perino et al. [15] reported that around
10% of the working proxies have a sort of malicious behavior.
Another work by Chung et al. [16] studied the end-to-
end connectivity violation of the proxy services, where they
utilized Luminati to detect end-to-end violations of Domain
Name Server (DNS), HTTP, and Hypertext Transfer Protocol
Secure (HTTPS), and to detect when a host or an ISP perform
a content monitoring. Using more than 1.2 million nodes
across 14,000 autonomous systems covering 172 countries,
the findings showed that 4.8% of nodes are subject to some
type of end-to-end connectivity violation. The reliability of
proxies can be measured by how the advertised location is
accurate. Recent studies such as Weinberg et al. [17] have
shown that some proxies providers are advertising to have
a wide range of locations, while in fact their proxies are in
certain countries in which the server cost is cheap. Another
work by Weaver et al. [18] utilized Netalyzr and techniques
based on traceroutes of the responses to TCP connection to
detect the presence of proxies.
Residential Proxy. The first study examining the behavior
of the residential proxies is due to Mi et al. [8], where the
authors conducted an in-depth analysis on residential proxy
services and servers, including about six million residential
IP addresses across 230 countries and 52,000 Internet Ser-
vice Providers (ISPs). Their findings show that even though
residential proxy providers claim that the proxy hosts will-
ingly participated in providing the service, many proxies op-
erate on compromised hosts. They also reported Potentially
Unwanted Programs (PUP) logs as well as other malicious
activities, such as ads, phishing, and malware hosting.
III. DATA COLLECTION AND MEASUREMENT
A. PROXY DATA COLLECTION
For open proxies, we used the dataset provided by
IP2Proxy [19], which makes up a large portion of our dataset.
We also searched websites listing open proxies, and regularly
collected the proxy IP addresses from them as of November
2019. Residential proxies are not public, so it is difficult to
obtain their IP addresses in a similar way. To this end, we
obtained the dataset residential proxies from Mi et al. [8].
Mi et al. utilized an infiltration framework to collect a dataset
of 6,419,987 residential proxies distributed across more than
230 countries and more than 52,000 ISPs. The captured
(a) Country distribution of open proxies.
(b) Country distribution of residential proxies.
FIGURE 1: Country distribution of open and residential prox-
ies. Darker shade of blue represents more proxies residing
in the country. Here, China and the US contain the majority
of Open Proxies, while Turkey and India contain the highest
number of residential proxies.
IPv4 addresses acting as residential proxies were observed
using five residential proxy providers between July 2017 and
March 2018. Figure 1(b) shows the locality distribution of
the residential proxies, and Table 1 shows the country/region
distribution of the top 10 localities for those proxies.
Open Proxy and Residential Proxy. We collect open proxy
IP addresses by searching Google for an “open proxy list”
and selecting sites that are updated regularly or can perform
direct collection commands. We summarize the collected
number of open proxy IP addresses in Table 2. A large
proportion of the dataset is from IP2Proxy [19], with a total
number of unique IP addresses of 1,045,468. We observe
that different lists provide same IP addresses (55,348 IP
addresses). To gather residential proxies IP addresses, we
obtained residential proxy dataset from Mi et al. [8]. This
dataset consists of IPv4 addresses collected between July
2017 and March 2018 and contains a total of 6,419,987
IP addresses. We find that there are common IP addresses
between the two datasets. That is, 20,816 IP addresses exist
in both open and residential proxy datasets. After collecting
the datasets, we conduct a geospatial analysis to obtain the
distribution of open and residential proxies. We categorize
the locations of proxies by country-, city-, and autonomous
system-level locality of the proxies. We start by obtaining
the geolocation and Autonomous System Number (ASN) of
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
TABLE 1: Country/region-level distribution of open and residential proxies. China and the USA contain approximately 29% of the
open proxies, while they are not in the top 10 countries/regions in the residential proxies list. Similarly, Turkey contains 528,032
residential proxies, but only 5,040 open proxy.
Rank Open Proxy Residential Proxy
Country/Region # Proxies % Proxies Country/Region # Proxies % Proxies
1 China 169,431 16.21% Turkey 528,032 8.22%
2 USA 131,302 12.56% India 440,215 6.86%
3 Thailand 88,624 8.48% Ukraine 331,091 5.16%
4 Netherlands 68,506 6.55% UK 320,375 4.99%
5 Indonesia 60,140 5.75% Russia 264,863 4.13%
6 Russia 57,675 5.52% Germany 234,291 3.65%
7 Brazil 57,031 5.46% Netherlands 228,707 3.56%
8 Australia 33,903 3.24% Australia 221,853 3.46%
9 Taiwan 27,609 2.64% Canada 217,633 3.39%
10 Uruguay 27,330 2.61% Brazil 216,989 3.38%
Total Worldwide 1,045,468 100% Worldwide 6,419,987 100%
TABLE 2: Websites that provide open proxy lists and the
number of open proxy IP addresses collected. We can see
that there are many duplicate IP addresses collected.
Proxy List # Proxies
IP2PROXY [19] 1,041,455
MultiProxy [20] 2,230
clarketm [21] 1,500
checkerproxy.net [22] 50,190
proxybroker [23] 5,441
Total (unique) 1,045,468
each IP address using the IP-to-region local dataset and the
MaxMind online database [24].
B. GEOSPATIAL ANALYSIS
Country-level Distribution of Proxies. Figure 1 shows the
country distribution of open and residential proxies. The
darker blue shade indicates a higher number of proxies in the
given country. Figure 1(a) describes the city-level distribution
of open proxies with China and the US accounting for a
large proportion, as they occupy 28.7% of all open proxies.
The distribution of residential proxy is shown in Figure 1(b),
which is different from the distribution of the open proxy.
Turkey and India have a large portion (15.08%), followed by
Ukraine and the United Kingdom. Table 1 provides the top 10
country/region distributions of open and residential proxies
in our data collection. The distribution of open proxies by
country is concentrated in the top two countries, but the
residential proxies are more dispersed in Russia and Euro-
pean countries and South America. The top 10 nations of the
open proxy account for nearly 70%, while residential proxy
accounts for less than 50% (46.8%).
City-level Distribution of Proxies. The distribution of cities
in open and residential proxies is similar to that of countries.
However, China is located at the top of the country distri-
bution. We find only one Chinese city (Hangzhou) in the
top 10 ccities, as shown in Table 3. This indicates that the
proxy is scattered in many cities in China, where nearly 300
(a) City distribution of open proxies.
(b) City distribution of residential proxies.
FIGURE 2: City distribution of open and residential proxies.
The circle size reflects the number of proxies. In general,
open and residential proxies and evenly distributed across all
Europe, particularly in Ankara and Istanbul.
Chinese cities appear in our dataset. Figure 2(a) describes
city distribution of open proxies. In this figure, we use circles
to present the number of open and residential proxies in each
city. Also, we highlight the top 10 cities with the red color
and larger size. The size of the circle depends on the number
of open proxies in the city. To better illustrate the distribution
within the cities, the region should be limited to a specific
country. Figure 3(a) shows the distribution of open proxies
in China. This figure shows that not only Hangzhou but also
4VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
TABLE 3: City-level distribution of open and residential proxies. Bangkok and Amsterdam contain approximately 13% of the
open proxies. While China is ranked first in the number of open proxies, only one city (Hangzhou) is in the top 10, indicating the
high distribution of proxies across the country. Most of the residential proxies within Turkey are residing in Istanbul and Ankara
(88.66%), as they are ranked first and third.
Rank Open Proxy Residential Proxy
City # Proxies % Proxies City # Proxies % Proxies
1 Bangkok 68,094 6.51% Istanbul 266,390 4.15%
2 Amsterdam 66,908 6.40% Montevideo 211,860 3.30%
3 Jakarta 54,476 5.21% Ankara 201,771 3.14%
4 Sao Paulo 38,617 3.69% Amsterdam 192,072 2.99%
5 Brisbane 30,894 2.96% Mexico City 180,613 2.81%
6 Montevideo 27,330 2.61% Kiev 151,372 2.36%
7 Nonthaburi 20,402 1.95% Jakarta 147,214 2.29%
8 Tehran 18,036 1.73% Bangalore 138,312 2.15%
9 Hangzhou 17,631 1.69% Sao Paulo 130,247 2.03%
10 Moscow 17,532 1.68% Brisbane 113,339 1.77%
Total Worldwide 1,045,468 100% Worldwide 6,419,987 100%
other Chinese cities occupy a large number of open proxies.
The large 10 circles in this figure represent the top 10 cities
in China with open proxy numbers, which are in the top 30
cities of the entire open proxy. This indicates that the open
proxies in China are distributed among major cities, such as
Hangzhou, Nanchang, Nanjing, Guangzhou, and Beijing.
Another example concerns the United States, which has
a high percentage of open proxy. It is ranked second in the
open proxy distribution at the country-level, but in the city
distribution, no city appears in the top 10. This means that the
open proxy in the United States is evenly distributed in many
cities. Figure 3(b) shows the distribution of open proxies in
cities in the United States. There are a larger number of open
proxies, distributed throughout the region and especially in
densely populated areas in the east and west. Despite a large
number of open proxies in the United States, only three cities
were included in the top 30, ranked as 17th, 23rd, and 27th.
In the case of the residential proxy, the distribution of
the city-level is more interesting. Figure 2(b) presents the
distribution of urban levels of residential proxies. The cities
of the Netherlands and other cities in European countries are
similarly distributed, as shown in Figure 2(a). On the other
hand, the two Turkish cities, Istanbul and Ankara, had an
inconspicuous number of open proxies, as they were ranked
first and third in the top 10 of the number of the residential
proxy, respectively. In order to learn more about Turkey’s
residential proxy distribution, we represent the distribution of
proxies within Turkey, shown in Figure 4(a). As mentioned
earlier, the two Turkish cities, Istanbul and Ankara, have a
large share. This may be due to the fact that almost 90% of
Turkey’s population lives in two cities.
We notice that India has the second largest number of
residential proxies at the country-level, with four cities from
India being in the top 30 of the residential proxy city-
level distribution, and 2.15% of residential proxies of the
world are located in Bangalore. Figure 4(b) shows the city-
level residential proxy distribution in India. The four cities
mentioned above are represented by large circles, and the
other cities are widely distributed.
Distribution of proxies over ASs. We also analyzed the
ASNs containing the IPs of the open and residential proxy
and summarized them as shown in Table 4. From this anal-
ysis, we noticed that ASN 4134, which has the largest share
of open proxy, serves China. We also notice that China has
the largest number of open proxies. It is worth noting that
ASN 4837, which accounts for 2.42% of the open proxy, is
also an AS in charge of China. ASN 23969, ASN 45758 and
ASN 131090, which are responsible for Thailand, account for
4.43%, 1.89% and 1.37%, respectively, summing to 80,466
proxies, representing 91% of Thailand’s open proxy. This
indicates that three ASes service most of Thailand’s open
proxy. In the AS-level distribution of residential proxies,
ASN 47331 and ASN 9121 serve Turkey and they account
for 3.67% and 1.86% of the total residential proxies, respec-
tively. As noted earlier, Turkey has the largest number of
residential proxies. The ASes that cover India in the AS-
level distribution are ASN 9829 and ASN 24560, which serve
more than 50% of the residential proxy located in India.
C. BLACKLISTS AND MALICIOUS BEHAVIOR
Proxies can be used by users to hide their identities. Although
they are important for privacy assurance, proxies can also
be a challenge to web security and administrators. It is
necessary for the administrators to employ access control to
their servers by knowing their customers and defend against
fraudulent access [25]. Common methods for access control
include manual and automated solutions. Manual blocking
requires understanding the types of proxies, along with main-
taining an updated list of proxy IP addresses1.
With this in mind, we attempt to identify the blacklisted
IP addresses. To do so, we begin by collecting a list of
blacklisting services. In total, we assemble a list of 27 such
services. Leveraging those services, we then distribute the
1Proxy IP addresses change on daily basis.
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
TABLE 4: AS-level distribution of open and residential proxies. Here, ASN 4134 is covering China, and contains 10.92% of the
open proxies. ASN 23969 is in Thailand, and contains 4.43% of the open proxies. Similarly, ASN 47331 is in Turkey, and cover
3.67% of the residential proxies.
Rank Open Proxy Residential Proxy
ASN # Proxies % Proxies ASN # Proxies % Proxies
1 4134 114,116 10.92% 47331 235,474 3.67%
2 23969 46,304 4.43% 8151 154,318 2.40%
3 7713 39,158 3.75% 9829 150,951 2.35%
4 209 30,599 2.93% 9121 119,218 1.86%
5 4837 25,284 2.42% 7713 102,055 1.59%
6 3462 25,041 2.40% 3320 98,045 1.53%
7 45758 19,802 1.89% 24560 94,237 1.47%
8 14061 18,898 1.81% 25019 81,128 1.26%
9 8048 15,379 1.47% 2856 78,754 1.23%
10 131090 14,360 1.37% 12389 75,589 1.18%
Total Worldwide 1,045,468 100.00% Worldwide 6,419,987 100.00%
(a) The distribution of open proxies across China.
(b) The distribution of open proxies across the United States.
FIGURE 3: The city-level distribution of open proxies in China
and the United States. The circle size reflects the number of
proxies.
blacklisted proxies based on their intent, such as spammer,
zombie risk, probable spammer, etc. Additionally, we argue
that a proxy IP address if involved in malicious activities
will be blacklisted and their intent identified at some point
in time. However, it is known for a proxy IP address to be
dynamic, meaning that an IP involved in an attack today
may be assigned to a harmless service. Considering this,
the blacklisting services allow a service to appeal against
its IP address being blacklisted. Taking these into consid-
eration, we aim to understand the distribution, patterns, and
associations among them. For this study, we limit ourselves
to the categories that strictly identify a proxy to have been
involved in spamming or attacks. In this section, we describe
the different blacklisting services leveraged and how we
distribute them into classes for further analysis.
Blacklist Services. To allow users to identify the different
intents of proxies, there are multiple online services that
make their list of blacklisted proxies public and classify their
IP addresses depending on the posed challenge to a destina-
tion web-service. For example, the Real-time blockhole list
all.spam-rbl.fr classifies proxies into spammer, zombie risk,
etc. Additionally, these services frequently update their lists,
e.g., all.spam-rbl.fr updates its list 10 times in a day.
Realtime Blackhole List (RBL). RBL maintains lists of
IP addresses that are susceptible to be used for spam. It
maintains many lists of such IP addresses, depending on
the source. We utilize the list that stores all the IP addresses
listed, and later identify the intent based upon the return
code by their API against our request for that proxy.
Spamrats. Spamrats maintains multiple APIs based on the
intent of the source. Each of these APIs maintain a set of
blacklisted IPs. Among these APIs, we utilize the ones that
store a set of IP addresses that are shown to be involved
in spamming attacks or AUTH attacks. In AUTH attacks,
a malicious user tries credentials obtained from breaches
to authenticate. It particularly targets users that re-use their
credentials across different services.
Weighted Private Block List (WPBL). WPBL passively
detects spams, with no crowd-sourced or manual additions.
Additionally, they suggest securing the host and fixing
misconfigurations to eliminate spam, and also provide a
lookup facility to help users de-list themselves.
Uceprotect. Uceprotect maintains APIs that list IP ad-
dresses with either wrong or missing or generic reverse
DNS (PTR record), or dialup connections (typically sug-
gesting a home/other user with a dynamic connection),
or computers with exploited / exploitable security holes
6VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
(a) City distribution of Residential Proxies in Turkey.
(b) City distribution of Residential Proxies in India.
FIGURE 4: City distribution of residential proxies in Turkey
and India. The circle size reflects the number of proxies.
(e.g., open proxies, open relays, vulnerable webservers,
virus infected, etc.) or which are assigned to well-known
spammers. We limit ourselves to the proxies that are known
to be spam sources by the service.
Justspam. Justspam checks if an IP is listed by other well-
known or independent blacklisting services. They claim to
be a safeway to prevent false positives.
Sorbs.net. sorbs.net maintains multiple APIs with lists of
IPs by their intent, such as open HTTP proxy servers,
IPs with spammer abusable vulnerabilities, known spam
sources (last 48 hours/28 days/one year/anytime), hijacked,
etc. It also lists spam supporting service providers with
“third strike and you are out” basis. We limit ourselves to
the lists that include spam and attack sources.
Junkemailfilter. Junkemailfilter maintains lists of black-
list, yellowlist, brownlist, and whitelist IP addresses. We
limit ourselves to the blacklist.
Korea services. This service lists most IP address ranges
(network address) assigned to Korea by APNIC, and any
older ARIN ranges with a history of spam.
Spamhaus. This popular service lists verified spammers,
Register of Known Spam Operations (ROKSO), illegal
third-party exploits, worms, and trojan horses.
DBUDB.com. IPs are added to the DBUDB.com database
automatically with no provision of manual addition. Addi-
tion to this list occurs when the recorded events for a given
IPv4 address indicate substantially that a message content
was spam, scam, virus, or other malware. IPs are added
within 10 minutes or less of an outbreak; data is collected
in real-time and the zone is updated every 10 minutes.
Limitations. The residential proxies dataset was collected
between July 2017 and March 2018, while our blacklisting
analysis is done in 2019. Such an observation/analysis time
difference could introduce some false alarms on the number
of blacklisted residential proxies since an IP address could
be associated with a residential proxy for a specific period of
time (e.g., during the data observation/collection time) and
then being associated with malicious activities later (e.g.,
during the blacklisting analysis). In this study, the reported
results do not take such a scenario into consideration given
the limitations in investigating the period when the IPs acted
as a residential proxy and the lack of information by the
blacklisting services on the date in which an IP address was
added to a certain blacklist. Given the large-scale dataset of
proxies used in this study, consisting of 1,045,468 open prox-
ies and 6,419,987 residential proxies (a total of 7,465,455
proxies), the impact of such limitation becomes less obvious,
and therefore the analysis provides insights into the general
behavior of the proxy ecosystem.
Country-level Analysis. Leveraging the blacklist services,
we check if an open or residential proxy is present in any
of the above blacklists. Among them, we then check if it is
a proven spam, or if it shown to be involved in an attack,
and if it has a vulnerability that can be exploited for future
spam activities. Table 5 shows the results of the open proxies
analysis. We observe that China has the highest number of
IPs included in the blacklisting services, i.e., 94.24% of
all the open proxies in the country. Additionally, it also
has the highest number of proxies shown to be involved in
spam activities and attack sources around the globe, and is
the second country by the number of vulnerable sources.
However, it has less than one percent vulnerable proxies.
On the other hand, Iran stands at number 10 among the
most blacklisted source-countries with 93% of its open
proxies blacklisted, but is at the sixth position in the countries
involved in attack sources and vulnerable sources. Other
noteworthy countries and regions are Thailand and Taiwan,
with almost 99.5% and 98% of their open proxies blacklisted,
respectively. Conversely, the USA, although is at the third
position in the number of blacklisted open proxies, it only
represents 55.5% of its open proxies, which makes it the
least blacklisted country by the percent representation. On the
other hand, the analysis of residential IP addresses in Table 6
reveal that every country (except Ukraine and Australia) in
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
TABLE 5: Country/region-level distribution of blacklisted open proxies. The number of blacklisted proxies is proportional to
the total number of proxies within the country/region. As shown, 99.42% of the proxies in Thailand are blacklisted. China and
Thailand contain 29.96% of the blacklisted open proxies worldwide.
Rank Country/Region Blacklisted Spam Attack Vulnerable
# % # % # % # %
1 China 159,681 94.24% 67,040 39.57% 28,495 16.82% 1,613 0.95%
2 Thailand 88,115 99.42% 23,592 26.62% 8,844 9.98% 666 0.75%
3 USA 72,879 55.50% 18,955 14.44% 3,952 3.01% 726 0.55%
4 Indonesia 54,191 90.11% 14,318 23.81% 1,505 2.50% 3,121 5.19%
5 Brazil 47,671 83.58% 16,093 28.22% 4,743 8.32% 1,306 2.29%
6 Russia 44,277 76.77% 15,408 26.72% 2,705 4.69% 1,403 2.43%
7 Netherlands 43,310 63.22% 22,037 32.17% 3,913 5.71% 1,201 1.75%
8 Taiwan 27,160 98.37% 5,174 18.74% 351 1.27% 94 0.34%
9 Australia 26,402 77.87% 13,446 39.66% 2,656 7.83% 1,574 4.64%
10 Iran 23,189 92.98% 10,940 43.87% 2,999 12.02% 1,218 4.88%
Total Worldwide 827,106 79.11% 295,152 28.23% 72,914 6.97% 21,035 2.01%
TABLE 6: Country-level distribution of blacklisted residential proxies. The number of blacklisted proxies is proportional to the
total number of proxies within the country. Turkey contains 9.05% of the blacklisted proxies, with a blacklisting rate of 97.68%.
Rank Country/Region Blacklisted Spam Attack Vulnerable
# % # % # % # %
1 Turkey 515,767 97.68% 39,424 7.47% 191 0.04% 4,204 0.80%
2 Indonesia 432,780 98.31% 152,261 34.59% 603 0.14% 30,025 6.82%
3 UK 289,049 90.22% 11,608 3.62% 420 0.13% 635 0.20%
4 Ukraine 271,088 81.88% 33,759 10.20% 330 0.10% 5,289 1.60%
5 Russia 239,733 90.51% 47,501 17.93% 522 0.20% 7,229 2.73%
6 Germany 224,281 95.73% 17,365 7.41% 74 0.03% 1,035 0.44%
7 Mexico 203,584 99.29% 27,650 13.49% 60 0.03% 3,668 1.79%
8 Brazil 198,764 91.60% 19,571 9.02% 743 0.34% 3,539 1.63%
9 Uruguay 190,228 89.79% 25,587 12.08% 792 0.37% 3,608 1.70%
10 Australia 175,270 79.00% 45,465 20.49% 822 0.37% 11,965 5.39%
Total Worldwide 5,700,244 86.04% 1,081,779 16.85% 17,596 0.27% 165,328 2.58%
the top 10 countries with highest number of residential IP
addresses have more than 90% of their IPs blacklisted by one
or more of the services, with Turkey, Indonesia, Germany,
and Mexico having more than 95% blockage.
We also observe four countries—Indonesia, Russia, Brazil,
and Austalia—in the top 10 blocked open and residential
proxies. We observed that 99.3% of residential IP addresses
in Mexico being blacklisted by at least one blacklisting
services, and with 13.5% of its IP addresses being flagged
for spam activities, 0.03% for launching attacks, and 1.6%
for being vulnerable to future spam activities. Additionally,
Indonesia and Australia are the countries that have most
vulnerabilities that may lead to their involvement in spam
activities in the future. Moreover, India, Vietnam, and Korea
are among the top three countries with highest number of
residential IPs with proven spam activities. Thailand, Viet-
nam, and Mauritius are the top three countries with highest
representation of proven attacks, and India, Indonesia, and
Australia represent the top three countries with most vulner-
able IP addresses. India has the most residential IPs that have
been involved in spam and highest number of vulnerable IPs
that can be exploited for spam activities in the future.
City-level Analysis. Table 7 shows the top 10 cities with
blacklisted open proxies, according to our analysis. While
Bangkok appears as the city with highest number of black-
listed open proxies, Bangkok, Nonthaburi, Hangzhou, Nan-
chang, and Nanjing all have more than 99% of their open
proxies blacklisted. However, only Nanjing has over 90% of
its proxies involved in proven spam activity. Additionally, al-
though 99.3% of the open proxies in Bangkok are blacklisted
by the aforementioned services, only 28.74% of its proxies
are proven to carry out spam activities. Moreover, Hangzhou
has the most number (23.56%) of its open proxies involved in
attacks and around 44% of its proxies are involved in spam
activities. On the other hand, Table 8 shows the cities with
most blacklisted residential IPs. It can be observed that all
(except for Amsterdam and Kiev) have more than 90% of
their residential IPs blacklisted, with four of them having
more than 95% of their blacklisted. Interestingly, Bengaluru,
India has 99.91% of its residential IP addresses blacklisted
by one or more of the services, and more than 41% (highest
by cities) of them are proven to be used for spam activities
and more than 7% of the city’s residential IPs vulnerable to
future spam campaigns. Additionally, Brisbane has more than
8VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
TABLE 7: City-level distribution of blacklisted open proxies. The number of blacklisted proxies is proportional to the total number
of proxies within the city. Bangkok contains 8.18% of the blacklisted open proxies, with a blacklisting rate of 99.30%. Note that
most of the cities in this list are with a blacklisting rate of higher than 99%.
Rank City Blacklisted Spam Attack Vulnerable
# % # % # % # %
1 Bangkok 67,620 99.30% 19,572 28.74% 7,790 11.44% 595 0.87%
2 Jakarta 49,607 91.06% 12,171 22.34% 1,192 2.19% 2,640 4.85%
3 Amsterdam 42,477 63.49% 21,694 32.42% 3,839 5.74% 1,190 1.78%
4 Sao Paulo 32,364 83.81% 11,536 29.87% 3,314 8.58% 887 2.30%
5 Brisbane 24,734 80.06% 12,847 41.58% 2,550 8.25% 1,536 4.97%
6 Montevideo 21,687 79.35% 5,468 20.01% 1,299 4.75% 424 1.55%
7 Nonthaburi 20,396 99.97% 3,963 19.42% 1,043 5.11% 54 0.26%
8 Hangzhou 17,578 99.70% 8,288 47.01% 4,154 23.56% 169 0.96%
9 Nanchang 17,330 99.98% 4,400 25.39% 2,377 13.71% 108 0.62%
10 Nanjing 16,904 99.86% 15,270 90.21% 6,393 37.77% 289 1.71%
Total Worldwide 827,106 79.11% 295,152 28.23% 72,914 6.97% 21,035 2.01%
TABLE 8: City-level distribution of blacklisted residential proxies. The number of blacklisted proxies is proportional to the total
number of proxies within the city. Both Istanbul and Ankara are at the top of the list with a blacklisting rate of more than 97%. As
shown, Mexico City is ranked fourth with 99.75% of the residential proxies blacklisted.
Rank City Blacklisted Spam Attack Vulnerable
# % # % # % # %
1 Istanbul 259,481 97.41% 21,665 8.13% 94 0.04% 2,381 0.89%
2 Ankara 197,607 97.94% 12,767 6.33% 79 0.04% 1,015 0.50%
3 Montevideo 190,228 89.79% 25,587 12.08% 792 0.37% 3,608 1.70%
4 Mexico City 180,158 99.75% 25,798 14.28% 47 0.03% 3,343 1.85%
5 Jakarta 145,197 98.63% 41,965 28.51% 251 0.17% 13,153 8.93%
6 Amsterdam 144,624 75.30% 40,241 20.95% 917 0.48% 7,096 3.69%
7 Bangalore 138,194 99.91% 56,800 41.07% 151 0.11% 9,855 7.13%
8 Kiev 130,989 86.53% 11,024 7.28% 125 0.08% 1,057 0.70%
9 Sao Paulo 117,910 90.53% 12,468 9.57% 543 0.42% 2,317 1.78%
10 Brisbane 105,951 93.48% 43,865 38.70% 809 0.71% 11,838 10.44%
Total Worldwide 5,700,244 86.04% 1,081,779 16.85% 17,596 0.27% 165,328 2.58%
10.4% of its residential IPs vulnerable and around 39% of
its IPs proven spammers. The vulnerable residential IPs, if
exploited, could make Brisbane the next most spam source-
city in the world.
ASN-level Analysis. Similarly, as in Table 9 ASNs, e.g.,
4134, have all their open proxies blacklisted but only 47.66%
of them are proven to be involved in spam activities and
21.95% of them are involved in attacks. This can be because
blacklisting services, such as uceprotect, blacklist all the IP
addresses corresponding to the worst performing ASN. This
also explains the 100% blacklisting of open proxies belong-
ing to ASNs 4837 and 45758. Also, notice that all, except two
ASNs, have greater than 99% blacklisting rate. Additionally,
ASN 121090 has almost 32% of its open proxies involved in
attacks. Moreover, more than 5% of the open proxies in ASN
7713 are vulnerable to future spam activities. On the other
hand, Table 10 shows the top 10 ASNs with most blacklisted
IPs around the world. Notice that, residential proxies follow
trends very similar to open proxies. Particularly, all the ASNs
in the table have more than 90% blacklisting rate, and eight
out of ten have more than 99% blacklisted IPs. Additionally,
ASN 24560, with 99.96% blacklisting, has 32.5% residential
IPs proven to be involved in spamming and 4% of its IPs
vulnerable to future spam campaigns. A common denomina-
tor among the residential IPs is the low proven attack record,
despite the huge number of residential IPs in our dataset, and
huge representation in spam activities.
Takeaways. Although the United States is the country with
the third largest number of blacklisted open proxies, that
only represents 55.50% of all of its open proxies, making
it the country with the least percentage of blacklisted open
proxies in comparison with the total number of proxies it
hosts. We also observed that Indonesia and Australia have
the highest number of vulnerable proxies that may lead to
them being used for spamming and malicious activities in the
future. Moreover, it is shown that both countries have a high
percentage of proxies involved in spamming attacks; 34.59%
for Indonesia, and 20.49% for Australia. Among the 99.86%
of the open proxies in Nanjing that are blacklisted, more than
90% have been involved in proven spam activities, highlight-
ing possible geographical concentration of malicious efforts.
In addition, several cities and ASes have proxies blacklisting
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
TABLE 9: AS-level distribution of blacklisted open proxies. The number of blacklisted proxies is proportional to the total number
of proxies within the AS. As shown, multiple ASs have a blacklisting rate of 100%, for instance, ASN 4134 contains 13.80% of
the blacklisted open proxies. Note that most of the reported ASNs are in China and Thailand.
Rank ASN Blacklisted Spam Attack Vulnerable
# % # % # % # %
1 4134 114,116 100.00% 54,382 47.66% 25,052 21.95% 1211 1.06%
2 23969 46,241 99.86% 11,006 23.77% 3,624 7.83% 279 0.60%
3 7713 38,799 99.08% 6,439 16.44% 185 0.47% 1,998 5.10%
4 209 25,903 84.65% 342 1.12% 42 0.14% 20 0.07%
5 4837 25,284 100.00% 6,073 24.02% 1,642 6.49% 172 0.68%
6 3462 24,754 98.85% 4,836 19.31% 268 1.07% 65 0.26%
7 45758 19,802 100.00% 3,907 19.73% 1,037 5.24% 50 0.25%
8 8048 15,329 99.67% 1,795 11.67% 28 0.18% 63 0.41%
9 131090 14,357 99.98% 8,036 55.96% 4,548 31.67% 188 1.31%
10 28573 13,380 97.95% 843 6.17% 133 0.97% 99 0.72%
Total Worldwide 827,106 79.11% 295,152 28.23% 72,914 6.97% 21,035 2.01%
TABLE 10: AS-level distribution of blacklisted residential proxies. The number of blacklisted proxies is proportional to the total
number of proxies within the AS. Here, ASN 47331 contains 4.09% of the blacklisted residential proxies, with a blacklisting
percentage of 99.04%.
Rank ASN Blacklisted Spam Attack Vulnerable
# % # % # % # %
1 47331 233,215 99.04% 14,901 6.33% 85 0.04% 482 0.20%
2 8151 154,078 99.84% 8,162 5.29% 17 0.01% 493 0.32%
3 9829 150,576 99.75% 50,889 33.71% 165 0.11% 8,310 5.51%
4 9121 117,330 98.42% 8,104 6.80% 49 0.04% 1,177 0.99%
5 7713 101,337 99.30% 16,496 16.16% 67 0.07% 5,931 5.81%
6 3320 97,410 99.35% 1,297 1.32% 10 0.01% 139 0.14%
7 24560 94,204 99.96% 30,618 32.49% 89 0.09% 3,731 3.96%
8 25019 80,683 99.45% 13,462 16.59% 46 0.06% 2,317 2.86%
9 12389 74,824 98.99% 11,216 14.84% 146 0.19% 1,618 2.14%
10 2856 74,274 94.31% 1,761 2.24% 30 0.04% 39 0.05%
Total Worldwide 5,700,244 86.04% 1,081,779 16.85% 17,596 0.27% 165,328 2.58%
ratios of higher than 99%, indicating a possible regional
blacklisting behavior.
IV. DATA ANALYSIS
A. LOCALITY CHARACTERIZATION
This study highlights the distribution of Internet proxy across
countries and cities around the globe. We aim to define
the relationship between such locality distribution and the
characteristics of countries in terms of performance, policies,
and political stability. In particular, we study the correlation
between the proxy locality distribution and five characteris-
tics, namely: censorship, Internet freedom (best and worst),
political stability, Internet speed, and the country’s GDP.
We report the correlation using three correlation measures,
namely: Pearson, Spearman, and Kendall’s Tau correlation
methods.
B. CORRELATION MEASURES
Correlation is a measure used to describe the relationship
between two or more features in a given dataset as well as
the direction of the relationship (i.e., positively or negatively
related). It highlights both the strength of the relationship and
its direction whether it’s a positive or a negative correlation.
The correlation coefficient can be expressed as a value be-
tween -1 and +1. As the correlation coefficient value goes
towards +1 or -1, it is an indication of either positive or nega-
tive correlation, while a correlation coefficient value around 0
means that there is no correlation between the given features.
There are three types of correlations that are commonly used
for measuring such relationship among independent features,
namely Pearson, Spearman, and Kendall correlation.
Pearson Correlation. Pearson correlation is a correlation
statistic that measures the degree of the relationship between
two linearly related features using the following formula:
rxy =nPxiyiPxiPyi
qnPx2
i(Pxi)2qnPy2
i(Pyi)2.(1)
where rxy denotes Pearson r correlation coefficient between
feature xand feature y,nrepresents the number of samples
in a given dataset, xivalues of xfor the ith sample, and yi
represents the values of yfor the ith sample.
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
FIGURE 5: The correlation values of the number of proxies and the ranking of the country. C: censorship, IFB: Internet freedom
(best 30), IFW: Internet freedom (worst 30), PS: political stability, IS: Internet speed, GDP: Gross domestic product, All: open and
residential proxies, OP: open proxies, RP: residential proxies, BL: blacklisted proxies, BL-OP: blacklisted open proxies, BL-RP:
blacklisted residential proxies.
Spearman Rank Correlation. Spearman correlation is a
correlation measure that is equal to the Pearson correla-
tion between the rank values of those two features. While
Pearson’s correlation measures linear relationships, Spear-
man’s correlation measures whether linear and non-linear
relationships. The following formula is used to calculate the
Spearman rank correlation:
ρ= 1 6Pd2
i
n(n21).(2)
where ρmeans Spearman rank correlation and direp-
resents the difference between the ranks of corresponding
variables, nrepresents the number of samples.
Kendall Rank Correlation. Kendall correlation is a non-
parametric test that measures the dependency strength be-
tween two features. It is used as an alternative to Pearson-
â˘
A´
Zs correlation (parametric) when the data failed one or
more assumptions of the test or when the sample size is small
and has many tied ranks. The following formula is used to
calculate the value of Kendall rank correlation:
τ=ncnd
n(n1)/2.(3)
where ncrepresents the number of concordant and ndrepre-
sents the number of discordant.
C. DATA HANDLING AND PREPROCESSING
Since the range of values in the data varies widely, some of
the measurements may not work properly without normal-
ization. For example, If one of the features has a wide range
of values, this may cause a failure in some of the statistical
measures. Therefore, the range of all features should be
normalized to be in the same range so that each feature
contributes approximately proportionately to the final result.
Normalization. Data normalization is a method used to scale
a set of independent values into a predefined range of values
mostly from 0 to 1, without distorting differences in the
ranges of values. There are many functions that can be used
to perform such scaling such as min-max normalization and
z-score normalization. In our measurement, we are utilizing
min-max normalization method to rescale the data to be in
the range [0, 1] using the following formula:
xnew =xold xmin
xmax xmin
.(4)
where xold is the original value of x and xnew is the nor-
malized value of x. xmin and xmax are the maximum and
minimum values in the given dataset.
Discretization. Discretization is the process of transferring
continuous values into pre-defined label interval. In this
study, we mapped the continuous data to five discrete values,
i.e., from 0.2 to 1.0 with a distance of 0.2, representing
the high end of the interval in which the data occurs, e.g.,
values within the ranges [0,0.2] and ]0.2,0.4] are assigned
the values 0.2 and 0.4, respectively, and so on.
D. PROXY ANALYSIS
Censorship. Tech.co [26], a media resource for tech news
and product reviews, have provided a list of the 30-most
Internet-censored countries based on the monitoring policies
and exposure of people to Internet contents and privacy
tools (e.g., VPNs). It lists Turkmenistan, North Korea, China,
Eritrea, and Iran as the five most Internet-censored countries.
We study the correlation between the countries policies on In-
ternet censorship and the locality distribution of proxies. Fig-
ure 5 shows a strong positive correlation between censorship
and the number of open proxies within countries. This corre-
lation is observed for the 30-most Internet-censored countries
and the locality distribution of proxies in our dataset. Since
China has 16.21% of the total open proxies in the dataset, this
correlation might be derived by this distribution. Generally,
censorship does not show correlation with the distribution of
proxies as the correlation score on Pearson measure is 0.21
for the total distribution of proxies.
Internet Freedom. According to the Freedom of the Net
2019 report [27], Iceland, Estonia, Canada, Germany, and the
United States are highest with respect to Internet freedom,
while China, Iran, Syria, Cuba, and Vietnam are perceived
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
as the worst. This report is established based on a study that
includes 70 analysts and 21 questions addressing the Internet
access, freedom of expression, and other privacy aspects. We
obtained the entire list of countries based on their ranking on
Internet freedom. To demonstrate the relationship between
Internet freedom and the locality distribution of proxies,
we measured the correlation between the best and worst
perceived 30-countries in Internet freedom and the number of
proxies. Figure 5 shows that there is no correlation between
Internet freedom and the distribution of proxies.
Political Stability. To explore the correlation between the
distribution of proxies and countries political stability, we
obtained the full list of countries ranking of political stability
from the World Bank. The World Bank, the largest sources of
funding and knowledge for developing countries, provides a
ranking of countries based on their political stability mea-
sured by an index with values between -3 (weak) to 2.5
(strong). The highest political stability score is assigned to
Monaco (i.e., 1.61 points), while the lowest score is assigned
to Yemen (i.e.,-3 points). We observe that the score of -3
is given to Yemen by the source of data as a sign of data
unavailability or severe political instability due to an ongoing
war in the region. The results in Figure 5 show that there is
no observed correlation between countries political stability
and the distribution of proxies.
Internet Speed. We explore the correlation between Internet
speed and distribution of proxies. We obtained the list of
countries ranking of Internet speed from Speedtest by Ookla
[28]. We observed a positive correlation between Internet
speed and the distribution of proxies in general. This positive
correlation is also observed with the blacklisted residential
proxies. This is inline with the intuitive that one can expect
the distribution of proxies locality fits positively with the
Internet speed.
GDP. We finally explore the correlation between the GDP
and the distribution of proxies. The GDP is a monetary
indicator that measures of the market value based on the
production of all goods and services in a certain time pe-
riod. We obtained the countries GDP ranking data from the
International Monetary Fund [29] for the year 2020. The
analysis shows a strong positive correlation between the GDP
and proxies localities, especially for the residential proxies.
This is due to the fact that countries with higher GDP often
maintain high operational services to host Internet proxies.
V. CONCLUSION
Internet proxies are intermediary and a gateway between
users and servers, often used to protect users’ privacy and
hide their identity. Moreover, proxies are used to surpass the
policies-enforced regional restrictions on accessing the Inter-
net, enabling the user’s freedom use of the Internet. However,
they may be used by adversaries to launch attacks, collect
users’ data, and inject ads and files. In this study, we highlight
this by conducting a comprehensive study on two types of
proxies, i.e., open and residential proxies. By studying a
dataset of 1,045,468 open proxies and 6,419,987 residential
proxies, we found that 79.11% of the open proxies are black-
listed via different blacklisting services, with 28.23% labeled
as spam proxies, and 6.97% labeled as proxies used to launch
an attack. Similarly, our analysis shows that 86.04% of the
residential proxies are blacklisted, despite their efforts in
hiding their identity, with 16.85% labeled as spam and 0.27%
are associated with an adversary attacks. Further, we found
that the distribution of the proxies is positively correlated
with the GDP and Internet speed on the country-level of
residence. While Internet proxies are considered a privacy
preserving way to access the Internet, this study, along with
several studies in the literature, highlights the malicious use
of the proxies, and the risk of using them.
Acknowledgement. This work is supported by the National
Research Foundation under Grant NRF-2016K1A1A291275.
REFERENCES
[1] M. Mukherjee, R. Matam, L. Shu, L. A. Maglaras,
M. A. Ferrag, N. Choudhury, and V. Kumar, “Security
and privacy in fog computing: Challenges,” IEEE Ac-
cess, vol. 5, pp. 19 293–19 304, 2017.
[2] S. Yu, “Big privacy: Challenges and opportunities of
privacy study in the age of big data,” IEEE Access,
vol. 4, pp. 2751–2763, 2016.
[3] Avast, “What is a proxy server and how
does it work?” Accessed: Nov. 2020. [Online]
Available:https://www.avast.com/c-what-is-a-proxy-
server, 2020.
[4] J. Castellà-Roca, A. Viejo, and J. Herrera-Joancomartí,
“Preserving userâ ˘
A´
Zs privacy in web search engines,
Computer Communications, vol. 32, no. 13-14, pp.
1541–1551, 2009.
[5] H. Yu, E. Lee, and S. Lee, “Symbiosis: Anti-censorship
and anonymous web-browsing ecosystem,” IEEE Ac-
cess, vol. 4, pp. 3547–3556, 2016.
[6] Didsoft, “Free proxy list,” Accessed: Nov. 2019. [On-
line] Available:https://bit.ly/2vzLCYI, 2019.
[7] ProxyNova.com, “Nova proxy switcher,” Accessed:
Nov. 2019. [Online] Available:https://bit.ly/2OguC0d,
2019.
[8] X. Mi, X. Feng, X. Liao, B. Liu, X. Wang, F. Qian,
Z. Li, S. A. Alrwais, L. Sun, and Y. Liu, “Resident evil:
Understanding residential IP proxy as a dark service,”
in 2019 IEEE Symposium on Security and Privacy,
SP 2019, San Francisco, CA, USA, May 19-23, 2019,
2019, pp. 1185–1201.
[9] W. Scott, R. Bhoraskar, and A. Krishnamurthy, “Under-
standing open proxies in the wild,” Chaos Communica-
tion Camp, 2015.
[10] K. Steding-Jessen, N. L. Vijaykumar, and A. Montes,
“Using low-interaction honeypots to study the abuse of
open proxies to send spam,” INFOCOMP Journal of
Computer Science, vol. 7, no. 1, pp. 44–52, 2008.
[11] G. Tyson, S. Huang, F. Cuadrado, I. Castro, V. C. Perta,
A. Sathiaseelan, and S. Uhlig, “Exploring HTTP header
manipulation in-the-wild,” in Proceedings of the 26th
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
International Conference on World Wide Web, WWW
2017, Perth, Australia, April 3-7, 2017, 2017, pp. 451–
458.
[12] S. Kanchan and N. S. Chaudhari, “SRCPR: signrecrypt-
ing proxy re-signature in secure VANET groups,” IEEE
Access, vol. 6, pp. 59 282–59 295, 2018.
[13] A. Mani, T. Vaidya, D. Dworken, and M. Sherr, “An Ex-
tensive Evaluation of the Internetâ ˘
A´
Zs Open Proxies,”
in Proceedings of the 34th Annual Computer Security
Applications Conference, 2018, pp. 252–265.
[14] G. Tsirantonakis, P. Ilia, S. Ioannidis, E. Athanasopou-
los, and M. Polychronakis, “A large-scale analysis of
content modification by open HTTP proxies,” in 25th
Annual Network and Distributed System Security Sym-
posium, NDSS 2018, San Diego, California, USA,
February 18-21, 2018, 2018.
[15] D. Perino, M. Varvello, and C. Soriente, “Proxytorrent:
Untangling the free HTTP(S) proxy ecosystem,” in
Proceedings of the 2018 World Wide Web Conference
on World Wide Web, WWW 2018, Lyon, France, April
23-27, 2018, 2018, pp. 197–206.
[16] T. Chung, D. R. Choffnes, and A. Mislove, “Tunneling
for transparency: A large-scale analysis of end-to-end
violations in the internet,” in Proceedings of the 2016
ACM on Internet Measurement Conference, IMC 2016,
Santa Monica, CA, USA, November 14-16, 2016, 2016,
pp. 199–213.
[17] Z. Weinberg, S. Cho, N. Christin, V. Sekar, and P. Gill,
“How to catch when proxies lie: Verifying the physical
locations of network proxies with active geolocation,
in Proceedings of the Internet Measurement Conference
2018, IMC 2018, Boston, MA, USA, October 31 -
November 02, 2018, 2018, pp. 203–217.
[18] N. Weaver, C. Kreibich, M. Dam, and V. Paxson, “Here
be web proxies,” in Passive and Active Measurement -
15th International Conference, PAM 2014, Los Ange-
les, CA, USA, March 10-11, 2014, Proceedings, 2014,
pp. 183–192.
[19] IP2Location.com, “IP2Proxy,” Accessed: Nov. 2019.
[Online] Available:https://bit.ly/2RJGgmg, 2019.
[20] M. Developers, “MultiProxy,” Accessed: Nov. 2019.
[Online] Available:https://bit.ly/2UdNoJp, 2019.
[21] P. list Developers, “Proxy list,” Accessed: Nov. 2019.
[Online] Available:https://bit.ly/2OePuEU, 2019.
[22] checkerproxy.net, “checkerproxy,” Accessed: Nov.
2019. [Online] Available:https://bit.ly/31cMnTn.
[23] P. Developers, “proxybroker,” Accessed: Nov. 2019.
[Online] Available:https://bit.ly/2S54ir6, 2019.
[24] MaxMind. (2019) Maxmind. Accessed: Nov. 2019.
[Online] Available: https://www.maxmind.com/.
[25] M. Wander, C. Boelmann, L. Schwittmann, and T. Weis,
“Measurement of globally visible DNS injection,” IEEE
Access, vol. 2, pp. 526–536, 2014.
[26] “Tech.co: Internet Censorship Rank-
ings,” Accessed: Feb. 2020. [Online]
Available:https://tech.co/vpn/internet-censorship-
rankings, 2019.
[27] “Freedom on The Net: Global Internet Freedom
Ranking,” Accessed: Feb. 2020. [Online]
Available:https://www.freedomonthenet.org/report/freedom-
on-the-net/2019/the-crisis-of-social-media, 2019.
[28] “Ookla: Internet Speeds By Coun-
try,” Accessed: Feb. 2020. [Online]
Available:https://www.speedtest.net/global-index,
2020.
[29] “International Monetary Fund: Countries GDP
Ranking,” Accessed: Feb. 2020. [Online]
Available:https://www.imf.org/, 2020.
JINCHUN CHOI is a Ph.D. candidate at the
Department of Computer Science at the Univer-
sity of Central Florida and the Department of
Computer Information Science of Inha University
(joint Ph.D. program). He has obtained his B.E.
and M.S. degrees from Inha University, in 2011,
2014, respectively. His research interests include
network and IoT security.
MOHAMMED ABUHAMAD is a PhD candidate
and active member in the Information Security
Research Laboratory (ISRL) at Inha University,
South Korea and Security Analytics Research Lab
(SEAL), University of Central Florida. He re-
ceived the B.S. degree in computer science from
The IUG in 2007, and the M.S. in artificial intel-
ligence from The National University of Malaysia
in 2013. His research interests include software se-
curity, machine learning, authentication, privacy,
and deep learning.
AHMED ABUSNAINA is a Ph.D. student in the
Department of Computer Science at the University
of Central Florida. He obtained his B.Sc. in Com-
puter Engineering from An-Najah National Uni-
versity, Palestine, in 2018. His research interests
include software security, machine learning, and
adversarial machine learning.
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3000959, IEEE Access
Choi et al.: Understanding the Proxy Ecosystem: A Comparative Analysis of Residential and Open Proxies on the Internet
AFSAH ANWAR is a Ph.D. student in the De-
partment of Computer Science at the University
of Central Florida. He obtained his B.S. from
Jamia Millia Islamia University, New Delhi, India,
in 2014. Before starting his Ph.D., Afsah was
working as a Data Analyst (C) for Apple. His
research interests include binary analysis, vulner-
ability analysis, and malware analysis.
SULTAN ALSHAMRANI is a PhD student and an
active member in the Security Analytics Research
Lab (SEAL), University of Central Florida. He
received the B.S. degree in computer science from
University of Tabuk, Saudi Arabia in 2014, and the
M.S. in Computer Science from Loyola Univer-
sity Chicago, Illinois, USA in 2018. His research
interests include data mining, natural language
processing, and deep learning.
JEMAN PARK obtained his B.Sc. degree in Com-
puter and Communication Engineering from Ko-
rea University, Seoul, South Korea, in 2016. Cur-
rently, he is a Ph.D. candidate at the Department of
Computer Science and Engineering at the Univer-
sity of Central Florida. His work has been focused
on privacy, computer security, and systems.
DAEHUN NYANG received a B.Eng. degree in
electronic engineering from Korea Advanced In-
stitute of Science and Technology, M.S. and Ph.D.
degrees in computer science from Yonsei Univer-
sity, Korea in 1994, 1996, and 2000 respectively.
He has been a senior member of the engineering
staff at Electronics and Telecommunications Re-
search Institute, Korea, from 2000 to 2003. Since
2003, he has been a full professor at Computer
Information Engineering Department of Inha Uni-
versity, Korea where he is also the founding director of the Information
Security Research Laboratory. He is a member of the board of directors and
an editorial board of ETRI Journal and also Korean Institute of Information
Security and Cryptology. Dr. Nyang’s research interests include AI-based
security, network security, traffic measurement, privacy, usable security,
biometrics and cryptography.
DAVID MOHAISEN earned his M.Sc. and Ph.D.
degrees from the University of Minnesota in 2012.
He is currently an Associate Professor at the Uni-
versity of Central Florida, where he directs the
Security and Analytics Lab (SEAL). Before join-
ing UCF in 2017, he was an Assistant Professor
at SUNY Buffalo (2015–2017) and a Senior Re-
search Scientist at Verisign Labs (2012–2015). His
research interests are in the areas of networked
systems and their security, online privacy, and
measurements. He is an Editor in Chief of EAI Transactions on Security and
Safety, and an Associate Editor of IEEE Transactions on Mobile Computing,
Elsevier Computer Networks, and ETRI Journal (Wiley). He is a senior
member of ACM (2018) and IEEE (2015).
14 VOLUME 4, 2016
... Those enormous numbers of proxies have formed a large and complex ecosystem. In recent years, researchers have conducted studies to explore and characterize the open proxies in various aspects, such as performance, behaviors, security, and distributions [23][24][25][26][27]. They analyzed how the proxies can modify or manipulate the requested resources, such as HTML contents, image files, and executable files. ...
... Mani et al. [25] also explored the availability, performance, HTML manipulation, and file manipulation of open proxies and compared open proxies with Tor. Choi et al. [27] conducted a comparative analysis of open proxies and residential proxies. They used passive methods to study open proxies' distributions, blacklist-check results and relations with GDP, Internet freedom, etc.In this study, we present a more comprehensive and larger-scale study of the open proxy ecosystem. ...
... Among those studies, the size of our collected open proxy dataset is the second-largest. Note that, although the study [27] examined a larger open proxy dataset, it lacks the active measurements and verification process of open proxies as it only analyzed open proxies based on the passive measurements. By contrast, our study combines active and passive measurements to investigate the open proxy ecosystem. ...
Article
Open proxies provide free relay services and are widely used to anonymously browse the Internet, avoid geographic restrictions, and circumvent censorship. To shed light on the ecosystem of open proxies and characterize the behaviors of open proxies, we conduct a large-scale, comprehensive study on over 436 thousand identified proxies, including 104 thousand responsive proxies in nine months. We characterize open proxies based on active and passive measurements and examine their network and geographic distributions, performance, and deployment. In particular, to obtain a more in-depth and broader understanding of open proxies, we analyze two particular groups of open proxies—cloud-based proxies and long-term proxies. To process and analyze the enormous amount of responses, we design a lightweight method that classifies and labels the proxies based on DOM structure which defines the logical structure of Web documents. We identify that 7.17% of responsive proxies modify the page content, and 76.42% of those proxies perform malicious actions. Furthermore, we parse the contents to extract information to identify the owners of proxies and track their activities for deploying malicious proxies. To this end, we reveal that some owners regularly change the proxy deployment to avoid being blocked and deploy more proxies to expand their malicious attacks.
... The dataset provided by Mi et al. has been used for subsequent studies [21,17], focused on geo-localization and reputation of ip addresses. Yang et al [39], recently investigated Chinese resip services and their IP addresses. ...
Chapter
Full-text available
Web scraping bots are now using so-called Residential ip Proxy (resip) services to defeat state-of-the-art commercial bot countermeasures. resip providers promise their customers to give them access to tens of millions of residential ip addresses, which belong to legitimate users. They dramatically complicate the task of the existing anti-bot solutions and give the upper hand to the malicious actors. New specific detection methods are needed to identify and stop scrapers from taking advantage of these parties. This work, thanks to a 4 months-long experiment, validates the feasibility, soundness, and practicality of a detection method based on network measurements. This technique enables contacted servers to identify whether an incoming request comes directly from a client device or if it has been proxied through another device.KeywordsWeb scrapingResidential ip Proxy resip Round trip time measurement tls SecurityBots
Article
Many malicious activities require a large number of IP addresses for altering user identities. It is well known that a large number of IP addresses could be obtained through compromising numerous machines and building botnets. However, we recently found that a special type of off-the-shelf virtual private server (VPS), which we refer to as dynamic VPSs, can retrieve a huge number of IP addresses using just a single virtual machine. Although dynamic VPSs are intended for legitimate purposes, whether they are actually misused in malicious activities has never been investigated. In this article, we take the first step to study dynamic VPSs, with an emphasis on their security implications. We reveal that dynamic VPSs are surprisingly and extensively involved in malicious activities, explore their internal working mechanisms, and suggest that the linkage of dynamic VPSs to the security domain deserves adequate attention.
Conference Paper
Full-text available
An emerging Internet business is residential proxy (RESIP) as a service, in which a provider utilizes the hosts within residential networks (in contrast to those running in a datacenter) to relay their customers' traffic, in an attempt to avoid server-side blocking and detection. With the prominent roles the services could play in the underground business world, little has been done to understand whether they are indeed involved in Cybercrimes and how they operate, due to the challenges in identifying their RESIPs, not to mention any in-depth analysis on them. In this paper, we report the first study on RESIPs, which sheds light on the behaviors and the ecosystem of these elusive gray services. Our research employed an infiltration framework, including our clients for RESIP services and the servers they visited, to detect 6 million RESIP IPs across 230+ countries and 52K+ ISPs. The observed addresses were analyzed and the hosts behind them were further fingerprinted using a new profiling system. Our effort led to several surprising findings about the RESIP services unknown before. Surprisingly, despite the providers' claim that the proxy hosts are willingly joined, many proxies run on likely compromised hosts including IoT devices. Through cross-matching the hosts we discovered and labeled PUP (potentially unwanted programs) logs provided by a leading IT company, we uncovered various illicit operations RESIP hosts performed, including illegal promotion, Fast fluxing, phishing, malware hosting, and others. We also reverse engineered RESIP services' internal infrastructures, uncovered their potential rebranding and reselling behaviors. Our research takes the first step toward understanding this new Internet service, contributing to the effective control of their security risks.
Conference Paper
Full-text available
Internet users worldwide rely on commercial network proxies both to conceal their true location and identity, and to control their apparent location. Their reasons range from mundane to security-critical. Proxy operators offer no proof that their advertised server locations are accurate. IP-to-location databases tend to agree with the advertised locations, but there have been many reports of serious errors in such databases. In this study we estimate the locations of 2269 proxy servers from ping-time measurements to hosts in known locations, combined with AS and network information. These servers are operated by seven proxy services, and, according to the operators, spread over 222 countries and territories. Our measurements show that one-third of them are definitely not located in the advertised countries, and another third might not be. Instead, they are concentrated in countries where server hosting is cheap and reliable (e.g. Czech Republic, Germany, Netherlands, UK, USA). In the process, we address a number of technical challenges with applying active geolocation to proxy servers, which may not be directly pingable, and may restrict the types of packets that can be sent through them, e.g. forbidding traceroute. We also test three geolocation algorithms from previous literature, plus two variations of our own design, at the scale of the whole world.
Conference Paper
Full-text available
Free web proxies promise anonymity and censorship circumvention at no cost. Several websites publish lists of free proxies organized by country, anonymity level, and performance. These lists index hundreds of thousand of hosts discovered via automated tools and crowd-sourcing. A complex free proxy ecosystem has been forming over the years, of which very little is known. In this paper we shed light on this ecosystem via ProxyTorrent, a distributed measurement platform that leverages both active and passive measurements. Active measurements discover free proxies, assess their performance, and detect potential malicious activities. Passive measurements relate to proxy performance and usage in the wild, and are collected by free proxies users via a Chrome plugin we developed. ProxyTorrent has been running since January 2017, monitoring up to 180,000 free proxies and totaling more than 1,500 users over a 10 months period. Our analysis shows that less than 2% of the proxies announced on the Web indeed proxy traffic on behalf of users; further, only half of these proxies have decent performance and can be used reliably. Around 10% of the working proxies exhibit malicious behaviors, e.g., ads injection and TLS interception, and these proxies are also the ones providing the best performance. Through the analysis of more than 2 Terabytes of proxied traffic, we show that web browsing is the primary user activity. Geo-blocking avoidance is not a prominent use-case, with the exception of proxies located in countries hosting popular geo-blocked content.
Article
Full-text available
Fog computing paradigm extends the storage, networking, and computing facilities of the cloud computing towards the edge of the networks while offloading the cloud data centers and reducing service latency to the end users. However, the characteristics of fog computing arise new security and privacy challenges. The existing security and privacy measurements for cloud computing can not be directly applied to the fog computing due to its features such as mobility, heterogeneity, large-scale geo-distribution. This article provides an overview of existing security and privacy concerns, particularly for the fog computing. Afterward, this survey highlights ongoing research effort, open challenges, and research trends in privacy and security issues for fog computing.
Conference Paper
Full-text available
Headers are a critical part of HTTP, and it has been shown that they are increasingly subject to middlebox manipulation. Although this is well known, little is understood about the general regional and network trends that underpin these manipulations. In this paper, we collect data on thousands of networks to understand how they intercept HTTP headers in-the-wild. Our analysis reveals that 25% of measured ASes modify HTTP headers. Beyond this, we witness distinct trends among different regions and AS types; e.g., we observe high numbers of cache headers in poorly connected regions. Finally, we perform an in-depth analysis of the types of manipulations and how they differ across regions.
Conference Paper
Open proxies forward traffic on behalf of any Internet user. Listed on open proxy aggregator sites, they are often used to bypass geographic region restrictions or circumvent censorship. Open proxies sometimes also provide a weak form of anonymity by concealing the requestor's IP address. To better understand their behavior and performance, we conducted a comprehensive study of open proxies, encompassing more than 107,000 listed open proxies and 13M proxy requests over a 50 day period. While previous studies have focused on malicious open proxies' manipulation of HTML content to insert/modify ads, we provide a more broad study that examines the availability, success rates, diversity, and also (mis)behavior of proxies. Our results show that listed open proxies suffer poor availability---more than 92% of open proxies that appear on aggregator sites are unresponsive to proxy requests. Much more troubling, we find numerous examples of malicious open proxies in which HTML content is manipulated to mine cryptocurrency (that is, cryptojacking). We additionally detect TLS man-in-the-middle (MitM) attacks, and discover numerous instances in which binaries fetched through proxies were modified to include remote access trojans and other forms of malware. As a point of comparison, we conduct and discuss a similar measurement study of the behavior of Tor exit relays. We find no instances in which Tor relays performed TLS MitM or manipulated content, suggesting that Tor offers a far more reliable and safe form of proxied communication.
Article
Vehicular Ad hoc Network (VANET) is an emerging area as a key component of Intelligent Transport System. Despite the immense researches going on in this area, it is yet to be deployed at its full scale due to lack of trust, safety and confidentiality in the network. Moreover, the security algorithms proposed till now are complex, and calculations involved are difficult to be completed within the strict real time constraints. This paper introduces SRCPR (SignReCrypting Proxy Re-signature) scheme which reduces the time taken for encryption at sender side as well as for decryption at receiver side. Signcryption reduces the computation cost by converting two steps of signature and encryption into one, whereas Re-encryption and Re-signature enables Alice to decrypt and sign a message on behalf of Bob. These three terminologies altogether with group signature make the proposed algorithm robust, secure and efficient. The compromised vehicle is revoked from group using dynamic accumulators, and security is verified using AVISPA (Automated Validation of Internet Security Protocols and Applications).
Conference Paper
Detecting violations of application-level end-to-end connectivity on the Internet is of significant interest to researchers and end users; recent studies have revealed cases of HTTP ad injection and HTTPS man-in-the-middle attacks. Unfortunately, detecting such end-to-end violations at scale remains difficult, as it generally requires having the cooperation of many nodes spread across the globe. Most successful approaches have relied either on dedicated hardware, user-installed software, or privileged access to a popular web site. In this paper, we present an alternate approach for detecting end-to-end violations based on Luminati, a HTTP/S proxy service that routes traffic through millions of end hosts. We develop measurement techniques that allow Luminati to be used to detect end-to-end violations of DNS, HTTP, and HTTPS, and, in many cases, enable us to identify the culprit. We present results from over 1.2m nodes across 14k ASes in 172 countries, finding that up to 4.8% of nodes are subject to some type of end-to-end connectivity violation. Finally, we are able to use Luminati to identify and measure the incidence of content monitoring, where end-host software or ISP middleboxes record users' HTTP requests and later re-download the content to third-party servers.
Article
Existing anti-censorship and anonymizing systems like Tor rely on volunteers who run relays. In reality, the most challenging part, however, is to gather enough volunteers beyond altruism. There have been a few incentive schemes for volunteering, but they require external monetary resources or e-cash infrastructure. This paper presents SymBiosis, a web-browsing ecosystem that makes a case for direct benefit trading between censored users and uncensored users for their own resources. The two-way interaction in SymBiosis makes every user benefit (anonymity) from participation (relay), and in turn leads to a near-perfect circumvention system via ubiquitous relay deployment. On top of this, SymBiosis provides a practical relay structure, optimized for users' web-browsing experience. SymBiosis is fully implemented and tested in a real-world environment. The evaluation demonstrates that SymBiosis achieves both goals against today's most advanced censor systems and delivers usable performance for everyday web browsing.