ArticlePDF Available

Abstract and Figures

In this paper, we study the effectiveness of phishing black-lists. We used 191 fresh phish that were less than 30 minutes old to conduct two tests on eight anti-phishing toolbars. We found that 63% of the phishing campaigns in our dataset lasted less than two hours. Blacklists were ineffective when protecting users initially, as most of them caught less than 20% of phish at hour zero. We also found that blacklists were updated at different speeds, and varied in coverage, as 47% -83% of phish appeared on blacklists 12 hours from the initial test. We found that two tools using heuristics to com-plement blacklists caught significantly more phish initially than those using only blacklists. However, it took a long time for phish detected by heuristics to appear on blacklists. Finally, we tested the toolbars on a set of 13,458 legitimate URLs for false positives, and did not find any instance of mislabeling for either blacklists or heuristics. We present these findings and discuss ways in which anti-phishing tools can be improved.
Content may be subject to copyright.
An Empirical Analysis of Phishing Blacklists
Steve Sheng
Carnegie Mellon University
Engineering and Public Policy
Pittsburgh, PA 15213
xsheng@andrew.cmu.edu
Brad Wardman
University of Alabama
Computer Science
Birmingham, Alabama 35294
brad.wardman@gmail.com
Gary Warner
University of Alabama
Computer Science
Birmingham, Alabama 35294
gar@cis.uab.edu
Lorrie Faith Cranor
Carnegie Mellon University
Computer Science & EPP
Pittsburgh, PA 15213
lorrie@cs.cmu.edu
Jason Hong
Carnegie Mellon University
School of Computer Science
Pittsburgh, PA 15213
jasonh@cs.cmu.edu
Chengshan Zhang
Carnegie Mellon University
Heinz School of Public Policy
Pittsburgh, PA 15213
zhangcs03@gmail.com
ABSTRACT
In this paper, we study the effectiveness of phishing black-
lists. We used 191 fresh phish that were less than 30 minutes
old to conduct two tests on eight anti-phishing toolbars. We
found that 63% of the phishing campaigns in our dataset
lasted less than two hours. Blacklists were ineffective when
protecting users initially, as most of them caught less than
20% of phish at hour zero. We also found that blacklists
were updated at different speeds, and varied in coverage, as
47% - 83% of phish appeared on blacklists 12 hours from the
initial test. We found that two tools using heuristics to com-
plement blacklists caught significantly more phish initially
than those using only blacklists. However, it took a long
time for phish detected by heuristics to appear on blacklists.
Finally, we tested the toolbars on a set of 13,458 legitimate
URLs for false positives, and did not find any instance of
mislabeling for either blacklists or heuristics. We present
these findings and discuss ways in which anti-phishing tools
can be improved.
1. INTRODUCTION
Phishing is a widespread problem that is impacting both
business and consumers. In November 2007, MessageLabs
estimated that 0.8% of emails going through their system
each day (about 3.3 billion) were phishing emails [23]. Mi-
crosoft Research recently estimated that 0.4% of recipients
were victimized by phishing attacks [12]. The annual cost to
consumers and businesses due to phishing in the US alone
is estimated to be between $350 million and $2 billion [14,
26].
To reduce phishing damage, stakeholders have enacted
their own countermeasures. Internet service providers, mail
providers, browser vendors, registrars and law enforcement
all play important roles. Due to the strategic position of
the browser and the concentration of the browser market,
web browser vendors play a key role. Web browsers are at a
strategic position at which they can warn users directly and
CEAS 2009 - Sixth Conference on Email and Anti-Spam July 16-17, 2009,
Mountain View, California USA
effectively. In addition, the browser market is fairly con-
centrated with two browsers (Internet Explorer and Fire-
fox) accounting for 95% of the total market [27]. Solutions
that these two browsers implement provide the majority of
users with a defense against phishing. A recent labora-
tory study shows that when Firefox 2 presented phishing
warnings, none of the users entered sensitive information
into phishing websites [10]. This study also recommended
changes to Internet Explorer’s phishing warnings, and Mi-
crosoft has already acted on some of them to improve IE 8’s
warning mechanism.
For browsers to truly realize their potential to protect
users, their warnings need to be accurate (low false posi-
tives) and timely. Currently, most browsers with integrated
phishing protection or anti-phishing browser toolbars rely
on blacklists of phish and, sometimes, heuristics to detect
phishing websites. Perhaps because toolbar vendors are
striving to avoid potential lawsuits from mislabeling web-
sites, blacklists are favored over heuristics due to their low
false positives.
In this paper, we study the effectiveness of phishing black-
lists. We used 191 fresh phish that were less than 30 minutes
old to conduct two tests on eight phishing toolbars. We
found that 63% of the phishing campaigns in our dataset
lasted less than two hours. Blacklists were ineffective when
protecting users initially, as most of them caught less than
20% of phish at hour zero. We also found that blacklists
were updated at different speeds, and varied in coverage, as
47% - 83% of phish appeared on blacklists 12 hours from the
initial test. We found that two tools using heuristics to com-
plement blacklists caught significantly more phish initially
than those using only blacklists. However, it took a long
time for phish detected by heuristics to appear on blacklists.
Finally, we tested the toolbars on a set of 13,458 legitimate
URLs for false positives, and did not find any instance of
mislabeling for either blacklists or heuristics.
To the best of our knowledge, this paper is the first at-
tempt to quantitatively measure the length of phishing cam-
paigns and the update speed and coverage of phishing black-
lists. Based on these measurements, we discuss opportuni-
ties for defenders, and propose ways that phishing blacklists
can be improved.
The remainder of the document is organized as follows:
section 2 introduces the background and related work, sec-
tion 3 discusses the test setup, section 4 presents our results,
and section 5 discusses ways in which phishing blacklists and
toolbars can be improved.
2. BACKGROUND AND RELATED WORK
Efforts to detect and filter phish can be implemented at
the phishing e-mail level and at the phishing website level.
To prevent phishing emails from reaching potential victims,
traditional spam-filter techniques such as bayesian filters,
blacklists, and rule based rankings can be applied. Re-
cently, some phishing-specific filters were developed as well
[1, 11]. In addition to these efforts, some protocols have
been proposed to verify the identities of email senders [9,
33]. Although these efforts are promising, many users re-
main unprotected. Filtering techniques, are imperfect and
many phishing emails still arrive in users’ inboxes. Thus, we
need to make an effort to detect phishing websites as well.
Generally speaking, research to detect phish at the web-
site level falls into two categories: heuristic approaches,
which use HTML or content signatures to identify phish,
and blacklist-based methods, which leverage human-verified
phishing URLs to reduce false positives. Our research on
blacklist measurement contributes to understanding the ef-
fectiveness of blacklists to filter phish at the website level.
2.1 Anti-Phishing Heuristics
Most of these heuristics for detecting phishing websites
use HTML, website content, or URL signatures to identify
phish. Machine learning algorithms are usually applied to
build classification models over the heuristics to classify new
webpages. For example, Garera et al. identified a set of fine-
grained heuristics from phishing URLs alone [13]. Ludl et
al. discovered a total of 18 properties based on the page
structure of phishing webpages [21]. Zhang et al. proposed
a content-based method using TF-IDF and six other heuris-
tics to detect phish [39]. Pan et al. proposed a method to
compile a list of phishing webpage features by extracting
selected DOM properties of the webpage, such as the page
title, meta description field, etc [29]. Finally, Xiang and
Hong described a hybrid phish detection method with an
identity-based detection component and a keyword-retrieval
detection component [35]. These methods achieve true pos-
itive rates between 85% and 95%, and false positive rates
between 0.43% and 12%.
The heuristics approach has pros and cons. Heuristics
can detect attacks as soon as they are launched, without the
need to wait for blacklists to be updated. However, attackers
may be able to design their attacks to avoid heuristic detec-
tion. In addition, heuristic approaches may produce false
positives, incorrectly labeling a legitimate site as phishing.
Several tools such as Internet Explorer 7 and Symantec’s
Norton 360 include heuristics in their phishing filters. Our
research examines the accuracy of these heuristics in terms
of their ability to detect phish and avoid false positives. In
addition, we examine how anti-phishing tools use heuristics
to complement their blacklists.
2.2 Phishing blacklists
Another method web browsers use to identify phish is to
check URLs against a blacklist of known phish. Blacklist
approaches have long been used in other areas.
Blacklists of known spammers have been one of the pre-
dominant spam filtering techniques. There are more than
20 widely used spam blacklists in use today. These black-
lists may contain IP addresses or domains used by known
spammers, IP addresses of open proxies and relays, country
and ISP netblocks that send spam, RFC violators, and virus
and exploit attackers [18].
Although a spam blacklist of known IP addresses or do-
main names can be used to block the delivery of phishing
emails, it is generally inadequate to block a phishing web-
site. One reason is that some phishing websites are hosted
on hacked domains. It is therefore not possible to block the
whole domain because of a single phish on that domain. So a
blacklist of specific URLs is a better solution in the phishing
scenario.
Compiling and distributing a blacklist is a multi-step pro-
cess. First, a blacklist vendor enters into contracts with var-
ious data sources for suspicious phishing emails and URLs to
be reviewed. These data sources may include emails that are
gathered from spam traps or detected by spam filters, user
reports (eg. Phishtank or APWG), or verified phish com-
piled by other parties such as takedown vendors or financial
institutions. Depending on the quality of these sources, ad-
ditional verification steps may be needed. Verification often
relies on human reviewers. The reviewers can be a dedicated
team of experts or volunteers, as in the case of Phishtank. To
further reduce false positives, multiple reviewers may need
to agree on a phish before it is added to the blacklist. For
example, Phishtank requires votes from four users in order
to classify a URL in question as a phish.
Once the phish is confirmed, it is added to the central
blacklist. In some instances, the blacklist is downloaded
to local computers. For example, in Firefox 3, blacklists of
phish are downloaded to browsers every 30 minutes [32]. Do-
ing so provides the advantage of reducing network queries,
but performance may suffer between blacklist updates.
A number of these blacklists are used in integrated browser
phishing protection [4, 15, 25], and in web browser toolbars
[6, 7, 28]. Although blacklists have low false positive rates,
they generally require human intervention and verification,
which may be slow and prone to human error. Yet this is the
most commonly used method to block phish. Our research
investigates the speed of blacklist updates and the accuracy
of blacklists?
2.3 Related Work
Several authors have studied the effectiveness of phishing
toolbars. In Nov 2006, Ludl et. al used 10,000 phishing
URLs from Phishtank to test the effectiveness of the black-
lists maintained by Google and Microsoft [21]. They found
that the Google blacklist contained more than 90% of the
live phishing URLs, while Internet Explorer contained only
67% of them. The authors concluded that blacklist-based
solutions were “quite effective in protecting users against
phishing attempts.” One limitation of this study is that the
freshness of the data feed was not reported. We overcome
this weakness by using a fresh phish feed less than 30 min-
utes old and by using an automated testbed to visit phishing
websites nine times in 48 hours to study the coverage and
update speed of blacklists. We arrive at a different conclu-
sion from this paper.
In a related study, Zhang et al. [38] tested the effective-
ness of 10 popular anti-phishing tools in November 2006
using data from Phishtank and APWG. Using 100 URLs
from each source and 516 legitimate URLs to test for false
positives, they found that only one tool was able to consis-
tently identify more than 90% of phishing URLs correctly,
but with false positive rates of 42%. Of the remaining tools,
only one correctly identified over 60% of phishing URLs from
both sources. This study had a similar weakness to the first
study, and it also had a small sample of false positives URLs.
We based our study on this setup, but made the following
improvements. First, we used a source of fresh phish less
than 30 minutes old. Second, we extend the methodology
by separately analyzing phish caught by heuristics versus
blacklists. Third, we tested phish nine times over 48 hours
to study the coverage and update speed of blacklists; Finally,
we used a much larger sample to test for false positives.
Other researchers have studied the effectiveness of spam
blacklists [18, 30, 16]. For example, Ramachandran et al.
measured the effectiveness of eight spam blacklists in real
time by analyzing a 17-month trace of spam messages col-
lected at a “spam trap” domain [30]. In their study, when-
ever a host spammed their domain, they examined whether
that host IP was listed in a set of DNSBLs in real time. They
found that about 80% of the received spam was listed in at
least one of eight blacklists, but even the most aggressive
blacklist had a false negative rate of about 50%.
In addition to research work introduced above, a number
of industry efforts were used to measure the effectiveness of
phishing toolbars as well [24, 22, 17].
3. METHODOLOGY
In this section we describe our anti-phishing testbed, ex-
plain how we collected phishing URLs for testing, and de-
scribe our evaluation methodology.
3.1 Anti-phishing Testbed
We used the anti-phishing testbed developed by Yue et
al. [39]. The testbed has a client-and-server architecture. It
includes a task manager and set of workers, each of which
is responsible for evaluating a single tool. During the test,
the task manager first retrieved a list of potential phish-
ing sites to test against. The task manager then sent each
URL to a set of workers, each of which was running a sep-
arate tool. To reduce the number of machines needed, we
ran each worker on a virtual machine. Each worker down-
loaded the specified web page, examined whether its tool
had labeled the web page as phishing or not using a simple
image-based comparison algorithm, and returned that value
back to the task manager. The image-based comparison al-
gorithm works as follows: each tool has several known states
(e.g., a red icon if it has detected a phishing site and a green
icon if it has not), and each tool can be set up to be in a
known location in the web browser. We capture screenshots
of the tools and compare relevant portions of those images
to screenshots of the tools in each of their known states. The
task manager aggregated all of the results from the workers
and tallied overall statistics, including true positives, true
negatives, false positives, false negatives, and sites that no
longer exist.
3.2 Phishing Feed
We obtained the phishing URLs for this study from the
University of Alabama (UAB) Phishing Team’s data reposi-
tory. UAB has relationships with several sources who share
Table 1: The top 10 brands that appear in our data
set. Total phish: 191
Institutions
Victimized
# of phish Percentage
Abbey 47 24.9%
Paypal 21 11.1%
Lloyds TSB 17 9.0%
Bank of America 14 7.4%
Halifax 13 6.9%
Capital One 11 5.8%
New Egg Bank 11 5.8%
HSBC 7 3.7%
eBay 6 3.2%
Wachovia 6 3.2%
Wellsfargo 6 3.2%
their spam as part of the UAB Spam Data Mine. One of
the largest sources is a spam-filtering company that pro-
vides services ranging from small business to the Fortune
500 companies located in more than 80 countries. This com-
pany reviews well over one billion emails each day and uses
a combination of keyword searching and proprietary heuris-
tics to identify potential phish. They then extract the URLs
from these emails and send these URLs to UAB in batches
every four minutes.
UAB manually tested the URLs they received from the
spam-filtering company to determine if they were phishing
URLs. If a URL was a phish and had not been reported to
UAB before, it was put on a list to be tested by the testbed.
UAB sent this list to the testbed every 20 minutes.1The
testbed began testing each batch of URLs within 10 minutes
of receipt.
Because UAB received phish URLs every four minutes,
they were able to label each URL with the four-minute time
segments in which it was seen. Thus they could identify
the first segment in which a URL was seen and identify
subsequent time segments in which the same URL was re-
ported. This approach to recording phishing URLs allows us
to determine the length of each spam campaign — the time
period over which phishers send out emails with the same
phishing URL. If the spam campaign lasts for only one day,
the effectiveness of anti-phishing tools on subsequent days
is not as important as effectiveness on day one. While some
users will read phishing emails days after the initial email
send time, most users will read phishing emails within a few
hours. Thus the most critical time to protect is when emails
are still being actively sent by the spammer.
We collected and tested a total of 191 verified phishing
URLs during this study. Table 1 lists the top 10 brands
that appear in our data set.
3.3 Evaluation Procedure
Tools tested: We tested eight anti-phishing toolbars that
use various blacklists and heuristics. They are Microsoft In-
ternet Explorer version 7 (7.0.5730.11), version 8 (8.0.6001.
18241), Firefox 2 (2.1.0.16), Mozilla Firefox 3 (3.0.1), Google
Chrome (0.2.149.30), Netcraft toolbar (1.8.0), McAfee Sitead-
1Sometimes randomization was introduced to URLs to at-
tempt to defeat exact matching. We do not consider two
URLs as unique if their difference is only in the attribute
portion of the URLs.
0%
20%
40%
60%
80%
100%
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
Length of phishing campaign (days)
March - April (5491
URLs)
Testbed data (191
URLs)
0%
10%
20%
30%
40%
50%
60%
70%
2
4
6
8
10
12
14
16
18
20
22
24
Percentage of total URLs
Length of phishing campaign (hours)
March - April (5491
URLs)
Testbed data (191
URLs)
Figure 1: Length of phishing campaign, measured as the time between the first and last appearance of the
phish in our source report. The graph on the left shows length of phishing campaigns in days. The graph on
the right shows length of phishing campaigns in hours for those campaigns that last one day or less.
visor (2.8.255 free version), and Symantec Norton 360 (13.3.5).
Except for Internet Explorer 7 and Symantec, all of these
tools use blacklists only. Those two toolbars that use heuris-
tics to complement their blacklists trigger different warnings
when a phish is detected by heuristics versus blacklist. We
configured all tools with their default settings, except for
Firefox 2, in which case we used the “Ask Google” option
to query the central blacklist server every time instead of
downloading phishing blacklists every 30 minutes.2
Testbed setup: We configured 4 PCs running Intel Core
2 CPU 4300 @ 1.80 GHz. Each PC ran two instances of
VMware, each configured with a 720MB RAM and 8GB
hard drive. For each toolbar, we ran the task manager
and workers on the same machine to avoid network latency.
Since some of the toolbars use local blacklists, we left every
browser open for six to eight hours before each test to down-
load blacklists, and we left the browser open for 10 minutes
between every run during the test. We chose the eight-hour
period because the necessary blacklists would download re-
liably within this time. Thus we are investigating the best
case scenario for blacklist effectiveness.
Test period: We ran the test for two to three hours on
October 2, 8, and 9, 2008 and on December 3, 4, 5, and 15,
2008. During this time, batches of new unique phish were
sent to the testbed every 20 minutes. The testbed began
testing them 10 minutes after receiving the phish, leaving a
total lapse time of approximately 30 minutes. Each worker
opened up the desired browser with toolbars for 30 seconds
before taking the screenshot. For each URL, we tested the
toolbars’ performance at hour 0, 1, 2, 3, 4, 5 12, 24 and 48.
We cleared the browser cache every hour. We collected and
tested 90 URLs in October and 101 URLs in December.
Post verification: After the data was compiled, we man-
ually reviewed every website that toolbars labeled as legiti-
mate. This step was necessary because some host companies
did not issue 404 errors when taking down a phish. Instead,
they replaced it with their front page. In this case, the tool-
bar will mark the website as legitimate, but in fact it was
the phishing website being taken down.
2This feature is no longer available for versions after Firefox
2 update 19.
4. RESULTS
4.1 Length of Phishing Campaign
We define the length of a phishing campaign (LPC) as
the time lapse between the first time a phish appeared in
our source report and the last time that phish appeared in
our source report. As mentioned in Section 3.2, we received
reports from our source every 4 minutes.
Of the 191 phish we used to test phishing blacklists, 127
of them, 66%, had an LPC less than 24 hours, indicating
that their corresponding phishing campaign lasted less than
24 hours. A total of 25 URLs had an LPC between 24 and
48 hours, and the remaining URLs had an LPC between 3
and 23 days. Examining the first day’s data more closely,
we found that 109 URLs were spammed only in a two-hour
period, accounting for 63% of the URLs in this dataset.
To validate our finding, we calculated the LPC for 5491
phish provided by the same source and verified by UAB
from February 17 through April 13, 2009. Similar to our
testbed dataset result, we found that 66% of these phish
had an LPC less than 24 hours, 14.5% had an LPC between
24 and 48 hours, and the remaining 19% of URLs had an
LPC between 3 and 47 days. We found that 44% of the
URLs had an LPC less than two hours. Figure 1 shows the
LPC combined LPC results for our two datasets.
It is important to note that the LPC does not necessarily
correspond to the time a phishing site is live. In fact, we
found that compared to the length of a phishing campaign,
the time to take websites down is generally much slower.
By hour 2, 63% of phishing campaigns in our dataset were
finished, but only 7.9% of those phish were taken down. As
shown in Table 2, on average, 33% of the websites were
taken down within 12 hours, around half were taken down
after 24 hours, and 27.7% were still alive after 48 hours.
Our LPC findings demonstrate the freshness of our data
and show that current takedown efforts lag behind phishing
campaigns. In the test conducted by Ludl et al., 64% of the
phish were already down when they conducted their test
[21], whereas in our sample, only 2.1% of phish were aleady
down in our initial test.
4.2 Blacklist Coverage
In this section, we present the results of two tests per-
formed in October and December of 2008 (Figures 2 and
3). We found that blacklists were ineffective when protect-
Table 2: Website takedown rate vs. length of phish-
ing campaign (LPC). LPC is measured as the time
between the first and last appearance of the phish
in our source report. Website takedown rate at each
hour is measured by the number of phish taken down
at that hour divided by total phish.
Hours % of website
taken down
% Phishing
Campaign
finished
0 2.1% 0%
2 7.9% 63%
4 17.8% 67%
5 19.9% 70%
12 33.0% 72%
24 57.6% 75%
48 72.3% 90%
ing users initially, as most of them caught less than 20%
of phish at hour zero. We also found that blacklists were
updated at different speeds, and varied in coverage, as 47%
to 83% of phish appeared on blacklists 12 hours from the
initial test in October.
At any given hour, we define the coverage of the blacklist
as:
No. of phish appearing on blacklist
T otal phish phish that were taken down
We found that coverage rates of some of the blacklists were
highly correlated. Firefox 2, 3 and Google Chrome appear
to use the same blacklists. Internet Explorer 7 and 8 also
share a blacklist. In our analysis, we combined the results
for those tools that use the same blacklists.
In our October test, all of the blacklists contained less
than 20% of the phish initially. New phish appeared on
the blacklists every hour, suggesting that the blacklists were
updated at least once every hour.
One notable improvement is the Symantec blacklist. In
hour 0, their blacklist caught as much phish as the others,
but in hour 1 it caught 73% of the phish, 2 to 3 times more
than the rest of the toolbars. This difference is also statis-
tically significant until 12 hours from the initial test.3One
possible explanation is that Symantec uses results from their
heuristics to facilitate rapid blacklist updates [2].
We observed that the coverage of the Firefox and Netcraft
blacklist is consistently highly correlated. Five hours af-
ter our initial test in October, 91% of the URLs that ap-
peared in the Netcraft blacklist also appeared in the Firefox
blacklist, and 95% of the URLs that appeared in the Fire-
fox blacklist also appeared in Netcraft. The two blacklists
are consistently highly correlated every hour except for our
initial test in December. This suggests that the two black-
lists have overlap in some of their data sources or have data
sources with similar characteristics. Others were less corre-
lated, phish on Internet Explorer only appear 45% of time
on Firefox blacklist and 73% vice versa, suggesting they use
different feeds with not much overlap.
We found that the Firefox blacklist was more comprehen-
sive than the IE blacklist up to the first 5 hours, and the
Symantec blacklists performed significantly better than the
rest of the toolbars from hour 2 to 12. After 12 hours, the
3ANOVA, p <0.05
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0
1
2
3
4
5
12
24
48
Percentage of Phish in blacklist
Hours After Receiving URL
Firefox 2,3
Chrome
blacklist
IE8 / IE7
blacklist
NetCraft
blacklist
McAfee Site
Advisor
blacklist
Symantec
Norton 360
blacklist
Figure 2: Percentage of phish caught by various
blacklists in October 2008 data. This percentage
is defined as the total number of phish on the black-
list divided by the total phish that were alive. URLs
that were taken down at each hour were excluded
in the calculation. Total phish at hour 0 was 90.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0
1
2
3
4
5
12
24
48
Percentage of Phish in blacklist
Hours After Receiving URL
Firefox 2,3
Chrome
blacklist
IE8 / IE7
blacklist
NetCraft
blacklist
McAfee Site
Advisor
blacklist
Symantec
Norton 360
blacklist
Figure 3: Percentage of phish caught by various
blacklists in December 2008 data. This percentage
is defined as the total number of phish on the black-
list divided by the total phish that were alive. URLs
that were taken down at each hour were excluded
in the calculation. Total phish at hour 0 was 101.
differences were no longer statistically significant. Figure 2
shows this result in detail.
In our December dataset, we observed similar trends in
terms of coverage for some toolbars. However, Firefox and
Netcraft performed much better here than in October. The
Firefox blacklist contained 40% of phish initially and by hour
2, 97% of phish were already on the blacklist. One reason for
this difference could be that during this period, the two tools
acquired new sources that were similar to our feed. Finally
we did not observe statistically significant improvement in
other toolbars.
Finally, we examined phish that the IE 8 blacklist and
Firefox blacklist missed five hours after our initial test in
October. We observed that at hour 5 the IE 8 blacklist
missed 74 phish, of which 73% targeted foreign financial
institutions. The Firefox blacklist missed 28 phish, of which
64% targeted foreign financial institutions. However, given
our limited sample size, we did not observe a statistically
significant difference in the speed at which phish targeting
US institutions and foreign institutions were added to the
blacklist. There were some notable differences between the
phish missed by the IE8 blacklist and Firefox. For example,
IE8 missed 21 Abbey Bank phish while Firefox missed only
4 Abbey Bank phish.
4.3 False Positives
We compiled a list of 13,458 legitimate URLs to test for
false positives. The URLs were compiled from three sources,
detailed below.
A total of 2,464 URLs were compiled by selecting the lo-
gin pages of sites using google’s inurl function. Specifically,
we used Google to search for pages where one of the follow-
ing login-related strings appears in the URL: login, logon,
signin, signon, login.asp. A script was used to visit each
URL to determine if it was running and also whether it in-
cluded a submission form. These pages were selected to see
whether tools can distinguish phishing sites from the legiti-
mate sites they commonly spoof. Ludl et al. also used this
technique to gather their samples [38].
A total of 994 URLs were compiled by extracting 1000
emails reported to APWG on August 20, 2008. Out of the
1000 emails we scanned, we removed URLs that were down
at the time of testing or URLs used in spam campaigns
through a spam URL blacklist service uribl.com. This left
us with 1076 URLs, which comprised a host of phish, mal-
ware, some spam and legitimate sites. We manually checked
each of these URLs and removed phishing URLs, leaving 994
verified non-phishing URLs. We ran the test for false pos-
itives within 24 hours after retrieval. The list was selected
because it represented a source of phishing feeds that many
blacklist vendors use, and thus we would expect to have
more false positives than other sources. While spam mes-
sages may be unwanted by users, the URLs in these messages
should not be classified as phishing URLs.
Similarly, we compiled 10,000 URLs by extracting non-
phishing URLS from the list of spam/phish/malware URLS
sent to UAB’s spam data mine on December 1-15, 2008. We
tested these URLs within one week of retrieval. Again, this
represents a source of phishing feeds that blacklist vendors
would likely receive, and thus we would expect this source
to have more false positives than other sources.
We did not find a single instance of mislabeling legitimate
login sites with phish. Among the 1,012 URLs from APWG,
there was one instance where a malware website was labeled
as a phish by the Firefox blacklist. Finally we did not find
any false positives in the 10,000 URLs from the UAB spam
data mine.
Compared with previous studies [38], our study tested an
order of magnitude more legitimate URLs for false positives,
yet our findings on false positives are the same: phishing
blacklists have close to zero false positives.
Our results differ from a 2007 HP research study [24] in
which the author obtained the Google blacklist and checked
each entry to see if it was a false positive. This study re-
ports that the Google blacklist contains 2.62% false posi-
tives. However, the methodology for verifying false posi-
tives is not fully explained and the list of false positives is
not included in the report. In our test of false positives, we
manually verified each URL labelled as phish and double-
checked it with one of the known repositories of phish on
the Internet.
It is also possible that Google changed their techniques
Table 3: Accuracy and false positives of heuristics
Detected
by
blacklist
at hour 0
Detected
by
heuristics
false
posi-
tives
IE7 - Oct 08 23% 41% 0.00%
Symantec - Oct 08 21% 73 % 0.00%
IE7 - Dec 08 15% 25% 0.00%
Symantec - Dec 08 14% 80% 0.00%
or sources for phishing URLs since 2007. For future work,
we would like to verify the Google blacklist using the same
method used in the HP study [24]. However, Google’s black-
list is no longer publicly available.
4.4 Accuracy of Heuristics
Heuristics are used in Symantec’s Norton 360 toolbar and
Internet Explorer 7. In this section, we report on their per-
formance.
We found that tools that use heuristics were able to detect
significantly more phish than those that use only blacklists.
At hour 0, Symantec’s heuristics detected 70% of phish,
while Internet explorer 7’s heuristics caught 41% of phish.
This is two to three times the amount of phish caught by
the blacklists in that period. Furthermore, the heuristics
triggered no false positives for the 13,458 URLs we tested.
Table 3 summarizes these results.
We also found that IE 7 and Symantec use heuristics
somewhat differently. Both tools display a transient and
less severe warning for possible phish detected by heuris-
tics. However, Symantec’s toolbar introduced a feedback
loop. When a user visits a possible phish which is detected
by heuristics and is not on the blacklist then the URL is
sent to Symantec for human review [2]. In our test, 95%
of the phish detected by Symantec heuristics appeared on
the Symantec blacklist at hour 1, while none of the phish
detected by IE7 heuristics appeared on the IE blacklist at
hour 1.
This feedback loop is important at the user interface level.
If a phish is detected by heuristics, toolbars display less se-
vere, passive warnings to avoid potential liability. However,
once the phish is verified as a phishing site by human, tool-
bars can block the content of the web page completely (ac-
tive warnings). A recent laboratory study [10] showed that
users only heed active phishing warnings and ignore passive
warnings.
4.5 Total Protection
Finally, we consider protection offered to users by phishing
toolbars. We define protection rate as:
phish on blacklist +detected by heuristics +taken down
T otal phish
Figures 4 and 5 present our findings. We found that at
hour 0, tools that use heuristics to complement blacklists of-
fered much better protection than tools that use only black-
lists. By hour 48 a large fraction of phishing sites are taken
down, and the tools we tested detected most of thelive phish-
ing sites. In the December test we found that by hour 48
most tools offered near-perfect protection.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0
1
2
3
4
5
12
24
48
Protection Rate
Hours After Receiving URL
Firefox 2,3,
Chrome
IE 7
IE 8
NetCraft
McAfee Site
Advisor
Symantec
Norton 360
Sites taken
down
Figure 4: Protection rate for the October run of 91
phishing URLs. Protection rate is defined as total
number of phish caught by blacklist or heuristic plus
phish taken down divided by the total number of
phish.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0
1
2
3
4
5
12
24
48
Protection Rate
Hours After Receiving URL
Firefox 2,3
Chrome
IE 7
IE8
NetCraft
McAfee Site
Advisor
Symantec
Norton 360
Sites taken
down
Figure 5: Protection Rate for the December run of
101 phishing URLs. Protection rate is defined as to-
tal number of phish caught by blacklist or heuristic
plus phish taken down divided by the total number
of phish.
5. DISCUSSION
5.1 Limitations
There are a few limitations to our study. First, all of
our URLs came from a single anti-spam vendor, therefore
the URLs received may not be representative of all phish.
Second, all the URLs were detected by a spam vendor and
presumably never reached users protected by that vendor.
However, as not all users are protected by commercial spam
filters, it is important that browsers also detect these phish-
ing URLs. Second, these URLs were extracted only from
email and did not include other attack vectors such as In-
ternet messenger phishing.
5.2 Opportunities for Defenders
The window of opportunity for defenders can be defined
as the length of the phishing campaign plus the time lapse
between the time a user receives a phishing email and the
time the user opens the email. Users are protected if they
either do not receive any phish or if, by the time they click on
a phish, the website is blocked by browsers or taken down.
As shown in Section 4.1, 44% of phishing campaigns lasted
less than 2 hours. Recent research shows that, for a non-
negligible portion of the Internet population, the time be-
tween when a user receives and opens a phishing email is
less than two hours. For example, Kumaraguru et al. sent
simulated phishing emails to students and staff at a U.S.
University and educated them once they clicked on the link
in the email. They found that 2 hours after the phishing
emails were sent, at least half the people who would eventu-
ally click on the phishing link had already done so; after 8
hours, nearly everyone (90%) who would click had already
done so [19]. Their study also found that people with techni-
cal skills were equally likely to fall for phish than their non-
technical counterparts. In a recent national survey, AOL
asked 4,000 email users aged 13 and older about their email
usage. The survey found that 20% of respondents check
their email more than 10 times a day, and 51% check their
email four or more times a day (up from 45% in 2007) [3].
Assuming that those who check their emails do so at a uni-
form rate, 20% of people check their emails once every hour
and half, and 51% check their email once every four hours4.
These findings suggest that the critical window of opportu-
nity for defense is between the start of a phishing campaign
and 2 to 4 hours later.
Our findings have several implications for phishing coun-
termeasures. First, anti-phishing efforts should be more fo-
cussed on upstream protections such as blocking phish at
the email gateway level. At the browser level, this effort
should be focused on updating the blacklist more quickly
or making better use of heuristic detection. Secondly, more
research and industry development efforts to effectively edu-
cate users (eg. [20, 34]) and to design trusted user interfaces
(eg. [8, 36, 31, 37]) are needed to overcome the initial limited
blacklist coverage problem.
5.3 Improving blacklists
The first step to improving blacklists is earlier detection
of more phishing URLs. As shown in Figure6, potential
phishing URLs can be gathered from URLs extracted from
spam/phishing filters at e-mail gateways, URLs extracted
from users’ reports of phishing emails or websites, and phish-
ing websites identified by toolbar heuristics (Figure6). Each
of these sources have different coverage. We first discuss
ways to improve each source.
E-mail gateway filters are the first point of contact with
phishing emails. Given the limited window of opportunity
for defenders, as discussed in section 4.1, vendors should
focus their gathering efforts here. However, regular spam
filters are not sufficient as they contain a lot of spam that
would require much human effort to filter. To improve detec-
tion of phish at this level, we recommend using spam filters
as the first line of defense, and then applying heuristics de-
veloped to detect phishing websites as a second layer. Once
a suspicious URL is marked by both sources, it should be
submitted for human review. As residential email accounts
and business email accounts receive a different distribution
of emails, to get the widest coverage vendors should collect
URLs from a variety of sources.
User reports of phishing emails and websites are likely
to contain phish that spam filters missed. Therefore user
4Assuming eight hour sleep time.
Email Gateways
Email Gateways
Email gateways
User Inboxes /
email client
User Inboxes /
email client
User inboxes
Queue of
possible phish
Browser anti-
phishing tools
1 2 3
4
User Report
Figure 6: High-level view of sources of URLs for phishing blacklists. Potential phishing URLs can be collected
from (1) URLs extracted from spam/phishing filters at mail exchange gateways, (2) URLs extracted from
user reports of phishing email, (3) phishing websites identified by heuristics, and finally (4) user reports of
phishing websites.
reports should be use to complement email gateway spam
filter data. However, users may lack incentives to report and
verify phish. User incentives (e.g. points, prizes) may help
overcome this problem.
Finally, we recommend browser anti-phishing tools use
heuristics to improve their blacklists. This method is analo-
gous to early warning systems for disease outbreaks. When
a user visits a possible phish that is detected by heuristics
and is not on the blacklist, the tool can send the URL for
human review and adds the URL to the blacklist once ver-
ified. This system would be likely to succeed based on the
fact that some users check their email much more frequent
than others [3].
5.4 Use of heuristics
As shown in Section 4.4 and 4.5, the two tools using
heuristics to complement blacklists caught significantly more
phish initially than those using only blacklists. Given the
short length of phishing campaigns, there is great value in
using heuristics. However, vendors may be concerned about
the greater possibility of false positives when using heuristics
and potential liability for mislabeling websites.
In a court case in 2005, Associated Bank-Corp sued Earth-
link after the Earthlink anti-phishing software ScamBlocker
blocked the bank’s legitimate page [5]. Earthlink was able to
fend off the suit on the basis that it was using a blacklist of
phish provided by a third party, thus it cannot be held liable
as a publisher when that information is erroneous under a
provision in the Communication Decency Act. However, if
a toolbar uses heuristics to detect and block a phish that
turns out to be a false positive, the toolbar vendor may be
regarded as “a publisher” under CDA, and thus not immu-
nized.
In our testing, we did not detect any false positives trig-
gered by either the blacklists or heuristics. However, it is
the potential of false positives that worries vendors. To
overcome this liability issue, we recommend vendors first
use heuristics to detect phish and then have experts verify
them. We also encourage more discussion about the liability
associated with providing phishing blacklists and heuristics.
So far, there has been no test case on this matter. Lack of
clarity on these matters could further reduce vendors’ incen-
tives to apply heuristics. Major vendors such as Microsoft
or Firefox, which offer protection to the majority of users,
do not lose money directly from phishing. However, if they
implement heuristics and get sued, they could potentially
lose millions of dollars in restitution and legal fees.
6. ACKNOWLEDGMENTS
This work was supported in part by National Science
Foundation under grant CCF-0524189, and by the Army Re-
search Office grant number DAAD19-02-1-0389. The views
and conclusions contained in this document are those of the
authors and should not be interpreted as representing the
official policies, either expressed or implied, of the National
Science Foundation or the U.S. government.
7. REFERENCES
[1] S. Abu-Nimeh, D. Nappa, X. Wang, and S. Nair. A
comparison of machine learning techniques for
phishing detection. In eCrime ’07: Proceedings of the
anti-phishing working groups 2nd annual eCrime
researchers summit, pages 60–69, New York, NY,
USA, 2007. ACM.
[2] Andy Patrizio. Symantec readies phishing protection
software, august 7, 2006. visited jan 1, 2009.
http://www.smallbusinesscomputing.com/news/
article.php/3624991.
[3] AOL Press Release. Its 3 a.m. are you checking your
email again? july 30, 2008. visited jan 1, 2009.
http://corp.aol.com/press-releases/2008/07/
it-s-3-am- are-you- checking-your-email-again.
[4] Apple Inc. . Visited jan 1, 2009. http:
//www.apple.com/safari/features.html#security.
[5] ASSOCIATED BANK-CORP v. EARTHLINK, INC.
Memorandum and order, 05-c-0233-s.
http://www.iplawobserver.com/cases/2005-09-14_
Associated_Banc_Corp_CDA_Section_230.pdf.
[6] N. Chou, R. Ledesma, Y. Teraguchi, and J. C.
Mitchell. Client-side defense against web-based
identity theft. In Proceedings of The 11th Annual
Network and Distributed System Security Symposium
(NDSS ’04)., 2004.
[7] Cloudmark Inc. Visited jan 1, 2009.
http://www.cloudmark.com/desktop/download/.
[8] R. Dhamija and J. D. Tygar. The battle against
phishing: Dynamic Security Skins. In SOUPS ’05:
Proceedings of the 2005 symposium on Usable privacy
and security, pages 77–88, New York, NY, USA, 2005.
ACM Press.
[9] DKIM Signatures, RFC 4871 . Visited jan 1, 2009.
http://dkim.org/specs/rfc4871-dkimbase.html.
[10] S. Egelman, L. F. Cranor, and J. Hong. You’ve been
warned: an empirical study of the effectiveness of web
browser phishing warnings. In CHI ’08: Proceeding of
the twenty-sixth annual SIGCHI conference on Human
factors in computing systems, pages 1065–1074, New
York, NY, USA, 2008. ACM.
[11] I. Fette, N. Sadeh, and A. Tomasic. Learning to detect
phishing emails. In WWW ’07: Proceedings of the 16th
international conference on World Wide Web, pages
649–656, New York, NY, USA, 2007. ACM Press.
[12] D. Florencio and C. Herley. A large-scale study of web
password habits. In WWW ’07: Proceedings of the
16th international conference on World Wide Web,
pages 657–666, New York, NY, USA, 2007. ACM
Press.
[13] S. Garera, N. Provos, M. Chew, and A. D. Rubin. A
framework for detection and measurement of phishing
attacks. In WORM ’07: Proceedings of the 2007 ACM
workshop on Recurring malcode, pages 1–8, New York,
NY, USA, 2007. ACM.
[14] Gartner Research. Number of Phishing E-Mails Sent
to U.S. Adults Nearly Doubles in Just Two Years.
Press Release, 2006.
http://www.gartner.com/it/page.jsp?id=498245.
[15] Google Inc. Google safe browsing for firefox. visited
jan 1, 2009. http:
//www.google.com/tools/firefox/safebrowsing/.
[16] Jeff Makey. Blacklists compared, april 11, 2009.
retrieved april 14, 2009.
http://www.sdsc.edu/~jeff/spam/cbc.html.
[17] John E. Dunn. Ie 7.0 tops study of anti-phishing tools
, 29 september 2006, techworld. retrieved april 1,
2009. http://www.techworld.com/security/news/
index.cfm?newsID=6995&pagtype=sam.
[18] J. Jung and E. Sit. An empirical study of spam traffic
and the use of dns black lists. In IMC ’04: Proceedings
of the 4th ACM SIGCOMM conference on Internet
measurement, pages 370–375, New York, NY, USA,
2004. ACM.
[19] P. Kumaraguru, J. Cranshaw, A. Acquisti, L. Cranor,
J. Hong, M. A. Blair, and T. Pham. School of phish:
A real-word evaluation of anti-phishing training. In
Under Review, 2009.
[20] P. Kumaraguru, Y. Rhee, A. Acquisti, L. F. Cranor,
J. Hong, and E. Nunge. Protecting people from
phishing: the design and evaluation of an embedded
training email system. In CHI ’07: Proceedings of the
SIGCHI conference on Human factors in computing
systems, pages 905–914, New York, NY, USA, 2007.
ACM Press.
[21] C. Ludl, S. Mcallister, E. Kirda, and C. Kruegel. On
the effectiveness of techniques to detect phishing sites.
In DIMVA ’07: Proceedings of the 4th international
conference on Detection of Intrusions and Malware,
and Vulnerability Assessment, pages 20–39, Berlin,
Heidelberg, 2007. Springer-Verlag.
[22] Matthew Broersma. Firefox 2 tops ie 7 in
anti-phishing study, 15 november 2006, techworld.
retrieved april 1, 2009. http://www.techworld.com/
security/news/index.cfm?newsid=7353.
[23] MessageLabs. Messagelabs Intelligence: 2007 Annual
Security Report. MessageLabs Intelligence, 2007.
http://www.messagelabs.com/mlireport/MLI_2007_
Annual_Security_Report.pdf.
[24] Michael Sutton. A tour of the google blacklist, august
7, 2006. visited jan 1, 2009. http://www.communities.
hp.com/securitysoftware/blogs/msutton/archive/
2007/01/04/A-Tour-of-the- Google-Blacklist.
aspx?jumpid=reg_R1002_USEN.
[25] Microsoft Corporation. Phishing filter: Help protect
yourself from online scams.
http://www.microsoft.com/protect/products/
yourself/phishingfilter.mspx.
[26] T. Moore and R. Clayton. Examining the Impact of
Website Take-down on Phishing. In eCrime ’07:
Proceedings of the 2007 e-Crime Researchers summit,
pages 1–13, New York, NY, USA, 2007. ACM Press.
[27] Net Applications. Inc. . Browser market share q4,
2008. visited jan 1, 2009. http://marketshare.
hitslink.com/report.aspx?qprid=0&qpmr=15&qpdt=
1&qpct=3&qpcal=1&qptimeframe=Q&qpsp=39.
[28] Netcraft Inc. Netcraft anti-phishing toolbar. visited
jan 1, 2009. http://toolbar.netcraft.com/.
[29] Y. Pan and X. Ding. Anomaly based web phishing
page detection. Computer Security Applications
Conference, Annual, 0:381–392, 2006.
[30] A. Ramachandran and N. Feamster. Understanding
the network-level behavior of spammers. In
SIGCOMM ’06: Proceedings of the 2006 conference on
Applications, technologies, architectures, and protocols
for computer communications, pages 291–302, New
York, NY, USA, 2006. ACM.
[31] B. Ross, C. Jackson, N. Miyake, D. Boneh, and J. C.
Mitchell. Stronger password authentication using
browser extensions. In Usenix security, 2005.
[32] F. Schneider, N. Provos, R. Moll, M. Chew, and
B. Rakowski. Phishing protection: Design
documentation. visited jan 1, 2009.
https://wiki.mozilla.org/Phishing_Protection:
_Design_Documentation.
[33] Sender Policy Framework Specifications (RFC 4408).
Visited jan 1, 2009.
http://www.openspf.org/Specifications.
[34] S. Sheng, B. Magnien, P. Kumaraguru, A. Acquisti,
L. F. Cranor, J. Hong, and E. Nunge. Anti-phishing
phil: the design and evaluation of a game that teaches
people not to fall for phish. In SOUPS ’07:
Proceedings of the 3rd symposium on Usable privacy
and security, pages 88–99, New York, NY, USA, 2007.
ACM.
[35] G. Xiang and J. I. Hong. A hybrid phish detection
approach by identity discovery and keywords retrieval.
In To appear in WWW ’09: Proceedings of the 18th
international conference on World Wide Web, New
York, NY, USA, 2009. ACM Press.
[36] Z. E. Ye, S. Smith, and D. Anthony. Trusted paths for
browsers. ACM Trans. Inf. Syst. Secur., 8(2):153–186,
2005.
[37] K.-P. Yee and K. Sitaker. Passpet: convenient
password management and phishing protection. In
SOUPS ’06: Proceedings of the second symposium on
Usable privacy and security, pages 32–43, New York,
NY, USA, 2006. ACM Press.
[38] Y. Zhang, S. Egelman, L. Cranor, and J. Hong.
Phinding Phish: An Evaluation of Anti-Phishing
Toolbars. In Proceedings of the ISOC Symposium on
Network and Distributed System Security. Internet
Society, 2007.
[39] Y. Zhang, J. I. Hong, and L. F. Cranor. Cantina: a
content-based approach to detecting phishing web
sites. In WWW ’07: Proceedings of the 16th
international conference on World Wide Web, pages
639–648, New York, NY, USA, 2007. ACM Press.
... Traditional detection methods include blacklist [3] and whitelist [4], [5] techniques that block emails from known malicious sources or only allow emails from trusted senders. However, phishing email attacks are evolving to bypass these traditional detection methods by closely mimicking legitimate communications and using sophisticated evasion techniques, such as dynamic URLs and AI-generated content [6]. ...
Preprint
Phishing attacks remain a critical cybersecurity threat. Attackers constantly refine their methods, making phishing emails harder to detect. Traditional detection methods, including rule-based systems and supervised machine learning models, either rely on predefined patterns like blacklists, which can be bypassed with slight modifications, or require large datasets for training and still can generate false positives and false negatives. In this work, we propose a multi-agent large language model (LLM) prompting technique that simulates debates among agents to detect whether the content presented on an email is phishing. Our approach uses two LLM agents to present arguments for or against the classification task, with a judge agent adjudicating the final verdict based on the quality of reasoning provided. This debate mechanism enables the models to critically analyze contextual cue and deceptive patterns in text, which leads to improved classification accuracy. The proposed framework is evaluated on multiple phishing email datasets and demonstrate that mixed-agent configurations consistently outperform homogeneous configurations. Results also show that the debate structure itself is sufficient to yield accurate decisions without extra prompting strategies.
... Browser-based phishing detection such as Google Safe Browsing (GSB) [52] and Microsoft Defender SmartScreen [39] identify and alert users of potential threats. This type of phishing detection, due to its scale and always-on nature, serves a particularly important role [47]. Furthermore, organizations have increasingly adopted "takedown" services that work to remove phishing websites, curtailing their malicious activities [4,5]. ...
... The general idea of using a filter list is widely prevalent in computer science, especially with Web technologies. Approaches to blocking unwanted behavior can be found in other areas as well [17,46,52]. ...
Article
Full-text available
Filter lists are used by various users, tools, and researchers to identify tracking technologies on the Web. These lists are created and maintained by dedicated communities. Aside from popular blocking lists (e.g., EasyList), the communities create region-specific block-lists that account for trackers and ads that are only common in these regions. The lists aim to keep the size of a general blocklist minimal while protecting users against region-specific trackers. In this paper, we perform a large-scale Web measurement study to understand how different region-specific filter lists (e.g., a block-list specifically designed for French users) protect users when visiting websites. We define three privacy scenarios to understand when and how users benefit from these regional lists and what effect they have in practice. The results show that although the lists differ significantly, the number of rules they contain is unrelated to the number of blocked requests. We find that the lists' overall efficacy varies notably. Filter lists also do not meet the expectation that they increase user protection in the regions for which they were designed. Finally, we show that the majority of the rules on the lists were not used in our experiment and that only a fraction of the rules would provide comparable protection for users.
... Y. Zhang et al. [15] introduced a content-based method for phishing detection by analyzing web page textual content, generating TF-IDF-based lexical signatures, and leveraging Google search for domain name comparisons. S. Sheng et al. [16] evaluated phishing blacklists and highlighted their limited initial detection accuracy (<20% within the first hour). They also proposed faster updates combined with heuristic integration for enhanced early detection. ...
Article
Full-text available
Phishing webpage detection is critical in combating cyber threats, yet distinguishing between benign and phishing webpages remains challenging due to significant feature overlap in the representation space. This study introduces a reinforced Triplet Network to optimize disentangled representation learning tailored for phishing detection. By employing reinforcement learning, the method enhances the sampling of anchor, positive, and negative examples, addressing a core limitation of traditional Triplet Networks. The disentangled representations generated through this approach provide a clear separation between benign and phishing webpages, substantially improving detection accuracy. To achieve comprehensive modeling, the method integrates multimodal features from both URLs and HTML DOM Graph structures. The evaluation leverages a real-world dataset comprising over one million webpages, meticulously collected for diverse and representative phishing scenarios. Experimental results demonstrate a notable improvement, with the proposed method achieving a 6.7% gain in the F1 score over state-of-the-art approaches, highlighting its superior capability and the dataset’s critical role in robust performance.
... However, for URLs, the situation is different: technical countermeasures (e.g., URL blacklists, machine learning techniques) lag behind attackers' increasing sophistication [8,9]. Mainstream user interfaces offer little help to users besides showing the full URL in address bars and small tool-tips. ...
Preprint
The most widespread type of phishing attack involves email messages with links pointing to malicious content. Despite user training and the use of detection techniques, these attacks are still highly effective. Recent studies show that it is user inattentiveness, rather than lack of education, that is one of the key factors in successful phishing attacks. To this end, we develop a novel phishing defense mechanism based on URL inspection tasks: small challenges (loosely inspired by CAPTCHAs) that, to be solved, require users to interact with, and understand, the basic URL structure. We implemented and evaluated three tasks that act as ``barriers'' to visiting the website: (1) correct click-selection from a list of URLs, (2) mouse-based highlighting of the domain-name URL component, and (3) re-typing the domain-name. These tasks follow best practices in security interfaces and warning design. We assessed the efficacy of these tasks through an extensive on-line user study with 2,673 participants from three different cultures, native languages, and alphabets. Results show that these tasks significantly decrease the rate of successful phishing attempts, compared to the baseline case. Results also showed the highest efficacy for difficult URLs, such as typo-squats, with which participants struggled the most. This highlights the importance of (1) slowing down users while focusing their attention and (2) helping them understand the URL structure (especially, the domain-name component thereof) and matching it to their intent.
... Anti-phishing blacklists are a method used by most modern browsers to verify whether the URLs users access are associated with known phishing websites [52]. These blacklists are among the most important mechanisms for protecting Internet users from phishing attacks [16,19]. ...
Article
Full-text available
Anti-phishing blacklists are a critical first line of defense for end-users against phishing attacks, yet they remain vulnerable to sophisticated evasion techniques. Despite the well-documented shortcomings of these systems, limited research has delved into the specific vulnerabilities that allow phishers to bypass detection and persist in their deceptive practices. This study addresses this gap by investigating three novel, human-centric evasion mechanisms employed by phishing websites: Virtual Machine Detection (VMD)-based evasion, Cache-based evasion, and HTTP Client Hints (CHs)-based evasion. These techniques exploit behavioral differences between legitimate users and Anti-Phishing Entities (APEs) to avoid detection. Through real-world experiments using a scalable evaluation system, the effectiveness of these mechanisms was rigorously assessed. Results show that VMD and CHs-based evasion achieved a zero blacklisting rate on tested browsers, while Cache-based evasion reduced detection by 40%. A user study further validated the practical impact of these evasion techniques. Additionally, the proposed evaluation system demonstrated 100% accuracy in identifying blacklisted and non-blacklisted websites, maintaining consistent performance across varying workloads. This research introduces innovative cloaking techniques and a Combined Effectiveness Metric (CEM) to evaluate the resilience of APEs against such sophisticated phishing tactics. The findings contribute significantly to the development of more robust defenses, strengthening the ability of APEs and malicious Uniform Resource Locator (URL) scanners to detect and mitigate emerging phishing threats.
Article
In this paper, we present PhiShield, which is a spam filter system designed to offer real-time email collection and analysis at the end node. Before our work, most existing spam detection systems focused more on detection accuracy rather than usability and privacy. PhiShield is introduced to enhance both of these features by precisely choosing the deployment location where it achieves personalization and proactive defense. The PhiShield system is designed to allow enhanced compatibility and proactive phishing prevention for users. Phishield is implemented as a browser extension and is compatible with third-party email services such as Gmail. As it is implemented as a browser extension, it assesses emails before a user clicks on them. It offers proactive prevention for users by showing a personalized report, not the content of the phishing email, when a phishing email is detected. Therefore, it provides users with transparency surrounding phishing mechanisms and helps them mitigate phishing risks in practice. We test various locally trained Artificial Intelligence (AI)-based detection models and show that a Long Short-Term Memory (LSTM) model is suitable for practical phishing email detection (>98% accuracy rate) with a reasonable training cost. This means that an organization or user can develop their own private detection rules and supplementarily use the private rules in addition to the third-party email service. In this paper, we implement PhiShield to show the scalability and practicality of our solution and provide a performance evaluation of approximately 300,000 emails from various sources.
Conference Paper
Full-text available
This paper presents quantitative data about SMTP traffic to MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) based on packet traces taken in December 2000 and February 2004. These traces show that the volume of email has increased by 866% between 2000 and 2004. Local mail hosts utilizing black lists generated over 470,000 DNS lookups, which accounts for 14% of all DNS lookups that were observed on the border gateway of CSAIL on a given day in 2004. In comparison, DNS black list lookups accounted for merely 0.4% of lookups in December 2000. The distribution of the number of connections per remote spam source is Zipf-like in 2004, but not so in 2000. This suggests that black lists may be ineffective at fully stemming the tide of spam. We examined seven popular black lists and found that 80% of spam sources we identified are listed in some DNS black list. Some DNS black lists appear to be well-correlated with others, which should be considered when estimating the likelihood that a host is a spam source.
Conference Paper
Full-text available
Phishing attacks, in which criminals lure Internet users to websites that impersonate legitimate sites, are occ urring with increasing frequency and are causing considera ble harm to victims. In this paper we describe the desi gn and evaluation of an embedded training email system tha t teaches people about phishing during their normal u se of email. We conducted lab experiments contrasting the effectiveness of standard security notices about ph ishing with two embedded training designs we developed. We found that embedded training works better than the current practice of sending security notices. We also deriv ed sound design principles for embedded training systems.
Conference Paper
Full-text available
Many popular web browsers now include active phishing warnings since research has shown that passive warnings are often ignored. In this laboratory study we examine the effectiveness of these warnings and examine if, how, and why they fail users. We simulated a spear phishing attack to expose users to browser warnings. We found that 97% of our sixty participants fell for at least one of the phishing messages that we sent them. However, we also found that when presented with the active warnings, 79% of partici- pants heeded them, which was not the case for the passive warning that we tested—where only one participant heeded the warnings. Using a model from the warning sciences we analyzed how users perceive warning messages and offer suggestions for creating more effective phishing warnings. Author Keywords Phishing, warning messages, mental models, usable privacy and security
Conference Paper
Full-text available
Banks and other organisations deal with fraudulent phishing websites by pressing hosting service providers to remove the sites from the Internet. Until they are removed, the fraud- sters learn the passwords, personal identication numbers (PINs) and other personal details of the users who are fooled into visiting them. We analyse empirical data on phishing website removal times and the number of visitors that the websites attract, and conclude that website removal is part of the answer to phishing, but it is not fast enough to com- pletely mitigate the problem. The removal times have a good t to a lognormal distribution, but within the general pat- tern there is ample evidence that some service providers are faster than others at removing sites, and that some brands can get fraudulent sites removed more quickly. We particu- larly examine a major subset of phishing websites (operated by the 'rock-phish' gang) which accounts for around half of all phishing activity and whose architectural innovations have extended their average lifetime. Finally, we provide a ballpark estimate of the total loss being suered by the banking sector from the phishing websites we observed.
Article
We describe a browser extension, PwdHash, that transparently produces a different password for each site, improving web password security and de-fending against password phishing and other attacks. Since the browser extension applies a cryptographic hash function to a combination of the plaintext pass-word entered by the user, data associated with the web site, and (optionally) a private salt stored on the client machine, theft of the password received at one site will not yield a password that is useful at an-other site. While the scheme requires no changes on the server side, implementing this password method securely and transparently in a web browser exten-sion turns out to be quite difficult. We describe the challenges we faced in implementing PwdHash and some techniques that may be useful to anyone facing similar security issues in a browser environment.
Article
Phishing is form of identity theft that combines social engi-neering techniques and sophisticated attack vectors to har-vest financial information from unsuspecting consumers. Of-ten a phisher tries to lure her victim into clicking a URL pointing to a rogue page. In this paper, we focus on study-ing the structure of URLs employed in various phishing at-tacks. We find that it is often possible to tell whether or not a URL belongs to a phishing attack without requiring any knowledge of the corresponding page data. We describe several features that can be used to distinguish a phishing URL from a benign one. These features are used to model a logistic regression filter that is efficient and has a high accuracy. We use this filter to perform thorough measure-ments on several million URLs and quantify the prevalence of phishing on the Internet today.
Conference Paper
A client-side solution to protect users against web-based identity theft is presented in (CLTM04) by Chou et al., in the form of an Internet browser plug-in called SpoofGuard. This tool is aimed at average Internet users that do not take enough precautions in their browsing habits and could be easily tricked into giving out sensitive data about themselves. This tool takes into consideration several characteristics of typical web spoong attacks that have been found to successfully deceive users in the past to calculate a score in order to determine whether a page is authentic or a spoof. SpoofGuard is an easy to install plug-in that doesn't represent a big hit in performance for an average user, however it may not be so easy to congure and server-side eorts should still be done in order to better prevent web spoong.
Conference Paper
There are many applications available for phishing detec- tion. However, unlike predicting spam, there are only few studies that compare machine learning techniques in pre- dicting phishing. The present study compares the predictive accuracy of several machine learning methods including Lo- gistic Regression (LR), Classication and Regression Trees (CART), Bayesian Additive Regression Trees (BART), Sup- port Vector Machines (SVM), Random Forests (RF), and Neural Networks (NNet) for predicting phishing emails. A data set of 2889 phishing and legitimate emails is used in the comparative study. In addition, 43 features are used to train and test the classiers.
Conference Paper
Phishing is an electronic online identity theft in which the attackers use a combination of social engineering and web site spoofing techniques to trick a user into revealing confidential information. This information is typically used to make an illegal economic profit (e.g., by online banking transactions, purchase of goods using stolen credentials, etc.). Although simple, phishing attacks are remarkably effective. As a re- sult, the numbers of successful phishing attacks have been continuously increasing and many anti-phishing solutions have been proposed. One popular and widely-deployed solution is the integration of blacklist-based anti-phishing techniques into browsers. However, it is currently unclear how effective such blacklisting approaches are in mitigating phishing at- tacks in real-life. In this paper, we report our findings on analyzing the effectiveness of two popular anti-phishing solutions. Over a period of three weeks, we automatically tested the effectiveness of the blacklists maintained by Google and Microsoft with 10,000 phishing URLs. Fur- thermore, by analyzing a large number of phishing pages, we explored the existence of page properties that can be used to identify phishing pages.