Asirra: a CAPTCHA that exploits interest-aligned manual image categorization.
ABSTRACT We present Asirra (Figure 1), a CAPTCHA that asks users to identify cats out of a set of 12 photographs of both cats and dogs. Asirra is easy for users; user studies indicate it can be solved by humans 99.6% of the time in under 30 seconds. Barring a major advance in machine vision, we expect computers will have no better than a 1/54,000 chance of solving it. Asirra’s image database is provided by a novel, mutually beneficial partnership with Petfinder.com. In exchange for the use of their three million images, we display an “adopt me” link beneath each one, promoting Petfinder’s primary mission of finding homes for homeless animals. We describe the design of Asirra, discuss threats to its security, and report early deployment experiences. We also describe two novel algorithms for amplifying the skill gap between humans and computers that can be used on many existing CAPTCHAs.
- SourceAvailable from: Ahmad Hassanat[Show abstract] [Hide abstract]
ABSTRACT: For the last ten years, CAPTCHAs have been widely used by websites to prevent their data being automatically updated by machines. By supposedly allowing only humans to do so, CAPTCHAs take advantage of the reverse Turing test (TT), knowing that humans are more intelligent than machines. Generally, CAPTCHAs have defeated machines, but things are changing rapidly as technology improves. Hence, advanced research into optical character recognition (OCR) is overtaking attempts to strengthen CAPTCHAs against machine-based attacks. This paper investigates the immunity of CAPTCHA, which was built on the failure of the TT. We show that some CAPTCHAs are easily broken using a simple OCR machine built for the purpose of this study. By reviewing other techniques, we show that even more difficult CAPTCHAs can be broken using advanced OCR machines. Current advances in OCR should enable machines to pass the TT in the image recognition domain, which is exactly where machines are seeking to overcome CAPTCHAs. We enhance traditional CAPTCHAs by employing not only characters, but also natural language and multiple objects within the same CAPTCHA. The proposed CAPTCHAs might be able to hold out against machines, at least until the advent of a machine that passes the TT completely.European Scientific Journal. 09/2014; 10(15).
- [Show abstract] [Hide abstract]
ABSTRACT: CAPTCHAs(Completely Automated Public Turing tests to tell Computer and Human Apart) have been widely used for preventing the automated attacks such as spam mails, DDoS attacks, etc.. In the early stages, the text-based CAPTCHAs that were made by distorting random characters were mainly used for frustrating automated-bots. Many researches, however, showed that the text-based CAPTCHAs were breakable via AI or image processing techniques. Due to the reason, the image-based CAPTCHAs, which employ images instead of texts, have been considered and suggested. In many image-based CAPTCHAs, however, the huge number of source images are required to guarantee a fair level of security. In 2008, Kang et al. suggested a new image-based CAPTCHA that uses test images made by composing multiple source images, to reduce the number of source images while it guarantees the security level. In their paper, the authors showed the convenience of their CAPTCHA in use through the use study, but they did not verify its security level. In this paper, we verify the security of the image-based CAPTCHA suggested by Kang et al. by performing several attacks in various scenarios and consider other possible attacks that can happen in the real world.Journal of the Korea Institute of Information Security and Cryptology. 01/2012; 22(4).
- [Show abstract] [Hide abstract]
ABSTRACT: Telephony systems are imperative for information exchange offering low cost services and reachability to millions of customers. They have not only benefited legitimate users but have also opened up a convenient communication medium for spammers. Voice spam is often encountered on telephony systems in various forms, such as by means of an automated telemarketing call asking to call a number to win a reward. A large percentage of voice spam is generated through automated system which introduces the classical challenge of distinguishing machines from humans on telephony systems. CAPTCHA is a conventional solution deployed on the web to address this problem. Audio-based CAPTCHAs have been proposed as a solution to curb voice spam. In this paper, we conducted a field study with 90 participants in order to answer two primary research questions: quantifying the amount of inconvenience telephony-based CAPTCHA may cause to users, and how various features of the CAPTCHA, such as duration and size, influence usability of telephony-based CAPTCHA. Our results suggest that currently proposed CAPTCHAs are far from usable, with very low solving accuracies, high solving times and poor overall user experience. We provide certain guidelines that may help improve existing CAPTCHAs for use in telephony systems.The 16th Information Security Conference, Dallas, Texas; 11/2013
Asirra: A CAPTCHA that Exploits
Interest-Aligned Manual Image Categorization
Jeremy Elson, John R. Douceur, Jon Howell
We present Asirra (Figure 1), a CAPTCHA that asks users to iden-
tify cats out of a set of 12 photographs of both cats and dogs. Asirra
is easy for users; user studies indicate it can be solved by humans
99.6% of the time in under 30 seconds. Barring a major advance
in machine vision, we expect computers will have no better than a
1/54,000 chance of solving it. Asirra’s image database is provided
by a novel, mutually beneficial partnership with Petfinder.com. In
exchange for the use of their three million images, we display an
“adopt me” link beneath each one, promoting Petfinder’s primary
mission of finding homes for homeless animals. We describe the
design of Asirra, discuss threats to its security, and report early de-
ployment experiences. We also describe two novel algorithms for
amplifying the skill gap between humans and computers that can
be used on many existing CAPTCHAs.
Over the past few years, an increasing number of public web
services have attempted to prevent exploitation by bots and auto-
mated scripts, by requiring a user to solve a Turing-test challenge
(commonly known as a CAPTCHA1or HIP2) before using the ser-
vice. Because the challenges must be easy to generate but diffi-
cult (for non-humans) to solve, all CAPTCHAs rely on some secret
information that is known to the challenger but not to the agent
being challenged. For our purposes, we can divide CAPTCHAs
into two classes depending on the scope of this secret. In Class I
CAPTCHAs, the secret is merely a random number, which is fed
into a publicly known algorithm to yield a challenge, somewhat
analogous to a public-key cryptosystem. Class II CAPTCHAs em-
ploy both a secret random input and a secret high-entropy database,
somewhat analogous to a one-time-pad cryptosystem. A critical
with a sufficiently large set of classified, high-entropy entries.
Class I CAPTCHAs have many virtues. They can be concisely
described in a small amount of software code; they have no long-
1“Completely Automated Public Turing test to tell Computers and Humans
Apart.” CAPTCHA is a trademark of Carnegie Mellon University.
2“Human Interaction Proof”
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
CCS’07, October 29–November 2, 2007, Alexandria, Virginia,
Copyright 2007 ACM 978-1-59593-703-2/07/0011 ...$5.00.
Figure 1: An Asirra challenge. The user selects each of the 12 images
that depict cats. As the mouse is hovered over each thumbnail, a larger
image and “Adopt me” link appear. “Adopt me” first invalidates the
challenge, then takes the user to that animal’s page on Petfinder.com.
term secret that requires guarding; and they can generate a prac-
tically unbounded set of unique challenges. On the other hand,
their most common realization—a challenge to recognize distorted
text—evince a disturbingly narrow gap between human and non-
human success rates. Optical character recognition algorithms are
competitive with humans in recognizing distinct characters, which
has led researchers toward increasing the difficulty of segmenting
an image into distinct character regions . However, this in-
crease in difficulty affects humans as well. Although laboratory
experiments suggest that humans can segment text characters accu-
rately , CAPTCHAs deployed on commercial public web sites
continue to use cleanly segmented challenges (e.g., Fig. 2a), some
with difficult segmentation challenges (Fig. 2c). The owners of
commercial web sites have apparently decided that a user’s success
at navigating a CAPTCHA depends not only on whether they are
able to solve the challenge, but also on whether they are willing
to put forth the effort. Informal discussions with MSN and other
web site owners suggest even relatively simple challenges can drive
away a substantial number of potential customers.
Class II CAPTCHAs have the potential to overcome the main
weaknesses described above. Because they are not restricted to
challenges that can be generated by a low-entropy algorithm, they
Figure 2: A gallery of text CAPTCHAs. Simple text challenges, such
as a (register.com), are still common despite recent defeat by optical
character recognition. Researchers have begun to focus on schemes
that make letter segmentation difficult, as seen in b (Carnegie Mellon
) and c (Microsoft Research ). Webmasters, wary of what users
will tolerate, dial back researchers’ noise parameters, seen in d (Mi-
crosoft Hotmail) and e (Yahoo! Mail).
can exercise a much broader range of human ability, such as recog-
nizing features of photographic images captured from the physical
world. Such challenges evince a broad gulf between human and
non-human success rates, not only because general machine vision
is a much harder problem than text recognition, but also because
image-based challenges can be made less bothersome to humans
without drastically degrading their efficacy at blocking automatons.
A significant issue in building a Class II CAPTCHA is popu-
lating the secret database. Existing approaches take one of two
directions: (a) mining a public database or (b) providing entertain-
ment as an incentive for manual image categorization. Examples
of the first group include the seminal work by Chew and Tygar ,
which used Google Image Search ; hotcaptcha , which ref-
erences the HotOrNot database ; and KittenAuth , which
draws images from Wikimedia Commons . A problem with
these approaches is that the public source of categorized images is
small or available to attackers, so a small, fixed amount of effort
spent reconstructing the private database can return the ability to
solve an unbounded number of challenges. The second direction
was pioneered by the ESP-PIX CAPTCHA , whose database is
populated as a deliberate side effect of playing the ESP Game ,
a very clever mechanism for enticing people to label images accu-
rately. Although potentially powerful, it is not yet clear whether
this approach will yield a sufficiently large set of categorized im-
ages. Furthermore, many of the images in the current implemen-
tation are rather abstract, which may make the challenge difficult
enough to drive away users.
In this paper, we present a new direction for populating image
databases for Class II CAPTCHAs, namely re-purposing a large,
continually evolving, private database of images that are manually
categorized. Although this may seem trivial, it is not a priori clear
why the owner of such a database would be willing to release the
images for use in Turing-test challenges. The answer is that there
canexist—and, inatleastoneinstance, doesexist—analignmentof
interests between a database owner and web-service owners wish-
ing to secure their sites. Both parties can benefit from selective,
wide-scale display of categorized images: the latter for security
and the former for advertising.
We present Asirra3, a CAPTCHA that asks users to categorize
photographs depicting either cats or dogs. An example is shown in
Figure 1. Asirra’s strength comes from an innovative partnership
with Petfinder.com , the world’s largest web site devoted to find-
ing homes for homeless animals. Petfinder has a database of over
three million cat and dog images, each of which is categorized with
very high accuracy by human volunteers working in thousands of
animal shelters throughout the United States and Canada. Petfinder
has granted ongoing access to its database, which grows by nearly
10,000 images daily, to the Asirra project. In exchange, Asirra
provides a small “Adopt me” link beneath each photo, promoting
Petfinder’s primary mission of exposing adoptable pets to potential
new owners. This partnership is mutually beneficial, and also pro-
duces the dual social benefits of improving computer security and
This paper describes Asirra and an analysis of its strengths and
weaknesses. We also report our deployment experience, and the
results of two user studies involving 332 test subjects.
Asirra is easy for users; it can be solved by humans 99.6% of the
time in under 30 seconds (Section 6, Table 1). Barring a major ad-
vance in machine vision or compromise of our database, we expect
computers will have no better than a 1/54,000 chance of solving it
(Section 6, Table 2). Anecdotally, users seem to find the experience
of using Asirra much more enjoyable than a text-based CAPTCHA
that provides equal security.
The organization of this paper is as follows. In Section 2, we
review related work in more detail. In Section 3, we describe the
design of Asirra. §3.1 describes user experiments we performed
to quantify humans’ performance. §3.2 explores the other side of
the equation—potential attacks on Asirra, and how they can be re-
sisted. We developed two algorithms that can be used to improve
virtually all CAPTCHAs, including those that are text-based; these
improvements are described in Section 4. In Section 5 we describe
our scalable Asirra implementation. Finally, in Section 6, we sum-
marize our contributions and offer conclusions.
Asirra is available free at www.asirra.com.
2. RELATED WORK
Since the concept of a CAPTCHA was widely introduced by von
Ahn in 2000 , hundreds of design variations have appeared. By
far, most are text-based: The computer generates a challenge by se-
lecting a sequence of letters, rendering them, distorting the image,
and adding noise. Text CAPTCHAs are popular because they are
simple, small, and easy to design and implement. Challenges as
short as four characters are robust against random guessing; there
are 364≈ 1.7 million possible four-character challenges consisting
of case-insensitive letters and digits.
Unfortunately, computers can do far better than guess randomly.
Simard et al. showed that Optical Character Recognition (OCR)
can achieve human-like accuracy, even when letters are distorted,
as long as the image can be reliably segmented into its constituent
letters . Mori and Malik demonstrated that von Ahn’s original
GIMPY CAPTCHA  can be solved automatically 92% of the
Consequently, recent text-based CAPTCHAs have focused on
making image segmentation difficult. Figure 2c shows a challenge
designed by Chellapilla et al., who claim it is hard for OCR be-
cause the noise confounds known segmentation techniques . Mi-
crosoft’s Hotmail (free email service) deployed it; however, due
3“Animal Species Image Recognition for Restricting Access”
to usability concerns, they later selected noise parameters demon-
strated in Figure 2d. Yahoo’s current CAPTCHA, shown in Fig-
ure 2e, seems to have suffered a similar fate. The noise is not suffi-
seem either too easy to be secure, or too difficult to be tolerated by
2.1Image Classification CAPTCHAs
Text-based CAPTCHAs seem to universally suffer from an un-
fortunate property: Making them hard for computers also makes
them hard for humans. This has led some researchers to use pho-
tographs as CAPTCHAs instead. Because general machine vision
is a much harder problem than character recognition, there are op-
portunities to find and exploit larger gaps in the capabilities of hu-
mans and computers.
Chew and Tygar  were among the first to describe using la-
belled photographs to generate a CAPTCHA. They generated a
database of labelled images by feeding a list of easily-illustrated
words to Google Image Search . Unfortunately, this technique
does not yield well-classified results due to Google’s method of in-
ferring photo contents based on surrounding descriptive text. To
use Chew and Tygar’s example, the word pumpkin may refer to ei-
ther a large vegetable or someone’s pet cat Pumpkin. Because of
these errors, they manually cull bad images from their collection.
This is devastating to the security of the scheme. A database small
enough to be manually constructed by researchers is also small
enough to be manually re-constructed by an attacker.
Of course, applying automation to database construction is in-
herently problematic. If a researcher can populate a database by
writing a program to automatically classify images, an attacker can
write a program to beat the CAPTCHA by performing the same
A novel solution to this problem is described by von Ahn et al.:
They were able to entice humans to manually describe images by
framing the task as a game. Their “ESP Game” awards points to
teams of non-communicating players who can both pick the same
label for a random image, encouraging them to use the most obvi-
ous label . Their PIX CAPTCHA displays four images from
the ESP Game database that have the same label, then challenges
the user to guess the label from a menu of 70 possibilities.
PIX is clever, but has several potential problems. First, its scale
seems insufficient. By solving PIX repeatedly, it is not hard to get
repeated images, making the database easy to reconstruct by an
attacker. However, perhaps more fundamental, it has a fixed menu
of only 70 object classes. This makes it a potential target for brute
force attacks (though potentially defensible using our token bucket
scheme; see Section 4.2). Even with a large number of categorized
images, it may be difficult to add a large number of classes. As the
number of classes goes up, so does the number of words that could
reasonably be used to describe a set of photos. Finally, PIX photos
are sometimes abstract, making it potentially difficult or frustrating
as a CAPTCHA.
A fascinating use of a large-scale human-generated database is
the site HotCaptcha.com. HotCaptcha displays nine photographs
of people and asks users to select the three which are “hot.” Its
database comes from HotOrNot.com, a popular web site that in-
vites users to post photos of themselves and rate others’ photos as
“hot” or “not.” HotCaptcha is clever in its use of a pre-existing mo-
tivation for humans to classify photos at a large scale. However,
humans may have difficulty solving it because the answers are sub-
jective and culturally relative; beauty has no ground truth. It is also
offensive to many people, making it difficult for serious web sites
Finally, worthy of mention is the similar-seeming KittenAuth
project . Like Asirra, KittenAuth authenticates users by asking
them to identify photos of kittens. However, this is a coincidental
and superficial similarity. KittenAuth is trivial to defeat because it
is has a database of less than 100 manually selected kitten photos.
An attacker can (indeed, already has ) expose the database by
manually solving the KittenAuth challenge a few dozen times. An
arbitrary number of challenges can then be solved using an image
comparator robust to simple image distortions.
Asirra surmounts the image-generation problem in a novel way:
by forming a partnership with Petfinder.com , the world’s largest
web site devoted to finding homes for homeless pets. Asirra gen-
erates challenges by displaying 12 images from a database of over
or dogs. Nearly 10,000 more are added every day by volunteers at
animal shelters throughout the United States and Canada. The size
and accuracy of this database is fundamental to the security pro-
vided by Asirra; it is what differentiates our work from previously
proposed image-based CAPTCHAs.
In exchange for access to Petfinder’s database, Asirra provides
an unobtrusive “Adopt me” link beneath each photo. This promotes
Petfinder’s primary mission of exposing adoptable pets to potential
new owners. To maximize the probability of successful adoptions,
Asirra will employ IP geolocation to determine the user’s approx-
imate region, and preferentially displays pets that are nearby. The
security implications of this feature are discussed in Section 3.2.1.
Asirra has several attractive features:
• Humans can solve it quickly (§3.1.2) and accurately (§3.1.3).
• Computers can not solve it easily (§3.2).
• Unlike many image-based CAPTCHAs which are abstract
or subjective, Asirra’s challenges are concrete, inoffensive
(cute, by some accounts), require no specialized or cultur-
ally biased knowledge, and have definite ground truth. This
makes Asirra less frustrating for humans. Some beta-testers
found it fun. The four-year-old child of one asked several
times to “play the cat and dog game again.”
• It promotes an additional social benefit: finding homes for
Asirra also has several disadvantages:
• Most CAPTCHAs are implemented as stand-alone program
libraries that can be integrated into a web site without intro-
ducing external dependencies. In contrast, Asirra, like PIX,
is both an algorithm and a database; there is only one in-
stance of it. Therefore, Asirra must be implemented as an
administratively centralized web service that is used to gen-
erate and verify CAPTCHAs on-demand for every site that
wishes to use it. (Our scalable implementation is described
in Section 5.)
• Asirra may abruptly lose its security if the database is com-
promised. For example, an attacker may hire cheap labor to
classify all three million images. If this does happen, we may
not even be aware of the attack.
• A typical Asirra challenge requires more screen space than a
traditional text-based CAPTCHA.
• Like virtually all other CAPTCHAs, Asirra is not accessible
to those with visual impairments (§3.3).
Image Area (pixels2)
Average Accuracy as a function of Image Area
Figure 3: Size of an individual image vs. classification accuracy in our
ies. We also collected data from our live Asirra deployment, which
is described in Section 5.
The user studies, in total, displayed 23,208 cat and dog images to
332 test subjects. The subjects were Microsoft employees, friends,
and family, recruited via postings to internal mailing lists. Our
experiment displayed a random sequence of cat and dog images,
one at a time. Users were asked to identify the species depicted in
each image as it appeared. We used 300 random images from the
Petfinder database, each scaled to a random size as described in the
next section. The users’ response time and accuracy were recorded.
To match the conditions of a real CAPTCHA, we implemented the
asked users to participate from their own home or office comput-
The real Asirra web service went live in March of 2007. In the
subsequent 6 weeks, it served about 100,000 challenges. Most of
these were from our own web page that demonstrates Asirra. How-
ever, several dozen web sites (blogs, free services, etc.) have ex-
perimented with integrating Asirra and were responsible for about
13,000 “real” challenges.
In following sections, we characterize various aspects of Asirra’s
performance, based on both the outcome of our user experiments
and data from our deployment.
play?” We expected larger images to exhibit higher accuracy and
faster response time. However, smaller images have the advantage
of taking up less screen space, making Asirra easier to integrate vi-
sually with the rest of a web page. Smaller images are also faster to
download, which especially important for users with slow Internet
To find the best image size, our first user experiment randomly
varied the size of the images from a minimum of 225 total pixels
(about 15x15 pixels square) to a maximum of 30,000 pixels (about
175x175 pixels square). We used total pixels instead of linear di-
mension as our metric because most images are not square. We
collected data from 18,311 displays of images to 185 users.
Figure 3 plots image size vs. average accuracy. Each graph seg-
pixels seems to be the sweet spot: larger images show no improve-
ment. (We quantify classification accuracy in Section 3.1.3).
Best Image Size
0 5000 100001500020000 25000 30000
Median Response Time (ms)
Image Area (pixels2)
Median Response Time as a function of Image Area
Figure 4: Size of an individual image vs. time required to classify it in
our test population.
We also used our first experiment to evaluate the effect of image
size on response time. Figure 4 shows the results. We plot the me-
dian response time because it is robust against outliers (e.g., when a
subject receives a phone call during the experiment). 10,000 pixel
images are typically classified in about 900msec. There does seem
to be a slight (50msec) speed benefit to displaying images larger
than 10,000 pixels. It is also interesting to note from Figures 3 and
4 that smaller images cause degradation in both response time and
accuracy. That is, users spend longer looking at an image but still
end up getting it wrong.
Budgeting 900msec per image, plus a few extra seconds to un-
solving a single, 12-image Asirra challenge.
Typical Response Time
After our first experiment made the best image size clear, we ran
a second user experiment focusing exclusively on images scaled to
10,000 pixels. Our goal was to collect a large number of samples
at our target image size. We collected data from 4,717 displays of
images to 147 users. The overall accuracy rate was 98.5%.
The Asirra challenge has 12 images. Based on a 98.5% per-
image accuracy, 83.4% of users should be able to pass Asirra after
one challenge, 97.2% after two challenges, and 99.5% after three.
However, Asirra uses a novel scheme called the Partial Credit Al-
gorithm, described in Section 4.1, which significantly increases the
probability of users passing Asirra while only marginally improv-
ing the yield of a bot. With PCA, we expect 99.6% of users will
solve Asirra after two challenges.
Extracting a good estimate of users’ error rates in the wild is
somewhat more difficult. In our controlled experiment, users were
focused exclusively on image categorization, so errors were clearly
attributable to task difficulty. In contrast, the deployed CAPTCHA
is integrated into web forms with many other fields and informa-
tion. Any click of the form’s “Submit” button causes Asirra to
score the challenge, even if the user had some other intent in mind;
26% of scoring requests received by our server had no response
at all (no cats selected). Of non-null responses scored, 66% were
scored as correct. Note that in this accounting, a user who success-
fully solves Asirra on her second try counts as 50% accuracy (one
right plus one wrong response). In addition, since Asirra is new,
users may sometimes be tinkering: “Oh, what happens if I get one
Figure 5: Though rare, some images at Petfinder.com are confusing
to humans. (left) A photo depicting a cat and dog together. (right) A
The classification errors made by users are not uniformly dis-
tributed: some images are confusing, even to humans. Figure 5
shows two examples.
If Asirra achieves only modest popularity, our accuracy figures
are likely to stay close to those described in the previous section.
However, if Asirra receives widespread use (tens of thousands of
challenges generated per day), we will have enough data to make
automatic, statistically significant judgements about confusing im-
ages. There will be cases when a user ultimately succeeds in pass-
ing Asirra, but may have made errors along the way. Once Asirra
decides a user is human, it will be able to mark the previously-
incorrect images as “possibly confusing.” Images that are marked
beyond a threshold can then be removed from the database.
Automatic Database Pruning
In both user experiments, we placed a small “Adopt me” link
below the cat or dog displayed in each trial. We did not give users
any instructions regarding this link or tell them what it did. Our
goal was to test how often Internet users involved in some unrelated
task would be motivated (by curiosity, cuteness of a photo, etc.)
to click it. This test is important because it suggests whether our
partnership with Petfinder is viable.
In our controlled experiments, 27 users (7.5% of the test popula-
tion) clicked “Adopt me”. One of our beta-testers adopted a beagle.
The public’s interest in the link has been lower. In the 6 weeks
following Asirra’s release, we issued 13,334 “real” challenges—
that is, integrated with pages other than our own demonstration
pages. 279, or 2.1%, led to clicks on “Adopt me”.
We discuss the security implications of “Adopt me” in §3.2.1.
Before discussing the security of Asirra, it is useful to review
the threat model. CAPTCHAs are an unusual area of security in
that we are not trying to provide absolute guarantees, only slow
down attackers. By definition, anyone can “break” a CAPTCHA
by devoting a small amount of human effort to it. A CAPTCHA
therefore is successful, essentially, if it forces an automated attack
to cost more than 30 seconds worth of a human’s time in a part
of the world where labor is cheap. The generally accepted figure
in the literature [13, 2] seems to be that CAPTCHAs should admit
bots less than with less than 1/10,000 probability.
Interest in the “adopt me” link
The most common first question we get when people see Asirra
is, “Doesn’t the adopt-me link defeat your security?” This mis-
conception is understandable; each pet’s page on Petfinder.com de-
Attacks on “Adopt Me”
scribes it as being either a cat or dog. However, this is not ac-
tually a security hole. The adopt-me links are not direct links to
Petfinder.com; they lead to the Asirra web service. Asirra provides
a redirection to Petfinder.com only after it has marked the challenge
as invalid. (A new challenge is then fetched and displayed.) Asirra
rejects attempts to solve invalidated challenges. In addition, Asirra
only permits redirection for a single adopt-me link per 12-image
challenge. The number of allowed redirections per IP address per
day is also limited, to prevent adopt-me from becoming a vector for
revealing large portions of the database.
Adoption links also have a second, more subtle effect on secu-
rity. Currently, the images shown as a challenge are selected at
random from the entire database of pet images. To maximize its
utility to Petfinder, we would ultimately like to restrict the chal-
lenge to pets that are close to the user based on IP geolocation,
and currently available for adoption. These constraints reduce the
usable database for a given user to just a few thousand images,
small enough to be easily exploitable. Our plan, therefore, is to
allow the first few challenges per day from an IP address to use
the restricted database; subsequent challenges will be drawn from
the complete collection. We have not yet implemented this feature,
so our challenges currently draw images from Petfinder’s complete
image pool, including the history of pets previously available for
The simplest attack on Asirra is brute force: Give a random so-
lution to challenges until one succeeds. If an attacker has no basis
for a decision (that is, a 50% probability of success for each image)
brute force will succeed with probability 1/4,096 for a 12-image
challenge. This is a large enough slowdown that it becomes easy
to detect and evade such an attack. We use a token bucket scheme,
described in detail in Section 4.2. Briefly, the scheme penalizes IP
addresses that get many successive wrong answers by forcing them
to answer two challenges correctly within 3 attempts before gain-
ing a ticket. With this scheme in place, attackers can expect only
one service ticket per 5.6 million guesses.
While random guessing is the easiest form of attack, various
forms of image recognition can allow an attacker to make guesses
that are better than random. Asirra’s strength, however, comes not
only in the size of its image database, but its diversity. Photos
have a wide variety of backgrounds, angles, poses, lighting, and
so forth—factors that make accurate automatic classification diffi-
cult. As Figure 6 demonstrates, the variations between photos are
large, but visual differences between cats and dogs are often subtle.
Based on a survey of machine vision literature and vision ex-
perts at Microsoft Research, we believe classification accuracy of
better than 60% will be difficult without a significant advance in
the state of the art. For example, the best known algorithms for fa-
cial recognition can achieve in excess of 90% accuracy under con-
trolled conditions, but are not robust to occlusions and variations
in pose similar to those seen in our database [5, 18]. The 2006
PASCAL Visual Object Classes Challenge  included a competi-
tion to identify photos as containing several classes of objects, two
of which were Cat and Dog. Although cats and dogs were eas-
ily distinguishable from other classes (e.g., “bicycle”), they were
frequently confused with each other.
One attack on our database proposed by vision researchers was
color histogram analysis. We implemented and tested this tech-
nique. We first characterized each of 150 random images as a 15-
feature color vector, computing histograms of each fifth of the red,
Machine Vision Attacks