Exploiting Predictability in Click-based Graphical Passwords∗
P.C. van Oorschot and Julie Thorpe†
We provide an in-depth study of the security of click-based graphical password schemes like PassPoints
(Weidenbeck et al., 2005), by exploring popular points (hot-spots), and examining strategies to predict
and exploit them in guessing attacks. We report on both short- and long-term user studies: one lab-
controlled, involving 43 users and 17 diverse images, the other a field test of 223 user accounts. We
provide empirical evidence that hot-spots do exist for many images, some more so than others. We
explore the use of “human-computation” (in this context, harvesting click-points from a small set of
users) to predict these hot-spots. We generate two “human-seeded” attacks based on this method: one
based on a first-order Markov model, another based on an independent probability model. Within 100
guesses, our first-order Markov model-based attack finds 4% of passwords in one image’s data set, and
10% of passwords in a second image’s data set. Our independent model-based attack finds 20% within
233guesses in one image’s data set and 36% within 231guesses in a second image’s data set. These are
all for a system whose full password space has cardinality 243. We also evaluate our first-order Markov
model-based attack with cross-validation of the field study data, which finds an average of 7-10% of user
passwords within 3 guesses. We also begin to explore some click-order pattern attacks, which we found
improve on our independent model-based attacks. Our results suggest that these graphical password
schemes (with parameters as originally proposed) are vulnerable to offline and online attacks, even on
systems that implement conservative lock-out policies.
Traditional text-based authentication suffers from a well-known limitation: many users tend to choose pass-
words that have predictable patterns, allowing for successful guessing attacks. As an alternative, graphical
passwords require a user to remember an image (or parts of an image) in place of a word. They have
been largely motivated by the well-known fact that people remember images better than words , and
implied promises that the password spaces of various image-based schemes are not only sufficiently large to
resist guessing attacks, but that the effective password spaces (from which users actually choose) are also
sufficiently large. The latter, however, is not well established.
Many different types of graphical passwords have been proposed to date; among the more popular ap-
proaches in the literature is PassPoints [47, 46, 45, 1, 13]. It and other click-based graphical password
schemes [2, 22, 38, 9, 5] require users to click on a sequence of points on one or more background images.
PassPoints usability studies have been performed to determine the optimal amount of error tolerance based
on click-point accuracy [46, 8], login and creation times, login error rates, memorability, and general percep-
tion [46, 47, 8]. An important remaining question for such schemes is: how secure are they? This issue has
previously remained largely unaddressed, despite speculation that the security of these schemes likely suffers
from hot-spots – areas of an image that are more probable than others for users to click.
The issue of whether hot-spots exist is tightly related to that of the security; if commonly preferred points
exist, then they could be exploited in a number of ways. We confirm the existence of hot-spots, and show
that some images are more susceptible to hot-spotting than others. Our work involves two user studies.
The first (lab) study used 17 diverse images. In the second (field) study, involving 223 user accounts over a
minimum of seven weeks, we explored two of these images in greater depth. We analyzed our lab study data
∗Manuscript received November 7, 2008; revised July 28, 2010; accepted August 4, 2010. Parts of this work appeared
previously in  and in the Ph.D. thesis  of the second author.
†Authors listed alphabetically. Contact author: Julie Thorpe (firstname.lastname@example.org). She is with the Faculty of Business and
Information Technology, University of Ontario Institute of Technology (UOIT), Oshawa, Ontario, Canada. P.C. van Oorschot
is with the School of Computer Science, Carleton University, Ottawa, Ontario, Canada, (e-mail: email@example.com).
using estimates of formal measures of security to make an informed decision of which two images to use in
the field study.
We explore how an attacker might predict the hot-spots we observed for use in an offline dictionary attack.
Rather than using image processing to predict hot-spots (see discussion under Related Work), we instead
use “human computation” , which relies on people to perform tasks that computers (at least currently)
find difficult. Human-computation can produce a human-computed data set; our human-computed data set
is our lab study data set, which effectively indexes the click-points that people would initially choose as part
of a password. We process this data set to determine a set of points that are more commonly preferred, to
create a human-seeded attack. A human-seeded attack can be generally defined as an attack generated by
using data collected from people.
We create three different predictive graphical dictionaries  (i.e., based on available information related
to the user’s login task, gathered from sources outside of the target password database itself, where a target
password database is the set of user passwords under attack): two based on different styles of human-seeded
attacks, and another based on click-order patterns. We evaluate these dictionaries, and also combined
human-seeded and click-order pattern attacks, using our field study data set. We also perform a 10-fold
cross-validation analysis with our field study database to train and test one style of human-seeded attack
(based on a first-order Markov model), providing a sense of how well an attacker might do with these methods
and an ideal human-computed data set for training.
Our contributions include an in-depth study of hot-spots in click-based (and cued-recall) graphical pass-
word schemes, and the impact of these hot-spots on security through two separate user studies. We ex-
plore predictive methods of generating attack dictionaries for click-based graphical passwords. Perhaps our
most interesting contribution is proposing and exploring the use of human-computation to create graphical
dictionaries; we conjecture that this method is generalizable to other types of graphical passwords (e.g.,
recognition-based) where users are given free choice.
The remainder of this paper proceeds as follows. Section 2 presents relevant background and terminology.
Section 3 describes our user studies and hot-spot analysis. Section 4 describes algorithms and methods for
creating predictive attacks. Section 5 presents results for all attacks examined herein. Section 6 discusses
related work, and we conclude with Section 7.
2Background and Terminology
Click-based graphical passwords require users to log in by clicking a sequence of points on one or more
background images. Many variations are possible (see Section 6), depending on the number of images and
what points a user is allowed to select. We study click-based graphical passwords by allowing clicks anywhere
on a single image (i.e., PassPoints-style). To allow password verification, user-entered passwords must be
encoded in some standard format to allow verification. Assuming that the encoding (e.g. robust discretization
 or centered discretization) is followed by some form of hashing to preclude trivial attacks, offline attacks
 are still possible if hashed values are intercepted by an attacker and can be used as verifiable text, or if
the attacker obtains a file of system-side verification values.
We use the following terminology. Assume a user chooses a given click-point c as part of his or her
password. The tolerable error or tolerance t is the error (in pixels) allowed for a click-point entered on a
subsequent login to be accepted as c. This defines a tolerance region (T-region) centered on c, which for
our experimental implementation using t = 9 pixels, is a 19 × 19 pixel square. A cluster is a set of one or
more click-points that lie within a T-region. Note that clusters arise when the data from multiple users is
combined, rather than a single user clicking multiple times in the same area. Our algorithm for computing
clusters is described in Section 3.2.1. The number of click-points falling within a cluster is its size. A
hot-spot is indicated by a cluster that is larger than expected by random choice, in an experiment which
produces click-points across a set of T-regions. To aid visualization and indicate relative sizes for clusters
of size at least two, on figures we sometimes represent the underlying cluster by a shaded circle or halo
with halo diameter proportional to its size (similar to population density diagrams). An alphabet is a set of
distinct T-regions; our experimental implementation, using 451×331 pixel images, results in an alphabet of
at least m = 414 non-overlapping T-regions. Using passwords composed of 5-clicks on an alphabet of size
414 provides the system with only 243entries in the full theoretical password space; however, increasing the
number of clicks, size of the image, and/or decreasing the tolerance square size would allow for comparable
security to traditional text passwords. We study an implementation with these particular parameters as they
are close to those used in other studies [45, 47] that show them to have acceptable usability.
3 User Studies
As mentioned, we conducted two user studies: a single session lab study with 43 users and 17 images, and a
long-term field study with 223 user accounts and two images. We use the lab study data as an indicator of
the degree of hot-spotting for each image, and as our human-computed data set. We use the field study data
to test our attacks. Further details of the lab and field studies are in Section 3.1, the hot-spotting results in
Section 3.2, and the user studies’ limitations are discussed in Section 3.3.
3.1 Experimental Methodology
We report on the methodology for the short-term lab study in Section 3.1.1 and the long-term field study in
3.1.1 Lab Study
Here we report the details of a university-approved 43-user study of click-based graphical passwords in
a controlled lab environment. Each user session was conducted individually and lasted about one hour.
Participants were all university students who were not studying (or experts in) computer security. Each user
was asked to create a click-based graphical password on 17 different images (most of these are reproduced
in Figures 1 and 11; others are available upon request). Four of the images are from a previous click-based
graphical password study by Wiedenbeck et al. ; the other 13 were selected to provide a range of values
based on two image processing measures that we expected to reflect the amount of detail: the number of
segments found from image segmentation  and the number of corners found from corner detection .
Seven of the 13 images were chosen to be those we “intuitively” believed would encourage fewer hot-spots;
this is in addition to the four chosen in earlier research by others  using intuition (no further details were
provided on their image selection methodology).
We implemented a browser-based lab tool for this study. Each user was provided a brief explanation of
what click-based graphical passwords are, and given two images to practice creating and confirming such
passwords. To keep the parameters as consistent as possible with previous usability experiments1of such
passwords , we used 5 click-points for each password, an image size of 451×331 pixels, and a 19×19 pixel
square of error tolerance. Wiedenbeck et al.  used a tolerance of 20 × 20, allowing 10 pixels of tolerated
error on one side and 9 on the other. For consistent error tolerance on all sides, we approximate this using
19×19. Users were instructed to choose a password by clicking on 5 points, with no two the same. Although
the software did not enforce this condition, subsequent analysis showed that the effect on the resulting cluster
sizes was negligible for all images except pcb. For pcb, considering all click-points produced 6 clusters of size
≥ 5, but counting at most one click from each user produced 3 clusters of size ≥ 5. We did not assume
a specific encoding scheme (e.g., robust discretization  or other grid-based methods ); the concept of
hot-spots and user choice of click-points is general enough to apply across all encoding schemes. To allow
for detailed analysis, we stored and compared the actual click-points.
Once users had a chance to practice a few passwords, the main part of the lab experiment began. For
each image, users were asked to create a click-based graphical password that they could remember but that
others will not be able to guess, and to pretend that it is protecting their bank information. After initial
creation, users were asked to confirm their password to ensure they could repeat their click-points. On
successful confirmation, users were given 3D mental rotation tasks  as a distractor for at least 30 seconds
(to remove the password from their visual working memory, and thus simulate the effect of the passage of
time). After this period of memory tasks, users were provided the image again and asked to log in using
their previously selected password. If users could not confirm after two failed attempts during password
creation/confirmation or log in after one failed attempt, they were permitted to reset their password for that
1The usability aspects of this study are reported separately .
image and try again. If users did not like the image and felt they could not create and remember a password
on it, they were permitted to skip the image. Only two of the 17 images had a significant number of skips:
paperclips and bee. This suggests some passwords for these images were not repeatable, and we suspect our
results for these images would show lower relative security in practice.
To avoid any dependence on the order of images presented, each user was presented a random (but
unique from other users) shuffled ordering of the 17 images used. Since most users did not make it through
all 17 images, the number of graphical passwords created per image ranged from 32 to 40, for the 43 users.
Two users had an inaccurate mouse, but we do not expect this to affect our present focus – the location of
selected click-points. This short-term lab study was intended to collect data on initial user choice; although
the mental rotation tasks work to remove the password from working memory, this study does not account
for any effect caused by password resets over time due to forgotten passwords. For this reason, we use the
long-term field study (Section 3.1.2) which does account for this effect, as the primary data set for testing
the success of our attack dictionaries.
3.1.2 Field Study
Here we describe a university-approved field study of 223 user accounts on two different background images.
We collected click-based graphical password data to evaluate the security of this style of graphical passwords
against various attacks. We used the entropy and expected guesses measures from our lab study to choose
two images that would apparently offer different levels of security (although both are highly detailed): pool
and cars (see Figure 1). The lab study showed that of the images used in previous studies , the pool
image had the closest to a middle ranking in terms of the amount of clustering (see Section 3.2.2). The
lab study also showed that the cars image had nearly the least amount of clustering among the 17 images
tested. Both images had a low number of skips in the lab study, indicating that they did not cause problems
for users with password creation. We chose the pool image so we had an image from previous studies and
also had an amount of clustering that was not extremely high or low (it was closest to the middle rank of
the images examined). We chose the cars image to give this scheme the best chance we could in terms of
(a) cars (originally from ). (b) pool (originally from [46, 47]).
Figure 1: Images used in lab study.
Our web-based implementation of PassPoints was used by three first-year undergraduate classes: two
first year courses for computer science students, and a first year course for non-computer science students
enrolled in a science degree. The students used the system for at least 7 weeks to gain access to their course
notes, tutorials, and assignment solutions. For comparison with previous usability studies, and our lab study,
we used an image size of 451 × 331 pixels. After the user entered their username and course, the screen
displayed their background image and a small black square above the image to indicate their tolerance square
size. For about half of users (for each image), a 19×19 T-region was used, and for the other half, a 13×13
T-region.2The system enforced that each password had 5 clicks and that no click-point was within t = 9
pixels of another (vertically and horizontally). Each user was assigned an image at random. To complete
initial password creation, users had to successfully confirm their password once. After initial creation, users
were permitted to reset their password at any time using a previously set secret question and answer.
Users were permitted to login from any machine (home, school, or other), and were provided an online
FAQ and help. The users were asked that they keep in mind that their click-points are a password, and that
while they will need to pick points they can remember, they should not pick points that someone else will be
able to guess. Each class was also provided a brief overview of the system, explaining that their click-points
in subsequent logins must be within the tolerance shown by a small square above the background image,
and that the input order matters. In order to have some confidence that the passwords we analyze have
some degree of memorability, we only use the final passwords created by each user that were demonstrated
as successfully recalled in at least one subsequent login (after the initial create and confirm). We also only
use data from 223 out of 378 accounts, as this was the number that provided explicit consent as required by
university policy. These 223 user accounts map to 189 distinct users as 34 users in our study belonged to
two classes; all but one of these users were assigned a different image for each account, and both accounts
for a given user were set to have the same error tolerance. Of the 223 user accounts, 114 used pool and 109
used cars as a background image.
3.2 Hot-Spot Results
We present the hot spots found in both the lab and field studies. How we compute hot-spots is described
in Section 3.2.1, as well as the hot-spots discovered in the lab study. A comparison of hot-spotting across
different lab study images is provided in Section 3.2.2. Finally, the hot-spots discovered in the field study
are presented in Section 3.2.3.
3.2.1 Hot-Spots Computed from Lab Study Data
We collected data from the in-lab study as described in Section 3.1.1, and used a clustering algorithm (see
below) to determine a set V of (non-empty) clusters and their sizes.
Clustering Algorithm. To calculate clusters (the size of which defines hot-spots) based on any user data
set of raw click-points, we assign all of the observed user click-points to clusters as follows. Let R be the raw
(unprocessed) set of click-points, M a list of temporary clusters, and V the final resulting set of clusters. M
and V are initially empty.
1. For each ck∈ R, let Ckbe a temporary cluster containing click-point ck. Temporarily assign all user
click-points in R within ck’s T-region to Ck. Add Ck to M. Each ck ∈ R can thus be temporarily
assigned to multiple clusters Ck.
2. Sort all clusters in M by size, in decreasing order.
3. Greedily make permanent assignments of click-points to clusters as follows. Let C? be the largest
cluster in M. Permanently assign each click-point ck∈ B? to C?, then delete each ck ∈ B?from all
other clusters in M. Delete C?from M, and add C?to V . Repeat until M is empty.
This process determines a set V of (non-empty) clusters and their sizes. We then calculate the observed
“probability”pj (based on our data set) of the cluster j being clicked, as cluster size divided by total clicks
To begin comparing the 17 images studied, Figure 2 shows the sizes of the top five most popular clusters,
and the total number of popular clusters.
Given the cluster sizes, we then calculate the observed“probability”pj(based on our user data set) of the
cluster j being clicked, as cluster size divided by total clicks observed. When the probability pjof a certain
cluster is sufficiently high, we can place a confidence interval around it for future populations (of users who
are similar in background to those in our study) using formula (1) as discussed below.
Each probability pj estimates the probability of a cluster being clicked for a single click. For 5-click
passwords, we approximate the probability that a user chooses cluster j in a password by 5pj. Note that the
2Analysis showed little difference between the points chosen for these different tolerance groups.