ArticlePDF Available

Abstract and Figures

CAPTCHAs are automated tests that are there to avoid misuse of computing and information resources by bots. Typical text-based CAPTCHAs are proven to be vulnerable against malicious automated programs. In this paper, we present an image-based CAPTCHA scheme using easily identifiable human appearance characteristics that overcomes the weaknesses of current text-based schemes. We propose and evaluate two applications for our scheme involving 25 participants. Both applications use same characteristics but different classes against those characteristics. Application 1 is optimized for security while application 2 is optimized for usability. Experimental evaluation shows promising results having 83% human success rate with Application 2 as compared to 62% with Application 1.
Content may be subject to copyright.
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. X, NO. X, XXX 201X 1
Copyright  2011 KSII
An Image-Based CAPTCHA Scheme Exploiting
Human Appearance Characteristics
Sajida Kalsoom1, Sheikh Ziauddin1 and Abdul Rehman Abbasi2
1 Department of Computer Science,
COMSATS Institute of Information Technology,
Park Road, Islamabad, Pakistan
[e-mail: {sajida.kalsoom, sheikh.ziauddin}@comsats.edu.pk]
2 Advanced Computing Laboratory
Karachi Institute of Power Engineering (KINPOE)
Paradise Point, Karachi, Pakistan
[e-mail: abdulrehman.abbasi@ait.ac.th]
*Corresponding author: Sajida Kalsoom
Abstract
CAPTCHAs are automated tests that are there to avoid misuse of computing and information
resources by bots. Typical text-based CAPTCHAs are proven to be vulnerable against
malicious automated programs. In this paper, we present an image-based CAPTCHA scheme
using easily identifiable human appearance characteristics that overcomes the weaknesses of
current text-based schemes. We propose and evaluate two applications for our scheme
involving 25 participants. Both applications use same characteristics but different classes
against those characteristics. Application 1 is optimized for security while application 2 is
optimized for usability. Experimental evaluation shows promising results having 83% human
success rate with Application 2 as compared to 62% with Application 1.
Keywords: Human Interactive Proof, CAPTCHA, Usability, Security, Human appearance
characteristics
2 Kalsoom et al.: An Image-Based CAPTCHA Scheme Exploiting Human Appearance Characteristics
1. Introduction
Human Interactive Proofs (HIPs) are used to differentiate between the human users and
automated programs by requiring some kind of interaction from a user which is difficult for a
program to imitate. CAPTCHA (Completely Automated Public Turing test to tell Computers
and Humans Apart) is a class of HIPs which involve challenge-response tests that are there to
verify that response is from human and not machine. The process involves a computer
program that generates and grades a test, and a user that submits the response (i.e. solves the
test). If the response is correct, the user is classified as human. The main purpose of
CAPTCHA is to prevent web services from automated scripting-based attacks. There are
many practical applications of CAPTCHAs including preventing computer scripts from:
voting in online polls, creation of email accounts, posting messages in discussion forums, and
misuse of online shopping and online games to name a few.
A typical text-based CAPTCHA scheme consists of a query asking the user to read and type
a randomly-generated character and/or numeric sequence. If the user correctly inputs the
characters and/or numerals, he or she passes the test and authorized as human user, otherwise
access is denied. However, such schemes are highly vulnerable to known attacks such as
optical character recognition (OCR) based attacks, dictionary attacks and segmentation
attacks. An alternative to the above-mentioned scheme is an image-based CAPTCHA scheme
which is not vulnerable to such attacks. In this paper, we propose and evaluate an image-based
CAPTCHA scheme using easily identifiable human appearance characteristics.
In our proposed method, the challenge consists of a randomly selected image from the
database and the user is asked to associate characteristics to the displayed image from an
available list of characteristics. To pass the test, the user has to associate all (five in our case)
characteristics correctly. We test the proposed method with two applications to evaluate
usability, compared to security feature and report our results in terms of their strengths and
weaknesses.
The rest of the paper is organized as follows. In Section 2, we review the related work. In
Section 3, we describe the proposed method. Next, we provide an experimental evaluation in
Section 4. Security of the scheme is analyzed in Section 5. Some limitations and their potential
solutions are discussed in Section 6 and finally we conclude the paper in Section 7.
2. Related Work
The first known practical CAPTCHA was the one designed by Broder (though they did not use
the term CAPTCHA in their work) to prevent automatic URL submission in Alta Vista search
queries [1]. Their CAPTCHA consisted of an image of text which is easy to read by average
humans but is difficult to read by OCR software. The first use of the term CAPTCHA is due to
Blum, Ahn, and Langford at Carnegie-Mellon University team [2] who developed the Gimpy
CAPTCHA to prevent unauthorized advertising in Yahoo’s online chat rooms. The challenge
consists of multiple words containing different kind of distortions and clutters making it
difficult for OCRs to read the text. A Gimpy challenge consists of 10 words and the user has to
recognize 3 out of these 10 words to pass the test. Yahoo later on started using a simpler
version of Gimpy named EZ-Gimpy. In EZ-Gimpy, the challenge is a single distorted word
and the user has to type that word correctly. Figure 1 and Figure 2 show sample Gimpy and
EZ-Gimpy challenges, respectively.
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. X, NO. X, XXX 201X 3
Currently there exist a number of CAPTCHA schemes that can be categorized as
text-based, speech/audio-based, image-based and video-based. Each one has its own merits
and demerits. In the continued section, we review the literature for a number of such schemes
with a focus on image-based CAPTCHAs. Pessimal print [3] is a text-based CAPTCHA
scheme. The challenge consists of a low quality text-image that is readable by human but
difficult for machine to read. They have selected medium length words (five to eight
characters) to prevent template matching and feature-based attacks (as shorter words are good
candidates for template matching attacks and longer words are more vulnerable to feature
based attacks). A word is randomly selected from a list then typeface and image-degradation
parameters are randomly selected from a list independently. They used Baird degradation
model [4] to degrade the words. Results show that the words were human readable but OCR
software was unable to read any of the words. Chew and Baird [5] proposed a text-based
CAPTCHA named BaffleText based on human's Gestalt perception abilities. They used
pronounceable non-dictionary words to prevent dictionary attacks and image-masking
degradation to defend image restoration attack. Their experimental evaluation shows a human
success rate of 79%.
Figure 1. A sample Gimpy challenge (Courtesy Greg Mori [6]). The challenge contains 10 distorted
words. The user has to identify any 3 words correctly to pass the challenge.
Figure 2. A sample EZ-Gimpy challenge presented to the user while creating a new Yahoo email
account. The user has to type all characters correctly to pass the challenge.
A text-based CAPTCHA named ScatterType [7] was presented in 2005 to withstand
segmentation attacks. In ScatterType, the text-images are pseudo-randomly generated which
are fragmented using horizontal and vertical cuts and scattered using horizontal and vertical
displacements. The text strings are English-like but non-dictionary words to resist lexical
attacks. A user study is conducted with 57 users who submitted 4275 inputs. The experiment
was conducted at different levels of difficulty. Results show human pass rate of 7.7% for most
difficult level and 81.3% for easiest level with an average success rate of 53%.
Most of the conventional text-based CAPTCHAs are vulnerable to known attacks, e.g.,
dictionary attacks, OCR attacks, segmentation attacks, etc. A number of text-based
CAPTCHAs have been broken using AI attacks [8, 9, 10, 11]. Additionally, text-based
4 Kalsoom et al.: An Image-Based CAPTCHA Scheme Exploiting Human Appearance Characteristics
CAPTCHAs suffer from universality problem, i.e., they are language dependent. To overcome
these problems researchers have presented image-based solutions. Rui and Liu [12] proposed a
CAPTCHA scheme named ARTiFACIAL which is based on detecting human face and
features. The user is presented with embedded faces in cluttered background. He or she has to
identify a complete face and then click its six points: two for mouth corners and four for eye
corners. They used 3 face detectors and one facial features detector to check resistance of their
system against automated attacks. Both type of detectors showed a low success rate. The
scheme was evaluated from 34 human users. Though human success rate was very high but
some of the users were not comfortable with challenges as conscious effort was required to
pass the tests.
Baird and Bentley [13] proposed a family of image-based implicit CAPTCHAs. In their
work, the challenge is an image along with an instruction for the user to click on the mentioned
position (e.g. click on the mountain top). The major problem with such tests is that they cannot
be created automatically and need human effort to design them. Datta et al. [14] proposed a
system named IMAGINATION for image-based CAPTCHAs generation. Their CAPTCHA is
a double-round click and annotation process where the user has to click four times in all. The
user is given a composite image formed by eight images and he or she has to click near the
geometric center of the image which he or she wants to annotate. If the click is valid, i.e., it is
near one of the centers, an image is presented to the user with a list of word choices after
applying controlled annotation on that list. Now the user has to select the appropriate word
from the list for the given image. The process is repeated one more time and completing both
rounds correctly means challenge is passed.
Wen-Hung Liao [15] proposed another image-based CAPTCHA. An image is presented to
the user with two non-overlapping blocks of the image exchanged with each other. In order to
pass the challenge, the user has to click on the exchanged region. Asirra was presented by
Elson et al. [16]. The challenge consists of images of cats and dogs, selected randomly from
the database that is manually tagged and updated, and the user has to identify all images of cats
in the displayed set. Gossweiler et al. [17] presented an image rotation based CAPTCHA.
Different orientations of the same image are presented to the user and the user has to identify
the image’s upright orientation. The success rate is 84% when tested with three images. Kim et
al. [18] proposed a CAPTCHA scheme which is an improvement of Gossweiler et al.’s idea.
Instead of rotating whole image, different sub-images are selected and rotated. The user has to
find the correct orientation for those sub-images in order to pass the test.
In addition to text-based and image-based CAPTCHAs, a few schemes have been designed
for the visually impaired users. Holman et al. [19] proposed a CAPTCHA which includes both
audio and visual data. The user is provided with the image of an object and its sound. To pass
the challenge, user has to associate appropriate word with the given image/sound from the
provided list of words. The CAPTCHA is suitable for users having either visual or hearing
disability because a combination of audio and video data is given to the user. A prototype was
evaluated from both blind and normal users and feedback from both groups was encouraging.
A potential limitation is the limited number of easily identifiable image and sound
combinations.
3. Proposed Scheme
As mentioned earlier, a CAPTCHA is a challenge-response game in which a challenge is
given by the system and the response is provided by a human user. The challenge in our
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. X, NO. X, XXX 201X 5
proposed CAPTCHA scheme is a human image along with a list of values (classes) for five
relatively easily identifiable human appearance characteristics. The user response is to
associate correct values with those characteristics for the given image.
3.1 Participants of the Study
We conducted a study for usability testing of the proposed image-based CAPTCHA scheme.
Twenty five undergraduate university students participated in our study. We had two
applications with same characteristics but different classes against those characteristics.
Testing started with a 5-7 minutes demo to make users familiar with the system. Then users
were called one by one to test the system. For each application, each user was asked to take the
CAPTCHA test 5 times, i.e., submit 5 inputs for each application.
3.2 Our Scheme
We present an image-based CAPTCHA scheme using human appearance characteristics.
Human face images are collected from widely used search engine Google and other publicly
available databases. We label the images manually. The human characteristics we used are:
gender, hair type, hair color, ethnicity, and (facial) expression. These characteristics are
chosen since we feel that these are conveniently identifiable. Each characteristic is classified
into two or more classes, e.g., gender has two classes: male and female while expression has 6
classes: anger, disgust, fear, joy, sad and surprise.
The working of the proposed scheme is as under. An image is given to the user along with a
list of above-mentioned characteristics and corresponding classes. The user is asked to
associate characteristics values to the image from the given list. If the user associates all
characteristics correctly, he or she is considered a human; otherwise the user is considered a
machine. Formally, our CAPTCHA challenge can be described as follows. Let I be the image
shown to the user in a particular CAPTCHA challenge, let Cmi be the value of ith characteristic
of I as assigned manually in the database (we call this manually assigned class), let Cui be the
value of ith characteristic of I as assigned by the user responding to CAPTCHA challenge (we
call this user assigned class), then we say that a CAPTCHA challenge is passed iff Cmi = Cui
for all i (in our scheme i={1,2,3,4,5})
We tested our scheme for two applications detailed below.
3.2.1 Application 1
This is first of the two applications which we designed for usability testing of our proposed
image-based CAPTCHA. We use the following human appearance characteristics that are
classified into the classes shown in the parentheses.
1. Gender (Male, Female)
2. Hair type (Long, Short)
3. Hair color (Black, Brown/Golden/Blonde, White/Grey)
4. Ethnicity (White/Caucasian, Black/African, Asian)
5. Expression (Anger, Disgust, Fear, Joy, Sad, Surprise)
This application works as follows: The user is presented with an image, randomly selected
by the system from the database, and the user is asked to associate corresponding
characteristics from a list for that image. Figure 3 displays our CAPTCHA as shown to the
user.
If the user submits all correct values for the given challenge, the test is considered passed. If
6 Kalsoom et al.: An Image-Based CAPTCHA Scheme Exploiting Human Appearance Characteristics
any of the user assigned classes do not match with the corresponding manually assigned
classes, the test is considered failed and the user is asked to retry. For each new attempt, a fresh
challenge is displayed to the user. The number of retries can either be unlimited or restricted
depending on the type of application and the desired security level.
Figure 3. An example of the proposed CAPTCHA challenge (Application 1)
The user is also provided with a “See Example” option as seen in Figure 4. This option
provides sample images for different classes against a given characteristic. These sample
images may be helpful to the users in making their decisions, if they are in doubt about any
particular characteristic. Figure 4 shows the sample images for short and long hair when help
is requested for the characteristic "Hair Type".
Figure 4. Sample images for hair type using “See Example" help
A user can also request a fresh challenge if he or she feels that the current challenge is not
convenient to solve. After the user submits a challenge, he or she is informed about acceptance
if all associated classes are correct and rejection otherwise.
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. X, NO. X, XXX 201X 7
3.2.2 Application 2
In many applications, usability is a major requirement and some of the security may be
compromised to achieve desired levels of usability. For such type of scenarios, we implement
our scheme for the selected human characteristics with reduced classes in ethnicity and
expression. The new set of classes is as under.
1. Gender (Male, Female)
2. Hair type (Long, Short)
3. Hair color (Black, Brown/Golden/Blonde, White/Grey)
4. Ethnicity (Black/African, White/Caucasian)
5. Expression (Happy, Not Happy)
In application 2, ethnicity has been classified into 2 classes instead of 3 and expression has
been classified into 2 classes instead of 6. The classification of other 3 characteristics remains
unchanged. We intuitively felt that the most difficult task for user in application 1 would be
correctly classifying expressions into 6 different categories. Results of application 1
(presented in Section 4) confirm our assumption. Other than the difference in number of
classes, the working of application 2 is exactly the same as that of application 1. Figure 5
schematically shows the characteristics and classes used in application 2.
Figure 5. CAPTCHA challenge of the proposed scheme (Application 2). Ethnicity and expression
classes are reduced to 2 in each case.
4. Experimental Evaluation
4.1 Application 1
As mentioned earlier, we conducted a study with 25 participants to evaluate usability of the
proposed scheme. Each user took CAPTCHA challenge 5 times resulting in a total of 125
attempts. For application 1, 77 attempts were successful while 48 were incorrect submissions
resulting in a usability success rate of 62%.
In some cases, multiple misclassifications were made in a single incorrect response
resulting in a total of 60 misclassifications against all characteristics. Table 1 shows a
distribution of user responses with respect to number of misclassifications in each attempt. As
8 Kalsoom et al.: An Image-Based CAPTCHA Scheme Exploiting Human Appearance Characteristics
can be seen from the table, among incorrect submissions, most of the users have made just 1
misclassification while there are 3 misclassifications in the worst case. Figure 6 schematically
represents number of misclassifications made against each characteristic (for both
applications).
The results of application 1show that the expression characteristic was the most difficult to
handle by the users resulting in 28 incorrect submissions. Ethnicity misclassification is also on
the higher side while the other 3 characteristics have a relatively low misclassification rate.
Table 1. Number of misclassifications vs. number of attempts made by users
Number of
misclassifications Number of attempts
Application 1 Application 2
0 77 104
1 39 17
2 6 4
3 3 0
Figure 6. Number of misclassifications made by the users against all characteristics
4.2 Application 2
As expected, the users find it quite difficult to classify multiple expressions accurately in
application 1. In application 2, we reduced the number of expressions from 6 to 2 (Happy and
Not happy). In addition, we also reduced the classes in ethnicity from 3 to 2 leaving the Asian
class out.
Similar to application 1, 25 users submitted the responses 5 times each resulting in a total of
125 submissions. Number of successful attempts was recorded as 104 while 21 inputs were
incorrect. Therefore, the human success rate was increased to 83% for application 2. There
were a total of 25 misclassifications made against all characteristics. Table 1 shows a
distribution of user responses with respect to number of misclassifications in each attempt.
Similar to application1, most of the users have made just 1 misclassification while at most 2
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. X, NO. X, XXX 201X 9
misclassifications are there. Figure 6 shows misclassifications made against each
characteristic (for both applications).
In application 2, the highest number of errors was recorded as 9 against hair color
characteristic. Expressions were misclassified 6 times followed by 5 incorrect submissions
each for both ethnicity and hair type characteristics.
4.3 Discussion on Results
Perception of individuals may vary quite significantly. For example, if a person has long hair
according to one individual, it may not be the same for some other. In spite of these perception
differences, the overall results are quite encouraging. In application 1, users classified all the
characteristics correctly in 62% challenges which were further improved to 83% in application
2. Next, we discuss the results against each characteristic in some detail.
Gender: There was just one wrong submission under this category in application 1. After
analyzing our database, we find out that it is more likely that the user has mistakenly selected
the wrong option for that particular image.
Hair Type: There were 7 misclassifications for hair type in application 1 and 5 in
application 2. Our analysis reveals that one reason of misclassifications is that some of the
users did not consider the gender of the image to associate the corresponding hair type.
Typically, females have longer hair and hence, the same length hair may be classified as long
for males and short for females. But some of the users did not consider this fact. This type of
failure rate can be minimized by training users to consider gender while associating hair type
with the image.
Hair Color: For hair color, we have 11 and 9 misclassifications in application 1 and 2,
respectively. Our analysis shows that images with lightening effects might have influenced the
hair color selection. For example, in some of the images, shadow was present on one side of
the head resulting in different hair color perception by different users. In addition, there were a
few images where individuals had dyed their hair and as a result they had more than one
shades in their hair again making the classification task difficult for the users. The error rate
can be further reduced by carefully selecting the image dataset.
Ethnicity: Ethnicity class was mostly misclassified in application 1 with 13
misclassifications as compared to 5 in application 2. One reason of misclassifications could be
the lack of geographic and social knowledge of the users. Some of the users could not classify
correctly the ethnicity of Chinese and Japanese and categorized all of them on the basis of their
facial color as “White/Caucasian". Many users also did not utilize the “See Example" option to
get help on ethnicity using sample images.
Expression: As it can be seen in Figure 6, expression characteristic is the one which caused
single largest number of misclassifications among all characteristics in application 1. The
errors were caused because it was difficult for the users to uniquely classify the given
expression in a relatively long list of expression classes (6 classes: Anger, Disgust, Fear, Joy,
Sad, and Surprise). Figure 7 shows the number of misclassifications against each expression
class. For application 1, the figure shows 2 values against each class; 1) when a specific class
is misclassified to any other class, and 2) when any other class is misclassified to that specific
class. For example, “Joy to Any other” represents those misclassifications where manually
assigned class was Joy but user assigned class was any of the other 5 classes. Similarly, “Any
other to Joy” represents those cases where manually assigned class was not Joy but user
assigned class was Joy. The results show that Surprise expression was most difficult for the
users to differentiate. There were 15 errors where either Surprise was misclassified to some
other expression or some other expression was misclassified to Surprise. We expect that by
10 Kalsoom et al.: An Image-Based CAPTCHA Scheme Exploiting Human Appearance Characteristics
more training and by selecting image dataset more carefully, overall success rate can be
increased.
From the above discussion, it is evident that Application 2 has produced better usability
results than Application 1. However, in Application 2, we have reduced number of classes
which results in lower security against brute force attacks. In fact, usability and security are
dependent on each other, i.e., more security mostly leads to less usability and vice versa. Same
is true for our scheme. Application 1 is more secure than application 2 but has lower human
success rate of 62% as compared to 83% for application 2.
Figure 7. Number of misclassifications involving different expression classes in application 1. Surprise
expression proved to be most difficult for the users to classify.
5. Security Analysis of the Proposed Scheme
First, we discuss brute force attack against our scheme. A brute force effort requires 216
attempts to attack application 1 and 48 attempts for application 2. Although the above
mentioned search space is not very large but as the purpose of CAPTCHAs is to prevent
misuse of web services so if a bot is able to pass the challenge after 216 attempts, then most
probably it will not defeat the vary purpose of CAPTCHA. Additionally, challenge after every
attempt is refreshed, so the attacker is not attacking the same image again and again. Besides
this, we can block the user after a fixed number of attempts from the same IP address. In
addition, to increase search space, 2 different images can be given to the user to associate
characteristics. This will increase the brute force effort against application 1 from 216 to
46656 but it will also decrease the usability of the system.
Next we consider Artificial Intelligence attacks. The proposed scheme offers multiple
machine challenges namely gender classification, hair type classification, hair color
classification, ethnicity classification and expression classification. For some of these
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. X, NO. X, XXX 201X 11
challenges, researchers have presented different algorithms giving reasonably accurate results
on selected datasets. For example, gender classification is a two class problem and a lot of
research has been done in automatically estimating human gender from facial images [20, 21,
22, 23, 24, 25, 26]. Similarly, research work has been done on ethnicity detection treating it as
either a two class problem [23, 25, 27] (Asian/Non-Asian) or a three class problem [26, 28]
(Asian/Caucasian/African). Facial expression recognition is also an active area in AI research
[29, 30, 31]. Though the above-mentioned systems show good classification results, they are
typically limited to work with much lesser challenging images than those included in our
dataset. Our database is extremely challenging for AI applications as we have images with
lightening effects, cluttered background and side poses, etc. which makes the classification
task much more difficult for machines as compared to that for humans.
To the best of our knowledge, there is no AI algorithm available to do the classification for
the remaining two characteristics. In addition, it also looks intuitively difficult to train a
system to classify hair color of images with lightening effects. Similarly, it is difficult for any
software program or bot to identify the length of the hair in an image. For example, there is a
person with long hair but have tied up his or her hair. Now a human can “see” its length but,
most likely, a machine will not be able to classify the length correctly. In short, though there
exist AI algorithms for detecting individual human appearance characteristics but passing
CAPTCHA challenge as a whole in the proposed scheme is a difficult task for such algorithms
due to involvement of multiple AI problems in a single challenge and presence of challenging
images in our dataset.
6. Limitations of the Proposed Scheme and Potential Solutions
The major limitation of the proposed scheme is manual tagging of the images. By manual
tagging, we mean that one has to manually associate values (classes) to the given
characteristics for each image in the database. Unfortunately, this manual tagging approach is
not feasible for real life applications with large datasets because it costs time and human
resources. In this section, we propose a solution to overcome this limitation.
In the proposed solution, the user’s response to a challenge is not only used to verify
whether he or she has passed the test but also to tag those images in the database which have
not been manually tagged. Initially a subset of the images is manually tagged while the tagging
of remaining images (or new images which are subsequently added in the system) is
automated through users’ interaction with the system without a need to manually tag those
images in the database. Instead of one image, now two images are shown to the user. The user
is asked to associate characteristics with both images; one to pass the challenge (we call this
challenge image) and the other to automatically tag the image in the database (we call this
database image). Figure 8 shows this modified CAPTCHA displayed to the user. The
intuition is that if a user classifies all characteristics correctly for challenge image, it is quite
likely that he or she will also classify all characteristics correctly for database image. In order
to have more confidence in the classification of database image, we only mark the image as
tagged in the database if all the values supplied by the users match for two consecutive
appearances of that image. The following description explains the process in more detail.
For the proposed solution to work, we associate a state with each image in the database. An
image can have one of the following three states: i) Untagged, ii) Semi-tagged and iii) Tagged.
At the beginning, a subset is (manually) tagged and all the other images are untagged. In each
CAPTCHA challenge, a challenge image and a database image (both randomly selected) are
shown to the user. The challenge image is always tagged while the database image can be
12 Kalsoom et al.: An Image-Based CAPTCHA Scheme Exploiting Human Appearance Characteristics
either untagged or semi-tagged. The user has to tag (classify) both images. If challenge image
is not classified correctly by the user, the system displays a failure message and ignores the
classification provided for database image. On the other hand, if challenge image is classified
correctly by the user then database image is observed whether its status in the database is
semi-tagged or untagged. If database image is untagged then its status is changed to
semi-tagged and user assigned classes are stored in the database against that image.
Alternatively, if database image is semi-tagged then its status is changed to either tagged or
untagged. If all the classes assigned by the current user match with those in the database then
the status is changed to tagged and classification is considered as final values for that image.
On the other hand, if any of the classes do not match with the stored ones then the image is
marked as untagged and its stored values are removed from the database. Figure 9
schematically illustrates this idea in the form of a flowchart.
Figure 8. CAPTCHA challenge of the proposed scheme for auto-tagging (Application 1). The
challenge image is on the left while the database image is on right.
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. X, NO. X, XXX 201X 13
Start
Show 2 random images
to the user
Get classification from
the user
Challenge image
classified correctly
Show failure message
Database image
has the status
partially tagged
Change the status of
database image to
partially tagged
Show success message
End
Cui = Cmi for all i
Change the status of
database image to
tagged
Change the status of
database image to
untagged
No
Yes
No Yes
Yes
No
Figure 9. A flowchart illustrating flow of control in our modified CAPTCHA to facilitate auto-tagging
of images
7. Conclusion
In this paper, we proposed a user friendly image-based CAPTCHA scheme based on the
human appearance characteristics. We considered only those characteristics whose values are
relatively unambiguous. We evaluated our proposed idea with two applications and presented
14 Kalsoom et al.: An Image-Based CAPTCHA Scheme Exploiting Human Appearance Characteristics
the results comparing security and usability features. Our results indicate that some level of
usability must be sacrificed to achieve higher level of security and vice versa. Our CAPTCHA
challenge is not time consuming as users have to click using mouse instead of typing using
keyboard which also results in avoidance of potential errors caused by misspellings and
synonyms. Universality feature can be achieved by adding translator to dynamically convert
the labels into user specified language. With two versions of the proposed scheme, we
achieved a human success rate of 62% and 83%, respectively.
References
[1] M. D. Lillibridge, M. Abadi, K. Bharat, and A. Broder, “Method for selectively restricting access
to computer systems”. US Patent 6,195,698. Applied April 1998 and Approved February 2001.
[2] Luis von Ahn, Manuel Blum, Nicholas J. Hopper and John Langford, “CAPTCHA: using hard AI
problems for security,” in Proc. of EUROCRYPT 2003, international conference on the theory
and applications of cryptographic techniques, 2003.
[3] A.L. Coates, H.S. Baird and R.J. Faternan, “Pessimal print: a reverse Turing test,” in Proc. of
Document Analysis and Recognition, 2001.
[4] H.S. Baird, “Document image defect models,” in Proc. of Document Image Analysis, 1995.
[5] M. Chew and H.S. Baird, “BaffleText: a human interactive proof,” in Proc. of SPIE Document
Recognition & Retrieval, 2003 .
[6] Greg Mori, “Results on Gimpy”, Retrieved on 25 Oct, 2011 from
http://www.cs.sfu.ca/~mori/research/gimpy/hard/
[7] H.S. Baird and T.P. Riopka, “ScatterType: a reading CAPTCHA resistant to segmentation attack,”
in Proc. of SPIE, 2005.
[8] Greg Mori and Jitendra Malik, “Recognizing objects in adversarial clutter: breaking a visual
CAPTCHA,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR 03),
2003.
[9] J. Yan and El Ahmad, “A Low-cost Attack on a Microsoft CAPTCHA,” in Proc. of 15th ACM
Conference on Computer and Communications Security, 2008.
[10] J. Yan and El Ahmad, “Is cheap labour behind the scene? Low-cost automated attacks on Yahoo
CAPTCHAs”, School of Computing Science Technical Report, Newcastle University, England,
2008.
[11] El Ahmad, J. Yan, and L. Marshall, “The robustness of a new CAPTCHA,” in Proc.of the Third
European Workshop on System Security, 2010.
[12] Y. Rui and Z. Liu, “ARTiFACIAL: Automated reverse Turing test using FACIAL features,”
Multimedia Systems, vol 9, pp. 493-502, 2004.
[13] H.S. Baird and J.L. Bentley, “Implicit CAPTCHAs,” in Proc. of SPIE, 2005.
[14] R. Datta, J. Li and J. Wang, “IMAGINATION: a robust image-based CAPTCHA generation
system,” in Proc. of the 13th annual ACM international conference on Multimedia, 2005.
[15] Wen-Hung Liao, “A CAPTCHA mechanism by exchanging image blocks,” in Proc. of the 18th
IEEE International Conference on Pattern Recognition (ICPR'06), 2006.
[16] J. Elson, J. Douceur, J. Howell and J. Saul, “Asirra: a CAPTCHA that exploits interest-aligned
manual image categorization,” in Proc. of the 14th ACM conference on Computer and
Communications Security, 2007.
[17] Gossweiler Rich, Kamvar Maryam and Baluja Shumeet, “What's up CAPTCHA?: a CAPTCHA
based on image orientation,” in Proc. of the 18th international conference on World Wide Web,
2009.
[18] Jong-Woo Kim, Woo-Keun Chung and Hwan-Gue Cho, “A new image-based CAPTCHA using
the orientation of the polygonally cropped sub-images,” The Visual Computer, , vol 26, pp.
1135-1143, 2010.
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. X, NO. X, XXX 201X 15
[19] J. Holman, J. Lazar, J. Feng and J. D'Arcy, “Developing usable CAPTCHAs for blind users,” in
Proc. of the 9th international ACM SIGACCESS conference on Computers and Accessibility,
2007.
[20] B.A. Golomb, D.T. Lawrence and T.J. Sejnowski, “Sexnet: A neural network identifies sex from
human faces,” Advances in neural information processing systems, vol 3, pp. 572-577, 1991.
[21] B. Moghaddam and M.H. Yang, “Gender classification with support vector machines,” in Proc. of
Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 2000.
[22] B. Moghaddam and M.H. Yang, “Learning gender with support faces,” Pattern Analysis and
Machine Intelligence, IEEE Transactions on, vol 24, pp. 707-711, 2002.
[23] G. Shakhnarovich, P.A. Viola and B. Moghaddam, “A unified learning framework for real time
face detection and classification,” in Proc. of Automatic Face and Gesture Recognito,2002.
[24] R. Iga, K. Izumi, H. Hayashi, G. Fukano and T. Ohtani, “A gender and age estimation system from
face images,” in Proc. of IEEE SICE 2003 Annual Conference, 2003.
[25] X. Lu, H. Chen, and A.K Jain, “Multimodal Facial Gender and Ethnicity Identification,” in Proc.
of International Conference on Biometric, 2006.
[26] H. Lin, H. Lu, and L. Zhang, “A new automatic recognition system of gender, age and ethnicity,
in Proc. of Sixth World Congress on Intelligent Control and Automation, 2006.
[27] X. Lu and A.K Jain. “Ethnicity identification from face images,” in Proc. of SPIE, 2004.
[28] S. Hosoi, E. Takikawa and M. Kawade, “Ethnicity estimation with facial images,” in Proc. of Sixth
IEEE International Conferenc on Automatic Face and Gesture Recogniton, 2004.
[29] I.A. Essa and A.P. Pentland, “Facial expression recognition using a dynamic model and motion
energy,” in Proc. of International Conference on Computer Vision ,1995.
[30] Z. Zhang, M. Lyons, M. Schuster and S. Akamatsu, “Comparison between geometry-based and
gabor-wavelets-based facial expression recognition using multi-layer perceptron,” in Proc. of
Third IEEE International Conference on Automatic Face and Gesture Recognition, 1998.
[31] P.K. Manglik, U. Misra and H.B. Maringanti, “Facial expression recognition,” in Proc. of IEEE
International Conference on Systems, Man and Cybernetics, 2004.
16 Kalsoom et al.: An Image-Based CAPTCHA Scheme Exploiting Human Appearance Characteristics
Sajida Kalsoom completed her MS in Computer Science from COMSATS Institute of
Technology (CIIT) in Islamabad, Pakistan. She is currently working as a lecturer in CIIT,
Islamabad. Her research interests include network security and computer vision.
Sheikh Ziauddin is an assistant professor in department of computer science, COMSATS
Institute of Information Technology in Islamabad, Pakistan. He did his PhD in Computer
Science from Asian Institute of Technology in Bangkok, Thailand. His research interests
include biometrics, image processing, cryptography, and computer security.
Abdul Rehman Abbasi is principal engineer at Karachi Institute of Power Engineering,
Karachi. He completed his PhD in Mechatronics Engineering from Asian Institute of
Technology in Bangkok, Thailand. His research interests include robotics, machine vision and
machine learning.
... For labeling the images, users should select their own choices from a set of predetermined options. In anotherconceptually similar work (Kalsoom et al., 2012), the proposed CAPTCHA asks users to perform a multi-tagging operation in order to specify appearance-based characteristics for a given human face. In this regard, the user should specify gender, hair type, hair color, ethnicity and perceived facial expression for the presented image by selecting from the list of possible choices. ...
... According to the reviewed literature, there are four main classes of CAPTHCA-based image annotation as follows. (2) Click-based modalities for selecting labels from a controlled list of options such as Kalsoom et al. (2012). ...
Article
Purpose Image annotation plays an important role in image retrieval process, especially when it comes to content-based image retrieval. In order to compensate the intrinsic weakness of machines in performing cognitive task of (human-like) image annotation, leveraging humans’ knowledge and abilities in the form of crowdsourcing-based annotation have gained momentum. Among various approaches for this purpose, an innovative one is integrating the annotation process into the CAPTCHA workflow. In this paper, the current state of the research works in the field and experimental efficiency analysis of this approach are investigated. Design/methodology/approach At first, and with the aim of presenting a current state report of research studies in the field, a comprehensive literature review is provided. Then, several experiments and statistical analyses are conducted to investigate how CAPTCHA-based image annotation is reliable, accurate and efficient. Findings In addition to study of current trends and best practices for CAPTCHA-based image annotation, the experimental results demonstrated that despite some intrinsic limitations on leveraging the CAPTCHA as a crowdsourcing platform, when the challenge, i.e. annotation task, is selected and designed appropriately, the efficiency of CAPTCHA-based image annotation can outperform traditional approaches. Nonetheless, there are several design considerations that should be taken into account when the CAPTCHA is used as an image annotation platform. Originality/value To the best of the authors’ knowledge, this is the first study to analyze different aspects of the titular topic through exploration of the literature and experimental investigation. Therefore, it is anticipated that the outcomes of this study can draw a roadmap for not only CAPTCHA-based image annotation but also CAPTCHA-mediated crowdsourcing and even image annotation.
... CAPTCHAs that appear in websites with deformed images of alphabets and numbers are textbased CAPTCHAs. To build the complexity of solving text-based CAPTCHAs, a mixture of font styles and font colors are also involved in framing these texts so that only a human being can identify properly and feed it precisely [13][14][15]. Pictures of CAPTCHAs are created using several images where users are made to distinguish objects such as cars, street symbols, and road signs in a set of images. Screen reading programs help visually challenged users in accessing websites. ...
Article
In digital technological advancement, websites are developed and set live to render service and information for users. To protect web resources from computerized programs, which create fake web login registration, Spam mails, fallacious comments in blogs, and forged online polls, CAPTCHAs are designed. For the effective functioning and security of web resources, CAPTCHAs are set for testing along with user login in WebPages. Users find text CAPTCHAs since it is simple and easy to design and implement. This paper introduces a method for creating CAPTCHAs using regional languages which are user-friendly. Without any difficulties, regional users can access location-based websites. Scrapers and automated bots may find difficulty in cracking this code. The websites can be more protected from intruders if the linguistic CAPTCHA is used. CAPTCHA (lCAPTCHA) generation and implementation process in the website, based on their region, is executed in this paper. This approach and technique provide a reliable platform for websites with high-level security and service by preventing various attacks. Analysis of various CAPTCHA techniques compared with the proposed method is done and interpretations are elucidated.
... They include CAPTCHAs such as: dice CAPTCHA, image-puzzle CAPTCHA, mosaic CAPTCHA, Bongo CAPTCHA, etc. In the recent time, the cognitive elements included in image-based CAPTCHA are in the focus [19]. Along with that, a new image-based CAPTCHA related to the age identification has been developed, too [20]. ...
Article
Full-text available
This paper presents an analysis of the CAPTCHA interfaces in terms of their usability to Internet users. The usability is represented by the time needed to the users for finding a solution to the CAPTCHA, which is called response time. Specifically, the analysis is focused on four examples of text and image-based CAPTCHA. The aim is to study the cognitive factors influencing the Internet users in finding a solution to these four CAPTCHA types. Accordingly, an experiment is conducted on 100 Internet users, characterized by demographic factors, such as age, gender, Internet experience, and education level. Each user is asked to solve the four CAPTCHA types, and the response time for each of them is registered. Collected data including demographic factors and response time is subjected to association rule mining, using the FP-Growth algorithm for extracting the association rules. They show the dependence of the response time on the co-occurrence of the demographic factors. Also, an additional statistical analysis is performed by using the nonparametric one-way Kruskal Wallis’ test. Experiments comparing the proposed method with the earlier studies of the CAPTCHA usability show the novelty of the method for the understanding of usability of CAPTCHA interfaces, which is based on the cognitive factors that influence the response time.
... Image-based CAPTCHA. Powell, Goswami and Vatsa (2014), Kalsoom, Ziauddin and Abbasi (2012), Wei, Jeng and Lee (2012) and Saini (2012) studied Image Based CAPTCHA and found that apart from the above CAPTCHAs, image based CAPTCHAs are also in existence. Users are made to recognize the human appearance characteristics with the help of mouse to avoid any manual errors that can happen while typing. ...
Article
Full-text available
Since the start of the Internet, numerous users have endeavored to abuse websites for both amusement and advantage. As the misuse got to be beneficial, the size of misuse developed utilizing computerized programming. To keep bots from invading destinations with spam, publishers act in response by testing users whether they were human or not. An exploration reaction test regularly put in web structures to figure out if the user is human or computer itself is “CAPTCHA”. One of the most useful applications has been developed for the web usability and security is CAPTCHA. Blocking spam bots, falsified registration by computerized program, spurious lotteries and other disgraceful things in websites are the functions of CAPTCHA. Existing CAPTCHAs are text based, audio and video based. The most generally utilized CAPTCHA use mixes of fuzzy characters and muddling procedures that users can perceive yet that might be troublesome for robotized scripts. In this paper a new CAPTCHA technique is proposed which focuses on text based CAPTCHA where the input characters are distributed in arrays. Array CAPTCHA generation is executed by bipartite graphical representation, structural and dynamic mathematical model called Petri Net. As a two-factor authentication process, this Array CAPTCHA technique can be implemented in websites for web security. To pass the test, user has to identify both the order of the Array displayed and the letters used in the Array.
... Note except for TapCHA v2, all CAP-TCHAs in the table are commercial solutions but they also represent some of the most common design approaches as seen in the research. For example, FunCaptcha uses a face recognition scheme (D'Souza et al. 2012;Kim et al. 2014;Kalsoom et al. 2012) and an image orientation scheme (Gossweiler et al. 2009;Kim et al. 2010;Banday & Shah 2015). KeyCAPTCHA uses a puzzle scheme which is similar to (Gao et al. 2010). ...
... Human Interactive Proofs (HIPs) are used to make a differentiation between the human users and the computer robot programs [8,21]. They require a specific type of interaction by a user which is difficult to emulate by the robot programs. ...
Article
Full-text available
CAPTCHA stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart. It is a test program that solves a given task for preventing the attacks made by automatic programs. If the response to CAPTCHA is correct, then the program classifies the user as a human. This paper introduces a new analysis of the impact of different CAPTCHAs to the Internet user’s response time. It overcomes the limitations of the previous approaches in the state-of-the-art. In this sense, different types of CAPTCHAs are presented and described. Furthermore, an experiment is conducted, which is based on two populations of Internet users for text and image-based CAPTCHA types, differentiated by demographic features, such as age, gender, education level and Internet experience. Each user is required to solve the different types of CAPTCHA, and the response time to solve the CAPTCHAs is registered. The obtained results are statistically processed by Mann-Whitney U and Pearson’s correlation coefficient tests. They analyze 7 different hypotheses which evaluate the response time in dependence of gender, age, education level and Internet experience, for the different CAPTCHA types. It represents an invaluable study in the literature to predict the best use of a given CAPTCHA for specific types of Internet users.
... For Clickbased Graphical CAPTCHAs [19], the verification process requires users to click candidate images with a particular sequence following the instruction. In addition, several schemes based on face recognitions among the candidate images are also proposed [20][21][22][23][24]. Usability comparison has been reported for four of the Image CAPTCHAs mentioned above: Asirra, ESP-PIX, SQ-PIX, and IMAGINATION, which revealed that Asirra and ESP-PIX are more preferable by users [25]. ...
Article
Full-text available
Image CAPTCHA, aiming at effectively distinguishing human users from malicious script attacks, has been an important mechanism to protect online systems from spams and abuses. Despite the increasing interests in developing and deploying image CAPTCHAs, the usability aspect of those CAPTCHAs has hardly been explored systematically. In this paper, the universal design factors of image CAPTCHAs, such as image layouts, quantities, sizes, tilting angles and colors were experimentally evaluated through the following four dimensions: eye-tracking, efficiency, effectiveness and satisfaction. The cognitive processes revealed by eye-tracking indicate that the distribution of eye gaze is equally assigned to each candidate image and irrelevant to the variation of image contents. In addition, the gazing plot suggests that more than 70% of the participants inspected CAPTCHA images row-by-row, which is more efficient than scanning randomly. Those four-dimensional evaluations essentially suggest that square and horizontal rectangle are the preferred layout; image quantities may not exceed 16 while the image color is insignificant. Meanwhile, the image size and tilting angle are suggested to be larger than 55 pixels x 55 pixels and within -45~45 degrees, respectively. Basing on those usability experiment results, we proposed a design guideline that is expected to be useful for developing more usable image CAPTCHAs.
Conference Paper
Full-text available
Information Security often works in antipathy to access and useability in communities of older citizens. Whilst security features are required to prevent the disclosure of information, some security tools have a deleterious effect upon users, resulting in insecure practices. Security becomes unfit for purpose where users prefer to abandon applications and online benefits in favour of non-digital authentication and verification requirements. For some, the ability to read letters and symbols from a distorted image is a decidedly more difficult task than for others, and the resulting level of security from CAPTCHA tests is not consistent from person to person. This paper discusses the changing paradigm regarding second tier applications where non-essential benefits are forgone in order to avoid the frustration, uncertainty and humiliation of repeated failed attempts to access online software by means of CAPTCHA.
Article
Many web-based services such as email, search engines, and polling sites are being abused by spammers via computer programs known as bots. This problem has bred a new research area called Human Interactive Proofs (HIP) and a testing device called CAPTCHA, which aims to protect services from malevolent attacks by distinguishing bots from human users. In the past decade, researchers have focused on developing robust and safe HIP systems but have barely evaluated their usability. To begin to fill this gap, the authors report the results of a user study conducted to determine the extent that English language proficiency affects CAPTCHA usability for users whose native language is not English. The results showed a significant effect of participants' English language proficiency level on the time the participant takes to solve CAPTCHA, which appear to be related to multiple usability issues including satisfaction and efficiency. Yet, they found that English language proficiency level does not affect the number of errors made while entering CAPTCHA or reCAPTCHA. The authors' results have numerous implications that may inform future CAPTCHA design.
Article
Full-text available
Web services designed for human users are being abused by computer programs (bots). The bots steal thousands of free e-mail accounts in a minute, participate in online polls to skew results, and irritate people by joining online chat rooms. These real-world issues have recently generated a new research area called human interactive proofs (HIP), whose goal is to defend services from malicious attacks by differentiating bots from human users. In this paper, we make two major contributions to HIP. First, based on both theoretical and practical considerations, we propose a set of HIP design guidelines that ensure a HIP system to be secure and usable. Second, we propose a new HIP algorithm based on detecting human face and facial features. Human faces are the most familiar object to humans, rendering it possibly the best candidate for HIP. We conducted user studies and showed the ease of use of our system to human users. We designed attacks using the best existing face detectors and demonstrated the challenge they presented to bots.
Conference Paper
Full-text available
We developed extraction functions of a face candidate region with color information and parts of its face and combined them with the gender and age estimation algorithm we had already developed so that the algorithm can be applied to real time captured face images. The experimental results have shown hitting ratios of 93.1% and 58.4% for gender and age respectively.
Conference Paper
Full-text available
In this paper a novel approach for recognizing the gender, ethnicity and age with facial images is proposed. The approach is a novel combination of Gabor filter, Adaboost learning and SVM classifier. Gabor filter banks and Adaboost learning are combined to extract key facial features of each pattern. Then use the Gabor+Adaboost features based SVM classifier to recognize the face image of each pattern. The experiment results in the system based on this approach are reported to show a good performance
Conference Paper
We exploit the gap in ability between human and machine vision systems to craft a family of automatic challenges that tell human and machine users apart via graphical interfaces including Internet browsers. Turing proposed [Tur50] a method whereby human judges might validate "artificial intelligence" by failing to distinguish between human and machine interlocutors. Stimulated by the "chat room problem" posed by Udi Manber of Yahoo!, and influenced by the CAPTCHA project [BAL00] of Manuel Blum et al of Carnegie-Mellon Univ., we propose a variant of the Turing test using pessimal print: that is, low-quality images of machine-printed text synthesized pseudo-randomly over certain ranges of words, typefaces, and image degradations. We show experimentally that judicious choice of these ranges can ensure that the images are legible to human readers but illegible to several of the best present- day optical character recognition (OCR) machines. Our approach is motivated by a decade of research on performance evaluation of OCR machines [RJN96,RNN99] and on quantitative stochastic models of document image quality [Bai92,Kan96]. The slow pace of evolution of OCR and other species of machine vision over many decades [NS96,Pav00] suggests that pessimal print will defy automated attack for many years. Applications include 'bot' barriers and database rationing.
Article
A lack of explicit quantitative models of imaging defects due to printing, optics, and digitization has retarded progress in some areas of document image analysis, including syntactic and structural approaches. Establishing the essential properties of such models, such as completeness (expressive power) and calibration (closeness of fit to actual image populations) remain open research problems. Work-in-progress towards a parameterized model of local imaging defects is described, together with a variety of motivating theoretical arguments and empirical evidence. A pseudo-random image generator implementing the model has been built. Applications of the generator are described, including a polyfont classifier for ASCII and a single-font classifier for a large alphabet (Tibetan U-Chen), both of which which were constructed with a minimum of manual effort. Image defect models and their associated generators permit a new kind of image database which is explicitly parameterized and indefinitely extensible, alleviating some drawbacks of existing databases.
Conference Paper
The need to tell human and machines apart has surged due to abuse of automated `bots'. However, several textual-image-based CAPTCHAs have been defeated recently, calling for the development of new anti-automation schemes. In this paper, we propose a simple yet effective visual CAPTCHA test by exchanging the content of non-overlapping regions in an image. We give in-depth analysis regarding the choice of parameter and image database during the test generation phase. We also contemplate possible ways, including: 1) random guess, 2) collect and match, and 3) image segmentation, to defeat the proposed test and provide counter-measures when necessary. Preliminary experimental results have validated the efficacy of the proposed CAPTCHA, although we expect that a large-scale experiment to collect and analyze user responses contribute to optimal parameter settings
Conference Paper
CAPTCHAs are widely used by websites for security and privacy purposes. However, traditional text-based CAPTCHAs are not suitable for individuals with visual impairments. We proposed and developed a new form of CAPTCHA that combines both visual and audio information to allow easy access by users with visual impairments. A preliminary evaluation suggests strong potential for the new form of CAPTCHA for both blind and visual users.