ChapterPDF Available

Behavioral Biometrics and Machine Learning to Secure Website Logins: 6th International Symposium, SSCC 2018, Bangalore, India, September 19–22, 2018, Revised Selected Papers

Authors:

Abstract

In a world dominated by e-commerce and electronic transactions, the business value of a secure website is immeasurable. With the ongoing wave of Artificial Intelligence and Big Data, hackers have far more sophisticated tools at their disposal to orchestrate identity fraud on login portals. Such attacks bypass static security rules and hence protection against them requires the use of machine learning based ‘intelligent’ security algorithms. This paper explores the use of client behavioral biometrics to secure website logins. A client’s mouse dynamics, keystrokes and click patterns during login are used to create a customized security model for each user that can differentiate the user of interest from any other impersonator. Such a model, combined with existing protocols, will provide enhanced security for the user’ profile, even if credentials are compromised. The module first employs a means of collecting relevant behavioral data from the client side when a new account is created. The collection module can easily be integrated with any web application without impacting website performance. After sufficient collection of login data, a biometric-based fraud detection algorithm is created that secures the account against future impersonators. Our choice of algorithms is the Multilayer Perceptron, Support Vector Machine and Adaptive Boosting, the outcomes of which are polled to give the prediction. We find that such a model shows good performance (accuracy, precision and recall) for different train: test splits. Moreover, the model is easily implementable for any web based authentication, is scalable and can be fully automated, if a dataset like ours can be created from client activity on the web application of interest.
Behavioral Biometrics and Machine Learning to
secure Website Logins
Falaah Arif Khan
DES India, Dell
Bangalore, India
Falaah_Arif_Khan@dell.com
Sajin Kunhambu
DCS DCP India, Dell
Bangalore, India
Sajin_Kunhambu@DELL.com
K Chakravarthy G
DCS DCP India, Dell
Bangalore, India
k_chakravarthy_g@dell.com
Abstract In a world dominated by e-commerce and
electronic transactions, the business value of a secure
website is immeasurable. With the ongoing wave of
Artificial Intelligence and Big Data, hackers have far
more sophisticated tools at their disposal to orchestrate
identity fraud on login portals. Such attacks bypass static
security rules and hence protection against them requires
the use of machine learning based ‘intelligent’ security
algorithms. This paper explores the use of client
behavioral biometrics to secure website logins. A client’s
mouse dynamics, keystrokes and click patterns during
login are used to create a customized security model for
each user that can differentiate the user of interest from
any other impersonator. Such a model, combined with
existing protocols, will provide enhanced security for the
user’ profile, even if credentials are compromised. The
module first employs a means of collecting relevant
behavioral data from the client side when a new account
is created. The collection module can easily be integrated
with any web application without impacting website
performance. After sufficient collection of login data, a
biometric-based fraud detection algorithm is created that
secures the account against future impersonators. Our
choice of algorithms is the Multilayer Perceptron,
Support Vector Machine and Adaptive Boosting, the
outcomes of which are polled to give the prediction. We
find that such a model shows good performance
(accuracy, precision and recall) for different train: test
splits. Moreover, the model is easily implementable for
any web based authentication, is scalable and can be fully
automated, if a dataset like ours can be created from
client activity on the web application of interest.
Keywordsbehavioral biometrics, machine learning,
artificial intelligence, login fraud, intelligent security, keystroke,
mouse movements, multilayer perceptron, support vector
machine, adaptive boosting
I. INTRODUCTION
Net bots and other ‘intelligent’ methods at the disposal of
malicious users to perform fraudulent logins on websites
make user information susceptible to misuse. In an era where
online transactions drive sales, such attacks cost millions of
dollars to the business. Dictionary and other brute force
attacks easily bypass static security rules and put user
information in malicious hands. Once credentials have been
compromised intruders can perform multiple subsequent
malicious logins that go virtually undetected during
authentication. The most relevant, generally overlooked and
underused information from the client side is behavioral,
namely; the mouse dynamics, keystrokes and click patterns of
the user. We propose a model that would secure user accounts
even if credentials have been compromised, that is based on
behavioral biometrics of the user during login. The model
would first employ a means of collecting relevant behavioral
data from the client side at login to create a unique template.
Then a fraud detection model is created. It consists of three
separate modules, namely; the Multi-Layer Perceptron,
Support Vector Machine and Adaptive Boosting, the
outcomes of which are polled to give an optimal prediction,
real time, while the user is logging in.
Key metrics of the work have been identified as: making a
model that accounts for the nature of the dataset; while the
algorithm is being created, the account would have the default
level of security, making it susceptible to attacks by
imposters. Hence, our model should be designed such that its
creation should be possible with a reasonably small amount
of data. Ensuring that the model should not be
computationally expensive is another important metric. Fraud
detection needs to be done in real time and evaluation of the
model should not impact login performance on the respective
website. Lastly, the model should be easily scalable; by fixing
the architecture of the model and accommodating easy
scalability of our application we allow for automation of the
creation of the detection model for each new user/ account on
the respective website.
II. LITERATURE REVIEW
Research on security applications that use behavioral
information of the client for authentication have identified
two sources of relevant data; mouse movements and
keystrokes; which are together termed as behavioral
biometrics. There has been a myriad of applications that rely
on behavioral biometrics and these use logs of mouse
movements and keystrokes in isolation as well as in a
combination of the two.
The use of such applications has varied from being a method
of re-authentication [1], to replacing conventional password
type logins [2], to adding additional layers of security [4, 5,
6]. The authors of [1] use behavioral biometrics for re-
authentication and not as the first wall of security. The
authors of [2] propose a twofold security system, where a
keystroke based template is the first level of authentication,
and mouse movements is the second. However, such a
system is not based on passive authentication; where the
biometrics work to complement existing security protocols.
The authors instead use a keystroke template as the entity
against which authentication is provided. Similarly, a
template of a unique mouse movement is used as a
‘password’ at the second level of security. Another kind of
application of such a model in seen in [4], where the authors
seek to use such a model for Data Loss Prevention by
predicting the identity of a data creator. This can be
contrasted with [5] and [6], which focus more on user profile
identification in web applications.
Looking at the kind of features extracted from the client’s
behavioral biometrics in all these applications, we see
certain fundamental similarities. Most literature [1, 4, 6]
using mouse movements identifies 8 classes into which each
mouse event can be classified into, based on the relative
direction of movement. We find this feature engineering to
be well researched and proven to be meaningful and hence
decide to extract similar features from mouse event data. Use
of keystroke biometrics mostly focus on monogram and di-
grams, ie. Dwell time and flight time. Researchers, however,
have not tried to extract features from click patterns of the
user and so we see our work contributing to new features that
can be extracted from login activity.
There is a fundamental lack of open source datasets for such
applications. Moreover, seeing as this a customized form of
security for a specific web application, most authors choose
to make their own data. In [2], the authors explain how, for
data collection, each user was made to login 10 times, from
which features were extracted and a template was created.
In [4], the model created is implemented as a software agent
that resides on user’s desktop. For data collection for model
creation, the authors explain how an organization can
mandate its employees to install this agent and require them
to run the software in the background of operating system
every time they use a computer. This kind of
organizationally mandated data collected would enable the
agent to record and analyze the user’s keystroke and mouse
movement behavior over more than just login activity.
For data collection, the authors of [5] made each user log in
a web site as themselves (genuine user) or other users
(intruder). For each login session, the logging type was
recorded. Credentials are shared with all users, to allow for
impersonation attack. In total, 24 users with different
background and computer skills participated in the data
collection, giving rise to logs of 193 legitimate visits and 101
intrusive visits. In [6], a total of 25 subjects were asked to
come up with a new password. Each subject or owner typed
this password 150 to 400 times during a period of several
days, and the last 75 timing vectors collected were set aside
for testing. The remaining timing vectors were used to train
the network. A total of 15 imposters were given all the 21
passwords and asked to type each password five times,
resulting in 75 imposter test vectors for each password.
Combined with the owner's 75 test vectors previously set
aside, a total of 150 test vectors per password were obtained.
In all these papers we see a significant amount of data being
logged, and gave us a metric of how much login data would
be required to create a reasonably good model.
Some preliminary analysis was also done on the few datasets
that are publicly available. For mouse movements, the
balabit/mouse dynamics challenge dataset was used [7]. The
goal of the challenge was to protect a set of users from the
unauthorized usage of their accounts by learning the
characteristics of how they use their mouse. The dataset
contained timing and positioning information of mouse
pointers of different users, from multiple sessions on a web
application. For the purpose of collecting data, a network
monitoring device was set between the client and the remote
computer that inspected all traffic as described by the RDP
protocol. This included the mouse interactions of the user
that is transmitted from the client to the server during the
remote session. Hence, the dataset contained the following
fields: record timestamp: elapsed time (in sec) since the start
of the session as recorded by the network monitoring device,
client timestamp: elapsed time (in sec) since the start of the
session as recorded by the RDP client, button: the current
condition of the mouse buttons, state: additional information
about the current state of the mouse, x: the x coordinate of
the cursor on the screen and y: the y coordinate of the cursor
on the screen. Work on this dataset helped understand what
kind of features can be meaningful for a model based on
mouse movements.
We then worked on the open source datasets on keystrokes.
The Benchmark Data Set (by Kevin Killourhy and Roy
Maxion) released as an accompaniment to [8] contains the
timing data for 51 typists all typing the same word. The
dataset contains the flight and dwell time for the predefined
password, and hence no preprocessing or feature engineering
was required before using the dataset. The BeiHang
Keystroke Dynamics Database (released as an
accompaniment to [9]) was another dataset we did extensive
work on, before making our own dataset and model. The
dataset contains 2057 test samples and 556 train samples,
taken from 117 subjects, divided into two subsets, based on
the collection environment. The keystrokes for a particular
session were read as a sequence of PiRi vectors, where Pi and
Ri represent the press and release time of the ith key of the
password. With this dataset we had the liberty to extract as
many ngram features as we wanted.
Our work on these available datasets was a verification of
the documented success of feature engineering of client
biometrics.
A metric of success of such a project, is its behavior across
users. The authors of [1] successfully verify the scalability
of a biometric based solution by proving its working for
multiple users and across environments. The authors of [2]
keep in mind the fact that their model, while analyzing the
user’s keystroke and mouse movement behavior, needs to
track the strokes on his keyboard and the movement of his
mouse without influencing user’s work. Hence, they show
the practical implementation of such models as a software
agent that resides on user’s desktop.
Another practical consideration would be the variation in
passwords that users choose to keep. The authors of [5]
identify that limited amount of work has been accomplished
on free text detection and make accommodations for dealing
with free text and free mouse movements, and the fact that
many web sessions tend to be very short.
Considerations on practicality are further answered in [6],
where the disproportionation between data labels is
identified. The authors highlight how such a problem is a
binary classification problem (owner vs. imposters)
problem, yet the patterns from only one class, the owner’s
are available in advance. Since there are millions of potential
imposters, it is not practical to obtain enough patterns from
all kinds of imposters. Also, it is not practically feasible to
publicize credentials in order to collect
potential imposters' timing vectors. Hence, they propose that
the only solution is to build a model of the owner's keystroke
dynamics and use this to detect imposters using some sort of
a similarity measure. They show us how our problem is that
of a “partially exposed environment” or “novelty detection”.
Another consideration the authors of [6] point out is the
situation where a new password has been registered. This
would require new data to be collected for a new model to
be created. During this time the proposed identity
verification cannot be used and so an ordinary level of
security can be maintained with the conventional password
security system. The length of the collection period can be
dynamically determined by monitoring the variability of
typing patterns. Moreover, for each password or user, a
separate model must be constructed. Also, whenever a user
changes his or her password, a new model has to be built.
Looking at the type of detection algorithms used in literature,
we see a variety of outcomes. The authors of [2] create a
classifier that verifies the similarity between the pattern to
be verified and the template of the prototypes (created from
the collected logs), using the Distance Pattern between the
vector of feature of the pattern and the prototype. In [4] they
employ the use of separate Support Vector Classifiers on
mouse dynamic features and for keystrokes. The authors of
[5] show the applicability of Bayesian Networks in such a
problem. The authors of [6] use an Auto Associative Neural
Network for novelty detection. [3] summarizes other similar
works, covering techniques like the Monte Carlo approach
for data collection to Gaussian probability density function,
direction similarity measure, parallel decision trees, etc for
classification.
[15] suggests that keystroke classes, akin to blood types, that
people can be classified into. They suggest a clustering
model for keystroke data.
Seeing as the performance of these algorithms differed based
on their inherent biases and variances we identified the need
for a polling or ensemble of conventional algorithms.
III. DATASET AND FEATURES
A. Dataset Description
The dataset was created by mimicking login activity at a
dummy login page, by 8 different typists; 1 true user and 7
imposters, all entering the same credentials into the created
portal. A total of 102 login sessions were recorded out of
which 65 sessions were of the true user and 37 were sessions
of fraudulent login attempts. The behavioral information
collected was:
Mouse coordinates at each time instant
Keystroke; timestamp of keypress event and key
release event
Timestamps of all clicks
Key code of each key pressed
B. Features Extracted
The mouse activity was divided into 5 minibatches for each
session. Within each minibatch, each mouse movement was
classified into one of 8 classes based on the relative direction
of movement. These classes of mouse movements have been
shown in Fig. 1 and described in Table I. After categorization,
features were extracted as an average of attributes logged
across the minibatch for each class, namely;
Average speed in x direction, per class
Average speed in y direction, per class
Average speed per class
Average distance covered, per class
Percentage of mouse movements logged in each
direction
Fig. 1. Categorization of mouse movements
TABLE I. CLASSES OF MOUSE MOVEMENTS
Class
Angle (in degrees)
1
0-45
2
45-90
3
90-135
4
135-180
5
180-225
6
225-270
7
270-315
8
315-360
By such a method, we got 5 features per category per
minibatch which translates to 40 features per minibatch, since
there are 8 categories. Choosing N (number of minibatches)
as 4, we got 160 features from the mouse activity logs.
The click times give us an approximation of how long the user
takes to login, by making the simple assumption that the first
click is to enter the username field, while the last one is to
submit the entered credentials. Hence, we also extracted the
login time from the click patterns as a relevant feature.
For the keystrokes, we first divided the typing activity based
on the kind of key that was pressed; control keys, shift altered
keys, lower case keys or other keys. Each keystroke was
associated with a corresponding category as described in
Table II.
TABLE II. KEY CATEGORIES
Category
Description
1
Uppercase: A-Z and special characters that require a
preceding shift (control key)
2
Lower case: a-z, numbers
3
Control: tab, backspace, delete, arrow keys
4
Others
Next, we split the entire session keystrokes into those for
username typing and for password. For each of these we
extracted;
Mean flight time, per key category
Mean dwell time, per key category
This gave us 2 features for each input (there are 2 inputs,
namely username and password) and hence we got 4 features
per category of keys. We defined 4 key categories and hence
we got a total of 16 such features. We further extracted the
mean and standard deviation of dwell and flight times for each
type of input, across all categories. This gave us another 8
features. Finally, we also noted the distribution of keystrokes
across categories (as a percentage) which gave rise to 4 more
features. This makes our total features extracted from
keystroke logs come to 28. Hence, our feature vector for each
session came to a length of 189, ie. We had 189 features for
each data point.
IV. METHODS AND RESULTS
A. Classifier Architectures
The classifiers for detecting fraudulent logins were
implemented in Python using the sklearn library.
Multilayer Perceptron: A neural network with 2
hidden layers, each containing 250 neurons and a
tanh activation was created. Training was done using
the adam optimizer, with a minibatch size of 1
sample and an initial learning rate of 0.001 which
was updated adaptively.
Support Vector Machine: The libsvm
implementation using a polynomial kernel of degree
3 was used.
Adaptive Boosting: An ensemble of decision trees,
each with a maximum depth of 200 was created
using the Adaboost module of the sklearn library
Keeping in mind the importance of this model maintaining the
performance of the web application, we must analyze the
computational complexity of our model. The choice of
machine learning models with simple architectures over deep
learning models is a conscious one, to ensure that the
detection API, which essentially is a polling of these three
architectures, does not become too computationally
expensive.
B. Results
Our dataset consisted of biometrics from 102 user logins. We
applied a random split of this data, and found the average
accuracy over 50 such splits. This was done to understand the
optimal amount of data required to create an effective model.
The results for different lengths of the feature vector are
tabulated in Table III and IV by varying the minibatch size of
mouse movements (N).
TABLE III. COMPARISION OF ACCURACIES FOR DIFFERENT TRAIN:TEST
SPLITS, WITH N=4
Train: Test
Split
Accuracy
MLP
SVM
Adaboost
80:20
0.883
0.969
0.961
70:30
0.879
0.954
0.946
60:40
0.873
0.949
0.936
50:50
0.854
0.949
0.937
TABLE IV. COMPARISION OF ACCURACIES FOR DIFFERENT TRAIN:TEST
SPLITS, WITH N=5
Train: Test
Split
Accuracy
MLP
SVM
Adaboost
80:20
0.895
0.973
0.947
70:30
0.900
0.971
0.947
60:40
0.902
0.969
0.942
50:50
0.892
0.983
0.952
Seeing as our models gave reasonably good results
(accuracy), we saved the best working models, to be loaded
and used in our real-time detection API. Table V summarizes
the performance of these chosen models against more
performance metrics.
TABLE V. SUMMARY OF SAVED MODELS
Performance
Metric
Classifier
MLP
SVM
Adaboost
Accuracy
0.952
1
0.952
Precision
1
1
0.928
Recall
0.933
1
1
V. CONCLUSIONS
A. Analysis
The results of our work seemed very promising. If we look at
the results of Table III and IV we see a reasonably reliable
performance even on a 40:60 split. This means that model
creation required around 30 user logs and this is far less that
the sizes reported previously. Moreover, testing of the created
application that does prediction using the loaded models,
described in Table V, performed exceedingly well and did not
affect the performance of the website.
Another very promising outcome of this study is the
simplicity of the network architecture involved. The support
vector machine showed the best performance on our dataset
which led us to conclude that the data gotten after feature
engineering can easily be separated by a polynomial kernel.
Given that, no matter who the user and how fast/slow he/she
types or how much he/she traverses the mouse around the
screen, the features to be extracted shall remain the same, it is
safe to assume that a similar architecture will work for
creating a SVM based detection algorithm for any user.
Similarly, with reference to the multilayer perceptron, when
taking 4 minibatches of mouse movements per session (N=4),
we get 189 features. From our results we saw that a network
with 2 hidden layers, of 250 neurons each, works well. Across
all users, the number of features remain fixed and will be
independent of the length of the password. Consequently, the
same architecture will create an equally well performing MLP
for a new user. The same argument is applied for the Adaptive
Boosting by using classification trees.
These results prove that we can automate the entire process of
model creation, by making the creation of the API
architecture independent, and purely data dependent.
Defining a certain minimum required accuracy, we can
automate the deployment of these modules into a new user’s
detection API by requiring data to be collected and models to
be retrained till the specified accuracy is reached.
B. Challenges faced
The biggest challenge faced in the creation of this model,
and sure to be faced in the scaling up of such models on live
websites, will be data collection. While the technology
required to log such behavioral biometrics is abundantly
available and easy to create, the sources of data are scarce.
A new user on a web platform will provide the positive
samples of the dataset in his first few logins. However, it will
be virtually impossible to gather negative samples. Even if
imposters use the profile, there is no way of labelling those
logs as negative samples. Hence, subsequent work needs to
be done to convert this research into one that works with
unsupervised or at the very least semi supervised learning
frameworks that do novelty detection. As is with any
increase in client data being logged by a business, a review
of legal implications might also be necessary and possibly a
challenge.
Another major challenge to this application is the varied
platforms and environments that a user can type from.
Casual typing, account sharing by multiple persons or
onehanded typing can pose challenges to our model. To
make this model more generalizable, care should be taken to
obtain such samples during the training itself and ensure that
the model has seen these exceptional positive samples.
C. Future Work
The results of this study enable the easy implementation of an
effective security protocol for web based applications. The use
of biometrics makes the security customizable, where
protection is granted against human imposters as well as
netbots and other malicious scripts. Moreover, such a
paradigm provides the conventional protection against attacks
that seek to find out credentials, for example brute force
attacks, as well as against imposters who already possess the
user’s credentials. Such a security protocol would be
extremely secure and defensible against a myriad of attackers
and attacks. The results of this study can be built upon by
recreating the same experiment with more number of users and
a larger dataset.
Using the same feature engineering, the results of this study
can be implemented for website authentication based on
transactional importance. For premium account holders, a
customized model for each user can be created. This is
intuitive as the amount of activity from these accounts (and
hence quantity of data available) will be higher as well as the
importance of securing such accounts. For regular users,
clustering into broad biometric classes can be done. Checking
for the cluster into which the incoming client sample falls into,
against the expected sample for those credentials might be
sufficient for regular accounts.
Another extension of this work could be replacing the
conventional models employed here with semi supervised
learning frameworks, to overcome the problem of collecting
negative labelled samples. A deep neural network could
replace the proposed MLP. Keeping in mind our prioritization
of performance of the web application, further studies into
computational complexities of using deep architectures in such
an application should be explored. Work can also be done to
replace the existing credential-based security protocols with
the biometric based ones, as opposed to using these intelligent
security protocols as an accompaniment to static rules.
ACKNOWLEDGMENT
We thank our managers; Mukund and Swami for their
unwavering support. We also extend a hearty thanks to all the
interns at Dell, Hyderabad who took part in the process of data
collection. Without the data, there could have been no
machine learning and so your contribution does not go
unnoticed.
We dedicate this project to the Python community for all the
extraordinary work they do in creating new useful libraries for
developers, while maintaining requisite documentation and
user support on existing libraries. The work of this study, like
the work of countless others, would not have been possible
without their unwavering dedication to the Pythonic way.
REFERENCES
[1] Nan Zheng, Aaron Paloski, and Haining Wang, "An efficient user
verification system via mouse movements”, Proceedings of the 18th
ACM conference on Computer and communications security, pp. 139-
150, 2011
[2] Swati Gurav, Rutuja Gadekar, Snehal Mhangore, “Combining
keystroke and mouse dynamics for user authentication”, International
Journal of Emerging Trends & Technology in Computer Science
(IJETTCS) ,Volume 6, Issue 2, March - April 2017,pp. 055-058 , ISSN
22786856
[3] Rohan V. Ponkshe, Prof. Vikrant Chole, “Keystroke and mouse
dynamics: a review on behavioral biometrics”, International Journal of
Computer Science and Mobile Computing, pp. 41345, 2015
[4] Jain-Hhing Wu, Chih-Ta Lin, Yuh-Jye Lee and Song-Kong Chong,
“Keystroke and mouse movement profiling for data loss prevention”
[5] Issa Traore, Member IEEE, Isaac Woungang, Member IEEE,
Mohammad S. Obaidat, Fellow of IEEE, Youssef Nakkabi, and Iris Lai,
“Combining mouse and keystroke dynamics biometrics for risk based
authentication in web environments”
[6] S. Cho, C.Han, D. Han, H. Kim, “Web based keystroke dynamics
identity verification using neural network”, Journal of organizational
computing and electronic commerce, 10 (4)(2000) 295-307
[7] Balabit mouse dynamics challenge data set, Fülöp, Á., Kovács, L.,
Kurics, T., Windhager-Pokol, E. (2016)
[8] Kevin S. Killourhy and Roy A. Maxion, “Comparing anomaly detectors
for keystroke dynamics”, Proceedings of the 39th Annual International
Conference on Dependable Systems and Networks (DSN-2009), pages
125-134, Estoril, Lisbon, Portugal, June 29-July 2, 2009.
[9] Yilin Li, Baochang Zhang Cao, Sanqiang Zhao, Yongsheng Gao,
Jianzhaung Liu, “Study on the BeiHang keystroke dynamics database”,
International Joint Conference on Biometrics (IJCB), pp.1-
5, 2011
[10] F. Monrose, A. Rubin, “Authentication via keystroke dynamics”,
ACM Conference on Computer and Communications Security
(1997)48-56.
[11] Shivani Hashiaa, Chris Pollettb, Mark Stamp, “On using mouse
movements as a biometric”, International Conference on User Science
and Engineering (i- USEr), pp. 206 211, Dec 2011
[12] Zach Jorgensen and Ting Yu, “On mouse dynamics as a behavioral
biometric for authentication”, Systems Journal, IEEE (Volume: 8,
Issue: 2), pp. 262 284, June 2013.
[13] Hugo Gamboa, Ana Fred, “A behavioral biometric system based on
human-computer interaction”, Proc. SPIE 5404, Biometric Technology
for Human Identification, 381, August 25, 2004
[14] P. S. Teh, A. B. J. Teoh, T. S. Ong and C. Tee, "Keystroke dynamics in
password authentication enhancement," Expert Systems with
Applications 37 (2010) 86188627, Elsevier, 2010.
[15] Shing-hon Lau, Roy Maxion, Clusters and Markers for Keystroke
Typing Rhythms”, Learning from Authoritative Security Experiment
Result, LASER 2014.
... So on the one hand, one would need many sessions in order to train a robust model and on the other hand, the model should be good enough to decide whether the session is legitimate or not for a very brief window of user interaction. The best published results in behavioral biometrics for logins achieve good accuracy (over 95% [21], [4]) but require at least 50 sessions for training, which is not acceptable in many practical scenarios. ...
... In a preliminary study, Khan et al. [4] investigated biometrics of website logins. The training data-set consisted of one user with 65 legitimate dummy website logins and 37 attacks. ...
Chapter
The rise in popularity of web and mobile applications brings about a need of robust authentication systems. Behavioral Biometrics Authentication has emerged as a complementary risk-based authentication approach which aims at profiling users based on their interaction with computers/smartphones. In this work we propose a novel approach based on Siamese Neural Networks to perform a few-shot verification of user’s behavior. We develop our approach to authenticate either human-computer or human-smartphone interaction. For computer interaction, our approach learns from mouse and keyboard dynamics, while for smartphone interaction it learns from holding patterns and touch patterns. The proposed approach requires only one model to authenticate all the users of a system, as opposed to the one model per user paradigm. This is a key aspect with respect to the scalability of our approach. The proposed model exhibits a few-shot classification accuracy of up to 99.8% and 90.8% for mobile and web interactions, respectively. We also test our approach on a database that contains over 100K interactions collected in the wild.KeywordsRisk-based authenticationBehavioral biometricsDeep learningSiamese networks
Article
Full-text available
In this paper we describe a new behavioural biometric technique based on human computer interaction. We developed a system that captures the user interaction via a pointing device, and uses this behavioural information to verify the identity of an individual. Using statistical pattern recognition techniques, we developed a sequential classifier that processes user interaction, according to which the user identity is considered genuine if a predefined accuracy level is achieved, and the user is classified as an impostor otherwise. Two statistical models for the features were tested, namely Parzen density estimation and a unimodal distribution. The system was tested with different numbers of users in order to evaluate the scalability of the proposal. Experimental results show that the normal user interaction with the computer via a pointing device entails behavioural information with discriminating power, that can be explored for identity authentication.
Article
Full-text available
This paper considers the effectiveness of using mouse movements as a biometric. Two authentication schemes are proposed, one for initial login of users and another for passively monitoring a computer for suspicious usage patterns. Error rates for both schemes were calculated and compared to prior work.
Conference Paper
Full-text available
Biometric authentication verifies a user based on its inherent, unique characteristics --- who you are. In addition to physiological biometrics, behavioral biometrics has proven very useful in authenticating a user. Mouse dynamics, with their unique patterns of mouse movements, is one such behavioral biometric. In this paper, we present a user verification system using mouse dynamics, which is both accurate and efficient enough for future usage. The key feature of our system lies in using much more fine-grained (point-by-point) angle-based metrics of mouse movements for user verification. These new metrics are relatively unique from person to person and independent of the computing platform. Moreover, we utilize support vector machines (SVMs) for accurate and fast classification. Our technique is robust across different operating platforms, and no specialized hardware is required. The efficacy of our approach is validated through a series of experiments. Our experimental results show that the proposed system can verify a user in an accurate and timely manner, and induced system overhead is minor.
Conference Paper
Full-text available
The idea of using one's behavior with a pointing device, such as a mouse or a touchpad, as a behavioral biometric for authentication purposes has gained increasing attention over the past decade. A number of interesting approaches based on the idea have emerged in the literature and promising experimental results have been reported; however, we argue that limitations in the past experimental evaluations of these approaches raise questions about their true effectiveness in a practical setting. In this paper, we review existing authentication approaches based on mouse dynamics and shed light on some important limitations regarding how the effectiveness of these approaches has been evaluated in the past. We present the results of several experiments that we conducted to illustrate our observations and suggest guidelines for evaluating future authentication approaches based on mouse dynamics. We also discuss a number of avenues for additional research that we believe are necessary to advance the state of the art in this area.
Article
Data leakage is a serious problem for many large organizations. In order to provide the user with information about confidential data, many prevalent data leakage prevention (DLP) solutions rely on scanning the content of the relevant files. This approach requires the capability to parse various file formats. However, risks of data breach persist for unsupported file formats. To address this issue, we propose in this paper an active behavior-based DLP model that hooks the keyboard and mouse application programming interfaces (APIs) to track and profile user behavior. This model has two major advantages: (1) it can help discover sensitive data without parsing file formats, and (2) a data creator can be identified according to his/her keystroke and mouse movement behavior. Since this model is based on profiling user behavior, it eliminates the risk of data leakage from unsupported file formats and can identify the creator of a file. The experiments showcase the effectiveness of the proposed model with data creator identification method yields an accuracy rate of 92.64%, which is promising considering that the features of keystroke and mouse movement behavior are dealing together.
Conference Paper
Existing risk-based authentication systems rely on basic web communication information such as the source IP address or the velocity of transactions performed by a specific account, or originating from a certain IP address. Such information can easily be spoofed, and as such, put in question the robustness and reliability of the proposed systems. In this paper, we propose a new online risk-based authentication system that provides more robust user identity information by combining mouse dynamics and keystroke dynamics biometrics in a multimodal framework. Experimental evaluation of our proposed model with 24 participants yields an Equal Error Rate of 8.21%, which is promising considering that we are dealing with free text and free mouse movements, and the fact that many web sessions tend to be very short.
Article
Password typing is the most widely used identity verification method in World Wide Web based Electronic Commerce. Due to its simplicity, however, it is vulnerable to imposter attacks. Keystroke dynamics and password checking can be combined to result in a more secure verification system. We propose an autoassociator neural network that is trained with the timing vectors of the owner's keystroke dynamics and then used to discriminate between the owner and an imposter. An imposter typing the correct password can be detected with very high accuracy using the proposed approach. This approach can be effectively implemented by a Java applet and used in the World Wide Web.
Conference Paper
Keystroke dynamics-the analysis of typing rhythms to discriminate among users-has been proposed for detecting impostors (i.e., both insiders and external attackers). Since many anomaly-detection algorithms have been proposed for this task, it is natural to ask which are the top performers (e.g., to identify promising research directions). Unfortunately, we cannot conduct a sound comparison of detectors using the results in the literature because evaluation conditions are inconsistent across studies. Our objective is to collect a keystroke-dynamics data set, to develop a repeatable evaluation procedure, and to measure the performance of a range of detectors so that the results can be compared soundly. We collected data from 51 subjects typing 400 passwords each, and we implemented and evaluated 14 detectors from the keystroke-dynamics and pattern-recognition literature. The three top-performing detectors achieve equal-error rates between 9.6% and 10.2%. The results-along with the shared data and evaluation methodology-constitute a benchmark for comparing detectors and measuring progress.
Article
This paper describes a novel technique to strengthen password authentication system by incorporating multiple keystroke dynamic information under a fusion framework. We capitalize four types of latency as keystroke feature and two methods to calculate the similarity scores between the two given latency. A two layer fusion approach is proposed to enhance the overall performance of the system to achieve near 1.401% Equal Error Rate (EER). We also introduce two additional modules to increase the flexibility of the proposed system. These modules aim to accommodate exceptional cases, for instance, when a legitimate user is unable to provide his or her normal typing pattern due to reasons such as hand injury.
Article
In an effort to confront the challenges brought forward by the networking revolution of the past few years, we present improved techniques for authorized access to computer system resources and data. More than ever before, the Internet is changing computing as we know it. The possibilities of this global network seem limitless; unfortunately, with this global access comes increased chances of malicious attack and intrusion. Alternatives to traditional access control measures are in high demand. In what follows we present one such alternative : computer access via keystroke dynamics. A database of 42 profiles was constructed based on keystrokes patterns gathered from various users performing structured and unstructured tasks. We study the performance of a system for recognition of these users, and present a toolkit for analyzing system performance under varying criteria. Keywords: Biometrics, keystroke dynamics, pattern recognition, computer security. 1 Introduction Todays' society ...