Conference PaperPDF Available

Abstract and Figures

A machine learning-based methodology is proposed and implemented for conducting evaluations of cyberdeceptive defenses with minimal human involvement. This avoids impediments associated with deceptive research on humans, maximizing the efficacy of automated evaluation before human subjects research must be undertaken. Leveraging recent advances in deep learning, the approach synthesizes realistic, interactive, and adaptive traffic for consumption by target web services. A case study applies the approach to evaluate an intrusion detection system equipped with application-layer embedded deceptive responses to attacks. Results demonstrate that synthesizing adaptive web traffic laced with evasive attacks powered by ensemble learning, online adaptive metric learning, and novel class detection to simulate skillful adversaries constitutes a challenging and aggressive test of cyberdeceptive defenses.
Content may be subject to copyright.
G. Ayoade, F. Araujo, K. Al-Naami, A.M. Mustafa, Y. Gao, K.W. Hamlen, and L. Khan. Automating
Cyberdeception Evaluation with Deep Learning. In Proc. 53rd Hawaii Int. Conf. System Sciences (HICSS).
January 2020.
Automating Cyberdeception Evaluation with Deep Learning
Gbadebo AyoadeFrederico AraujoKhaled Al-Naami
Ahmad M. MustafaYang GaoKevin W. HamlenLatifur Khan
The University of Texas at Dallas
{gbadebo.ayoade, khaled.al-naami, ahmad.mustafa, yang.gao, hamlen, lkhan}@utdallas.edu
IBM Research
frederico.araujo@ibm.com
Abstract
A machine learning-based methodology is proposed
and implemented for conducting evaluations of cyberde-
ceptive defenses with minimal human involvement. This
avoids impediments associated with deceptive research on
humans, maximizing the efficacy of automated evaluation
before human subjects research must be undertaken.
Leveraging recent advances in deep learning, the ap-
proach synthesizes realistic, interactive, and adaptive
traffic for consumption by target web services. A case
study applies the approach to evaluate an intrusion detec-
tion system equipped with application-layer embedded
deceptive responses to attacks. Results demonstrate that
synthesizing adaptive web traffic laced with evasive attacks
powered by ensemble learning, online adaptive metric
learning, and novel class detection to simulate skillful
adversaries constitutes a challenging and aggressive test
of cyberdeceptive defenses.
1. Introduction
Cyberdeceptive defenses are increasingly vital for
protecting organizational and national critical infrastruc-
tures from asymmetric cyber threats. Market forecasts
predict an over
$
2 billion industry for cyberdeceptive
products by 2022 [1], including major product releases by
Rapid7, TrapX, LogRhythm, Attivo, Illusive Networks,
Cymmetria, and many others in recent years [2].
These new defense layers are rising in importance
because they enhance conventional defenses by shifting
asymmetries that traditionally burden defenders back
on attackers. For example, while conventional defenses
invite adversaries to find just one critical vulnerability to
successfully penetrate the network, deceptive defenses
challenge adversaries to discern which vulnerabilities
among a sea of apparent vulnerabilities (many of them
traps) are real. As attacker-defender asymmetries increase
with the increasing complexity of networks and software,
deceptive strategies for leveling those asymmetries will
become increasingly essential for scalable defense.
Robust evaluation methodologies are a critical step in
the development of effective cyberdeceptions; however,
cyberdeception evaluation is frequently impeded by the
difficulty of conducting experiments with appropriate
human subjects. Capturing the diversity, ingenuity, and
resourcefulness of real APTs tends to require enormous
sample sizes of rare humans having exceptional skills
and expertise. Human deception research raises many
ethical dilemmas that can lead to long, difficult approval
processes [3]. Even when these obstacles are surmounted,
such studies are extremely difficult to replicate (and there-
fore to confirm), and results are often difficult to interpret
given the relatively unconstrained, variable environments
that are the contexts of real-world attacks.
Progress in cyberdeceptive defense hence demands ef-
ficient methods of conducting preliminary yet meaningful
evaluations without humans in the loop. Human subject
evaluation can then be reserved as a final, high-effort
validation of the most promising, mature solutions.
Toward this goal, this paper proposes and critiques a
machine learning-based approach for evaluating cyberde-
ceptive software defenses without human subjects. Al-
though it is extremely difficult to emulate human decision-
making automatically for synthesizing attacks, our ap-
proach capitalizes on the observation that in practice cyber
attackers rely heavily upon mechanized tools for offense.
For example, human bot masters rely primarily upon re-
ports delivered by automated bots to assess attack status
and reconnoiter targets, and they submit relatively simple
commands to the botnet to carry out complex killchains
that are largely mechanized as malicious software. In
such scenarios, deceiving the mechanized agents goes a
long way toward deceiving their human masters. Automat-
ing the machine-versus-machine part of the deception
evaluation is therefore both feasible and useful.
We therefore propose an evaluation methodology that
leverages machine learning to (1) generate realistic streams
User
Attacker
User
Attacker
monitoring stream
embedded
deception
intrusion
detector audit stream
attack traces
User
Attacker
monitoring stream
embedded
deception
intrusion
detector audit stream
attack traces
Figure 1: Deceptive IDS training overview
of synthetic traffic comprised of benign interactions and
attacks based on real threat data and vulnerabilities, and
(2) automatically adapt the synthetic traffic in an effort to
evade observed (possibly deceptive) responses to the at-
tacks. The goal is to obtain the maximum evaluative power
of adaptive deceptive defenses without explicit human
adversarial engagement.
1
As a case study, we apply our
technique to evaluate a network-level intrusion detection
system (IDS) equipped with embedded honeypots at the
application layer. Our contributions include:
We present the design of a framework for replay and
generation of web traffic that statistically mutates
and injects scripted attacks into the output streams
to more effectively train, test, and evaluate deceptive,
concept-learning IDSes.
We evaluate our approach on large-scale network and
system events gathered via simulation over a test bed
built atop production web software, including the
Apache web server, OpenSSL, and PHP.
We propose an adaptive deception detector to cope
with adaptive defenses to detect outliers in the presence
of concept-evolving data streams.
Section 2 first characterizes deceptive defenses that
are suitable evaluation subjects of our approach. Section 3
details our technical approach, followed by our evaluation
case study and findings in Section 4. Related work is
highlighted in Section 5 and Section 6 concludes.
2. Background
2.1. Deception-enhanced Intrusion Detection
Our evaluation approach targets IDS defenses en-
hanced with deceptive attack-responses (e.g., [4,5, 6]).
Figure 1 depicts the general approach. Unlike conventional
intrusion detection, deception-enhanced IDSes incremen-
tally build a model of legitimate and malicious behavior
based on audit streams and attack traces collected from
successful deceptions. The deceptions leverage user inter-
actions at the network, endpoint, or application layers to
solicit extra communication with adversaries and waste
1
The implementation and datasets used in this paper are available in
https://github.com/cyberdeception/deepdig.
their resources, misdirect them, or gather intelligence.
This augments the classifier with security-relevant feature
extraction capabilities not available to typical network
intrusion detectors.
For example, honey-patches [4,7, 8] introduce appli-
cation layer deceptions by selectively replacing software
security patches with decoy vulnerabilities. Attempted
exploits transparently redirect the attacker’s session to a
decoy environment where the exploit is allowed to succeed.
This allows the system to observe subsequent phases of
the attacker’s killchain without risk to genuine assets.
2.2. Challenges in IDS Evaluation
One of the major challenges for evaluation of deceptive
IDSes is the general inadequacy of static attack datasets,
which cannot react to deceptive interactions. Testing de-
ceptive defenses with these datasets renders the deceptions
useless, missing their value against reactive threats.
To mitigate this problem, a method of dynamic attack
synthesis is required. A suitable solution must learn a
model of how adversarial agents are likely to react based
on their reactions to similar feedback during real-world
interactions mined from real attack data. The accuracy of
such predictions depends upon the complexity of deceptive
responses and the decision logic of the adversaries. For
example, when defensive responses are binary (viz. accept
or reject) or a finite list of error messages, accurate predic-
tion is more feasible than when the output space is large
(e.g., arbitrary textual messages). Likewise, automated
agents tend to have high predictability (e.g., learnable by
emulating their software logic on desired inputs), whereas
human agents are far more difficult to predict.
3. Technical Approach
We aim to quantitatively assess the resiliency of adap-
tive, deceptive, concept-learning defenses for web services
against adaptive adversaries. Our approach therefore dif-
fers from works that measure only absolute IDS accuracy.
We first present our approach for generating web traf-
fic to replay normal and malicious user behavior, which
we harness to automatically generate training and test
datasets for attack classification (§3.1). We then discuss
the testing harness and analysis used to investigate the
effects of different attack classes and varying numbers of
attack instances on the predictive power and accuracy of
intrusion detection (§3.2).
3.1. Traffic Analysis
Our evaluation methodology seeks to create realistic,
end-to-end workloads and attack killchains to function-
ally test cyberdeceptive defenses embedded in commodity
server applications and process decoy telemetry for feature
embedded decepon
attack generato r
attack automation
normal traffic gen erator
data sources activities
Selenium client
exploits
normal
workload
attack
workload
BBC NewsBBC News
PIIPII
Electronic
Records
Electronic
Records
traffic replay
audit
pcap
audit stream
scap
pcap
attack labeling
attack traces
scap
pcap
normal
workload
decoy monitoring
aack traces
decoy hit abort
(a) traffic analysis
attack det ection
attack mode ling
feature extraction
classifiermodel update
monitoring stream
(unknown/test)
monitoring
data
alerts
audit stream
labeled attack traces
audit data
attack data
(b) data analysis
Figure 2: Overview of (a) automated workload generation for cyberdeception evaluation, and (b) deceptive IDS training and testing.
extraction and IDS model evolution. Figure 2a shows an
overview of our traffic generation framework. It streams
encrypted legitimate and malicious workloads onto end-
points enhanced with embedded deceptions, resulting
in labeled audit streams and attack traces (collected at
decoys) for training set generation.
Workload Generation.
Rather than evaluating decep-
tion-enhanced IDSes with existing, publicly available
intrusion datasets (which are inadequate for the reasons
outlined in
§
2.2), our evaluation interleaves attack and nor-
mal traffic following prior work on defense-in-depth [8,9],
and injects benign payloads as data into attack packets
to mimic evasive attack behavior. The generated traffic
contains attack payloads against realistic exploits (e.g.,
weaponizing recent CVEs for reconnaissance and initial
infection), and our framework automatically extracts la-
beled features from the monitoring network and system
traces to (re-)train the classifiers.
Legitimate workload. The framework uses both real
user sessions and automated simulation of various user
actions to compose legitimate traffic. Real interactions
comprise web traffic that is monitored and recorded as
audit pcap data in the targeted operational environment
(e.g., regular users in a local area network). The recorded
sessions are replayed by our framework and streamed as
normal workload onto endpoints embedding deceptions.
These regular data streams are enriched with simulated
interactions, which are created by automating complex
user actions on typical web application, leveraging Sele-
nium [10] to automate user interaction with web applica-
tions (e.g., clicking buttons, filling out forms, navigating a
web page). To create realistic workloads, our framework
feeds from online data sources, such as the BBC text
corpus [11], online text generators [12] for personally
identifiable information (e.g., usernames, passwords), and
product names to populate web forms. To ensure diversity,
we statistically sample the data sources to obtain user
input values and dynamically generate web content. For
example, blog title and body are statistically sampled from
the BBC text corpus, while product names are picked from
the product names data source.
Our implementation defines different customizable
user activities that can be repeated with varying data feeds
and scheduled to simulate different workload profiles
and temporal patterns. These include web page brows-
ing, e-commerce website navigation, blog posting, and
interacting with a social media web application. The
setup includes common web software stacks, such as CGI
web applications and PHP-based Wordpress applications
hosted on a monitored Apache web server.
Attack workload. Attack traffic is generated based on
real vulnerabilities. The procedure harnesses a collection
of scripted attacks (crafted using Bash, Python, Perl, or
Metasploit scripts) to inject malicious client traffic against
endpoints in the tested environment. Attacks can be easily
extended and tailored to specific test scenarios during
evaluation design, without modifications to the frame-
work, which automates and schedules attacks according
to parametric statistical models defined by the targeted
evaluation (e.g., prior probability of an attack, attack rates,
reconnaissance pattern).
In the case study reported in
§
4, multiple exploits for
recent CVEs were scripted to carry out different mali-
cious activities (i.e., different attack payloads), such as
leaking password files and invoking shells on the remote
web server. These vulnerabilities are important as attack
vectors because they range from sensitive data exfiltration
to complete control and remote code execution. The post-
infection payloads execute tasks such as tool acquisition,
basic environment reconnaissance (e.g., active scanning
with Nmap, passive inspection of system logs), password
file access, root certificate exfiltration, and attempts at
gaining access to other machines in the network.
Monitoring & Threat Data Collection.
Our frame-
work tracks two lifecycle events associated with moni-
tored decoys: upon a decoy hit, the framework records
the timestamp that denotes the beginning of an attack
session (i.e., when a security condition is met). After the
corresponding abort event arrives (i.e., session disconnec-
tion), the monitoring component extracts the session trace
(delimited by the two events), labels it, and stores the trace
outside the decoy for subsequent feature extraction. Since
the embedded deceptions should only host attack sessions,
precisely collecting and labeling their traces (at both the
network and OS level) is effortless using this strategy.
Our approach distinguishes between three separate
input data streams: (1) the audit stream, collected at the
target honey-patched server, (2) attack traces, collected
at decoys, and (3) the monitoring stream, the actual test
stream collected from regular servers. Each of these
streams contains network packets and OS events captured
at each server environment. To minimize performance
impact, we use two powerful and efficient software moni-
tors: sysdig (to track system calls and modifications made
to the file system), and tcpdump (to monitor ingress and
egress of network packets). Specifically, monitored data
is stored outside decoy environments to avoid possible
tampering with collected data.
Our monitoring and data collection solution is designed
to scale for large, distributed on-premise and cloud deploy-
ments. The host-level telemetry leverages a mainstream
kernel module [13] that implements non-blocking event
collection and memory-mapped event buffer handling
for minimal computational overhead. This architecture
allows system events to be safely collected (without sys-
tem call interposition) and compressed by a containerized
user space agent that is oblivious to other objects and
resources in the host environment. The event data streams
originated from the monitored hosts are exported to a high-
performance, distributed S3-compatible object storage
server [14], designed for large-scale data infrastructures.
3.2. Data Analysis
Using the continuous audit stream and incoming at-
tack traces as labeled input data, our approach enables
concept-learning IDSes to incrementally build supervised
models that are able to capture legitimate and malicious
behavior. As illustrated in Figure 2b, the raw training set
(composed of both audit stream and attack traces) is piped
into a feature extraction component that selects relevant,
non-redundant features and outputs feature vectors—audit
data and attack data—that are grouped and queued for
subsequent model update. Since the initial data streams
are labeled and have been preprocessed, feature extraction
becomes very efficient and can be performed automati-
cally. This process repeats periodically according to an
administrator-specified policy.
Network Packet Analysis.
Each packet transmitted and
received forms the basic unit of information flow for our
packet-level analysis. Bidirectional (Bi-Di) features are
extracted from the patterns observed on this network data.
Due to encrypted network traffic opacity, features are
extracted from TCP packet headers. Packet data length
and transmission time are extracted from network sessions.
We extract histograms of packet lengths, time intervals,
and directions. To reduce the dimension of the generated
features, we apply bucketization to group TCP packets
into correlation sets based on frequency of occurrence.
Uni-burst features include burst size, time, and count
of groups of packets transmitted consecutively in one
TCP window. Bi-burst features include time and size
attributes of consecutive groups of packets transmitted in
two consecutive TCP windows.
System Call Analysis.
In order to capture events from
within the host, we extract features from system-level OS
events. Event types include open,read,select, etc., with
the corresponding process name. Leveraging N-Gram
feature extraction, we build a histogram of the N-Gram
occurrences. N-Gram is a contiguous sequences of system
call events. We consider four types of such N-Gram: uni-
events,bi-events,tri-events, and quad-events are sequences
of 1–4 consecutive system call events (respectively).
3.3. Classification
Ensemble SVM.
After feature extraction, we leverage
SVM to classify both Bi-DI and N-Gram features. SVM
uses a convex optimization approach by mapping non-
linearly separated data to a higher dimensional linearly
distinguishing space. With the new linearly separable
space, SVM can separate positive (attack) and negative
(benign) training instances by a hyperplane with the maxi-
mum gap possible. Prediction is assigned based on which
side of the hyperplane an instance resides.
The models built from Bi-Di and N-Gram are com-
bined into an ensemble to obtain a better predictive model.
Rather than concatenating the features from both Bi-Di
and N-Gram, which has the drawback of introducing
normalization issues, the ensemble combines multiple
classifiers to obtain a better outcome by majority voting.
In our case, for each classification output by the classifier
models, we obtain the predicted label and the confidence
probability of each of the individual classifiers. The out-
come of the classifier with the maximum confidence is
picked for the predicted instance.
Confidence is rated using Platt scaling [15], which
uses the following sigmoid-like function to compute the
classification confidence:
P(y= 1|x) = 1
1 + exp (Af(x) + B)(1)
where
y
is the label,
x
is the testing vector,
f(x)
is
the SVM output, and
A
and
B
are scalar parameters
learned using Maximum Likelihood Estimation (MLE).
This yields a probability measure of how much a classifier
is confident about assigning a label to a testing point.
Online Adaptive Metric Learning.
OAML [16] is a
recently advanced deep learning approach that improves
instance separation by transforming input features to a
new latent space. This generates a new latent feature
space where similar instances are closer together and
dissimilar instances are separated farther. It extends online
similarity metric learning (OML) [17,18,19, 20,21], which
employs pairwise and triplet constraints: A pairwise
constraint takes two dissimilar/similar instances, while a
triplet constraint
(A, B, C )
combines similar instances
A
and Bwith a dissimilar instance C.
We choose OAML since non-adaptive OML usually
learns a pre-selected linear metric (e.g., Mahalanobis
distance [22]) that lacks the complexity to learn non-linear
semantic similarities among class instances, which are
prevalent in intrusion detection scenarios. In addition,
using a non-adaptive method results in a fixed metric
which suffers from bias to a specific dataset. OAML
overcomes these disadvantages by adapting its metric
learning model to accommodate more constraints in the
observed data. Its metric function learns a dynamic latent
space from the Bi-Di and N-Gram feature spaces, which
can include both linear and highly non-linear functions.
OAML leverages artificial neural networks (ANNs)
which consist of a set of hidden layers where the output is
fed as input to an independent metric-embedding layer
(MEL). The MELs output an
n
-dimensional vector in
an embedded space that clusters similar instances. The
importance of model generated by each MEL layer is
determined by a metric weight assigned to each MEL.
The output of this embedding is used as input to a
k
-NN
classifier, as detailed below.
Problem Setting. Let
S={(xt,x+
t,x
t)}T
t=1
be a
sequence of triplet constraints sampled from the data,
where
{xt,x+
t,x
t}∈Rd
, and
xt
(anchor) is simi-
lar to
x+
t
(positive) but dissimilar to
x
t
(negative).
The goal of OAML is to learn a model
F:Rd7→ Rd0
such that
||F(xt)F(x+
t)||2 ||F(xt)F(x
t)||2
.
Given these parameters, the objective is to learn a metric
model with adaptive complexity while satisfying the con-
straints. The complexity of
F
must be adaptive so that its
hypothesis space is automatically modified.
Overview. Consider a neural network with
L
hidden
layers, where the input layer and the hidden layer are
connected to an independent MEL. Each embedding layer
learns a latent space where similar instances are clustered
and dissimilar instances are separated.
Figure 3 illustrates our ANN. Let
E`∈ {E0, . . . , EL}
denote the
`th
metric model in OAML (i.e., the network
E0
E1
L1
L(0)
L0
L0E2
L(1)
L(2)
Constraint Stream Adaptive Metric Network
Hedge
Hedge
Hedge
Total Loss
Loss
𝞪0
𝞪1
𝞪2
Figure 3: OAML network structure. Each layer
Li
is a linear
transformation output to a rectified linear unit (ReLU) activation.
Embedding layers
Ei
connect input or hidden layers. Linear
model
E0
maps the input feature space to the embedding space.
branch from the input layer to the
`th
MEL). The simplest
OAML model
E0
represents a linear transformation from
the input feature space to the metric embedding space.
A weight
α(`)[0,1]
is assigned to
E`
, measuring its
importance in OAML.
For a triplet constraint
(xt,x+
t,x
t)
that arrives at
time
t
, its metric embedding
f(`)(x
t)
generated by
E`
is
f(`)(x
t) = h(`)Θ(`)(2)
where
h(`)=σ(W(`)h(`1)),
with
`1
,
`N
, and
h(0) =x
t
. Here
x
t
denotes any anchor
xt
(positive
x+
t
or negative
x
t
instance), and
h(`)
is the activation of the
`th
hidden layer. Learned metric embedding
f(`)(x
t)
is
limited to a unit sphere (i.e.,
||f(`)(x
t)||2= 1
) to reduce
the search space and accelerate training.
During the training phase, for every arriving triplet
(xt,x+
t,x
t)
, we first retrieve the metric embedding
f(`)(x
t)
from the
`th
metric model using Eq. 2. A local
loss
L(`)
for
E`
is evaluated by calculating the similarity
and dissimilarity errors based on
f(`)(x
t)
. Thus, the
overall loss introduced by this triplet is given by
Loverall(xt,x+
t,x
t) =
L
X
`=0
α(`)L(`)(xt,x+
t,x
t)(3)
Parameters
Θ(`)
,
α(`)
, and
W(`)
are learned during
the online learning phase. The final optimization problem
to solve in OAML at time tis therefore:
minimize
Θ(`),W (`)(`)Loverall
subject to ||f(`)(x
t)||2= 1,`= 0, . . . , L. (4)
We evaluate the similarity and dissimilarity errors using
an adaptive-bound triplet loss (ABTL) constraint [16] to
estimate L(`)and update Θ(`),W(`)and α(`).
Novel Class Detection.
Novel classes may appear at any
time in real-world monitoring streams (e.g., new attacks
and new deceptions). To cope with such concept-evolving
data streams, we include a deception-enhanced novel class
detector that extends traditional classifiers with automatic
detection of novel classes before the true labels of the
novel class instances arrive.
Data stream classification. Novel class detection observes
that data points belonging to a common class are closer to
each other (cohesion), yet far from data points belonging to
other classes (separation). Building upon ECSMiner [23,
24], our approach segments data streams into equal, fixed-
sized chunks, each containing a set of monitoring traces,
efficiently buffering chunks for online processing. When
a buffer is examined for novel classes, the classification
algorithm looks for strong cohesion among outliers in the
buffer and large separation between outliers and training
data. When strong cohesion and separation are found, the
classifier declares a novel class.
Training & model update. A new classifier is trained on
each chunk and added to a fixed-sized ensemble of
M
classifiers, leveraging audit and attack instances (traces).
After each iteration, the set of
M+ 1
classifiers are ranked
based on their prediction accuracies on the latest data
chunk, and only the first
M
classifiers remain in the en-
semble. The ensemble is continuously updated following
this strategy and thus modulates the most recent concept
in the incoming data stream, alleviating adaptability issues
associated with concept drift [23]. Unlabeled instances are
classified by majority vote of the ensemble’s classifiers.
Classification model. Each classifier in the ensemble
uses a
k
-NN classification, deriving its input features from
Bi-Di and N-Gram feature set models. Rather than storing
all data points of the training chunk in memory, which is
prohibitively inefficient, we optimize space utilization and
time performance by using a semi-supervised clustering
technique based on Expectation Maximization (E-M) [25].
This minimizes both intra-cluster dispersion and cluster
impurity, and caches a summary of each cluster (centroid
and frequencies of data points belonging to each class),
discarding the raw data points.
Feature transformation. To make the learned represen-
tations robust to partial corruption of the input patterns
and improve classification accuracy, abstract features are
generated from the original feature space during training
via a stacked denoising autoencoder (DAE) [26, 27] using
the instances of the first few chunks in the data stream.
Stacked DAE builds a deep neural network that aims to
capture the statistical dependencies between the inputs by
reconstructing a clean input from a corrupted version of
it, thus forcing the hidden layers to discover more robust
features (yielding better generalization) and prevent the
classifier from learning the identity (while preserving the
information about the input).
Figure 4 illustrates our approach. The first step creates
a corrupted version
ex
of input
xRd
using additive
Gaussian noise [28]. In other words, a random value
vk
is added to each feature in
x
:
exk=xk+vk
where
k= [1 . . . d]
and
vk N (0, σ2)
(cf., [29]). The output
Figure 4: Overview of feature transformation
of the training phase is a set of weights
W
and bias vectors
b
. We keep the learned weights and baises to transform the
feature values of the subsequent instances of the stream.
After transforming the features of stream instances, these
are fed back into our novel class detector for training.
One-class SVM Ensemble.
Our approach builds an
ensemble of one-class SVM classifiers. One-class SVM is
an unsupervised learning method that learns the decision
boundary of training instances and predicts whether an
instance is inside it. We train one classifier for each class.
For instance, if our training data consists of instances of
k
classes, our ensemble must contain
k
one-class SVM
classifiers, each trained with one of the
k
class’s instances.
During classification, once a new unlabeled instance
x
emerges, we classify it using all the one-class SVM
classifiers in the ensemble.
We build our ensemble using the first few chunks of
instances. During the classification of the stream, once
novel class’s instances emerge, we train a new one-class
SVM classifier with the new novel class instances. Then
we add the new classifier to the ensemble.
Attacker Evasion.
To properly challenge deceptive
defenses, it is essential to simulate adversaries who adapt
and obfuscate their behaviors in response to observed
responses to their attacks. Attackers employ various
evasion techniques to bypass protections, including packet
size padding, packet timing sequence morphing, and
modifying data distributions to resemble legitimate traffic.
In our study, we considered three encrypted traffic
evasion techniques published in the literature: Pad-to-
MTU [30], Direct Target Sampling [31], and Traffic Mor-
phing [31]. Pad-to-MTU (pMTU) adds extra bytes to each
packet length until it reaches the Maximum Transmission
Unit (1500 bytes in the TCP protocol). Direct Target
Sampling (DTS) is a distribution-based technique that
uses statistical random sampling from benign traffic fol-
lowed by attack packet length padding. Traffic Morphing
(TM) is similar to DTS but it uses a convex optimization
methodology to minimize the overhead of padding. Each
of these are represented using the traffic modeling ap-
proach detailed in
§
3.1 and analyzed using the machine
learning approaches detailed above.
4. Case Study
As a case study of our evaluation approach, we ap-
plied it to test DE EP DIG [8], an IDS platform protecting
deceptively honey-patched [4] web servers. DE EP DIG is
an anomaly-based IDS that improves its detection model
over time by feeding attack traces that trigger honey-patch
traps back into a classifier. This core feature makes it
an advanced, intelligent defense that cannot be properly
evaluated using static datasets.
4.1. Implementation
We implemented our evaluation framework atop 64-bit
Linux. The data generation component is implemented us-
ing Python and Selenium [10]. The monitoring controller
is 350 lines of node.js code, and leverages tcpdump [32],
editcap [33], and sysdig [13] for network and system
call tracing and preprocessing. The machine learning
modules are implemented in Python using 1200 lines of
scikit-learn [34] code for data preprocessing and feature
generation. The novel class detection component com-
prises of about 250 lines of code to reference the Theano
deep learning library [35] and ECSMiner [23]. Finally,
the OAML module was implemented with 500 lines of
PyTorch [36] deep learning development framework code.
4.2. Experimental Setup
The traffic generator was deployed on a separate host to
avoid interference with the test bed server. To account for
operational and environmental differences, our framework
simulated different workload profiles (according to time
of day), against various target configurations (including
different background processes and server workloads),
and network settings, such as TCP congestion controls.
In total, we generated 42 GB of (uncompressed) network
packets and system events over a period of three weeks.
After feature extraction, the training data comprised 1800
normal instances and 1600 attack instances. Monitoring or
testing data consisted of 3400 normal and attack instances
gathered at unpatched web servers, where the distribution
of normal and attack instances varies per experiment.
In the experiments, we measured the true positive rate
(tpr), where true positive represents the number of actual
attack instances that are classified as attacks; false positive
rate (fpr), where false positive represents the number of
actual benign instances classified as attacks; accuracy
Table 1: Base detection rate percentages for an approximate
targeted attack scenario (PA1%) [37]
Classifier tpr fpr acc F2bdr
1SVM Bi-Di 77.78 41.23 68.96 59.69 1.87
1SVM N-Gram 84.88 5.11 88.57 88.38 14.47
VNG++ 46.81 0.83 69.25 52.31 36.29
Panchenko 47.69 0.17 70.04 53.24 73.92
Bi-Di OML 91.00 0.01 91.14 90.00 98.92
N-Gram OML 65.00 0.01 88.58 80.00 98.50
Bi-Di SVM 79.00 0.78 89.88 78.69 50.57
N-Gram SVM 92.42 0.01 96.89 93.84 99.05
Ens-SVM 93.63 0.01 97.00 94.89 99.06
(acc); and
F2
score of the classifier, where the
F2
score
is interpreted as the weighted average of the precision
and recall, reaching its best value at 1 and worst at 0. We
also calculated a base detection rate (bdr) to estimate the
success of intrusion detection (§4.3).
Model Parameters.
In our experiments, SVM uses RBF
kernel with Cost
1.3×105
, and gamma is
1.9×106
.
OAML employs a ReLU network with
n= 200
,
L= 1
,
k= 5
, learning rate of
0.3
, lr decay of
1×104
, and
ADAM optimizer. One-class SVM uses RBF kernel and
Nu =
0.5
. Novel class detection uses the DAE denoising
autoencoder with
L= 2
, input feature size =
6000
, first
layer =
2
3
of input size, second layer =
1
3
of input size, and
additive Gaussian noise where σ= 1.1.
4.3. IDS Evaluation
Using the dataset shown in Table 2, we trained and
assessed the individual performances of the classifiers
presented in
§
3.3 and two other state-of-the-art supervised
approaches, VNG
++
[30] and Panchenko (P) [38], which
are widely used in the literature on encrypted traffic analy-
sis [39]. To obtain different baselines, 1SVM, VNG
++
,
and Panchenko were trained non-deceptively (i.e., trained
exclusively on normal data, as outlier detectors), while the
OML and SVM classifiers were trained atop DE EP DIG.
Table 1 summarizes our results, which confirm our
intuition that deceptively-trained IDSes are able to curtail
false positives and achieve better detection rates than non-
deceptive outlier detectors by 25.1–97.2%. Figure 5 also il-
lustrates the performances of the different IDS approaches
when trained incrementally with the first
1
16
attack
classes. Specifically, the results shown in Fig. 5(a)–(d) un-
derscore perennial challenges encountered in conventional
anomaly-based intrusion detection: reduced detection
accuracy and high incidence of false alarms. Conversely,
Ens-SVM is able to achieve high accuracy after being
trained with just a few attack classes (Fig. 5(e)–(f)).
Base Detection Analysis.
We measure the success of
detecting intrusions assuming a realistic scenario in which
attacks are only a small fraction of the interactions. Al-
Table 2: Summary of attack workload
# Attack Type Description Software
1 CVE-2014-0160 Information leak OpenSSL
2 CVE-2012-1823 System remote hijack PHP
3 CVE-2011-3368 Port scanning Apache
4–10 CVE-2014-6271 System hijack (7 variants) Bash
11 CVE-2014-6271 Remote Password file read Bash
12 CVE-2014-6271 Remote root directory read Bash
13 CVE-2014-0224 Session hijack and info leak OpenSSL
14 CVE-2010-0740 DoS via NULL pointer deref OpenSSL
15 CVE-2010-1452 DoS via request lacking path Apache
16 CVE-2016-7054 DoS via heap buffer overflow OpenSSL
17–22 CVE-2017-5941 System hijack (6 variants) Node.js
though risk-level attribution for cyber attacks is difficult
to quantify in general, we use the results of a recent
study [37] to approximate the probability of attack occur-
rence for targeted attacks against business and commercial
organizations. The study’s model assumes a determined
attacker leveraging one or more exploits of known vul-
nerabilities to penetrate a typical organization’s internal
network, and approximates the prior of a directed attack
as PA= 1% based on real-world threat statistics.
To estimate the success of the IDS, we use base detec-
tion rate (bdr) [40], expressed using the Bayes theorem:
P(A|D) = P(A)P(D|A)
P(A)P(D|A) + P(¬A)P(DA)] (5)
where
A
and
D
are random variables denoting targeted
attacks and their detection by the classifier, respectively.
Table 1 presents the accuracy values and bdr for each
classifier, assuming
P(A) = PA
. The numbers expose
a practical problem with the defense that is typical in
intrusion detection research: Despite having high accu-
racy values, the IDS is ineffective when confronted with
extremely low base detection rates. This is in part due
to its inability to eliminate false positives in operational
contexts where the attacks are such a tiny fraction of the
total traffic available for learning.
4.4. Resistance to Attack Evasion Techniques
Table 3 shows the results of the deceptive defense
against our evasive attack techniques compared with re-
sults when no evasion is attempted. In each experiment,
the classifier is trained and tested with 1800 normal in-
stances and 1600 morphed attack instances.
Our evaluation shows that the tpr drops slightly and
the fpr increases with the introduction of attacker evasion
techniques. This shows that the system could resist some
of the evasions but not all. However, we can conclude
that an increase in the frequency of classifier retraining
may be needed to accommodate the drop in performance.
This may be a challenge as shorter time interval results in
fewer data points to retrain the classifier to maintain their
detection performance.
Table 3: Detection performance in adversarial settings
Evasion technique tpr fpr acc F2
No evasion 93.63 0.01 97.00 99.06
pMTU 75.84 0.96 85.78 79.57
DTS 82.78 6.02 87.58 84.91
TM 79.29 6.17 85.52 81.91
Table 4: Novel attack class detection performance
Features Classifier tpr fpr
Bi-Di OneSVM 44.06 31.88
DAE & OneSVM 76.54 85.61
ECSMiner 74.91 26.66
DAE & ECSMiner 84.73 0.01
N-Gram OneSVM 54.25 45.13
DAE & OneSVM 80.09 71.49
ECSMiner 76.36 34.89
DAE & ECSMiner 89.67 2.95
4.5. Novel Class Detection Accuracy
To test the ability of our novel class classifier to detect
novel classes emerging in the monitoring stream, we split
the input stream into equal-sized chunks. A chunk of
100 instances is classified at a time where one or more
novel classes may appear along with existing classes. We
measured the tpr (total incremental number of actual novel
class instances classified as novel classes) and the fpr
(total number of existing class instances misclassified as
belonging to a novel class).
Table 4 shows the results for OneSVM and ECSMiner.
Here ECSMiner outperforms OneSVM in all measures.
For example, for Bi-Di features, ECSMiner observes an
fpr of 26.66% while OneSVM reports an fpr of 31.88%,
showing that the binary-class nature of ECSMiner is
capable of modeling the decision boundary better than
OneSVM. To achieve better accuracy, we augmented
ECSMiner with extracted deep abstract features using
our stacked denoising autoencoder approach (DAE &
ECSMiner). For DAE, we used two hidden layers (where
the number of units in the first hidden layer is
2/3
of the
original features, and the number of units in the second
hidden layer is
1/3
of the first hidden layer units). For the
additive Gaussian noise, which is used for data corruption,
we assigned
σ= 1.1
. As a result, fpr reduced to a
minimum (0.01%), showing a substantial improvement
over ECSMiner. Notice that using the abstract features
with OneSVM does not help as shown in the table.
While effective in detecting concept drifts, our novel
class detection technique requires a (semi-)manual la-
beling of novel class instances. In our future work, we
plan to investigate how to automatically assign labels
(e.g., deceptive vs. non-deceptive defense response) to
previously unseen classes.
0
20
40
60
80
100
10 20 30 40 50 60 70 80 90 100
%
number of instances
OneSVM-Bi-Di OneSVM-N-Gram
(a) tpr
0
20
40
60
80
100
0 2 4 6 8 10 12 14 16
%
number of attack classes
VNG++ P
(c) tpr
0
20
40
60
80
100
0 2 4 6 8 10 12 14 16
%
number of attack classes
ens-SVM
(e) tpr
0
10
20
30
40
50
60
10 20 30 40 50 60 70 80 90 100
%
number of instances
OneSVM-Bi-Di OneSVM-N-Gram
(b) fpr
0
5
0 2 4 6 8 10 12 14 16
%
number of attack classes
VNG++ P
(d) fpr
0
5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
%
number of attack classes
ens-SVM
(f) fpr
Figure 5: Baseline evaluations: (a)–(b) OneSVM Bi-Di and OneSVM N-Gram, and (c)–(d) VNG++ and Panchenko. Classification:
(e)–(f) Ens-SVM classification tpr for 0–16 attack classes.
5. Related Work
Deception-enhanced IDS.
Our evaluation methodology
is designed to assess adaptive, deception-enhanced IDS
systems protecting web services. Examples from the
literature include shadow honeypots [41,42], Argos [43],
Honeycomb [44], and DAW [45].
Synthetic Attack Generation.
Our approach was in-
spired by WindTunnel [9], which is a synthetic data gener-
ation framework for evaluating (non-deceptive) security
controls. WindTunnel acquires data from network, system
call, file access, and database queries and evaluates which
of the data sources provides better signal for detection
remote attacks. The DETER [46] testbed provides a frame-
work for designing repeatable experiments for evaluating
security of computer systems.
6. Conclusion
Effective evaluation of cyberdeceptive defenses is no-
toriously challenging. Our attempts to conduct such an
evaluation without resorting to human subjects experimen-
tation indicates that dynamic, synthetic attack generation
powered by deep learning is a promising approach. In
particular, a combination of ensemble learning leveraging
multiple classifier models, online adaptive metric learning,
and novel class detection suffices to model aggressively
adaptive adversaries who respond to deceptions in a va-
riety of ways. Our case study evaluating an advanced,
deceptive IDS shows that the resulting synthetic attacks
can expose both strengths and weaknesses in modern
embedded-deception defenses.
Acknowledgments
The research reported herein was supported in part by
ONR award N00014-17-1-2995; NSA award H98230-15-
1-0271; AFOSR award FA9550-14-1-0173; an endowment
from the Eugene McDermott family; NSF FAIN awards
DGE-1931800, OAC-1828467, and DGE-1723602; NSF
awards DMS-1737978 and MRI-1828467; an IBM fac-
ulty award (Research); and an HP grant. Any opinions,
recommendations, or conclusions expressed are those of
the authors and not necessarily of the aforementioned
supporters.
References
[1]
Mordor Intelligence, “Global cyber deception market,”
tech. rep., Mordor Intelligence, 2018.
[2]
G. Sadowski and R. Kau, “Improve your threat detec-
tion function with deception technologies,” Tech. Rep.
G00382578, Gartner, March 2019.
[3]
D. Baumrind, “IRBs and social science research: The costs
of deception,” IRB: Ethics & Human Research, vol. 1,
no. 6, pp. 1–4, 1979.
[4]
F. Araujo, K. W. Hamlen, S. Biedermann, and S. Katzen-
beisser, “From patches to honey-patches: Lightweight
attacker misdirection, deception, and disinformation,” in
Proc. ACM Conf. Computer and Communications Security,
pp. 942–953, 2014.
[5]
J. Avery and E. H. Spafford, “Ghost patches: Fake patches
for fake vulnerabilities,” in Proc. IFIP Int. Conf. ICT
Systems Security and Privacy Protection, pp. 399–412,
2017.
[6]
S. Crane, P. Larsen, S. Brunthaler, and M. Franz, “Booby
trapping software,” in Proc. New Security Paradigms Work.,
pp. 95–106, 2013.
[7]
F. Araujo, M. Shapouri, S. Pandey, and K. Hamlen, “Ex-
periences with honey-patching in active cyber security
education,” in Proc. Work. Cyber Security Experimentation
and Test, 2015.
[8]
F. Araujo, G. Ayoade, K. Al-Naami, Y. Gao, K. W. Hamlen,
and L. Khan, “Improving intrusion detectors by crook-
sourcing,” in Proc. Annual Computer Security Applications
Conf., December 2019.
[9]
N. Boggs, H. Zhao, S. Du, and S. J. Stolfo, “Synthetic
data generation and defense in depth measurement of
web applications,” in Proc. Int. Sym. Recent Advances in
Intrusion Detection, pp. 234–254, 2014.
[10]
Selenium, “Selenium browser automation.” http://www.
seleniumhq.org, 2016.
[11]
D. Greene and P. Cunningham, “Practical solutions to
the problem of diagonal dominance in kernel document
clustering,” in Proc. Int. Conf. Machine learning, pp. 377–
384, 2006.
[12]
Mockaroo, “Product data set.” www.mockaroo.com, 2018.
[13]
Sysdig, “Linux system exploration and troubleshooting
tool.” https://github.com/draios/sysdig, 2019.
[14] MinIO, “MinIO object storage.” https://min.io, 2019.
[15]
J. C. Platt, Probabilities for SV Machines, ch. 5, pp. 61–74.
Neural Information Processing, MIT Press, 2000.
[16]
Y. Gao, Y.-F. Li, S. Chandra, L. Khan, and B. Thuraising-
ham, “Towards self-adaptive metric learning on the fly,” in
Proc. Int. World Wide Web Conf., pp. 503–513, 2019.
[17]
W. Li, Y. Gao, L. Wang, L. Zhou, J. Huo, and Y. Shi,
“OPML: A one-pass closed-form solution for online metric
learning,” Pattern Recognition, vol. 75, pp. 302–314, 2018.
[18]
G. Chechik, V. Sharma, U. Shalit, and S. Bengio, “Large
scale online learning of image similarity through ranking,”
J. Machine Learning Research, vol. 11, pp. 1109–1135,
2010.
[19]
P. Jain, B. Kulis, I. S. Dhillon, and K. Grauman, “Online
metric learning and fast similarity search,” in Proc. Annual
Conf. Neural Information Processing Systems, pp. 761–768,
2008.
[20]
R. Jin, S. Wang, and Y. Zhou, “Regularized distance metric
learning: Theory and algorithm,” in Proc. Annual Conf.
Neural Information Processing Systems, pp. 862–870,
2009.
[21]
C. Breen, L. Khan, and A. Ponnusamy, “Image classi-
fication using neural networks and ontologies,” in Proc.
Int. Work. Database and Expert Systems Applications,
pp. 98–102, 2002.
[22]
S. Xiang, F. Nie, and C. Zhang, “Learning a mahalanobis
distance metric for data clustering and classification,” Pat-
tern Recognition, vol. 41, no. 12, pp. 3600–3612, 2008.
[23] M. M. Masud, T. M. Al-Khateeb, L. Khan, C. Aggarwal,
J. Gao, J. Han, and B. Thuraisingham, “Detecting recurring
and novel classes in concept-drifting data streams,” in Proc.
Int. IEEE Conf. Data Mining, pp. 1176–1181, 2011.
[24]
T. Al-Khateeb, M. M. Masud, K. M. Al-Naami, S. E.
Seker, A. M. Mustafa, L. Khan, Z. Trabelsi, C. Aggarwal,
and J. Han, “Recurring and novel class detection using
class-based ensemble for evolving data stream,IEEE
Trans. Knowledge and Data Engineering, vol. 28, no. 10,
pp. 2752–2764, 2016.
[25]
M. M. Masud, J. Gao, L. Khan, J. Han, and B. Thurais-
ingham, “A practical approach to classify evolving data
streams: Training with limited amount of labeled data,” in
Proc. Int. Conf. Data Mining, pp. 929–934, 2008.
[26]
P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol,
“Extracting and composing robust features with denois-
ing autoencoders,” in Proc. Int. Conf. Machine Learning,
pp. 1096–1103, 2008.
[27]
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A.
Manzagol, “Stacked denoising autoencoders: Learning
useful representations in a deep network with a local de-
noising criterion,” J. Machine Learning Research, vol. 11,
pp. 3371–3408, 2010.
[28]
M. Chen, K. Q. Weinberger, F. Sha, and Y. Bengio,
“Marginalized denoising auto-encoders for nonlinear repre-
sentations,” in Proc. Int. Conf. Machine Learning, pp. 1476–
1484, 2014.
[29]
Y. Bengio, Learning Deep Architectures for AI. Now
Foundations and Trends, 2009.
[30]
K. P. Dyer, S. E. Coull, T. Ristenpart, and T. Shrimpton,
“Peek-a-boo, I still see you: Why efficient traffic analysis
countermeasures fail,” in Proc. IEEE Sym. Security &
Privacy, pp. 332–346, 2012.
[31]
C. V. Wright, S. E. Coull, and F. Monrose, “Traffic morph-
ing: An efficient defense against statistical traffic analysis,”
in Proc. IEEE Network and Distributed Security Sym.,
pp. 237–250, 2009.
[32]
TcpDump, “Network packet capture and analyzer.” www.
tcpdump.org, 2019.
[33]
Linux Manual, editcap: Edit and/or Translate the Format
of Capture Files, 2019. https://linux.die.net/man/1/editcap.
[34]
Scikit-learn, “Scikit-learn: Machine learning in Python.”
https://scikit-learn.org, 2011.
[35]
Theano Development Team, “Theano: A Python frame-
work for fast computation of mathematical expressions,
arXiv, vol. abs/1605.02688, 2016.
[36]
PyTorch, “An open source deep learning framework.” https:
//pytorch.org, 2019.
[37]
D. Dudorov, D. Stupples, and M. Newby, “Probability anal-
ysis of cyber attack paths against business and commercial
enterprise systems,” in Proc. IEEE European Intelligence
and Security Informatics Conf., pp. 38–44, 2013.
[38]
A. Panchenko, L. Niessen, A. Zinnen, and T. Engel, “Web-
site fingerprinting in onion routing based anonymization
networks,” in Proc. Annual ACM Work. Privacy in the
Electronic Society, pp. 103–114, 2011.
[39]
T. Kovanen, G. David, and T. H
¨
am
¨
al
¨
ainen, “Survey: Intru-
sion detection systems in encrypted traffic,” in Int. Conf.
Internet of Things, Smart Spaces, and Next Generation
Networks and Systems, pp. 281–293, 2016.
[40]
M. Juarez, S. Afroz, G. Acar, C. Diaz, and R. Greenstadt,
“A critical evaluation of website fingerprinting attacks,” in
Proc. ACM Conf. Computer and Communications Security,
pp. 263–274, 2014.
[41]
K. G. Anagnostakis, S. Sidiroglou, P. Akritidis, K. Xinidis,
E. Markatos, and A. D. Keromytis, “Detecting targeted at-
tacks using shadow honeypots,” in Proc. USENIX Security
Sym., 2005.
[42]
K. G. Anagnostakis, S. Sidiroglou, P. Akritidis, M. Poly-
chronakis, A. D. Keromytis, and E. P. Markatos, “Shadow
honeypots,Int. J. Computer and Network Security, vol. 2,
no. 9, pp. 1–15, 2010.
[43]
G. Portokalidis, A. Slowinska, and H. Bos, “Argos: An
emulator for fingerprinting zero-day attacks for advertised
honeypots with automatic signature generation,” ACM
SIGOPS Operating Systems Review, vol. 40, no. 4, pp. 15–
27, 2006.
[44]
C. Kreibichi and J. Crowcroft, “Honeycomb – creating
intrusion detection signatures using honeypots,” ACM
SIGCOMM Computer Communication Review, vol. 34,
no. 1, pp. 51–56, 2004.
[45]
Y. Tang and S. Chen, “Defending against internet worms:
A signature-based approach,” in Proc. Annual Joint Conf.
IEEE Computer and Communications Societies, pp. 1384–
1394, 2005.
[46]
T. Benzel, R. Braden, D. Kim, C. Neuman, A. Joseph,
K. Sklower, R. Ostrenga, and S. Schwab, “Experience
with DETER: A testbed for security research,” in Proc.
Int. Conf. Testbeds and Research Infrastructures for the
Development of Networks and Communities, 2006.
... With the current growth in technology, cyber-deceptive defenses are increasingly being applied for securing government critical infrastructure and organizational systems from external cyber threats. According to [1], cyber-deceptive products will increase in the industry with numbers of up to 2 billion dollars by 2022. The new advances in cyber defense have integrated new layers which can improve the conventional defense. ...
... For instance, the human bots rely mostly on reports delivered by automatic bots which are, then, used to assess the attack status and send commands to the botnet [3] These are simple commands which can conduct kill chains that are automated and are recognized as malicious software. This is a human mastery which automates the machine section of the deception evaluation and is feasible and very useful [1]. The methodology of evaluation applies targets intrusion detection system (IDS) defenses which are enhanced with deceptive cyber-attack responses using means such as monitoring response. ...
... SVM utilizes a complex technique that maps separated non-linear data to linearly distinguishing void in a higher dimension [7]. The confidence of the classification can be obtained using Platt scaling which using a formula (see appendix 1) where y is the label given, x is the vector used to test, f(x) is the output SVM, and A and B are scalar parameters [1]. Also, the success of the IDS can be estimated using Bayes theorem (see appendix 2) where the base detection rate is used. ...
Article
Cyber-deceptive evaluation is important in the current world of cyberspace defense which is vulnerable to various attacks. To protect an organization or national infrastructure from any form of cyber threat, cyber-deceptive defenses have taken their place in protecting these systems. For this reason, this paper proposes a deep learning methodology that can be implemented to conduct automated cyber-deceptive defenses. This methodology requires minimal human involvement and can prevent any impediments related to human deceptive research. Additionally, it can reduce the efficacy involved in the automation process before human subjects are taken for research. This paper leverages the current advancements in machine learning and uses a realistic and interactive approach for use by large web services which might be targeted. Also, an evaluation of intrusive detection systems has been integrated specifically for systems whose deceptive responses are equipped with the application layer. The results obtained from this paper suggest that developing adaptive web traffic equipped with evasive attacks are challenging and aggressive to cyber-deceptive defenses.
... To classify instances to classes, we use online adaptive metric learning (OAML) [56,57]. OAML is better suited to our task than off-line approaches (e.g., -nearest neighbors), which yield weak predictors when the separation between different class instances is small. ...
... Red teaming validation was performed on a similar environment comprising of EC2 instances deployed on AWS (Amazon Web Services). Fig. 8 shows an overview of our evaluation framework, inspired by related work [57,75]. It streams encrypted legitimate and malicious workloads (both simulated and real) simultaneously onto a honeypatched web server, resulting in labeled audit streams and attack traces (collected at decoys) for training set generation. ...
Article
Most conventional cyber defenses strive to reject detected attacks as quickly and decisively as possible; however, this instinctive approach has the disadvantage of depriving intrusion detection systems (IDSes) of learning experiences and threat data that might otherwise be gleaned from deeper interactions with adversaries. For IDS technology to improve, a next-generation cyber defense is proposed in which cyber attacks are unconventionally reimagined as free sources of live IDS training data. Rather than aborting attacks against legitimate services, adversarial interactions are selectively prolonged to maximize the defender’s harvest of useful threat intelligence. Enhancing web services with deceptive attack-responses in this way is shown to be a powerful and practical strategy for improved detection, addressing several perennial challenges for machine learning-based IDS in the literature, including scarcity of training data, the high labeling burden for (semi-)supervised learning, encryption opacity, and concept differences between honeypot attacks and those against genuine services. By reconceptualizing software security patches as feature extraction engines, the approach conscripts attackers as free penetration testers, and coordinates multiple levels of the software stack to achieve fast, automatic, and accurate labeling of live web streams. Prototype implementations are showcased for two feature set models to extract security-relevant network- and system-level features from cloud services hosting enterprise-grade web applications. The evaluation demonstrates that the extracted data can be fed back into a network-level IDS for exceptionally accurate, yet lightweight attack detection.
... Metric Learning. To classify instances to classes, we use online adaptive metric learning (OAML) [9,30]. OAML is better suited to our task than off-line approaches (e.g., k-nearest neighbors), which yield weak predictors when the separation between different class instances is small. ...
... Regular and honey-patched servers were deployed as LXC containers [53] running atop the host using the official Ubuntu container image. Red teaming validation was performed on a similar environment deployed on AWS. Figure 5 shows an overview of our evaluation framework, inspired by related work [9,13]. It streams encrypted legitimate and malicious workloads (both simulated and real) onto a honey-patched web server, resulting in labeled audit streams and attack traces (collected at decoys) for training set generation. ...
Conference Paper
Full-text available
Conventional cyber defenses typically respond to detected attacks by rejecting them as quickly and decisively as possible; but aborted attacks are missed learning opportunities for intrusion detection. A method of reimagining cyber attacks as free sources of live training data for machine learning-based intrusion detection systems (IDSes) is proposed and evaluated. Rather than aborting attacks against legitimate services, adversarial interactions are selectively prolonged to maximize the defender’s harvest of useful threat intelligence. Enhancing web services with deceptive attack-responses in this way is shown to be a powerful and practical strategy for improved detection, addressing several perennial challenges for machine learning-based IDS in the literature, including scarcity of training data, the high labeling burden for (semi-)supervised learning, encryption opacity, and concept differences between honeypot attacks and those against genuine services. By reconceptualizing software security patches as feature extraction engines, the approach conscripts attackers as free penetration testers, and coordinates multiple levels of the software stack to achieve fast, automatic, and accurate labeling of live web data streams. Prototype implementations are showcased for two feature set models to extract security-relevant network- and system-level features from servers hosting enterprise-grade web applications. The evaluation demonstrates that the extracted data can be fed back into a network-level IDS for exceptionally accurate, yet lightweight attack detection.
... As Shade et al. [15] have pointed out, most research on cyber deception tools tends to focus on honeypots [16], suggesting ways to improve them [17], deliver them as a service [18], or to recognise their deficiencies [18], [19]. Where cyber deception research extends beyond honeypots it still tends to build from a computer science or engineering perspective [18], [19], [20], [21] with a smaller number of examples of research that include the impact of humans on cyber deception through, 'cognitive models and experimental games' [22] and 'computational models of human cognition' [20]. The assumption in such research is one of rational decision-making with a focus on formal rules or models in how decisions are made [23]. ...
Conference Paper
Cyber deception tools are increasingly sophisticated but rely on a limited set of deception techniques. In current deployments of cyber deception, the network infrastructure between the defender and attacker comprises the defence/attack surface. For cyber deception tools and techniques to evolve further they must address the wider attack surface; from the network through to the physical and cognitive space. One way of achieving this is by fusing deception techniques from the physical and cognitive space with the technology development process. In this paper we trial design thinking as a way of delivering this fused approach. We detail the results from a design thinking workshop conducted using deception experts from different fields. The workshop outputs include a critical analysis of design provocations for cyber deception and a journey map detailing considerations for operationalising cyber deception scenarios that fuse deception techniques from other contexts. We conclude with recommendations for future research.
... Previous research has discussed that the configuration of honeypots can either encourage or discourage attacks and identified some relevant design elements Frederick et al. [2012]. Deep learning has been presented as one method of automatic evaluation of cyber deception techniques Ayoade et al. [2020]. The accumulated research has demonstrated that techniques designed to deceive users are feasible. ...
Thesis
The threat of cyber attacks is a growing concern across the world, leading to an increasing need for sophisticated cyber defense techniques. The Tularosa Study, was designed and conducted to understand how defensive deception, both cyber and psychological, affects cyber attackers, Ferguson-Walter et al. [2019c]. More specifically, for this empirical study, cyber deception refers to a decoy system and psychological deception refers to false information of the presence of defensive deception techniques on the network. Over 130 red teamers participated in a network penetration test over two days in which we controlled both the presence of and explicit mention of deceptive defensive techniques. To our knowledge, this represents the largest study of its kind ever conducted on a skilled red team population. In addition to the abundant host and network data collected, we conducted a battery of questionnaires, e.g., experience, personality; and cognitive tasks, e.g., fluid intelligence, working memory; as well as physiological measures, e.g., galvanic skin response (GSR), heart rate, to be correlated with the cyber events at a later date. The design and execution of this study and the lessons learned are a major contribution of this thesis. I investigate the �effectiveness of decoy systems for cyber defense by comparing performance across all experimental conditions. Results support a new finding that the combination of the presence of deception and the true information that deception is present has the greatest effect on cyber attackers, when compared to a control condition in which no deception was used. Evidence of cognitive biases in the red teamers' behavior is then detailed and explained, to further support our theory of oppositional human factors (OHF). The final chapter discusses how elements of the experimental design contribute to the validity of assessing the �effectiveness of cyber deception and reviews trade-off�s and lessons learned.
Article
Full-text available
With information systems worldwide being attacked daily, analogies from traditional warfare are apt, and deception tactics have historically proven effective as both a strategy and a technique for Defense. Defensive Deception includes thinking like an attacker and determining the best strategy to counter common attack strategies. Defensive Deception tactics are beneficial at introducing uncertainty for adversaries, increasing their learning costs, and, as a result, lowering the likelihood of successful attacks. In cybersecurity, honeypots and honeytokens and camouflaging and moving target defense commonly employ Defensive Deception tactics. For a variety of purposes, deceptive and anti-deceptive technologies have been created. However, there is a critical need for a broad, comprehensive and quantitative framework that can help us deploy advanced deception technologies. Computational intelligence provides an appropriate set of tools for creating advanced deception frameworks. Computational intelligence comprises two significant families of artificial intelligence technologies: deep learning and machine learning. These strategies can be used in various situations in Defensive Deception technologies. This survey focuses on Defensive Deception tactics deployed using the help of deep learning and machine learning algorithms. Prior work has yielded insights, lessons, and limitations presented in this study. It culminates with a discussion about future directions, which helps address the important gaps in present Defensive Deception research.
Conference Paper
Full-text available
Advanced persistent threats (APT) have increased in recent times as a result of the rise in interest by nation-states and sophisticated corporations to obtain high profile information. Typically, APT attacks are more challenging to detect since they leverage zero-day attacks and common benign tools. Furthermore, these attack campaigns are often prolonged to evade detection. We leverage an approach that uses a provenance graph to obtain execution traces of host nodes in order to detect anomalous behavior. By using the provenance graph, we extract features that are then used to train an online adaptive metric learning. Online metric learning is a deep learning method that learns a function to minimize the separation between similar classes and maximizes the separation between dissimilar instances. We compare our approach with baseline models and we show our method outperforms the baseline models by increasing detection accuracy on average by 11.3 % and increases True positive rate (TPR) on average by 18.3 %.
Conference Paper
Full-text available
Conventional cyber defenses typically respond to detected attacks by rejecting them as quickly and decisively as possible; but aborted attacks are missed learning opportunities for intrusion detection. A method of reimagining cyber attacks as free sources of live training data for machine learning-based intrusion detection systems (IDSes) is proposed and evaluated. Rather than aborting attacks against legitimate services, adversarial interactions are selectively prolonged to maximize the defender’s harvest of useful threat intelligence. Enhancing web services with deceptive attack-responses in this way is shown to be a powerful and practical strategy for improved detection, addressing several perennial challenges for machine learning-based IDS in the literature, including scarcity of training data, the high labeling burden for (semi-)supervised learning, encryption opacity, and concept differences between honeypot attacks and those against genuine services. By reconceptualizing software security patches as feature extraction engines, the approach conscripts attackers as free penetration testers, and coordinates multiple levels of the software stack to achieve fast, automatic, and accurate labeling of live web data streams. Prototype implementations are showcased for two feature set models to extract security-relevant network- and system-level features from servers hosting enterprise-grade web applications. The evaluation demonstrates that the extracted data can be fed back into a network-level IDS for exceptionally accurate, yet lightweight attack detection.
Conference Paper
Full-text available
Good quality similarity metrics can significantly facilitate the performance of many large-scale, real-world applications. Existing studies have proposed various solutions to learn a Mahalanobis or bilinear metric in an online fashion by either restricting distances between similar (dissimilar) pairs to be smaller (larger) than a given lower (upper) bound or requiring similar instances to be separated from dissimilar instances with a given margin. However, these linear metrics learned by leveraging fixed bounds or margins may not perform well in real-world applications, especially when data distributions are complex. We aim to address the open challenge of “Online Adaptive Metric Learning” (OAML) for learning adaptive metric functions on-the-fly. Unlike traditional online metric learning methods, OAML is significantly more challenging since the learned metric could be non-linear and the model has to be self-adaptive as more instances are observed. In this paper, we present a new online metric learning framework that attempts to tackle the challenge by learning a ANN-based metric with adaptive model complexity from a stream of constraints. In particular, we propose a novel Adaptive-Bound Triplet Loss (ABTL) to effectively utilize the input constraints, and present a novel Adaptive Hedge Update (AHU) method for online updating the model parameters. We empirically validates the effectiveness and efficacy of our framework on various applications such as real-world image classification, facial verification, and image retrieval.
Article
Full-text available
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
Conference Paper
Full-text available
Measuring security controls across multiple layers of defense requires realistic data sets and repeatable experiments. However, data sets that are collected from real users often cannot be freely exchanged due to privacy and regulatory concerns. Synthetic datasets, which can be shared, have in the past had critical flaws or at best been one time collections of data focusing on a single layer or type of data. We present a framework for generating synthetic datasets with normal and attack data for web applications across multiple layers simultaneously. The framework is modular and designed for data to be easily recreated in order to vary parameters and allow for inline testing. We build a prototype data generator using the framework to generate nine datasets with data logged on four layers: network, file accesses, system calls, and database simultaneously. We then test nineteen security controls spanning all four layers to determine their sensitivity to dataset changes, compare performance even across layers, compare synthetic data to real production data, and calculate combined defense in depth performance of sets of controls.
Conference Paper
Offensive and defensive players in the cyber security sphere constantly react to either party’s actions. This reactive approach works well for attackers but can be devastating for defenders. This approach also models the software security patching lifecycle. Patches fix security flaws, but when deployed, can be used to develop malicious exploits. To make exploit generation using patches more resource intensive, we propose inserting deception into software security patches. These ghost patches mislead attackers with deception and fix legitimate flaws in code. An adversary using ghost patches to develop exploits will be forced to use additional resources. We implement a proof of concept for ghost patches and evaluate their impact on program analysis and runtime. We find that these patches have a statistically significant impact on dynamic analysis runtime, increasing time to analyze by a factor of up to 14x, but do not have a statistically significant impact on program runtime.
Article
To achieve a low computational cost when performing online metric learning for large-scale data, we present a one-pass closed-form solution namely OPML in this paper. Typically, the proposed OPML first adopts a one-pass triplet construction strategy, which aims to use only a very small number of triplets to approximate the representation ability of whole original triplets obtained by batch-manner methods. Then, OPML employs a closed-form solution to update the metric for new coming samples, which leads to a low space (i.e., $O(d)$) and time (i.e., $O(d^2)$) complexity, where $d$ is the feature dimensionality. In addition, an extension of OPML (namely COPML) is further proposed to enhance the robustness when in real case the first several samples come from the same class (i.e., cold start problem). In the experiments, we have systematically evaluated our methods (OPML and COPML) on three typical tasks, including UCI data classification, face verification, and abnormal event detection in videos, which aims to fully evaluate the proposed methods on different sample number, different feature dimensionalities and different feature extraction ways (i.e., hand-crafted and deeply-learned). The results show that OPML and COPML can obtain the promising performance with a very low computational cost. Also, the effectiveness of COPML under the cold start setting is experimentally verified.
Conference Paper
Intrusion detection system, IDS, traditionally inspects the payload information of packets. This approach is not valid in encrypted traffic as the payload information is not available. There are two approaches, with different detection capabilities, to overcome the challenges of encryption: traffic decryption or traffic analysis. This paper presents a comprehensive survey of the research related to the IDSs in encrypted traffic. The focus is on traffic analysis, which does not need traffic decryption. One of the major limitations of the surveyed researches is that most of them are concentrating in detecting the same limited type of attacks, such as brute force or scanning attacks. Both the security enhancements to be derived from using the IDS and the security challenges introduced by the encrypted traffic are discussed. By categorizing the existing work, a set of conclusions and proposals for future research directions are presented.
Article
Recent studies on Website Fingerprinting (WF) claim to have found highly effective attacks on Tor. However, these studies make assumptions about user settings, adversary capabilities, and the nature of the Web that do not necessarily hold in practical scenarios. The following study critically evaluates these assumptions by conducting the attack where the assumptions do not hold. We show that certain variables, for example, user's browsing habits, differences in location and version of Tor Browser Bundle, that are usually omitted from the current WF model have a significant impact on the efficacy of the attack. We also empirically show how prior work succumbs to the base rate fallacy in the open-world scenario. We address this problem by augmenting our classification method with a verification step. We conclude that even though this approach reduces the number of false positives over 63%, it does not completely solve the problem, which remains an open issue for WF attacks.