Conference PaperPDF Available

Anomaly Detection in Cyber-Physical Systems: A Formal Methods Approach

Authors:

Abstract

As the complexity of cyber-physical systems increases , so does the number of ways an adversary can disrupt them. This necessitates automated anomaly detection methods to detect possible threats. In this paper, we extend our recent results in the field of inference via formal methods to develop an unsupervised learning algorithm. Our procedure constructs from data a signal temporal logic (STL) formula that describes normal system behavior. Trajectories that do not satisfy the learned formula are flagged as anomalous. STL can be used to formulate properties such as " If the train brakes within 500 m of the platform at a speed of 50 km/hr, then it will stop in at least 30 s and at most 50 s. " STL gives a more human-readable representation of behavior than classifiers represented as surfaces in high-dimensional feature spaces. STL formulae can also be used for early detection via online monitoring and for anomaly mitigation via formal synthesis. We demonstrate the power of our method with a physical model of a train's brake system. To our knowledge, this paper is the first instance of formal methods being applied to anomaly detection.
Anomaly Detection in Cyber-Physical Systems: A Formal Methods
Approach
Austin Jones, Zhaodan Kong, Calin Belta
Abstract As the complexity of cyber-physical systems in-
creases, so does the number of ways an adversary can disrupt
them. This necessitates automated anomaly detection methods
to detect possible threats. In this paper, we extend our recent
results in the field of inference via formal methods to develop
an unsupervised learning algorithm. Our procedure constructs
from data a signal temporal logic (STL) formula that describes
normal system behavior. Trajectories that do not satisfy the
learned formula are flagged as anomalous. STL can be used to
formulate properties such as “If the train brakes within 500
m of the platform at a speed of 50 km/hr, then it will stop
in at least 30 s and at most 50 s.” STL gives a more human-
readable representation of behavior than classifiers represented
as surfaces in high-dimensional feature spaces. STL formulae
can also be used for early detection via online monitoring and
for anomaly mitigation via formal synthesis. We demonstrate
the power of our method with a physical model of a train’s
brake system. To our knowledge, this paper is the first instance
of formal methods being applied to anomaly detection.
I. INTRODUCTION
Cyber-physical systems (CPSs) integrate physical pro-
cesses with computational resources via communication net-
works. In light of high-profile attacks, such as the Maroochy
water breach [1], there has been a surge of interest in
understanding how an adversary can disrupt a cyber-physical
system and how such attacks can be identified and potentially
mitigated [2], [3]. In all the cited works, the designer of
the controllers and/or estimators is assumed to have perfect
knowledge of the physical systems under consideration,
which are assumed to be described by linear models. These
assumptions are not consistent with the growing complexity
of modern CPSs and the involvement of agents, such as
humans, whose behavior is generally quite hard to predict.
In this paper, we apply formal methods to an anomaly de-
tection framework to identify whether or not a given CPS is
under attack. Anomaly detection is the problem of detecting
patterns from data that do not conform to expected behavior
[4]. In our case, we are looking for patterns in the output of
a CPS that lead us to believe that the underlying dynamics of
the system have changed due to attack. Tools from machine
learning, such as Gaussian processes, have been adapted to
anomaly detection [5]. In general, existing techniques infer
a surface embedded in a high-dimensional feature space that
separates normal and anomalous data. However, it is hard
to interpret the meanings of the surfaces, especially in the
Austin Jones and Calin Belta are with the Division of Systems Engineer-
ing. Zhaodan Kong and Calin Belta are with the Department of Mechanical
Engineering at Boston University, Boston, MA 02115. Email: {austinmj,
zhaodan, cbelta}@bu.edu
context of prediction, knowledge base construction and on-
line monitoring (i.e. determining on-line whether a behavior
is anomalous).
Our approach to the problem circumvents the over-
specificity of model-based CPS security methods and the
low usability of existing anomaly detection techniques. We
present a model-free unsupervised learning algorithm for
inferring a signal temporal logic (STL) formula from system
output data that can be used to classify data as normal or
anomalous. STL can express system properties that include
time bounds and bounds on physical system parameters, e.g.
“If the boat remains in region Awhile maintaining its speed
below 10 kph for 10 min., it is guaranteed to reach the
port within 15 min.” STL formulae are easy to formulate in
natural language and have a rigorous mathematical definition,
meaning they can be used for both human-in-the-loop and
automated on-line monitoring.
Our procedure is an extension of our previous work [6]
(which, in turn, was inspired by [7]), in which we developed
a supervised learning algorithm for inferring formulae to
distinguish between desirable (e.g. “the car successfully stops
before hitting an obstruction”) and undesirable (e.g. “The car
strikes the obstruction”) behavior. We defined a fragment of
STL, called reactive STL (rSTL), whose formulae can also
indicate possible causes for each set of behaviors (e.g. “If
the speed of the car is greater than 15 m/s within 0.5s of
brake application, the obstruction will be struck”). By using
the concept of a robustness degree [8], [9], we showed how
to perform a directed search over this set. In contrast, in this
paper, we address the problem of finding such a formula
when the system output data is not labeled. We use many of
the same concepts and theoretical results from [6], but due
to the different problem formulations, we use a fragment of
rSTL that does not require a causal structure.
We use two case studies, a simple academic example and
a more realistic model of an electronically controlled pneu-
matic train brake system (adapted from [10]), to demonstrate
that our algorithm is able to correctly identify anomalies.
Although the focus of our paper is on detecting anomalies,
we use the case studies to also demonstrate how the inferred
formula can be used for on-line monitoring. Further research
will approach this integration in a more formal and rigorous
manner.
II. MATHEMATICAL PRELIMINARIES
A signal xis a map x:R+XRn. We denote
the value of xat time tas x(t)and the suffix of signal x
from time tas x[t]. In this paper, given a system S(e.g. a
53rd IEEE Conference on Decision and Control
December 15-17, 2014. Los Angeles, California, USA
978-1-4673-6088-3/14/$31.00 ©2014 IEEE 848
set of ODEs or a hybrid automaton), we call the set of its
trajectories the language of S, denoted L(S).
Signal temporal logic (STL) [11] is a predicate temporal
logic defined over signals. In this work, we focus on the frag-
ment called inference STL (iSTL), which is a generalization
of reactive STL (rSTL), a fragment we previously defined
in [6]. iSTL differs from rSTL in that the syntax does not
require every formula to contain Boolean implication ().
The syntax of iSTL is defined as
φ::= F[0,T )(φc)(1a)
φc::= F[a,b)`|G[a,b)`|φcφc|φcφc(1b)
where Tis the maximum duration of x,[a, b)is a time
interval, `is a rectangular predicate of the form (x(i)
c),∼∈ {<, ≥},cR, and x(i)is a one-dimensional
element of the signal x.and are conjunction and
disjunction, respectively. F[a,b)and G[a,b)are the temporal
operators “finally” (“eventually”) and “globally” (“always”),
respectively. The external operator F[0,T ]means that we are
searching for properties that can occur at any point in a
signal.
The semantics of iSTL is defined recursively as
x[t]|= (x(i)c)iff x(i)(t)c
x[t]|=φ1φ2iff x[t]|=φ1and x[t]|=φ2
x[t]|=φ1φ2iff x[t]|=φ1or x[t]|=φ2
x[t]|=F[a,b)φiff t0[t+a, t +b)s.t. x[t0]|=φ
x[t]|=G[a,b)φiff x[t0]|=φt0[t+a, t +b)
(2)
A signal xsatisfies an iSTL formula φif x[0] |=φ. The
language of an STL formula φ,L(φ), is the set of all signals
that satisfy φ.
The robustness degree of a signal xwith respect to an
iSTL formula φat time tis given as r(x, φ, t), where rcan
be calculated recursively via the quantitative semantics [8],
[9]
r(x, (x(i)c), t) = x(i)(t)c
r(x, (x(i)< c), t) = cx(i)(t)
r(x, φ1φ2, t) = min(r(x, φ1, t), r(x, φ2, t))
r(x, φ1φ2, t) = max(r(x, φ1, t), r(x, φ2, t))
r(x, F[a,b)φ, t) = max
t0[t+a,t+b)r(x, φ, t0)
r(x, G[a,b)φ, t) = min
t0[t+a,t+b)r(x, φ, t0)
The robustness degree of the entire signal is denoted as
r(x, φ) = r(x, φ, 0). If r(x, φ)is large and positive (nega-
tive), then xsatisfies (violates) φand a large perturbation
to xwould be required in order for the resulting signal x0
to violate (satisfy) φ. If r(x, φ)0, then if even a small
perturbation is applied to x, whether or not x0satisfies φis
unpredictable.
Inference parametric signal temporal logic (iPSTL) [12] is
an extension of iSTL where the bound cand the endpoints of
the time intervals [a, b)are parameters instead of constants.
We denote them as scale parameters π= [π1, ..., πnπ]
and time parameters τ= [τ1, ..., τnτ], respectively. A full
parameterization is given as [π, τ ]. The syntax and semantics
of iPSTL are the same as those for iSTL. To avoid confusion,
we will use φto denote an iSTL formula and ϕto refer to
an iPSTL formula. A valuation θis a mapping that assigns
real values to the parameters appearing in an iPSTL formula.
A valuation θof an iPSTL formula ϕinduces an STL
formula φθ. For example, if ϕ=F[τ12)(x1π1)and
θ([π1, τ1, τ2]) = [0,0,3], then φθ=F[0,3)(x10).
In [6], we showed that the set of iSTL and the set of
iPSTL formulae admit partial orders. Further, the set of all
iPSTL formulae can be organized into a directed acyclic
graph (DAG) where there exists a path from ϕ1to ϕ2if x, θ,
r(x, φ1)r(x, φ2 ). Therefor, if we use the robustness
degree as a fitness measure, we can search for an iSTL
formula that best fits a given set of data by iteratively using
the DAG to perform a search over the set of iPSTL formula
and using continuous optimization methods in order to find
its optimal valuation.
III. PROB LEM F ORM ULATI ON
A. Cyber-Physical Systems Under Normal Operation
We denote a cyber-physical system under normal operation
(i.e. operating as intended in the absence of an attack) as a
system SN. A trajectory of SNis a signal x:R+X,
where XRnis the (possibly) high-dimensional physical
state space of the CPS. The operation of the system is
observed via an output signal
y(t) = g(x(t), t, w),(3)
where wis a noise process. This concept is illustrated in the
following scenario which will serve as a running example
throughout this paper.
Example 1 (Normal system).Consider a train using an
electronically-controlled pneumatic (ECP) braking system.
The train has 3 cars, each of which has its own braking
system. The state of the train system is defined by the
velocity vof the train. Our model of the train system is
modified from [10]. See Section V-B for more details.
In this model, the braking system is automated to regulate
the velocity below unsafe speeds and above low speeds to
ensure that the train reaches its destination. If v(t)exceeds
a threshold vmax (28.5 m/s), the velocity of the un-braked
system increases (shown in green in Fig. 1). Each of the
brakes responds to the threshold crossing after a random
time delay by engaging. The brakes decrease the velocity of
the train (shown in black in Fig. 1) until it passes a second
threshold vmin (20 m/s). After random delays, the brakes
disengage.
The system is observed via the output signal
y(t) = v(t) + w, (4)
where wis a white noise process with variance 0.3. Some
traces of the output signal are shown in Fig. 1. Note that there
is quite a bit of variability in the signals due to noisy inputs
and delays between brake engagement and disengagement,
but they all maintain v(t)below vmax and above vmin for
most of the time.
849
0 50 100
10
20
30
t (s)
v(m/ s)
0 50 100
10
20
30
t (s)
v(m/ s)
0 50 100
10
20
30
t (s)
v(m/ s)
0 50 100
10
20
30
t (s)
v(m/ s)
Fig. 1. Four output signals of the velocity of the train controlled by three
ECP braking systems when all three brakes are functioning normally. The
colors refer to modes of the hybrid automaton that describes the system
(given in Section V-B): blue indicates the system oscillates between low
and high velocities, green indicates the system is moving too fast, and black
indicates the system is being braked.
0 50 100
10
20
30
imax =3
t (s)
v(m/ s)
0 50 100
10
20
30
imax =2
t (s)
v(m/ s)
0 50 100
10
20
30
imax =1
t (s)
v(m/ s)
0 50 100
10
20
30
imax =0
t (s)
v(m/ s)
Fig. 2. Outputs of the train velocity system under different attack scenarios.
An adversary has the ability to disable one, two, or three of the trains brakes
in order to deregulate its velocity. The variable bis the number of brakes
affected by attack.
B. Cyber-Physical Systems Under Attack
In the previous subsection, we described cyber-physical
systems under normal operation. However, we are interested
in the case in which an adversary can affect the sensors
or actuators of the system in order to disrupt its normal
operation. Given a system SNunder normal operation, we
define a system with attack vulnerabilities as a system ST
such that L(SN)L(ST). That is, STbehaves normally
(produces the same outputs as SN) when no attacks take
place and behaves qualitatively differently if an adversary
disrupts it.
Example 1 (Attacked system).An adversary has the possi-
bility to disable each of the brakes of the train. A few sample
outputs from the system STare shown in Fig. 2.
The behavior of the system depends on how much access
the adversary has, i.e. the number of brakes bthat can
be disabled. All three attacked outputs behave qualitatively
differently from each other, but they all clearly violate the
desired invariant behavior demonstrated in Fig. 1. Although
the difference between normal and attacked outputs is visu-
ally obvious, it is difficult to quantify the difference between
the two sets of behaviors due to a large variability in each
output set. Our procedure is able to separate these two
sets automatically with no a priori knowledge of system
dynamics or attack models.
C. Problem Definition
We are interested in determining whether a cyber-physical
system is under attack or not. As illustrated in Example 1,
a CPS may exhibit a wide range of behaviors. Thus we
must compare individual system executions to some global
property that all normal executions of the system need to
satisfy. The logic iSTL (defined in Section II) is well-
suited for compactly and precisely describing CPS behavior.
Logical operators describe how different components of the
output signal interact. Temporal operators describe how the
system changes over time. The bounds given by rectangular
predicates and the time bounds on the temporal operators
incorporate physical parameters into the description. Further,
the set of iSTL formulae may be searched in an efficient and
principled manner as demonstrated in Section IV.
Example 1 (iSTL properties).From examining the outputs
shown in Fig. 1, it is clear that each output satisfies the iSTL
formula
F[0,100)(F[0,10) (y > 25) G[0,30)(y < 30)).(5)
In plain English, this means “At some point, the output
exceeds 25 m/s within the next 10s while remaining below
30 m/s for the next 30 s.
Given a model STof a cyber-physical system, it is in
general difficult to determine analytically an iSTL formula
that describes only the normal behaviors of the system.
Further, in many applications, explicit models of the CPS
are unavailable for analysis. Therefore, we propose to find
such a formula directly from system output data.
In the ideal case, we would be able to construct a formula
that can perfectly distinguish normal and attacked behaviors.
However, since the CPS under consideration may involve
process and observation noise, and only a finite number of
traces are available, we focus our attention on the more
realistic goal given by Problem 1.
Problem 1. A CPS with attack vulnerabilities STproduces
a set of trajectories {xi}N
i=1. Given the set {yi}N
i=1 with yi
being the observed output associated with xi, find an iSTL
formula φsuch that the misclassification rate, given by
|{yi|yi|=φ, xi6∈ L(SN)}| +|{yi|yi6|=φ, xiL(SN)}|
N,
(6)
is minimized.
That is, we want to find a formula that has a high correct
detection rate (correctly flags outputs from systems under
attack as anomalous) and a low false alarm rate (rarely
flags outputs from the system under normal operation as
anomalous). In order to simplify the problem, we make
the following two key assumptions. First, attacks happen
infrequently, i.e., given an output yifrom a CPS ST, the
a priori probability that it was produced by STunder attack
is low. Second, the outputs of a system under attack differs
850
qualitatively from the outputs of a system under normal
operation. Otherwise, it is impossible to infer any classifier
to separate the two sets of outputs. These assumptions are
plausible for real-world scenarios and are commonly made
in other anomaly detection problems [4].
IV. SOLUTION
Since Problem 1 is an unsupervised learning problem,
we use some notions from classical unsupervised learning
to aid in our approach. In particular, we consider one-class
support vector machines (SVMs). A one-class SVM is an
optimization that, given a set of data, lifts the data to a
higher-dimensional feature space and constructs a surface in
this space that separates normal data from anomalous data
[13]. We adapt the objective function used in one-class SVM
and map Problem 1 to the following optimization.
min
φθ, d(φθ) + 1
νN
N
X
i=1
µi(7)
such that
µi=0r(yi, φθ)> /2
/2r(yi, φθ)else i, (8)
where φθis an iSTL formula, is the “gap” in signal
space between outputs identified as normal and outputs
defined as anomalous, νis the upper bound of the a priori
probability that the CPS is under attack, µis a slack variable,
which is positive if yidoes not satisfy φθwith minimum
robustness /2, and the function dis a “tightness” function
that penalizes the size of L(φθ).
Minimizing the sum of the µiminimizes the number of
traces the learned formula φθclassifies as anomalous. This
is consistent with a low prior attack probability. Maximizing
the gap maximizes the separation between normal and
anomalous outputs. Minimizing the function d(φθ)prevents
the learned formula from trivially describing all observed
signals, i.e. finding a formula s.t. yi|=φθ,xiL(ST).
Solving (7) requires searching over the set of continuous
variables (θand ) as well as over the discrete set of
iPSTL formula structures (the structure ϕof φθ). We showed
in our previous work [6] how the set of reactive PSTL
(rPSTL) formulae may be efficiently searched and developed
an algorithm to solve the supervised learning problem by
iterating between the discrete structure search and continuous
variable optimization via simulated annealing. Algorithm 1
is an adaptation of this procedure to solve (7).
Algorithm 1 begins by organizing all of the iPSTL formu-
lae of length 1, e.g., the set of all formulae O[τ12)(xiπ)
where O∈ {F, G}, xiV, and ∼∈ {≤,≥}, into a DAG
(DAGInitialization). The set of formulae in the graph is then
organized into a ranked list (ListInitialization). The ranking
of the formulae in the list is generated randomly during
initialization, as no a priori knowledge of the fitness of the
formulae exists.
After the graph and list are initialized, the iterative learning
procedure begins. The parameter estimation loop (lines 9-
13) iterates over formulae structures ϕin the list List from
Algorithm 1 Anomaly detection algorithm. Lmax is the limit
of the length of the mined formula. Vis the set of variables
that can appear in rectangular predicates. dis a tightness
function. δis an acceptable performance threshold.
1: function FindFormula(Lmax, V, {yi}N
i=1, d, δ)
2: for i= 1 to Lmax do
3: if i= 1 then
4: G1DAGInitialization(V);
5: List ListInitialization(G1);
6: else
7: GiPruningAndGrowing(Gi1);
8: List Ranking(Gi\Gi1,);
9: while List 6=do
10: ϕList.pop();
11: (θ, cost, )ParameterEstimation({yi}N
i=1, ϕ, d);
12: if cost δthen
13: return (ϕ, θ).
14: return MinimumCostNode(GLmax);
lowest rank to highest rank (Line 10). ParameterEstimation
uses simulated annealing to solve (7) to find the optimal
values of θand for each formula structure ϕ.
The ParameterEstimation procedure uses the heuristic
tightness function dwhen calculating the objective function
in (7). In this paper, we use the heuristic given by Algorithm
2. The subroutine Normalize normalizes the value of a
parameter to [0,1]. For each linear predicate in ϕ, Tightness
penalizes the size of the parameter τ1, as for monitoring
purposes we prefer formulae that describe behaviors of the
early parts of the system’s outputs. If =<, the size of π
is penalized, as L(xi< π)grows with π. If =, then
small values of πare penalized. The sum of the penalized
quantities is normalized over the interval [0,1] so that the
magnitude of the output remains invariant with the length of
||ϕ||. This value is then multiplied by a constant λthat takes
into account the total number of trajectories and the ranges
of θ.
Algorithm 2 Tightness Function
1: function Tightness(θ,ϕ)
2: k= 0
3: for all (τ1, τ2, π)θsuch that O[τ12)xiπ)in ϕdo
4: tightness[k] = Normalize(τ1); k+ +;
5: if is <then
6: tightness[k] = Normalize(π);k+ +;
7: else
8: tightness[k] = 1-Normalize(π); k+ +;
9: return λPktightness[k]
length(tightness)
If the optimal cost cost from ParameterEstimation is small
enough (less than δ), the current formula is returned. If no
acceptable solution is found, the set of iPSTL formulae is
searched (Lines 6-8). At the ith iteration, PruningandGrow-
ing uses logical rules to grow the graph of searched formula
to include iPSTL formulae of length i+1. This function also
851
prunes a constant proportion of formulae of length iwhose
optimal cost found by parameter estimation was too high.
This prevents the algorithm from searching over formulae
that we can assume have poor performance. After the graph
is grown, the formulae of length i+1 are organized into a list
ranked according to the performance of their parents in the
DAG. When the formulae are iterated over in Lines 6-8, the
formulae with lowest rank (those whose parents performed
best) are considered first. This iterative process of parameter
estimation and formula structure search continues until either
a formula with low enough cost is found or the length Lmax
is reached.
V. CA SE STU DIE S
A. Linear System
1) Model: We first test our implementation of Algorithm
1 on a system STwhose dynamics evolve according to
˙x= 0.03x+w(9)
under normal operation or
˙x=0.03x+w(10)
under attacked operation, where wis a white noise process
with variance 0.025. For simplicity, y(t) = x(t).
2) Formula inference: We generated 200 different tra-
jectories of ST, shown in Fig. 3(a). We ran our inference
algorithm on a training set of 100 of the trajectories, 4 of
which represented an attack (red), and reserved the other 100
(with 7 attacks) for testing the output of Algorithm 1.
From the training set, our implementation of Algorithm 1
inferred the formula
F[0,3.0)(G[0.5,2.0) (y > 0.9634)).(11)
In plain English, this is “At every point within 0.5 and 2.0
s in the future, the output yexceeds 0.9634”. We used 3
simulated annealing cycles with 15 samples per cycle. The
computation time was 130 s on an 8 core PC with 2.1 GHz
processors and 8 GB RAM.
The threshold 0.9634 from formula (11) is indicated with a
blue line in Fig. 3(a). The formula (11) successfully separates
the normal and attacked outputs, i.e. the misclassification
rate for the training set was 0 and the training set had a
misclassification rate of 0.01. The single missed attack had
a robustness degree of 0.0018 with respect to the inferred
formula, meaning it was “barely” missed.
3) Monitoring: Fig. 3(b) shows the robustness degree of
each yiat time twith respect to the parameterized formula
φ(t) = F[0,t)(G[0.5,min(t,2.0))(y > 0.9634)) (12)
for t > 0.1. This serves as a rudimentary on-line monitor
by quantifying how close an output signal yiobserved up to
time tis to satisfying or violating the part of (11) that can
be checked up to time t. It can be seen that all of the outputs
initially have positive robustness, meaning they do not yet
violate the formula. However, while the normal trajectories
become more positively robust with respect to φ(t)over time,
the robustness of the attack trajectories steadily declines.
0 1 2 3
0.9
0.95
1
1.05
1.1
1.15
t
y(t)
Linear systems
0.5 1 1.5 2 2.5 3
−0.1
−0.05
0
0.05
0.1
t
r(y, φ(t))
Monitor
(a) (b)
Fig. 3. (a) Outputs of system (9) (green) and (10) (red). The blue line
indicate the threshold 0.9634 used in the inferred formula (11). (b) A simple
monitor of the formula φ(t)given in (12).
B. Braked Train
1) Model: In this section, we apply our algorithm to the
train braking scenario [10] used in Example 1. We model
the train as a classical hybrid automaton [14]. A hybrid
automaton produces continuous trajectories x:R+X
Rn. A trajectory xevolve according to dynamics which
depend on the current discrete mode qQ(denoted by a
vertex of a graph) of the automaton. The mode of the system
changes (denoted by edges of a graph) if a guard condition
over the state of the system is satisfied. During a transition,
the state of the system may change discontinuously according
to a reset relation. Here we denote guards in black text and
reset relations in red text.
The hybrid automaton HTwhich describes the total model
of the train consists of 3 identical braking subsystems with
modes Qbk={qbk,j }5
j=1 that describe the state of each brake
and a velocity subsystem with modes Qv={qv,j }3
j=1 which
describes the dynamics of the train’s velocity. The subsystem
associated with brake 1 is shown in Fig. 4(a). The noise
processes n1. . . n5are all Gaussian processes with variance
1, 0.1, 0.3, 3, and 3, respectively. The brake remains in
mode qb1,1during acceleration until v(t)exceeds a threshold
vmax. At this point, the system transitions to the delay mode
qb1,2before moving to the braked mode qb1,3. After the
velocity is decreased below vmin, the system transitions to a
second delay state qb1,4before returning to mode qb1,1. An
adversary can disable the brakes (denoted by the exogenous
event attack1), which forces the system to transition from
qb1,2to a failure mode qb1,5. Brakes in the other two cars
can similarly be disabled.
The velocity subsystem is shown in Fig. 4(b). The velocity
of the train begins in mode qv,1(blue in Fig.s 1 and 2)
and accelerates to vmax. The dynamics of the train shift
to the higher velocity mode qv,2(green). Once at least one
brake engages, the system transitions to a decelerating mode
(black).
2) Formula inference: We used the model given in the
previous section to generate 50 outputs. 43 of the trajectories
were from normal operation and 7 were from an attacked
operation. We only considered attacks in which all of the
brakes were disabled (b= 3). Our algorithm inferred the
formula
F[0,100)(F[10,69) (y < 24.9) F[13.9,44.2)(y > 17.66)).(13)
852
(a)
(b)
Fig. 4. (a) ECP braking subsystem of the first car in the train. (b) Velocity
subsystem of the entire train.
0 20 40 60 80 100
10
5
0
5
10
15
t
r(y,φ(t))
Monitor
Fig. 5. On-line monitoring of train output with respect to (14).
In plain English, (13) means “At some point, between 10s
and 69s in the future the output dips below 24.9 m/s and
the output exceeds 17.66 m/s. between 10 and 44.2 s in the
future.” This is consistent with the oscillatory nature of the
velocity output under normal behavior, as the velocity must
increase and decrease over a window.
The formula (13) perfectly separates the data, i.e. the
misclassification rate is 0. The formula was inferred using 15
simulated annealing cycles with 15 sample points per cycle.
The computation time was 154 s on the same PC as described
in Section V-A.
3) Monitoring: Fig. 5 shows the robustness degree of the
train’s output signal with respect to
φ(t) = F[0,t)(F[10,min(t,69))(y < 24.9)
F[13.9,min(t,44.2))(y > 17.66)).(14)
The robustness of the normal outputs are shown in green and
the robustness of the attacked outputs are shown in red. As
you can see from Fig. 5, many of the normal outputs initially
have negative robustness. However, as time goes on, the
robustness measure improves for all of the normal outputs.
In contrast, the performance of (most) of the attacked outputs
remains low and worsens over time. By time t= 40, all but
two of the attacked outputs are clearly separated from the
normal outputs.
VI. CONCLUSION
In this paper, we consider a general framework for
anomaly detection for cyber-physical systems security. In
place of using classical anomaly detection tools, we apply
a formal methods approach to the problem. We designed
and implemented an algorithm which is able to infer a data
classifier in the form of a signal temporal logic formula
from unlabeled data. The inferred formula can be interpreted
in natural language and can be used in the future for on-
line monitoring. We demonstrated our approach using two
case studies, including a model of a train under attack. Our
approach was able to classify the attacked and normal outputs
for both case studies with low misclassification rates. Further,
we used the formula to test a simple on-line monitor. Results
indicate that the monitors provide early warning for systems
under attack.
ACKNOWLEDGMENT
This work was partially supported by ONR under grants
ONR MURI N00014-10-10952, ONR MURI N00014-09-
1051 and ONR N00014-14-1-0554 ONR, and supported by
NSF under grant NSF CNS-1035588.
REFERENCES
[1] J. Slay and M. Miller, Lessons learned from the maroochy water
breach. Springer, 2007.
[2] F. Pasqualetti, F. Dorfler, and F. Bullo, “Cyber-physical attacks in
power networks: Models, fundamental limitations and monitor design,
in Decision and Control and European Control Conference (CDC-
ECC), 2011 50th IEEE Conference on. IEEE, 2011, pp. 2195–2201.
[3] A. Teixeira, D. P´
erez, H. Sandberg, and K. H. Johansson, “Attack
models and scenarios for networked control systems,” in Proceedings
of the 1st international conference on High Confidence Networked
Systems. ACM, 2012, pp. 55–64.
[4] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A
survey,” ACM Computing Surveys (CSUR), vol. 41, no. 3, p. 15, 2009.
[5] K. Kowalska and L. Peel, “Maritime anomaly detection using gaussian
process active learning,” in Information Fusion (FUSION), 2012 15th
International Conference on. IEEE, 2012, pp. 1164–1171.
[6] Z. Kong, A. Jones, A. Medina Ayala, E. Aydin Gol, and C. Belta,
“Temporal logic inference for classification and prediction from data,
in The 17th International Conference on Hybrid Systems: Computation
and Control (HSCC), Berline, Germany, 2014.
[7] X. Jin, A. Donze, J. Deshmukh, and S. Seshia, “Mining requirements
from closed-loop control models,” in Hybrid Systems: Computation
and Control (HSCC), 2013.
[8] G. E. Fainekos and G. J. Pappas, “Robustness of temporal logic spec-
ifications for continuous-time signals,” Theoretical Computer Science,
vol. 410, no. 42, pp. 4262–4291, 2009.
[9] A. Donz´
e and O. Maler, “Robust satisfaction of temporal logic over
real-valued signals,” in Formal Modeling and Analysis of Timed
Systems. Springer, 2010, pp. 92–106.
[10] A. P. Sistla, M. ˇ
Zefran, and Y. Feng, “Monitorability of stochastic
dynamical systems,” in Computer Aided Verification. Springer, 2011,
pp. 720–736.
[11] O. Maler and D. Nickovic, “Monitoring temporal properties of con-
tinuous signals,” Formal Techniques, Modelling and Analysis of Timed
and Fault-Tolerant Systems, pp. 71–76, 2004.
[12] E. Asarin, A. Donz´
e, O. Maler, and D. Nickovic, “Parametric iden-
tification of temporal properties,” in Runtime Verification. Springer,
2012, pp. 147–160.
[13] H. J. Shin, D.-H. Eom, and S.-S. Kim, “One-class support vector
machinesan application in machine fault detection and classification,”
Computers & Industrial Engineering, vol. 48, no. 2, pp. 395–408,
2005.
[14] J. Lygeros, K. Johansson, S. Sastry, and M. Egerstedt, “On the
existence of executions of hybrid automata,” in Decision and Control,
1999. Proceedings of the 38th IEEE Conference on, vol. 3, 1999, pp.
2249–2254 vol.3.
853
... The challenge lies in identifying these deviations amidst the complex and dynamic nature of these systems. Anomalies can signal a range of issues, from benign system errors to sophisticated cyberattacks, making their timely detection essential for maintaining the integrity, availability, and confidentiality of CPS [34], [118]. The IoT is an important part of CPS, though many people mistakenly use the two terms as if they are the same. ...
... 5) Formal Methods: Formal methods provide rigorous, logic-based approaches to anomaly detection. Signal Temporal Logic (STL) [118] is used to model normal system behavior through time-bound constraints. Anomalies are detected when the system's behavior violates the inferred STL formula, quantified by a robustness metric. ...
Preprint
Full-text available
In our increasingly interconnected world, Cyber-Physical Systems (CPS) play a crucial role in industries like healthcare, transportation, and manufacturing by combining physical processes with computing power. These systems, however, face many challenges, especially regarding security and system faults. Anomalies in CPS may indicate unexpected problems, from sensor malfunctions to cyber-attacks, and must be detected to prevent failures that can cause harm or disrupt services. This paper provides an overview of the different ways researchers have approached anomaly detection in CPS. We categorize and compare methods like machine learning, deep learning, mathematical models, invariant, and hybrid techniques. Our goal is to help readers understand the strengths and weaknesses of these methods and how they can be used to create safer, more reliable CPS. By identifying the gaps in current solutions, we aim to encourage future research that will make CPS more secure and adaptive in our increasingly automated world.
... Outras investigações neste dataset aplicaram Random Forest (RF), Support Vector Machine (SVM) e NB, ilustrando a variedade de métodos aplicáveis [Beaver et al. 2013]. Além disso, combinações de técnicas como J48, NB, RF, Logistic Regression (LR), e Multi Layer Perceptron (MLP) foram testadas no dataset WUSTL-IIOT-2018 para simular operações de ICS sobre ModBUS TCP/IP [Jones et al. 2014, Teixeira et al. 2020. Pesquisas no campo do Industrial Internet of Things (IIoT) destacam a importância do balanceamento de datasets e pré-processamento, onde técnicas como RF se mostraram particularmente eficazes quando aplicadas a datasets enriquecidos com amostras sintéticas [Eid et al. 2024]. ...
... O KNN se destacou com alto recall, essencial para a detecção de ataques, apesar de sua precisão reduzida afetar o F1-Score (0,8900). A LR variou de moderada a alta eficácia, sendo eficiente na classificação de estados normais assim como ataques na segunda fase, conforme visto em [Jones et al. 2014] e [Beaver et al. 2013]. O DT e o RF mostraram robustez, com o DT alcançando um recall de 0,9903 e um F1-score de 0,9779 na primeira fase, e o RF alcançando um recall perfeito de 1,000 e um F1-score de 0,9940, ambos demonstrando alta eficácia na identificação de ataques. ...
Conference Paper
Este trabalho investiga a cibersegurança em Sistemas de Controle Industrial (ICS) diante de riscos cibernéticos emergentes. Desenvolvemos um sistema de detecção de anomalias com uma abordagem de classificação em duas etapas: distinguindo entre operações normais e anômalas, e identificando o tipo específico de ataque. Utilizou-se o dataset SWaT, um simulador de tratamento de água, e técnicas, como SMOTE, foram aplicadas para balancear os dados. Vários algoritmos foram testados, com destaque para o Random Forest pela sua capacidade de identificar os incidentes sem incorrer em falsos negativos (recall). Os resultados mostram que o sistema proposto pode classificar as operações de acordo com seu estado e tipo de ataque.
... While early methods in this domain focused on manual parameter synthesis from predefined template formulae [8]- [11], recent advancements have sought to automate this process by inferring both the structure and parameters of STL formulae directly from data [12]. The authors of [13] introduced a fragment of STL called inference parametric signal temporal logic (iPSTL), which enables the classification problem to be formulated as an optimization problem. However, this approach faces challenges such as high computational cost due to nonlinear parameter optimization routines and constructing a directed acyclic graph (DAG) based on the ordering of PSTL formulae, which may not necessarily improve classification performance. ...
... The colored nodes indicate nodes utilized for classification. A typical example of the learned formula from our method for the normal and abnormal trajectories is: node1: ϕ 1 ∶ ◊ [13,59] STL formula for abnormal behavior c n : Fig. 3: 3a depicts the classification tree of depth 2 wherein only green-colored nodes are used for classification while grey nodes and edges are redundant. Fig. 3b shows the resulting STL BDT wherein c p denotes "normal" and c n indicates "abnormal" behavior. ...
Preprint
Full-text available
This paper presents a novel framework for inferring timed temporal logic properties from data. The dataset comprises pairs of finite-time system traces and corresponding labels, denoting whether the traces demonstrate specific desired behaviors, e.g. whether the ship follows a safe route or not. Our proposed approach leverages decision-tree-based methods to infer Signal Temporal Logic classifiers using primitive formulae. We formulate the inference process as a mixed integer linear programming optimization problem, recursively generating constraints to determine both data classification and tree structure. Applying a max-flow algorithm on the resultant tree transforms the problem into a global optimization challenge, leading to improved classification rates compared to prior methodologies. Moreover, we introduce a technique to reduce the number of constraints by exploiting the symmetry inherent in STL primitives, which enhances the algorithm's time performance and interpretability. To assess our algorithm's effectiveness and classification performance, we conduct three case studies involving two-class, multi-class, and complex formula classification scenarios.
... Setup. We evaluate ECATS on a commonly-used CPS-based dataset, a train cruise control dataset, often used as learning benchmark [15,22,26]. The dataset, balanced between classes, collects 200 trajectories of 1 dimension and has a clearcut condition to classify between regular and anomalous trajectories, except for seven outliers. ...
Chapter
Full-text available
Deep learning methods for time series have already reached excellent performances in both prediction and classification tasks, including anomaly detection. However, the complexity inherent in Cyber Physical Systems (CPS) creates a challenge when it comes to explainability methods. To overcome this inherent lack of interpretability, we propose ECATS, a concept-based neuro-symbolic architecture where concepts are represented as Signal Temporal Logic (STL) formulae. Leveraging kernel-based methods for STL, concept embeddings are learnt in an unsupervised manner through a cross-attention mechanism. The network makes class predictions through these concept embeddings, allowing for a meaningful explanation to be naturally extracted for each input. Our preliminary experiments with simple CPS-based datasets show that our model is able to achieve great classification performance while ensuring local interpretability.
Article
Cyber Physical Systems (CPSs) have a larger attack surface due to the integration of unprotected sensors and actuators into cyber infrastructure and hence a significant amount of research effort is devoted to address the problems of cyber attacks on these systems. In this paper, we address the problem of discovering the signatures of a broad type of cyber attacks that can be launched by a remote attacker using malware on an operational CPS. Our aim is to efficiently detect and prevent such attacks at the boundary of cyber infrastructure and before the payloads can actually cause any damage to the system. In particular, we have considered a large dataset of an operational and popular CPS testbed, called SWaT (Secure Water Treatment), where a number of such cyber attacks have been launched and the network traces, without any specific evidence of such attacks, have been made public recently so that effective security solutions can be developed. We have proposed an effective method to analyze the traffic to discover the signatures of these cyber attacks. Our method has discovered an exact set of signatures based on the packets of Common Industrial Protocol (CIP) in EtherNet/Industrial Protocol stack (ENIP/CIP) of all “sensor reading distortion” and “actuator state alteration” attacks present in SWaT.A6_Dec2019 [24] dataset for the first time in this paper. Leveraging these signatures, we have proposed an algorithm that takes as input a network trace file containing ENIP/CIP packets and a set of signatures and automatically generates as output a graphical model of the cyber infrastructure of SWaT without using any background information and the path in the model that the signatures travel. Our analysis of computational time to execute the algorithm shows that the processing of raw network trace files, a step in the algorithm, consumes a considerable amount of time. Hence, we have developed a set of rules using the signatures and deployed them in Suricata, a well-known and well-adopted rules-based network intrusion detection system, to generate effective alert logs. We found that the rules in Suricata can produce alerts with zero false positives and false negatives in the SWaT.A6_Dec2019 dataset and in three other SWaT datasets for the two types of attacks.
Preprint
Full-text available
In this paper, we define a novel census signal temporal logic (CensusSTL) that focuses on the number of agents in different subsets of a group that complete a certain task specified by the signal temporal logic (STL). CensusSTL consists of an "inner logic" STL formula and an "outer logic" STL formula. We present a new inference algorithm to infer CensusSTL formulae from the trajectory data of a group of agents. We first identify the "inner logic" STL formula and then infer the subgroups based on whether the agents' behaviors satisfy the "inner logic" formula at each time point. We use two different approaches to infer the subgroups based on similarity and complementarity, respectively. The "outer logic" CensusSTL formula is inferred from the census trajectories of different subgroups. We apply the algorithm in analyzing data from a soccer match by inferring the CensusSTL formula for different subgroups of a soccer team.
Conference Paper
Full-text available
Given a dense-time real-valued signal and a parameterized temporal logic formula with both magnitude and timing parameters, we compute the subset of the parameter space that renders the formula satisfied by the trace. We provide two preliminary implementations, one which follows the exact semantics and attempts to compute the validity domain by quantifier elimination in linear arithmetics and one which conducts adaptive search in the parameter space.
Conference Paper
Full-text available
This paper presents an inference algorithm that can discover temporal logic properties of a system from data. Our algorithm operates on finite time system trajectories that are labeled according to whether or not they demonstrate some desirable system properties (e.g. "the car successfully stops before hitting an obstruction"). A temporal logic formula that can discriminate between the desirable behaviors and the undesirable ones is constructed. The formulae also indicate possible causes for each set of behaviors (e.g. "If the speed of the car is greater than 15 m/s within 0.5s of brake application, the obstruction will be struck") which can be used to tune designs or to perform on-line monitoring to ensure the desired behavior. We introduce reactive parameter signal temporal logic (rPSTL), a fragment of parameter signal temporal logic (PSTL) that is expressive enough to capture causal, spatial, and temporal relationships in data. We define a partial order over the set of rPSTL formulae that is based on language inclusion. This order enables a directed search over this set, i.e. given a candidate rPSTL formula that does not adequately match the observed data, we can automatically construct a formula that will fit the data at least as well. Two case studies, one involving a cattle herding scenario and one involving a stochastic hybrid gene circuit model, are presented to illustrate our approach.
Conference Paper
Full-text available
A model of normal vessel behaviours is useful for detecting illegal, suspicious, or unsafe behaviour; such as vessel theft, drugs smuggling, people trafficking or poor sailing. This work presents a data-driven non-parametric Bayesian model, based on Gaussian Processes, to model normal shipping behaviour. This model is learned from Automatic Identification System (AIS) data and uses an Active Learning paradigm to select an informative subsample of the data to reduce the computational complexity of training. The resultant model allows a measure of normality to be calculated for each newly-observed transmission according to its velocity given its current latitude and longitude. Using this measure of normality, ships can be identified as potentially anomalous and prioritised for further investigation. The model performance is assessed by its ability to detect artificially generated AIS anomalies at locations around the United Kingdom. Finally, the model is demonstrated on case studies from artificial and real vessel data to detect anomalies in unusual tracks.
Article
Full-text available
Cyber-secure networked control is modeled, analyzed, and experimentally illustrated in this paper. An attack space defined by the adversary's system knowledge, disclosure, and disruption resources is introduced. Adversaries constrained by these resources are modeled for a networked control system architecture. It is shown that attack scenarios corresponding to replay, zero dynamics, and bias injection attacks can be analyzed using this framework. An experimental setup based on a quadruple-tank process controlled over a wireless network is used to illustrate the attack scenarios, their consequences, and potential counter-measures.
Conference Paper
Full-text available
Supervisory control and data acquisition (SCADA) systems are widely used to monitor and control operations in electrical power distribution facilities, oil and gas pipelines, water distribution systems and sewage treatment plants. Technological advances over the past decade have seen these traditionally closed systems become open and Internet-connected, which puts the service infrastructures at risk. This paper examines the response to the 2000 SCADA security incident at Maroochy Water Services in Queensland, Australia. The lessons learned from this incident are useful for establishing academic and industry-based research agendas in SCADA security as well as for safeguarding critical infrastructure components. Keywords: SCADA security, Maroochy Water Services breach
Article
A significant challenge to the formal validation of software-based industrial control systems is that system requirements are often imprecise, non-modular, evolving, or even simply unknown. We propose a framework for mining requirements from the closed-loop model of an industrial-scale control system, such as one specified in the Simulink modeling language. The input to our algorithm is a requirement template expressed in Parametric Signal Temporal Logic --- a formalism to express temporal formulas in which concrete signal or time values are replaced by parameters. Our algorithm is an instance of counterexample-guided inductive synthesis: an intermediate candidate requirement is synthesized from simulation traces of the system, which is refined using counterexamples to the candidate obtained with the help of a falsification tool. The algorithm terminates when no counterexample is found. Mining has many usage scenarios: mined requirements can be used to validate future modifications of the model, they can be used to enhance understanding of legacy models, and can also guide the process of bug-finding through simulations. We present two case studies for requirement mining: a simple automobile transmission controller and an industrial airpath control model for an engine.
Article
In this paper, we consider the robust interpretation of Metric Temporal Logic (MTL) formulas over signals that take values in metric spaces. For such signals, which are generated by systems whose states are equipped with non-trivial metrics, for example continuous or hybrid, robustness is not only natural, but also a critical measure of system performance. Thus, we propose multi-valued semantics for MTL formulas, which capture not only the usual Boolean satisfiability of the formula, but also topological information regarding the distance, ε, from unsatisfiability. We prove that any other signal that remains ε-close to the initial one also satisfies the same MTL specification under the usual Boolean semantics. Finally, our framework is applied to the problem of testing formulas of two fragments of MTL, namely Metric Interval Temporal Logic (MITL) and closed Metric Temporal Logic (clMTL), over continuous-time signals using only discrete-time analysis. The motivating idea behind our approach is that if the continuous-time signal fulfills certain conditions and the discrete-time signal robustly satisfies the temporal logic specification, then the corresponding continuous-time signal should also satisfy the same temporal logic specification.
Article
Fast incipient machine fault diagnosis is becoming one of the key requirements for economical and optimal process operation management. Artificial neural networks have been used to detect machine faults for a number of years and shown to be highly successful in this application area. This paper presents a novel test technique for machine fault detection and classification in electro-mechanical machinery from vibration measurements using one-class support vector machines (SVMs). In order to evaluate one-class SVMs, this paper examines the performance of the proposed method by comparing it with that of multilayer perception, one of the artificial neural network techniques, based on real benchmarking data.