ArticlePDF Available

Optimal detection and error exponents for hidden multi-state processes via random duration model approach


Abstract and Figures

We study detection of random signals corrupted by noise that over time switch their values (states) from a finite set of possible values, where the switchings occur at unknown points in time. We model such signals by means of a random duration model that to each possible state assigns a probability mass function which controls the statistics of durations of that state occurrences. Assuming two possible signal states and Gaussian noise, we derive optimal likelihood ratio test and show that it has a computationally tractable form of a matrix product, with the number of matrices involved in the product being the number of process observations. Each matrix involved in the product is of dimension equal to the sum of durations spreads of the two states, and it can be decomposed as a product of a diagonal random matrix controlled by the process observations and a sparse constant matrix which governs transitions in the sequence of states. Using this result, we show that the Neyman-Pearson error exponent is equal to the top Lyapunov exponent for the corresponding random matrices. Using theory of large deviations, we derive a lower bound on the error exponent. Finally, we show that this bound is tight by means of numerical simulations.
Content may be subject to copyright.
Optimal detection and error exponents for
hidden multi-state processes via random
duration model approach
Dragana Bajovi´
c1, Kanghang He2, Lina Stankovi´
c2, Dejan Vukobratovi´
c1, and Vladimir
Abstract. We study detection of random signals corrupted by noise that over time switch their values
(states) from a finite set of possible values, where the switchings occur at unknown points in time.
We model such signals by means of a random duration model that to each possible state assigns a
probability mass function which controls the statistics of durations of that state occurrences. Assuming
two possible signal states and Gaussian noise, we derive optimal likelihood ratio test and show that it
has a computationally tractable form of a matrix product, with the number of matrices involved in the
product being the number of process observations. Each matrix involved in the product is of dimension
equal to the sum of durations spreads of the two states, and it can be decomposed as a product of
a diagonal random matrix controlled by the process observations and a sparse constant matrix which
governs transitions in the sequence of states. Using this result, we show that the Neyman-Pearson error
exponent is equal to the top Lyapunov exponent for the corresponding random matrices. Using theory
of large deviations, we derive a lower bound on the error exponent. Finally, we show that this bound is
tight by means of numerical simulations.
Keywords. Multi-state processes, random duration model, hypothesis testing, error exponent, large de-
viations principle, threshold effect, Lyapunov exponent.
1D. Bajovi´
c and D. Vukobratovi´
c are with Department of Power, Electronic and Communications Engineering, Faculty of
Technical Sciences, University of Novi Sad, Novi Sad, Serbia (e-mail: {dbajovic,}).
2K. He, L. Stankovi´
c and V. Stankovi´
c are with Department of Electronic and Electrical Engineering, University of Strathclyde,
Glasgow, G1 1XW, UK (e-mail: {kanghang.he, vladimir.stankovic, lina.stankovic}
December 27, 2017 DRAFT
arXiv:1712.09061v1 [cs.IT] 25 Dec 2017
The problem of detecting a signal hidden in noise is investigated. The signal to be detected is
characterised as having a constant magnitude in any one state and can transition to multiple states over
time. Each occurrence of a particular state has a random duration, modelled as a discrete random variable
which takes values in a finite set of integers, according to a certain probability mass function associated
with that state. For each given state, duration of its occurrences over time are independent and identically
distributed random variables, independent of duration of other states.
Our main motivation for studying the described model comes from non intrusive appliance load
monitoring (NILM) problem, i.e., detecting one or more particular appliance states, each of unknown
duration, within an aggregate power signal, as obtained from smart meters. With the large-scale roll-out
of smart meters worldwide, there has been increased interest in NILM, i.e., disaggregating total household
energy consumption measured by the smart meter down to appliance level using purely software tools
[1]. NILM can enrich energy feedback, it can support smart home automation [2], appliance retrofit
decisions, and demand response measures [3].
Despite significant research efforts in developing efficient NILM algorithms (see [3], [4],[5], [6], [7] and
references therein), NILM is still a challenge, especially at low sampling rates, in the order of seconds and
minutes. One obstacle is lack of standardised performance measures and appropriate theoretical bounds
of detectability of appliance usage, which can help estimating performance of various algorithms. A
particularly challenging problem is the detection of multi-state appliances, i.e., appliances whose power
consumption switches over one appliance runtime through several different values. Examples of such
appliances are a dish-washer or a washing machine, where the chosen program or setting and possibly
also the appliance load (e.g., with the washing machine) determines duration that the appliance spends
in each state. The difficulty there arises from the fact that the program and the load, unknown from the
perspective of NILM, are non-deterministic, i.e., vary each time the same appliance is run resulting in
difficulty in detecting in which state the appliance is. The aggregate signal minus the appliance load is
considered noise for the detection problem.
The above model is also representative of signals occurring in a range of other applications. In
econometrics, examples of duration signals include marital or employment status, or in general the time
an individual spends in a certain state [8]. Further examples from econometrics are time to currency
alignment or time to transactions in stock market [9]. In communication systems theory pulse-duration
modulated (PDM) signals for transmitting information encoded into the pulse duration have two possible
signal states: the positive value state is a pulse whose duration is proportional to the information symbol
December 27, 2017 DRAFT
to be encoded, and the zero-value state in between any two pulses. The probability distribution of the
state duration is then controlled by the probability distribution on the set of information symbols to
be transmitted. Further binary state examples are random telegraph signals, where the signal switches
between two values in a random manner1, and the activity pattern of a certain mobile user in a cellular
communication system.
In this paper, we are interested in deriving optimal detection tests for detecting multi-state signals
with random duration structure hiding in noise. We consider binary models, where occurrences of two
possible states are interleaved in time. Further, we are interested in characterizing performance of optimal
detection tests measured in terms of Neyman Pearson error exponent. Works on detecting multi-state
signals hidden in noise, most related to our work, include [10], [12] and [13]. However, in contrast to the
random duration model that we propose, these references model multi-state signals in noise as hidden
Markov chains. Reference [10] considers random telegraph signals modelled as binary Markov chains
and derives the corresponding optimal detection test in the form of a product of certain measurement
defined matrices. Reference [12] considers detection of a random walk on a graph, and derives bounds
on the error exponent for the Neyman-Pearson detection test. Reference [13] uses the method of types to
generalize the results from [12] to non-homogeneous setting where different nodes have different signal-
to-noise ratios (SNR) with respect to the walk. Furthermore, reference [13] proves that the derived bound
on the error exponent has a convex optimization form.
Contributions. In this paper, we show that the optimal detection test, seemingly combinatorial in nature,
admits a simple, linear recursion form of a product of matrices of dimension equal to the sum of the
duration spreads for the two states. Using the preceding result, we show that the Neyman-Pearson error
exponent for this problem is given by the top Lyapunov exponent [14] for the matrices that define
the recursion. The matrices have a structure of an interleaved random diagonal and (sparse) constant
component that defines transitions from one state pattern to another. Thus, we reveal that a similar
structural effect as with the error exponent for hidden Markov processes occurs here as well [10],[13].
Finally, using the theory of large deviations [15], we derive a lower bound on the error exponent and
demonstrate by numerical simulations that the derived bound is very close to the true error exponent.
Paper outline. Section II states the problem setup and Section III gives the preliminaries. Section IV
gives main results on the form of the optimal likelihood ratio test. Section V provides the lower bound on
1We remark that there are other stochastic models in the literature for the random telegraph signal, e.g., the Poisson model,
or the hidden Markov chain model [10][11].
December 27, 2017 DRAFT
the error exponent, while Section VI proves this result. Finally, numerical results are given in Section VII
and Section VIII concludes the paper.
Notation. For an arbitrary integer n,Sn1denotes the probability simplex in Rn;e1denotes the first
canonical vector and the vector (the ndimensional vector with 1only in the first position, and having
zeros in all other positions), and 1the vector of all ones, where we remark that the dimension should
be clear from the context; A0denotes the lower shift matrix (the 0/1matrix with ones only on the first
subdiagonal). We denote Gaussian distribution of mean value µand standard deviation σby N(µ, σ2);
by p[1, n]an arbitrary distribution over the first nintegers; by U[1, n]the uniform distribution over the
first nintegers; log denotes the natural logarithm.
We consider the problem of detecting a signal corrupted by noise that randomly switches from one state
mto another, where m= 1,2, ..., M and in each state the signal has a certain magnitude µm. The duration
that the signal spends in a given state mis modelled as a discrete random variable on a given support set
[1,m], and with a certain probability mass function (pmf) defined by vector pmSm1. In this work,
we consider the case when M= 2 and we assume that for each state mwe know the corresponding
value of the observed signal µm. Without loss of generality, we will assume that µ2> µ10. For each
sampling time t= 1,2, ..., let St={S1, ..., St}denote the sequence of states until time tof the signal
that we wish to detect, where for each k= 1, ..., t,Sk∈ {1,2}; similarly, we denote S={S1, S2, ...}.
We assume that, with probability one, the first state is S11, and, for the purpose of analysis, we set
S02. Let Xkdenote the signal measurement for sample time k,k= 1, ..., t, and, for each t, collect all
measurements up to time tin vector Xt= (X1, ..., Xt). We assume that each measurement is corrupted
by a zero mean additive Gaussian noise N(0, σ2), where standard deviation σ > 0.
The sequence of switching times. For the sequence of states S1, S2, ..., we define the sequence of times
{T1, T2, ...}, when the signal in the sequence switches from one state to another, i.e.,
Ti+1 = max{kTi+ 1 : Sk=STi+1 },for i= 0,1,2, ... (1)
where we set T00. We call a phase each time window [Ti+ 1, Ti+1],i= 0,1,2, . . ., and note that
during any phase, the sequence Sstays in the same state. Since S11, all odd-numbered intervals
[T0+ 1, T1],[T2+ 1, T3],..., where the ordering is with respect to the order of appearance, are state 1
phases, and all even-numbered intervals [T1+ 1, T2],[T3+ 1, T4],... are state 2phases.
December 27, 2017 DRAFT
Random duration model. For n= 1,2, ..., we denote by D1,n the difference process
D1,n =T2n1T2n2,(2)
or, in words, for each n,D1,n is the duration of the n-th state-1phase in the sequence S. We assume
that durations of state-1phases are independent and identically distributed (i.i.d.), with support set of all
integers in the finite interval [1,1], and with pmf given by vector p1= (p11, p12, ..., p1∆1)S11.
Similarly, we define
D2,n =T2nT2n1(3)
to be the duration of the n-th state-2phase in the sequence S1, S2, ..., for n= 1,2, ...; we assume that
the D2,n’s are i.i.d., with support set of all integers in the interval [1,2], and pmf given by vector
p2= (p21, p22, ..., p2∆2)S21. We also assume that durations of state-1and state-2phases are
mutually independent.
Hypothesis testing problem. Using the preceding definitions, we model the signal detection problem as
the following binary hypothesis testing problem:
∼ N(0, σ2)(4)
N(µ1, σ2),if Sk= 1
N(µ2, σ2),if Sk= 2
,for k= 1, ..., t,
where D1,n p1(1,1)are i.i.d., D2,n p2(1,2)are i.i.d., D1n’s and D2ns are independent, and
S11. We remark that the model above easily generalizes to the case when the signals Xkare under
both hypotheses shifted for some µ0R, i.e., when, under H=H0,Xk∼ N(µ0, σ2)and, under
H=H1,Xk∼ N(µSk+µ0, σ2); see the example of appliance detection problem later in this section.
The latter hypothesis testing problem reduces to the one in (4) by means of the change of variables
Illustration: Multiphase appliance detection. Suppose that we wish to detect an event that a certain
appliance in a household is switched on. We consider classes of appliances whose signature signals exhibit
a multistate (multiphase) type of behavior, such as switching from high to low signal values, where the
durations of phases of the same signal level can be different across a single appliance run-time and
also in different run-times of the same appliance. Examples of appliances whose signatures fall into this
class are, e.g., a dishwasher and a washer-dryer. This problem can be modelled by the hypothesis testing
problem (4) where µ1corresponds to the appliance consumption when in low state and µ2corresponds to
the appliance consumption when in high state. In this scenario, there is an underlying baseline load which
December 27, 2017 DRAFT
can also be modelled as a Gaussian random variable of expected value µ0and standard deviation σ2.
Since the same baseline load is present both under H0and H1, to cast the described appliance detection
problem in the format given in (4), we simply subtract the value µ0from the observed consumption
signal Xk.
Likelihood ratio test and Neyman-Pearson error exponent. We denote the probability laws corre-
sponding to H0and H1by P0and P1, respectively. Similarly, the expectations with respect to P0and
P1are denoted by E0and E1, respectively. The probability density functions of Xtunder H1and H0are
denoted by f1,t(·)and f0,t (·). It will also be of interest to introduce the conditional probability density
function of Xtgiven St=st(i.e., the likelihood functions), which we denote by f1,t|St(·|st), for any
st. Finally, the likelihood ratio at time tdenoted by Lt, and at a given realization of Xtis computed by
Lt(Xt) = f1,t(Xt)
It is well known that the optimal detection test (both in Neyman-Pearson and Bayes sense) for
problem (4) is the likelihood ratio test. Conditioning on the state realizations until time t,St=st,
and denoting shortly P(st) = P1(St=st), we have
Lt(Xt) = X
k=1 1
2πσ e(µskXk)2
k=1 1
2πσ eX2
In this paper our goal is to find a computationally tractable form for the optimal, likelihood ratio test
and also to characterize its asymptotic performance, when the number of samples Xkgrows large. In
particular, with respect to performance characterization, we wish to compute the error exponent for the
probability of a miss, under a given bound αon the probability of false alarm:
tlog Pα
miss,t =: ζ, (6)
where Pα
miss,t is the minimal probability of a miss among all decision tests that have probability of false
alarm bounded by α. By results from detection theory, e.g., [16],[17], the ζin (6) is given by the
asymptotic Kullback-Leibler rate in (7), provided that this limit exists
ζ= lim
tlog Lt(Xt).(7)
We prove the existence of the limit in (7) in Lemma 7 in Section V further ahead. An illustration
of the identity (6) is given in Figure 1, which clearly shows that both sequences 1
tlog Pα
miss,t and
tlog Lt(Xt)are convergent and moreover that they converge to the same value – the asymptotic
December 27, 2017 DRAFT
Kullback-Leibler rate for the two hypothesis defined in (4). For further details on this simulation see
Section VII.
Fig. 1: Simulation setup: ∆=3,p1, p2∼ U([1,∆]),µ1= 2,µ2= 5,σ= 10,α= 0.01. Green full line
plots the evolution of 1
tlog Lt; blue dotted line plots the evolution of 1
tlog Pα
miss,t, and red dashed
line plots the estimated slope of the probability of a miss values (in the logarithmic scale) calculated for
values until t= 300 observations.
In this section we now introduce a number of quantities related with the sequences st∈ St,t= 1,2, ...,
and give certain results pertaining to these quantities that will be useful for our analysis.
Statistics for the durations of phases. For each t, we define the sets of discrete times until time tin
which the signal was in states 1and 2, which we respectively denote by T1and T2:
T1(st) = {1kt:sk= 1},(8)
T2(st) = {1kt:sk= 2}.(9)
We denote cardinalities of T1and T2, respectively, by τ1and τ2, i.e., τ1≡ |T1|and τ2≡ |T2|. Note
that functions T1and T2are, strictly speaking, dependent on time t(this dependence is observed in their
domain sets Stwhich clearly change with time t). However, for reasons of easier readibility, we suppress
this dependence in the notation, as we also do for all the subsequently defined quantities.
December 27, 2017 DRAFT
For each t, for each st, we also introduce N1and N2to count the number of state-1and state-2phases,
respectively, in the sequence st:
N1(st) = |{1kt:sk1= 2, sk= 1}| (10)
N2(st) = |{1kt:sk1= 1, sk= 2}|,(11)
where, since the first phase is state-1phase, we set s02. We remark that, for any sequence st, if the
last state st= 2, then N1(st) = N2(st), and if st= 1, then N1(st) = N2(st)+1. Finally, N(st)is the
total number of phases in st,NN1+N2.
We further define the sets Tmn(st)that contain time indices for the n-th state-mphase, n= 1, ..., Nm(st),
m= 1,2. Note that, for each m= 1,2,Nm(st)
n=1 Tmn(st) = Tm. We now increase granularity in the counts
N1and N2and define
N1d(st) =
1{|T1n|=d}(st),for d= 1, ..., 1,(12)
N2d(st) =
1{|T2n|=d}(st),for d= 1, ..., 2;(13)
i.e., in words, vectors (Nm1, ..., Nm),m= 1,2, represent histograms of phase 1and phase 2durations.
It is easy to see that Nm=Pm
d=1 Nmd, for m= 1,2. Also, for each time tand each sequence st,
the total number of state 1and state 2occurrences must sum up to t, and therefore P1
d=1 d N1d(st) +
d=1 d N2d(st) = t.
Figure 2 shows an example of simulation signals under Hypothesis H1with ∆ = 10, µ1= 3, µ2= 5
and σ= 0.05 using random duration model for various switching times T, difference process durations
Dk,i and numbers of different state-phases with fixed duration Nk,d. We can see from the figure that
D1,1=T1T0= 8 as shown in eq. (2) and there is only one state-phase 1last for 8samples,
hence N1,8= 1. Again, from eq. (3) we can see from the figure again that D2,1=T2T1= 8 and
D2,3=T6T5= 8. Thus N2,8= 2 for there are two state-phase 2last for 8samples.
To simplify the notation, let o(st)return the duration of the last phase in the sequence st, and note
also that streturns the type of the last phase in st. The next lemma computes the probability of a given
sequence st,P(st) = P1St=st.
Lemma 1. For any sequence st, there holds
P(st) = p+
where by p+
ml we shortly denote p+
ml =pml +pml+1 +... +pmm, for l= 1,2, ..., mand m= 1,2.
December 27, 2017 DRAFT
Fig. 2: Example of simulation signals with ∆ = 10, µ1= 3, µ2= 5 and σ= 0.05 and various T,Dk,i ,
and Nk,d.
The proof of Lemma 1 is given in Appendix. Besides function Pwhich returns the exact probability of
occurrence of sequence st, it will also be of interest to define a related function P0:{1,2}t7→ R, defined
through Pby leaving out the first factor in (14), i.e., P0(st) = psto(st)
P(st)(note that the assumption
that p1, p2>0(entrywise) ensures that P0is always well defined). Let pmin = min{pmd :m= 1,2, d =
1, ..., m}and note that, for any mand d,pmd p+
md 1(this relation can be easily seen from the
definition of p+
md). Thus, the following relation holds between Pand P0:
For increasing t, the two functions will have equal exponents, that is, the effect of the factor 1
pmin will
vanish, and thus in our subsequent analyses we will use the analytically more appealing function P0.
Further, to simplify the analysis, in what follows we will assume that 1= ∆2=: ∆.
We let Stdenote the set of all feasible sequences of states stof length t, i.e., the sequences for which
P1(St=st)>0; we let Ctdenote the cardinality of St. When p1and p2are strictly greater than zero,
it can be shown that Ctequals the number of ways in which integer tcan be partitioned with parts
bounded by . This number is known as the -generalized Fibonacci number, and is computed via the
following recursion:
Ct=Ct1+. . . +Ct,(16)
December 27, 2017 DRAFT
with the initial condition C1= 1. The recursion in (16) is linear and hence can be represented in the
form e
Ct1, where e
Ct= [CtCt1. . . Ct∆+1 ]and Ais a square, ×matrix; it can be shown
that Ais equal to A=e11>+A0, where, we recall, A0is the lower shift matrix of dimension . The
growth rate of Ctis given by the largest zero of the characteristic polynomial of A, as the next result,
which we borrow from [18] asserts.
Lemma 2. [Asymptotics for -generalized Fibonacci number [18]] For any , there exists t0=t0()
such that
where ψis the unique positive zero of the following polynomial ψψ1. . . 1=0.
A. Sequence types
Duration fractions. For d= 1,2, ..., , let Vm,d denote the number of times along a given sequence of
states that state-1phase had length d, normalized by time t, i.e.,
Vm,d(st) = Nm,d (st)
t, m = 1,2.(18)
For each sequence st, we define its type as the 2×matrix V:= V1(st)>;V2(st)>, where
Vm(st) = Vm,1(st), ..., Vm,(st), for m= 1,2. Recalling N1and N2(10), which, respectively, count
the number of state-1and state-2phases along st, we see that Nm=t1>Vm,m= 1,2.
It will also be of interest to define the fractions of times Θ1and Θ2that a given sequence of states
was in states 1and 2, respectively,
Θm(st) = τm(st)
t, m = 1,2.(19)
It is easy to verify that Θm=P
d=1 d Vm,d, for m= 1,2.
Let Vtdenote the set of all 2×-tuples of feasible occurrence of type Vat time t
Vt=ν= (ν1, ν2) : ν=V(st),for some st.(20)
Note that, as they are defined as normalized versions of quantities Nmd(st),Vmd(st)’s also inherit the
properties of Nmd’s: 1) P
d=1 dV1d(st)+dV2d(st)=1; 2) 01>V1(st)1>V2(st)1/t. As t+,
for every st∈ St, the difference between 1>V1(st)and 1>V2(st)decreases. Motivated by this, we
introduce the set
+:1>ν1=1>ν2, q>ν1+q>ν2= 1o.(21)
December 27, 2017 DRAFT
For each t,ν∈ Vt, define the set St
νthat collects all sequences st∈ Stwhose type is ν:
ν=st∈ St:V(st) = ν(22)
(note that if ν /∈ Vt, then set St
νwould be empty). Set St
νtherefore consists of all sequences with the
following properties: 1) the first phase is state-1phase; 2) the total number of state-1phases is 1>ν1t,
where the total number of such phases of duration exactly dis given by ν1,d t; and 3) the total number of
state-2phases is 1>ν2t, where the total number of such phases of duration exactly dis given by ν2,d t.
Let Ct,ν denote the cardinality of St
ν. This number is equal to the number of ways in which one can
order 1>ν1tstate-1phases (of different durations), where each new ordering has to give rise to a different
pattern of state occurrences, times the corresponding number for state-2phases. Since for any d, any
permutation of νm,dtphases, each of which is of length d, gives the same sequence pattern, Ct,ν is given
by the number of permutations with repetitions for state-1phases times the number of permutations with
repetitions for state-2phases:
Ct,ν =1>ν1t!
(ν1,1t)! ·. . . ·(ν1,1t)! 1>ν2t!
(ν2,1t)! ·. . . ·(ν2,2t)! .(23)
From (23) the following result regarding the growth rate of Ct,ν easily follows (e.g., by Stirling’s
approximation bounds).
Lemma 3. For any  > 0there exists t1=t1()such that for all tt1
et(H(ν1)+H(ν2))Ct,ν et(H(ν1)+H(ν2)+),(24)
where H:R
+7→ Ris defined as
H(λ) =
1>λlog λd
where λddenotes the d-th element of an arbitrary vector λR
We end this section by giving some well-known results from the theory of large deviations that we
will use in our analysis of detection problem (4).
B. Varadhan’s lemma and large deviations principle
Large deviations principle.
Definition 4 (Large deviations principle [15] with probability 1).Let µω
t:BRDbe a sequence of
Borel random measures defined on probability space (Ω,F,P). Then, µω
t,t= 1,2, ... satisfies the large
deviations principle with probability one, with rate function Iif the following two conditions hold:
December 27, 2017 DRAFT
1) for every closed set Fthere exists a set ?
Fwith P(Ω?
F)=1, such that for each ω?
lim sup
tlog µω
t(F)≤ − inf
xFI(x); (26)
2) for every open set Ethere exists a set ?
Ewith P(Ω?
E)=1, such that for each ω?
lim inf
tlog µω
t(E)≥ − inf
We give here the version of the Varadhan’s lemma which involves sequence of random probability
measures and large deviations principle (LDP) with probability one.
Lemma 5 (Varadhan’s lemma [15]).Suppose that the random sequence of measures µω
tsatisfies the
LDP with probability one, with rate function I, see Definition 4. Then, if for function Fthe tail condition
below holds with probability one,
B+lim sup
tlog Zx:F(x)B
etF (x)ω
t(x) = −∞,(28)
then, with probability one,
tlog Zx
etF (x)ω
t(x) = sup
From (5) and (14), it is easy to see that the likelihood ratio can be expressed through the defined
quantities as:
Lt(Xt) = X
m=1 µmPk∈Tm(st)Xkτm(st)µ2
m=1 Pm
d=1 N1m(st) log p1m×
m=1 µmPk∈Tm(st)Xkτm(st)µ2
The expression in (30) is combinatorial, and its straightforward implementation would require com-
puting Cteψt summands. This is prohibitive when the observation interval tis large. In this paper,
we unveil a simple, linear recursion form for the likelihood Lt(Xt), for t= 1,2, .... We give this result
in the next lemma. To shorten the notation, we introduce functions fm:R7→ R, which we define by
fm(x) := 1
m, for xRand m= 1,2. Recall that e1denotes the first canonical vector in
R(the dimensional vector with 1only in the first position, and having zeros in all other positions),
and 1denotes the vector of all ones in R.
December 27, 2017 DRAFT
Lemma 6. Let Λk=Λ1
k>>evolve according to the following recursion
Λk+1 =Ak+1Λk,(31)
with the initial condition Λ1=ef1(Xk)e>
1, ef2(Xk)e>
1>, and where, for k2, matrix Ak= [A11
is defined by
and A0is, we recall, the lower shift matrix of dimension . Then, the likelihood ratio Lt(Xt)is, for
each t1, computed by
Lt(Xt) =
t,d +p+
where Λm
t,d is the d-th element of Λm
t, for d= 1, ..., and m= 1,2.
Remark. We note that the matrix Akcan be further decomposed as
Dk= diag ef1(Xk)1>, ef2(Xk)1>>, k = 1,2, ..., (35)
i.e., Dkis a random diagonal matrix of size 2∆, modulated by the k-th measurement Xk, and M0is a
sparse, constant matrix of the same dimension, which defines transitions from the current state pattern
to the one in the next time step.
Proof intuition. The intuition behind this recursive form is the following. We break the sum in (30) into
sequences stwhose last phases are of the same type. For sequences that end with state m= 1,Λ1
represents the contribution to the overall likelihood ratio Lt(Xt)of all such sequences whose last phase
is of length d, and similarly for Λ2
t,d. Once the vectors Λ1
t,d and Λ2
t,d are defined, their update is simple.
Consider the value Λ1
t+1,d, where d > 1; this value corresponds to the likelihood ratio contribution of
all sequences st+1 that end with state-1phase of duration d. Since d > 1, the only possible way to
get a sequence of that form is to have a sequence at time tthat ends with the same state, where the
December 27, 2017 DRAFT
duration of the last phase is d1. This translates to the update Λ1
t+1,d =ef1(Xt+1)Λ1
t,d1, where the
choice of f1in the exponent is due to the fact that the last state is st+1 = 1; see also the first line
in (32). On the other hand, if d= 1, then the state at time tmust have been m= 2. The duration of this
previous phase could have been arbitrary from d= 1 to d= ∆. Hence Λ1
t+1,1is computed as the sum
d=1 p2def1(Xt+1)Λ2
t,d, where the probabilities p2dare used to mark that the previous phase is
completed, see the second line in (32). The analysis for Λ2
t+1,d is similar. The formal proof of Lemma 6
is given in Appendix.
A. Error exponent ζas Lyapunov exponent
From Lemma 6 we see that Ltcan be represented as a linear function of the matrix product Πt:=
At·. . . ·A1,
where Akare matrices of the form (32), and p+=hp+
1>, p+
2>i>, where the d-th entry of p+
mequals p+
for m= 1,2,d= 1,2, ..., . Each Akis modulated by the measurement Xkobtained at time k. Since
Xk’s, k= 1,2, ..., are i.i.d., it follows that the matrices Akare i.i.d. as well. Applying a well-known
result from the theory of random matrices, see Theorem 2 in [19], to sequence Akit follows that the
sequence of the negative values of the normalized log-likelihood ratios 1
tlog Lt,t= 1,2, ..., converges
to the Lyapunov exponent of the matrix product Πt. This result is given in Lemma 7 and proven in
Lemma 7. With probability one,
tlog kΠtk= lim
tE0[log kΠtk],(38)
and thus, with probability one,
ζ= lim
tlog kΠtk= lim
tE0[log Lt].(39)
Lemma 7 asserts that the error exponent for hypothesis testing problem (4) equals the top Lyapunov
exponent for the sequence of products Πt. Computation of the Lyapunov exponent (e.g., for i.i.d. matrices)
is a well-known problem in random matrix theory and theory of random dynamical systems, proven to be
very difficult to solve, see, e.g., [14]. We instead search for tractable lower bounds that tightly approximate
ζ. We base our method for approximating ζon the right hand-side identity in (39).
December 27, 2017 DRAFT
Our first step for computing the limit in (39) is a natural one. Since µ10is the guaranteed signal
level (recall that µ2> µ10), we assume that the signal was at all times at state 1, and remove the
corresponding components of the signal to noise ratio (SNR) µ2
2σ2and the signal sum Pt
k=1 Xkfrom the
likelihood ratio. This manipulation then gives us a lower bound on the error exponent. By doing so, we
arrive at an equivalent problem to problem (4) just with µ1= 0. Mathematically, we have
Lt(Xt)= X
σ2µ1 t
Xk(tτ2(st)) µ2
Taking the logarithm, dividing by t, and computing the expectation with respect to hypothesis H0, we
tE0log Lt(Xt)=µ2
tE0"log X
where we used that E0[Xk]=0, for all k, see (4). Taking the limit as t+, we obtain
2σ2+η, (42)
where ηis given by the following limit
η= lim
tE0"log X
the existence of which is guaranteed by (39), in Lemma 7. From now on, we focus on computing η.
Before we proceed, we make a simplification in the expression for ηby replacing the term P(st)with
its analytically more appealing proxy P0(st), see (15). Applying inequality (15) in (43) and using the
fact that 1
tlog pmin 0, as t+, we obtain that the limit in (43) does not change when we replace
P(st)with P0(st), i.e.,
η= lim
tE0"log X
December 27, 2017 DRAFT
For λR, and pS1, introduce the relative entropy function D(λ||p) := P
d=1 λd
1>λlog λd
Theorem 8. There holds η+µ2
2σ2ζ, where ηis the optimal value of the following optimization problem
minimize G(ν)
subject to H(ν1) + H(ν2)ξ2
ν∈ V
where G(ν) = D(ν1||p1) + D(ν2||p2) + θ2
σ2, for νR2∆
Guaranteed error exponent. Since each of the terms in the objective function of (45) is non-negative, its
optimal value is lower bounded by 0. Using relation (42), we obtain that the value of the error exponent
is lower bounded by the value of SNR in state-1,µ2
2σ2, i.e.,
The preceding bound holds for any choice of parameters , p1, p2, µ1and µ2. This result is very intuitive,
as it mathematically formalizes the reasoning that, no matter which configuration of states occurs, signal
level µ1is always guaranteed, and hence the corresponding value of error exponent µ2
2σ2is ensured. In
that sense, any appearance of state 2(i.e., signal level µ2> µ1) can only increase the error exponent.
Special case µ1= 0 and detectability condition. When the signal level in state 1equals zero, then,
since the statistics of Xkfor Sk= 1 is the same as its statistics under H0, effectively we can have
information on the state of nature H1only when state Sk= 2 occurs. Denoting µ=µ2, optimization
problem (45) then simplifies to:
minimize D(ν1||p1) + D(ν2||p2) + θ2
subject to H(ν1) + H(ν2)ξ2
ν∈ V
From (47) we obtain the following condition for detectability of process Sk:
H(p1) + H(p2)q>p2
i.e., if the inequality above holds, then the optimal value of optimization problem (47) is zero. To see
why this holds, note that the point (ν1, ν2, ξ)R2∆+1, where νm=pm/(q>p1+q>p2),m= 1,2,
December 27, 2017 DRAFT
and ξ=q>p2/((q>p1+q>p2))µunder which the cost function of (47) vanishes, under condition (48)
belongs to the constraint set of (47). Thus, under condition (48), the lower bound on the error exponent
ηis zero, indicating that the process Skis not detectable. To further illustrate this condition, note that the
left hand-side corresponds to the entropy of the process Sk, and the right hand-side corresponds to the
expected, i.e. – long-run SNR of the measured signal (q>p2/q>p1+q>p2is the expected fraction of
times that the process was in state 2, and µ2
2σ2is the SNR for this state). Condition (48) therefore asserts
that, if the entropy of the process Skis too high compared to the expected, or long-run, SNR, then it is
not possible to detect its presence. Intuitively, if the dynamics of the phase durations is too stochastic,
then it is not possible to estimate the locations of state 2occurrences, in order to perform the likelihood
ratio test. However, on the other hand, if the SNR is very high (e.g., the level µis high compared to the
process noise σ2) then, whenever state 2occurs, the signal will make a sharp increase and can therefore
be easily detected. The condition in this sense quantitatively characterizes the threshold between the two
physical quantities which makes detection possible.
A. Reformulation of (47)
In this subsection we show that optimization problem (47) admits a simplified form, obtained by
suppressing the dependence on ξthrough inner minimization over this variable. To simplify the notation,
introduce H(ν) = H(ν1) + H(ν2)and R(ν) = q>ν2µ2
2σ2; note that the function Rhas the physical
meaning of the expected SNR of the Stprocess that we wish to detect, for a given sequence type ν.
Lemma 9. Suppose that H(p1) + H(p2)< q>p2/q>p1+q>p2µ2
2σ2. Then, optimization problem (47)
is equivalent to the following optimization problem:
minimize D(ν1||p1) + D(ν2||p2) + pH(ν)pR(ν)2
subject to H(ν)R(ν)
ν∈ V
Proof. Fix ν∈ V. To remove the dependence on ξin (47), for any given fixed ν∈ V, we need to solve
minimize θ2ξ
subject to H(ν)ξ2
where, as before, we denote θ2=q>ν2. Since µ > 0, and the constraint set is defined only through the
square of ξ, the optimal solution of (50) is achieved for ξ0. Thus, (50) is equivalent to
minimize θ2ξ
subject to 0 ξσp2θ2H(ν)
December 27, 2017 DRAFT
The solution of (51) is given by: 1) ξ?=θ2µ, if θ2µσp2θ2H(ν); and 2) ξ?=σp2θ2H(ν), otherwise.
Hence, to solve (47) we can partition its constraint set V=V1SV2according to these two cases,
where V1=nν∈ V :H(ν)θ2µ2
2σ2oand V2=nν∈ V :H(ν)θ2µ2
2σ2o, solve the corresponding
two optimization problems, and finally find the minimum among the two obtained optimal values.
Consider first the case ν∈ V1. Since in this case ξ?=θ2µ, plugging in this value in (51), we have
that the optimization problem (47) with Vreduced to V1simplifies to:
minimize D(ν1||p1) + D(ν2||p2)
subject to ν∈ V1.
If H(p)q>p2
2σ2, then the point 1/q>p1+q>p2pbelongs to V, where p= (p1, p2)and hence
the optimal solution to (52) equals 1/q>p1+q>p2pwith the corresponding optimal value equal to 0.
Suppose now that H(p)<q>p2
2σ2. We show that in this case the solution to (52) must be at the
boundary of the constraint set, in the set of points nν∈ V :H(ν) = θ2µ2
We prove the above claim. Since the entropy function H, see eq. (25), is concave, the constraint set
V1is convex, and since KL divergence Dis convex, we conclude that the problem in (52) is convex.
Also, it can be shown that the Slater point exists [20]. Therefore, the solution to (52) is given by the
corresponding Karush-Kuhn-Tucker (KKT) conditions:
(1 + λ) log ν1d
1>ν1log p1d= 0,for d= 1, ...,
(1 + λ) log ν2d
1>ν2log p2d+λd µ2
2σ2= 0,for d= 1, ...,
2σ2= 0
ν∈ V
From the fourth and fifth condition, we have that either λ= 0, or that λ > 0and H(ν) = q>ν2µ2
Suppose that λ= 0. Then, from the first two KKT conditions we have that the solution νmust satisfy
νmd/1>νm=pmd , for m= 1,2,d= 1, ..., . However, this contradicts with the third condition
(recall that we assumed that H(p)< q>p2µ2
2). Therefore, the solution to (52) must belong to the set
nν∈ V :H(ν) = q>p2/q>p1+q>p2µ2
2σ2o. Since this set intersects with the set V2, we conclude that,
when H(p)< q>p2/q>p1+q>p2µ2
2σ2, then the optimal solution to (47) is found by optimizing over
the smaller set V2⊆ V, i.e., (47) is equivalent to
minimize D(ν1||p1) + D(ν2||p2) + θ2
ν∈ V2.
December 27, 2017 DRAFT
where ξ?(ν) = σp2θ2H(ν). Simple algebraic manipulations reveal that the third term in the objective
above is equal to pH(ν)pR(ν)2. Finally, set V2is precisely the constraint set in (47), and hence
the claim of the lemma follows.
Sum of conditionals as an expectation. For each st∈ St, introduce
and note that, for each stand under H=H0,Xstis Gaussian random variable of mean zero and variance
equal to τ2(st)/t2=θ2(st)/t. The idea is to view the sum in (44) as an expectation of a certain function
gX:St7→ Rdefined over the set Stof all possible sequences st, parameterized by random family (i.e.,
vector) X=Xst:st∈ Xt. More precisely, consider the probability space with the set of outcomes
Stand where an element stof Stis drawn uniformly at random – and hence with probability 1/Ct,
where, we recall Ct=|St|; denote the corresponding expectation by EU. We see that the sum under the
logarithm in (44) equals
gX(st) = CtEUgX(st),(56)
where it is easy to see that gX(st) = P0(st)et(µ2µ1)
2σ2, for st∈ St.
Using further the type Vdefined in Subsection III-A, we can express gX(st)as
gX(st) = et(µ2µ1)
Vmd(st) log pmd .(57)
Induced measure. We see that function gXdepends on stonly through type Vof the sequence and the
values of vector X. More precisely, define F:R2∆ ×R7→ Ras
F(ν, ξ) = µ2µ1
νmd log pmd.(58)
Then, for any st,gX(st) = eF(V(st),Xst). For each vector X, let then QX
t:BR2∆+17→ Rdenote the
probability measure induced by V(st),X(st), for the assumed uniform measure on St:
t(B) := Pst∈St1{(V ,X)B}(st)
for arbitrary B∈ BRN2+N. It is easy to verify that QX
tis indeed a probability measure. Also, we note
that, for any fixed tand X,QX
tis discrete, supported on the discrete set V(st),Xst:st∈ St; note
December 27, 2017 DRAFT
that the latter set is a subset of Vt×st∈StXst– the Cartesian product of the set of all feasible types at
time twith the set of all elements of vector X.
Let EQdenote the expectation with respect to measure QX
t. Then, we have EUgX(St)=EQetF (V,X).
Going back to (56), and using the result of Lemma 2, we obtain for ηgiven in (44):
η=log ψ+ lim
tE0hlog EQhetF (V,X)ii,(60)
where, we recall E0is the expectation with respect to probability P0that corresponds to H0state of
nature, under which measurements Xk– and hence vector Xare generated.
If the measures QX
twere sufficiently nice such that they satisfied the LDP and the moderate growth
condition (28), then one could apply Varadhan’s lemma to compute the exponential growth of the
expectation in the right hand side of (60). However, the measures QX
tare very difficult to analyze
due to the correlations in different elements of Xwhich couple the indicator functions in (59). Hence,
we resort to an upper bound of ηwhich we derive by replacing vector Xby vector Zwith the same
statistical properties, but with an added feature that its elements are mutually independent. More precisely,
for each twe introduce a family of independent Gaussian variables Z=Zst:st∈ St. Further, for
each stthe corresponding element of the family Zstis Gaussian with the same mean and variance as Xst:
expected value equal to 0, and variance equal to Var [Zst] = θ2(st)/t. Denote by Pand E, respectively,
the probability function and the expectation corresponding to the family Zst:st∈ St:t= 1,2, . . ..
Then, the following result holds; the proof is based on Slepian’s lemma [21], and it can be found in
Lemma 10. For each t, there holds,
Ehlog EQhetF (V,Z)iiE0hlog EQhetF (V,X)ii,(61)
where the inner left hand side expectation is with respect to the measures QX
tand the inner right
hand-side expectation is with respect to the measures QZ
The next result asserts that QZ
tsatisfies the LDP with probability one and computes the corresponding
rate function. To simplify the notation, denote q= (1,2,...,∆)>.
Theorem 11. For every measurable set G, the sequence of measures QZ
t,t= 1,2, ..., with probability
one satisfies the LDP upper bound (26) and the LDP lower bound (27), with the same rate function
I:R2∆+1 7→ R, equal for all sets G, which for ν∈ V for which H(ν1) + H(ν2)Jν(ξ)is given by
I(ν, ξ) = log ψH(ν1)H(ν2) + Jν(ξ),(62)
December 27, 2017 DRAFT
and equals +otherwise, and where, for any ν∈ V, function Jν:R7→ Ris defined as Jν(ξ) := 1
The proof of Theorem 11 is given in Appendix.
Having the large deviations principle for the sequence QZ
t, we can invoke Varadhan’s lemma to compute
the limit of the scaled values in (60). Applying Lemma 5 (the details of the moderate growth condition (28)
for QZ
tare given in Appendix, we obtain that, with probability one,
tlog EQhetF (V,Z)i= sup
F(ν, ξ)I(ν, ξ ).(63)
It can be shown that the sequence under the preceding limit is uniformly integrable; the proof of this
result is very similar to the proof of a similar result in the context of hidden Markov models, given in
Appendix E of [12], hence we omit the proof here. Thus, the limit of the sequence values and the limit
of their expected values coincide, i.e.,
tEhlog EQhetF (V,Z)ii= lim
tlog EQhetF (V,Z)i.(64)
Combining with (60), (61), and (63), we finally obtain
η≥ −log ψsup
F(ν, ξ)I(ν, ξ ).(65)
It remains to show that the value of the above supremum equals the value of the optimization problem (45).
Using the definition of I, we have that I(ν, ξ) = +for any (ν, ξ)such that H(ν)< Jθ(ξ)or such that
ν /∈ V. Since the supremum is surely not achieved at these points, set R2∆+1 in (65) can be replaced by
{(ν, ξ) V × R:H(ν)< Jθ(ξ)}. Using the definitions of Fand I, we have
F(ν, ξ)I(ν, ξ ) =
νmd log pmd νmd log νmd
2σ2log ψ. (66)
Cancelling out the term log ψin the preceding equation with the one in (65), and recognizing that
d=1 νmd log pmd νmd log νmd =D(νm||pm), we see that problem (45) is equivalent to the one
in (65). This completes the proof of Theorem 8.
In this section we report our numerical results to demonstrate tightness of the developed performance
bounds. We also illustrate our methodology on the problem of detecting one single run of a dish-washer,
where we use real-world data to estimate the state values for a dish-washer.
In the first set of simulations, we consider the setup in which µ1>0and we compare the error
exponents obtained via simulations to the guaranteed lower bound (46). We simulate a two-state signal,
December 27, 2017 DRAFT
Xt, as an i.i.d. Gaussian random variable with standard deviation σand mean µ1= 2 and µ2= 5 in
states 1and 2, respectively. The duration of each state is random uniform distributed between 1and
∆ = 3. The observation interval is t[1, T ], where T= 200. In the absence of the signal, the data is
distributed according to the Gaussian distribution with mean µ0= 0 and the same standard deviation σ.
To estimate the receiver operating characteristics (ROC) curves, we use J= 100000 Monte Carlo
simulation runs for each hypothesis. For each hypothesis and each simulation run, we compute the
values Lt(Xt), for t= 1,2,3, ..., T , using the linear recursion from Lemma 6. Then, for each t, to obtain
the corresponding ROC curve, we first find the minimal and maximum value Lt,m and Lt,m, respectively,
across Jruns for each hypothesis m, and change the detection threshold γwith a small step size from
Lt,1βto Lt,0+β, where βis a carefully chosen bound. For each tand γthe probability of false alarm
Pfa or false positive, i.e., wrongly determining that the signal is present, is calculated as
fa,t =PJ
j=1 (Lt(Xt
where is an indicator function that returns 1if the corresponding condition is true and 0otherwise,
and Xt
(j)is the j-th realisation of the sequence Xtunder H0. The probability of a miss Pmiss or false
negative, that is, declaring that the signal is not present, though it is, is calculated as:
miss,t =PJ
j=1 (Lt(Xt
(j))< γ)
We set the bound α= 0.01 and find Pα
miss,t =Pγ?
miss,t where γ?resulted in the highest probability of a
miss that satisfied Pγ?
fa,t α.
To investigate the dependence of the slope on the SNR, we fix signal levels µ1and µ2, and pmf’s p1
and p2as described above, and we vary the standard deviation of noise σ. For each different value of σ,
we compute the values of Pα
miss,t, for t= 1, ..., T , and apply linear regression on the sequence of values
log Pα
miss,t for all observation times tfor which the probability of a miss was non-zero. This gives an
estimate for the error exponent (i.e., the slope) for the probability of a miss under a fixed value of σ,
which we denote by Sσ.
Figure 3 plots the probability of a miss (in the logarithmic scale) vs. the number of samples tfor
five different values of σ, namely σ= 10,15,20,25,30. We observe that for large observation intervals
tthe curves are close to linear, as predicted by the theory, see Lemma 7. Further, as σincreases the
magnitude of the slope decreases becoming very close to 0for large values of σ. Figure 4 compares
the error exponent Sσobtained from simulations with the theoretical bound calculated using (46). The
theoretical curve is plotted in red dashed line, while the numerical curve Sσis plotted in blue full line.
For comparison, we also plot the curve µ2
2/(2σ2), which corresponds to the best possible error exponent
December 27, 2017 DRAFT
for the studied setup, obtained when the signal throughout the whole observation interval stays at the
higher signal value µ2> µ1; this curve is plotted in green dotted line. It can be seen from the figure
that the numerical error exponent curve is at all points sandwiched between the lower bound (46) curve
1/(2σ2)and the curve µ2
2/(2σ2). Also, the difference between the numerical error exponent and the
lower bound (46) decreases as σincreases, where the differences become negligible for large σ, showing
that our bound is tight for large values of σ.
Fig. 3: Simulation setup: ∆=3,p1, p2∼ U([1,∆]),µ1= 2,µ2= 5,α= 0.01. Evolution of probability
of a miss, in the logarithmic scale, for σ= 5,10,15,20,25.
In the second set of experiments, we consider the setup where the signal level in state 1is zero, µ1= 0,
and µ2=µ= 1; similarly as in the previous setup, we consider uniform distributions p1, p2∼ U([1,∆]),
with ∆=2. We compare the numerical error exponent with the one obtained as a solution to optimization
problem (49). To solve (49), we apply random search over 106different vectors from set V, and pick
the point which gives the smallest value of the objective (and satisfies the constraint in (49)).
Figure 5 plots probability of a miss vs. number of samples tfor 5different values of σ, in the interval
from 0.2to 0.6. Again, we can observe that linearity emerges with the increase of σ. Figure 6, top,
compares error exponent estimated from the slope in Figure 5 with the theoretical bound calculated from
solving (49). We can see from the plot that the two lines are very close to each other. In fact, we have that
the numerical values are slightly below the lower bound values. This seemingly contradictory effect is a
December 27, 2017 DRAFT
Fig. 4: Simulation setup: ∆=3,p1, p2∼ U([1,∆]),µ1= 2,µ2= 5,α= 0.01.σvaries from 5to 50.
Blue full line plots the numerical error exponent estimated from slope of log Pα
miss,t vs. σ. Red dashed
line plots the theoretical bound µ2
1/(2σ2)in (46). Green dotted line plots function µ2
consequence of the following. As the probability of a miss curves have a concave shape in this simulation
setup (which can be observed from Figure 5) their slopes continuously increase with the increase of the
observation interval. As a consequence, the linear fitting performed on the whole observation interval
is underestimating the slope, as it is trying to fit also the region of values where concavity is more
prominent. To further investigate this effect, we performed linear fitting of probability of a miss curves
only for a region of higher values of t, where emergence of linearity is already evident. In particular,
for each different value of σ, we apply linear fitting for [4/5tmax, tmax], where tmax is the maximal
tfor which the probability of a miss is non-zero, and we plot the results in Figure 6, bottom. It can
be seen from the figure that the numerical curve got closer to the theoretical curve, indicating that the
bound in (49) is very tight or even exact. Finally, it can be seen from Figure 6 (top and bottom) that the
value of σfor which the error exponent is equal to zero matches the threshold predicted by the theory,
σ?=µ/(22 log ∆) = 0.4247, obtained from detectability condition (48).
In the final set of simulations, we demonstrate applicability of the results to estimate the number of
samples needed to detect an appliance run from the smart meter data. To do that, we use measurements of
a dishwasher from the REFIT dataset [2]. REFIT dataset contains 2years of appliance measurements from
20 houses. The monitored dishwasher is a two-state appliance, with mean power values of µ1= 2200W,
December 27, 2017 DRAFT
Fig. 5: Simulation setup: ∆=2,p1, p2∼ U([1,∆]),µ1= 0,µ2= 1,α= 0.01. Plots of probability of
a miss in the logarithmic scale for σ= 0.3,0.33,0.37,0.4,0.45
µ2= 66Wand standard deviation of σ1= 36.6Wand σ2= 18.2W, in states 1and 2, respectively. The
mean value of background noise which is also base-load in that house is µ0= 90 and with standard
deviation σ0= 16.6W. We down sampled dishwasher data with ∆ = 10 to simulate the influence of
noise including base-load and unknown appliances on detecting the appliance. The simulation results are
shown in Figure 7 as plots of Pα
miss,t vs. tfor several values of σbetween the measured σ1and σ2.
As expected, the probability of a miss decreases with the increase of number of samples t. Furthermore,
the number of samples needed for successful detection is about 10.
We studied the problem of detecting a multi-state signal hidden in noise, where the durations of state
occurrences vary over time in a nondeterministic manner. We modelled such a process via a random
duration model that, for each state, assigns a (possibly distinct) probability mass function to the duration
of each occurrence of that state. Assuming Gaussian noise and a process with two possible states, we
derived optimal likelihood ratio test and showed that it has a form of a linear recursion of dimension equal
to the sum of the duration spreads of the two states. Using this result, we showed that the Neyman-Pearson
error exponent is equal to the top Lyapunov exponent for the linear recursion, the exact computation of
December 27, 2017 DRAFT
Fig. 6: Simulation setup: ∆ = 2,p1, p2∼ U([1,∆]),µ1= 0,µ2= 1,α= 0.01.σvaries from 0.2 to
0.6. Blue full line plots the numerical error exponent estimated from slope of log Pα
miss,t vs. σby linear
fitting. Top: linear fitting performed on the whole interval [1, tmax];bottom: linear fitting performed on
[4/5tmax, tmax]. Red dashed line plots the theoretical bound calculated by solving (49)).
December 27, 2017 DRAFT
Fig. 7: Simulation setup: ∆ = 10,p1, p2∼ U([1,∆]),µ1= 66,µ2= 2200,σ= 90,α= 0.01. Plots of
probability of a miss for 5 different σvalues.
which is a well-known hard problem. Using the theory of large deviations, we provided a lower bound
on the error exponent. We demonstrated the tightness of the bound with numerical results. Finally, we
illustrated the developed methodology in the context of NILM, applying it on the problem of detecting
multi-state appliances from the aggregate power consumption signal.
Proof of Lemma 1. Fix an arbitrary sequence st. Let n1=N1(st),n2=N2(st)and n=N(st)denote,
respectively, the number of state-1phases, state-2phases, and the total number of phases in st. Let the
durations of state-1phases (by the order of appearance) in stbe d11, d12 , ..., d1n1, and the durations of
state-2phases be d21, d22, ..., d2n. Recall that o(st)denotes the duration of the last phase in st. Then, if
st= 1, we have
P(st) = P1(St=st)
=P1(D11 =d11, D21 =d21, . . . , D1n1d1n1)
where the second equality follows from the fact that the last phase is state 1and that with the knowledge
of only up to time tit is not certain whether this last phase lasts longer than d1n1, i.e., stretches over time
December 27, 2017 DRAFT
t; the last equality follows from the fact that Dmn’s are i.i.d. for each mand mutually independent for
different m. Adding the missing factor in the product P1(D1n1=d1n1), and dividing the middle term
in (67) by the same factor, yields
P(st) = P1(D1n1d1n1)
P1(Dml =dml).(68)
Similar formula can be obtained for the case when st= 2. Note now that, for every d= 1, ..., m,
P1(Dmnmd) = pmd +pmd+1 +. . . +pm=: p+
md, for m= 1,2. Grouping, for each state, the product
terms with equal durations, and denoting n1d=N1d(st), for d= 1, ..., 1, and n2d=N2d(st), for
d= 1, ..., 2, we obtain that
P(st) = p+
This completes the proof of the lemma.
Proof of Lemma 6. Consider (30) and note that P(st)can be expressed as P(st) = p+
where, we recall o:St7→ Zis an integer-valued function which returns the duration of the last phase
in a sequence st. We break the sum in (30) as follows,
Lt(Xt) =
md X
k=1 fsk(Xk),(70)
To prove the lemma, it suffices to show that, for each m, d, t, the Λm
t,d’s are equal to the corresponding
summands in (70),
t,d := X
k=1 fsk(Xk).(71)
To prove the previous claim, fix m= 1. For t= 1, it is easy to see that Σ1
1,1=ef1(X1), and, since, when
t= 1, there cannot be sequences with last phase longer than 1, we have Σ1
1,d = 0 for all 2d.
Analogous identities can be derived for m= 2. Thus, we have proved that, for t= 1, the summands
t,d = Λm
t,d, for each dand m.
Consider now an arbitrary fixed t2. Consider m= 1 and d= 1. This pair of parameter values
corresponds to sequences that end with state 1with phase of length 1. We thus obtain that st1= 2, and
we can represent this set of sequences as:
st∈ St:st= 1, o(st)=1
l=1 st1,1:st1∈ St1, st1= 2, o(st1) = l.(72)
December 27, 2017 DRAFT
Hence, we can write Σ1
t,1as follows:
l=1 X
k=1 fsk(Xk)
where in the first equality we used that, when o(st1) = l,P0(st1) = p2lP0(st1l)and the last equality
follows by the definition of Σ2
t,l,l= 1,2, ..., in (71).
Consider now m= 1 and d2. Since the last dstates must be state 1, we can represent this set of
sequences as:
st∈ St:st=st1=. . . =std+1 = 1, o(st) = d=
st1,1:st1∈ St1, st1=. . . =st1(d1)+1 = 1,
o(st1) = d1.(74)
Thus, we can write Σ1
t,d as follows:
t,d =ef1(Xt)X
k=1 fsk(Xk)
where, we note that in the first equality we used that P0(std) = P0(st1(d1)).
Representing (73) and (75) in a matrix form (we remark that derivations for m= 2 are analogous),
we recover recursion (31). Since we proved that the initial conditions are equal, i.e., Σm
1= Λm
1, for
m= 1,2, we proved that Σm
t= Λm
tfor all t, which proves the claim of the lemma.
Proof of Lemma 7. To prove the claim, we apply Theorem 2 from [19]. Note that since matrices Ak
are i.i.d., they are stationary and ergodic, and hence they are also metrically transitive, see, e.g., [22].
Therefore the assumptions of the theorem are fulfilled. We now show that the condition of the theorem
holds, i.e., we show that
December 27, 2017 DRAFT
where log+= max{log,0}. It is easy to verify that kAkk ≤ emaxm=1,2|fm(Xk)|CM0, where CM0=kM0k.
Thus, we have
log+kAkk ≤ log+CM0emaxm=1,2|fm(Xk)|
log+CM0+ max
Since Xkis Gaussian, and f1and f2are linear functions, we have that f1(Xk)and f2(Xk)are Gaussian.
Therefore, the expectation of the right hand side of the preceding equation is finite (which can be seen
by bounding E0[|f1(Xk)|]qE0f2
1(Xk)+, and similarly for m= 1). Hence, the condition (76)
follows. By Theorem 2 from [19] we therefore have that
tlog kΠtk= lim
tE[log kΠtk],(78)
which proves (38). To prove (39), we note that Lt=p+>Πt12∆, where p+>0. Thus, there exist
constants cand Csuch that ckΠtk ≤ LtCkΠtk[23]. The claim now follows from the preceding
sandwich relation between Ltand kΠtk.
Proof of Theorem 11.
Fix t1and fix ν∈ Vt. For DR, introduce
t,ν (D) := Pst∈St
where, we recall, Ct,ν is the number of type νfeasible sequences of length t. Let B=C×Dbe a box
in R2∆+1, where Cis a box in R2∆ and D= [a, b]is an interval in R. Then, we have
t(B) = X
t,ν (D).(80)
From (80) it follows that, for each t, for any ν∈ Vtthere holds
t,ν (D)QZ
Further, note that, for each ν∈ Vt, the corresponding elements of the random vector Z,Zst:V(st) = ν,
are i.i.d., Gaussian, with mean 0and variance equal to q>ν2σ2
t. Thus, QZ
t,ν (D)is binomial with Ct,ν
trials and probability of success EQZ
t,ν (D)=qt,ν (D)equal to
qt,ν (D) = Zaxb
p2π q>ν2σetx2
q>ν2σ2dx. (82)
Using the well-known bounds on the Q-function [24], the following bounds on qt,ν(D), for an arbitrary
interval D, are straightforward to show.
December 27, 2017 DRAFT
Lemma 12. Fix  > 0. Then, for any D= [a, b],a<b, there holds
etetinf aηbJν(η)qt,ν (D)etetinf aηbJν(η)(83)
for each ν∈ Vt, and all tsufficiently large.
We next show that the random measures QZ
t,ν approach their expected values qt,ν as tincreases.
Lemma 13. Fix an arbitrary  > 0.
1) With probability one,
t,ν (D)qt,ν (D)et,(84)
for all ν∈ Vt, for all tsufficiently large.
2) Let νt∈ Vt,t= 1,2, ..., be a sequence of types converging to ν?∈ V. Then, with probability one,
for all tsufficiently large
t,νt(D)qt,ν?(D)(1 ).(85)
The proof of part 1 of Lemma 13 can be obtained by considering separately the cases: 1) inf(ν,ξ)BJν(ξ)
H(ν)<0and 2) inf(ν,ξ)BJν(ξ)H(ν)0. Then, in each of the two cases the claim can be obtained
by a corresponding application of Markov’s inequality on a conveniently defined sequence of sets in .
In case 1), we use At=ω:QZ
t,ν (D)qt,ν (D)et,for some νC∩ Vt. Applying the union bound,
together with fact that the the cardinality of Vtis polynomial in t(|Vt| ≤ (t+ 1)2∆), we obtain from
condition 1) that the probabilities P(At)decay exponentially with t. The claim in 1 then follows by the
Borel-Cantelli lemma. Similar arguments can be derived for case 2), where in the place of set At, set
ν1{ZstD}(st)1,for some νC∩ Vtois used. For details we refer the reader to
Section V-A in [13].
By defining Ct=nω:QZ
qt,νt(D)1, for some νC∩ Vtoand applying Chebyshev’s inequality,
the proof of part 2 can be derived similarly as in the proof of part 1. For details, see the proof of
Lemma 13 in [13].
Having the preceding technical results, we are now ready to prove the LDP for the sequence QZ
t. We
first prove the LDP upper bound, and then turn to the LDP lower bound.
Proof of the LDP upper bound. We break the proof of the LDP upper bound into the following steps.
In the first step, we show that the LDP upper bound holds with probability one for all boxes in R2∆+1.
In the second step, we extend the claim to all compact sets via the standard finite cover argument [15].
Finally, in the third step, we move from compact sets to closed sets by using the fact that Ihas compact
December 27, 2017 DRAFT
Step 1: LDP for boxes Let B=C×Dbe an arbitrary closed box in R2∆+1, where Cis a box in
R2∆ and Dis a closed interval in R. To prove the LDP upper bound for box B, we need to show that
there exists a set ?
1= Ω?
1(B)which has probability one, P(Ω?
1)=1, such that for every ω?
1, there
lim inf
tlog QZ
t(B)≤ −I(B),(86)
where I(B) := inf(ν,ξ)BI(ν, ξ). To this end, fix  > 0. Applying Lemma 2, Lemma 3, Lemma 12, and
part 1 of Lemma 13, together with (80), we have
e4tetinf ξDJνt(ξ)+tH(νt)tlog ψ(87)
≤ |Vt|e4tetlog ψtinf νC∩V infξDJν(ξ)H(ν),(88)
which holds with probability one for all tsufficiently large. Dividing by t, taking the limit t+,
and letting 0, the upper bound for boxes follows.
Step 2: LDP for compact sets The extension of the upper bound to all compact sets in R2∆+1 can be
done by picking an arbitrary closed set F, covering it with a family of boxes Bof the form as in Step
1, where a ball of a conveniently chosen size is assigned to each point of F, and finally extracting a
finite cover of F. As this is a standard argument in the proof of LDP upper bounds, we omit the details
of the proof here and refer the reader to [15] (see, e.g., the proof of Cram´
er’s theorem in Rd, Chapter
2.2.2 in [15]).
Step 3: LDP for closed sets Since the rate function has compact domain, LDP upper bound for compact
sets implies LDP upper bound for closed sets. This completes the proof of the upper bound.
Proof of the LDP lower bound. Let Ube an arbitrary open set in R2∆+1. To prove the LDP lower
bound we need to show that there exists a set ?
2= Ω?
2(U)which has probability one, P(Ω?
2)=1, such
that for every ω?
2, there holds
lim inf
tlog QZ
t(U)≥ −I(U).(89)
Since Iis non-negative at any point of its domain, it follows that I(U)can either be a finite non-negative
number or +. In the latter case the lower bound holds trivially, hence we focus on the case I(U)<+.
For any point ν∈ V, we define a sequence of types νt∈ Vtconverging to ν, by picking, for each
t1, an arbitrary closest neighbor of νin the set Vt2, i.e.,
2Since Vtgets denser with t, the sequence νtindeed converges to ν.
December 27, 2017 DRAFT
Now note that by the fact that I(U)is an infimal value, for any δ > 0there must exist (ν, ξ)U
such that I(ν, ξ)I(U) + δ. If for (ν, ξ )there holds H(ν)Jν(ξ)>0, we assign ν?=νand ξ?=ξ.
Otherwise, we can decrease ξin absolute value to a new point ξ0such that (ν, ξ0)still belongs to U(note
that this is feasible due to the fact that Uis open), and for which the strict inequality H(ν)Jν(ξ0)>0
holds. Assigning ξ?=ξ0we prove the existence of (ν?, ξ?)Usuch that
I(ν?, ξ?)I(U) + δ(91)
Let νtdenote a sequence of points obtained from (90) converging to ν?. Since Uis open, there exists
a box Bcentered at (ν?, ξ?)that entirely belongs to U. This implies that there exists a closed interval
DRsuch that, for sufficiently large t,νt×DU. By the inequality in (81), it follows that
t({νt} × D) = Ct,νt
Combining the lower bound on qt,νt(D)from Lemma 12 with part 2 of Lemma 13, we obtain that for
sufficiently large t,
t(U)qt,νt(D)(1 )Ct,νt
e3tetinf ξDJν?(ξ)+H(νt)log ψ(1 ).
Taking the logarithm and dividing by t, we obtain
tlog QZ
t(U)≥ −3inf
ξDJν?(ξ) + H(νt)log ψ+log(1 )
As t+,νtν?, and by the continuity of Hwe have that H(θt)H(θ?). Thus, taking the limit
in (93) yields
lim inf
tlog QZ
t(U)≥ −3inf
ξDJν?(ξ) + H(ν?)log ψ
≥ −3I(ν?, ξ?),
where in the last inequality we used the fact that ξ?D. The latter bound holds for all  > 0, and hence
taking the supremum over all  > 0yields
lim inf
tlog QZ
t(U)≥ −I(ν?, ξ?)
≥ − inf
(ν,ξ)UI(ν, ξ)δ.
Recalling that δwas chosen arbitrarily, the lower bound is proven.
Proof of Lemma 10. For reference, we state here the Slepian’s lemma that we use in our proof.
December 27, 2017 DRAFT
Lemma 14. (Slepian’s lemma [21]) Let the function φ:RL7→ Rsatisfy
kxk→+φ(x)eαkxk2= 0,for all α > 0.(94)
Suppose that φhas nonnegative mixed derivatives,
∂xlxm0,for l6=m. (95)
Then, for any two independent zero-mean Gaussian vectors Xand Ztaking values in RLsuch that
l] = EZ[Z2
l]and EX[XlXm]EZ[ZlZm]there holds EX[φ(X)] EZ[φ(Z)], where EXand EZ,
respectively, denote expectation operators on probability spaces on which Xand Zare defined.
Proof. For each fixed tdefine function φt:RCt7→ R,
φt(x) := log X
where xstis an element of a vector x=xst:st∈ StRCt, whose index is st, and where each
function γstis defined through function gX, given in (56), as γst(xst) := log(gx(st)). Since each γst(xst),
st∈ St, grows linearly in x, we have that condition (94) is fulfilled.
Further, it is straightforward to show that the second partial derivative of φtis given by
which is always non-negative, and hence condition (97) is also fulfilled.
We next verify the conditions of the lemma on the vectors Xand Z. Since for the same sequence st,
the corresponding Xstand Zsthave the same Gaussian distribution (of mean zero and variance equal to
q>V2(st)σ2/t, there holds E0[X2
st] = E[Z2
st]. Further, it is easy to see that E0[XstXst0] = Pk:sk=s0
On the other hand, since Zstand Z0
stare independent for st6=st0, and they are both zero mean, we
have E[ZstZst0] = 0. Therefore, the last condition of the Slepian’s lemma is fulfilled. Hence, the claim
of Lemma 10 follows.
Proof of the moderate growth of QZ
Proof. Conditions that define set Vimply that 1/(2∆) q>ν21,ν0, and hence Vis compact.
Further, condition H(ν1) + H(ν2)1
q>ν2ξ2, which defines the domain of the rate function I, implies
2 log ∆ ξ2. Thus, ξmus be bounded in order for Ito be finite, which combined with the fact that ν
must belong to Vwhich is compact, shows that Ihas compact domain. Let B0be a box that contains
the domain of I, and let M0:= max(ν,ξ)B0F(ν, ξ). Since function Fis continuous, it must achieve
maximum on B0, which we denote by M0. It follows that for each MM0, with probability one, the
integral in (28) equals zero for all tsufficiently large. Thus, condition (28) is fulfilled.
December 27, 2017 DRAFT
[1] G. W. Hart, Nonintrusive Appliance Load Data Acquisition Method: Progress Report. Massachusetts Institute
of Technology. Energy Laboratory and Electric Power Research Institute., 1984. [Online]. Available: https:
[2] D. Murray, L. Stankovic, and V. Stankovic, “An electrical load measurements dataset of United Kingdom households from
a two-year longitudinal study,Scientific data, vol. 4, p. 160122, 2017.
[3] J. Liao, G. Elafoudi, L. Stankovic, and V. Stankovic, “Non-intrusive appliance load monitoring using low-resolution smart
meter data,” in 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm), Nov 2014, pp.
[4] K. He, L. Stankovic, J. Liao, and V. Stankovic, “Non-intrusive load disaggregation using graph signal processing,” IEEE
Transactions on Smart Grid, vol. PP, no. 99, pp. 1–1, 2017.
[5] B. Zhao, L. Stankovic, and V. Stankovic, “On a training-less solution for non-intrusive appliance load monitoring using
graph signal processing,” IEEE Access, vol. 4, pp. 1784–1799, 2016.
[6] O. Parson, S. Ghosh, M. J. Weal, and A. Rogers, “Non-intrusive load monitoring using prior models of general appliance
types.” in AAAi, 2012.
[7] J. Z. Kolter and T. Jaakkola, “Approximate inference in additive factorial HMMs with application to energy disaggregation,
in Artificial Intelligence and Statistics, 2012, pp. 1472–1482.
[8] C. Uggen, “Work as a turning point in the life course of criminals: A duration model of age, employment, and recidivism,”
American Sociological Review, vol. 65, no. 4, pp. 529–546, Aug. 2000.
[9] J. R. Russell and R. F. Engle, “A discrete-state continuous-time model of financial transactions prices and times: The
autoregressive conditional multinomial-autoregressive conditional duration model,” Journal of Business and Economic
Statistics, vol. 23, no. 2, pp. 166–180, April 2005.
[10] M. Ting, A. O. Hero, D. Rugar, C.-Y. Yip, and J. A. Fessler, “Near-optimal signal detection for finite-state Markov signals
with application to magnetic resonance force microscopy,IEEE Transactions on Signal Processing, vol. 54, no. 6, pp.
2049–2062, June 2006.
[11] M. Ting and A. O. Hero, “Detection of a random walk signal in the regime of low signal to noise ratio and long observation
time,” in 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 3, May 2006.
[12] A. Agaskar and Y. M. Lu, “Optimal detection of random walks on graphs: Performance analysis via statistical physics,
April 2015,
[13] D. Bajovi´
c, J. M. F. Moura, and D. Vukobratovic, “Detecting random walks on graphs with heterogeneous sensors,” July
[14] J. N. Tsitsiklis and V. D. Blondel, “The Lyapunov exponent and joint spectral radius of pairs of matrices are hard—when
not impossible—to compute and to approximate,” Mathematics of Control, Signals and Systems, vol. 10, no. 1, pp. 31–40,
March 1997. [Online]. Available:
[15] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications. Boston, MA: Jones and Barlett, 1993.
[16] Y. Sung, L. Tong, and H. V. Poor, “Neyman-Pearson detection of Gauss-Markov signals in noise: closed-form error exponent
and properties,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1354–1365, April 2006.
[17] P.-N. Chen, “General formulas for the Neyman-Pearson type-II error exponent subject to fixed and exponential type-I error
bounds,” IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 316–323, Jan 1996.
[18] I. Flores, “Direct calculation of k-generalized Fibonacci numbers,Fibonacci Quarterly, vol. 5, no. 3, pp. 259–266, 1967.
December 27, 2017 DRAFT
[19] H. Furstenberg and H. Kesten, “Products of random matrices,” Ann. Math. Statist., vol. 31, no. 2, pp. 457–469, 06 1960.
[Online]. Available:
[20] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, United Kingdom: Cambridge University Press, 2004.
[21] O. Zeitouni, “Gaussian Fields,” March 2016, Lecture notes. Courant institute, New York.
[22] C. Shalizi, “Stochastic processes,” 2007, Lecture notes. Stochastic processes. Carnegie Mellon University.
[23] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, United Kingdom: Cambridge University Press, 1990.
[24] A. F. Karr, Probability. Springer Texts in Statistics. New York: Springer-Verlag, 1993.
December 27, 2017 DRAFT