Content uploaded by Lina Stankovic

Author content

All content in this area was uploaded by Lina Stankovic on Aug 29, 2018

Content may be subject to copyright.

1

Optimal detection and error exponents for

hidden multi-state processes via random

duration model approach

Dragana Bajovi´

c1, Kanghang He2, Lina Stankovi´

c2, Dejan Vukobratovi´

c1, and Vladimir

Stankovi´

c2

Abstract. We study detection of random signals corrupted by noise that over time switch their values

(states) from a ﬁnite set of possible values, where the switchings occur at unknown points in time.

We model such signals by means of a random duration model that to each possible state assigns a

probability mass function which controls the statistics of durations of that state occurrences. Assuming

two possible signal states and Gaussian noise, we derive optimal likelihood ratio test and show that it

has a computationally tractable form of a matrix product, with the number of matrices involved in the

product being the number of process observations. Each matrix involved in the product is of dimension

equal to the sum of durations spreads of the two states, and it can be decomposed as a product of

a diagonal random matrix controlled by the process observations and a sparse constant matrix which

governs transitions in the sequence of states. Using this result, we show that the Neyman-Pearson error

exponent is equal to the top Lyapunov exponent for the corresponding random matrices. Using theory

of large deviations, we derive a lower bound on the error exponent. Finally, we show that this bound is

tight by means of numerical simulations.

Keywords. Multi-state processes, random duration model, hypothesis testing, error exponent, large de-

viations principle, threshold effect, Lyapunov exponent.

1D. Bajovi´

c and D. Vukobratovi´

c are with Department of Power, Electronic and Communications Engineering, Faculty of

Technical Sciences, University of Novi Sad, Novi Sad, Serbia (e-mail: {dbajovic, dejanv@uns.ac.rs}).

2K. He, L. Stankovi´

c and V. Stankovi´

c are with Department of Electronic and Electrical Engineering, University of Strathclyde,

Glasgow, G1 1XW, UK (e-mail: {kanghang.he, vladimir.stankovic, lina.stankovic}@strath.ac.uk)

December 27, 2017 DRAFT

arXiv:1712.09061v1 [cs.IT] 25 Dec 2017

2

I. INTRODUCTION

The problem of detecting a signal hidden in noise is investigated. The signal to be detected is

characterised as having a constant magnitude in any one state and can transition to multiple states over

time. Each occurrence of a particular state has a random duration, modelled as a discrete random variable

which takes values in a ﬁnite set of integers, according to a certain probability mass function associated

with that state. For each given state, duration of its occurrences over time are independent and identically

distributed random variables, independent of duration of other states.

Our main motivation for studying the described model comes from non intrusive appliance load

monitoring (NILM) problem, i.e., detecting one or more particular appliance states, each of unknown

duration, within an aggregate power signal, as obtained from smart meters. With the large-scale roll-out

of smart meters worldwide, there has been increased interest in NILM, i.e., disaggregating total household

energy consumption measured by the smart meter down to appliance level using purely software tools

[1]. NILM can enrich energy feedback, it can support smart home automation [2], appliance retroﬁt

decisions, and demand response measures [3].

Despite signiﬁcant research efforts in developing efﬁcient NILM algorithms (see [3], [4],[5], [6], [7] and

references therein), NILM is still a challenge, especially at low sampling rates, in the order of seconds and

minutes. One obstacle is lack of standardised performance measures and appropriate theoretical bounds

of detectability of appliance usage, which can help estimating performance of various algorithms. A

particularly challenging problem is the detection of multi-state appliances, i.e., appliances whose power

consumption switches over one appliance runtime through several different values. Examples of such

appliances are a dish-washer or a washing machine, where the chosen program or setting and possibly

also the appliance load (e.g., with the washing machine) determines duration that the appliance spends

in each state. The difﬁculty there arises from the fact that the program and the load, unknown from the

perspective of NILM, are non-deterministic, i.e., vary each time the same appliance is run resulting in

difﬁculty in detecting in which state the appliance is. The aggregate signal minus the appliance load is

considered noise for the detection problem.

The above model is also representative of signals occurring in a range of other applications. In

econometrics, examples of duration signals include marital or employment status, or in general the time

an individual spends in a certain state [8]. Further examples from econometrics are time to currency

alignment or time to transactions in stock market [9]. In communication systems theory pulse-duration

modulated (PDM) signals for transmitting information encoded into the pulse duration have two possible

signal states: the positive value state is a pulse whose duration is proportional to the information symbol

December 27, 2017 DRAFT

3

to be encoded, and the zero-value state in between any two pulses. The probability distribution of the

state duration is then controlled by the probability distribution on the set of information symbols to

be transmitted. Further binary state examples are random telegraph signals, where the signal switches

between two values in a random manner1, and the activity pattern of a certain mobile user in a cellular

communication system.

In this paper, we are interested in deriving optimal detection tests for detecting multi-state signals

with random duration structure hiding in noise. We consider binary models, where occurrences of two

possible states are interleaved in time. Further, we are interested in characterizing performance of optimal

detection tests measured in terms of Neyman Pearson error exponent. Works on detecting multi-state

signals hidden in noise, most related to our work, include [10], [12] and [13]. However, in contrast to the

random duration model that we propose, these references model multi-state signals in noise as hidden

Markov chains. Reference [10] considers random telegraph signals modelled as binary Markov chains

and derives the corresponding optimal detection test in the form of a product of certain measurement

deﬁned matrices. Reference [12] considers detection of a random walk on a graph, and derives bounds

on the error exponent for the Neyman-Pearson detection test. Reference [13] uses the method of types to

generalize the results from [12] to non-homogeneous setting where different nodes have different signal-

to-noise ratios (SNR) with respect to the walk. Furthermore, reference [13] proves that the derived bound

on the error exponent has a convex optimization form.

Contributions. In this paper, we show that the optimal detection test, seemingly combinatorial in nature,

admits a simple, linear recursion form of a product of matrices of dimension equal to the sum of the

duration spreads for the two states. Using the preceding result, we show that the Neyman-Pearson error

exponent for this problem is given by the top Lyapunov exponent [14] for the matrices that deﬁne

the recursion. The matrices have a structure of an interleaved random diagonal and (sparse) constant

component that deﬁnes transitions from one state pattern to another. Thus, we reveal that a similar

structural effect as with the error exponent for hidden Markov processes occurs here as well [10],[13].

Finally, using the theory of large deviations [15], we derive a lower bound on the error exponent and

demonstrate by numerical simulations that the derived bound is very close to the true error exponent.

Paper outline. Section II states the problem setup and Section III gives the preliminaries. Section IV

gives main results on the form of the optimal likelihood ratio test. Section V provides the lower bound on

1We remark that there are other stochastic models in the literature for the random telegraph signal, e.g., the Poisson model,

or the hidden Markov chain model [10][11].

December 27, 2017 DRAFT

4

the error exponent, while Section VI proves this result. Finally, numerical results are given in Section VII

and Section VIII concludes the paper.

Notation. For an arbitrary integer n,Sn−1denotes the probability simplex in Rn;e1denotes the ﬁrst

canonical vector and the vector (the ndimensional vector with 1only in the ﬁrst position, and having

zeros in all other positions), and 1the vector of all ones, where we remark that the dimension should

be clear from the context; A0denotes the lower shift matrix (the 0/1matrix with ones only on the ﬁrst

subdiagonal). We denote Gaussian distribution of mean value µand standard deviation σby N(µ, σ2);

by p[1, n]an arbitrary distribution over the ﬁrst nintegers; by U[1, n]the uniform distribution over the

ﬁrst nintegers; log denotes the natural logarithm.

II. PRO BL EM S ET UP

We consider the problem of detecting a signal corrupted by noise that randomly switches from one state

mto another, where m= 1,2, ..., M and in each state the signal has a certain magnitude µm. The duration

that the signal spends in a given state mis modelled as a discrete random variable on a given support set

[1,∆m], and with a certain probability mass function (pmf) deﬁned by vector pm∈S∆m−1. In this work,

we consider the case when M= 2 and we assume that for each state mwe know the corresponding

value of the observed signal µm. Without loss of generality, we will assume that µ2> µ1≥0. For each

sampling time t= 1,2, ..., let St={S1, ..., St}denote the sequence of states until time tof the signal

that we wish to detect, where for each k= 1, ..., t,Sk∈ {1,2}; similarly, we denote S∞={S1, S2, ...}.

We assume that, with probability one, the ﬁrst state is S1≡1, and, for the purpose of analysis, we set

S0≡2. Let Xkdenote the signal measurement for sample time k,k= 1, ..., t, and, for each t, collect all

measurements up to time tin vector Xt= (X1, ..., Xt). We assume that each measurement is corrupted

by a zero mean additive Gaussian noise N(0, σ2), where standard deviation σ > 0.

The sequence of switching times. For the sequence of states S1, S2, ..., we deﬁne the sequence of times

{T1, T2, ...}, when the signal in the sequence switches from one state to another, i.e.,

Ti+1 = max{k≥Ti+ 1 : Sk=STi+1 },for i= 0,1,2, ... (1)

where we set T0≡0. We call a phase each time window [Ti+ 1, Ti+1],i= 0,1,2, . . ., and note that

during any phase, the sequence S∞stays in the same state. Since S1≡1, all odd-numbered intervals

[T0+ 1, T1],[T2+ 1, T3],..., where the ordering is with respect to the order of appearance, are state 1

phases, and all even-numbered intervals [T1+ 1, T2],[T3+ 1, T4],... are state 2phases.

December 27, 2017 DRAFT

5

Random duration model. For n= 1,2, ..., we denote by D1,n the difference process

D1,n =T2n−1−T2n−2,(2)

or, in words, for each n,D1,n is the duration of the n-th state-1phase in the sequence S∞. We assume

that durations of state-1phases are independent and identically distributed (i.i.d.), with support set of all

integers in the ﬁnite interval [1,∆1], and with pmf given by vector p1= (p11, p12, ..., p1∆1)∈S∆1−1.

Similarly, we deﬁne

D2,n =T2n−T2n−1(3)

to be the duration of the n-th state-2phase in the sequence S1, S2, ..., for n= 1,2, ...; we assume that

the D2,n’s are i.i.d., with support set of all integers in the interval [1,∆2], and pmf given by vector

p2= (p21, p22, ..., p2∆2)∈S∆2−1. We also assume that durations of state-1and state-2phases are

mutually independent.

Hypothesis testing problem. Using the preceding deﬁnitions, we model the signal detection problem as

the following binary hypothesis testing problem:

H0:Xk

i.i.d.

∼ N(0, σ2)(4)

H1:Xk|Stindep.

∼

N(µ1, σ2),if Sk= 1

N(µ2, σ2),if Sk= 2

,for k= 1, ..., t,

where D1,n ∼p1(1,∆1)are i.i.d., D2,n ∼p2(1,∆2)are i.i.d., D1n’s and D2n’s are independent, and

S1≡1. We remark that the model above easily generalizes to the case when the signals Xkare under

both hypotheses shifted for some µ0∈R, i.e., when, under H=H0,Xk∼ N(µ0, σ2)and, under

H=H1,Xk∼ N(µSk+µ0, σ2); see the example of appliance detection problem later in this section.

The latter hypothesis testing problem reduces to the one in (4) by means of the change of variables

Yk=Xk−µ0.

Illustration: Multiphase appliance detection. Suppose that we wish to detect an event that a certain

appliance in a household is switched on. We consider classes of appliances whose signature signals exhibit

a multistate (multiphase) type of behavior, such as switching from high to low signal values, where the

durations of phases of the same signal level can be different across a single appliance run-time and

also in different run-times of the same appliance. Examples of appliances whose signatures fall into this

class are, e.g., a dishwasher and a washer-dryer. This problem can be modelled by the hypothesis testing

problem (4) where µ1corresponds to the appliance consumption when in low state and µ2corresponds to

the appliance consumption when in high state. In this scenario, there is an underlying baseline load which

December 27, 2017 DRAFT

6

can also be modelled as a Gaussian random variable of expected value µ0and standard deviation σ2.

Since the same baseline load is present both under H0and H1, to cast the described appliance detection

problem in the format given in (4), we simply subtract the value µ0from the observed consumption

signal Xk.

Likelihood ratio test and Neyman-Pearson error exponent. We denote the probability laws corre-

sponding to H0and H1by P0and P1, respectively. Similarly, the expectations with respect to P0and

P1are denoted by E0and E1, respectively. The probability density functions of Xtunder H1and H0are

denoted by f1,t(·)and f0,t (·). It will also be of interest to introduce the conditional probability density

function of Xtgiven St=st(i.e., the likelihood functions), which we denote by f1,t|St(·|st), for any

st. Finally, the likelihood ratio at time tdenoted by Lt, and at a given realization of Xtis computed by

Lt(Xt) = f1,t(Xt)

f0,t(Xt).

It is well known that the optimal detection test (both in Neyman-Pearson and Bayes sense) for

problem (4) is the likelihood ratio test. Conditioning on the state realizations until time t,St=st,

and denoting shortly P(st) = P1(St=st), we have

Lt(Xt) = X

st∈St

P(st)f1,t|St(Xt|st)

f0,t(Xt)

=X

st∈St

P(st)Qt

k=1 1

√2πσ e−(µsk−Xk)2

2σ2

Qt

k=1 1

√2πσ e−X2

k

2σ2

.(5)

In this paper our goal is to ﬁnd a computationally tractable form for the optimal, likelihood ratio test

and also to characterize its asymptotic performance, when the number of samples Xkgrows large. In

particular, with respect to performance characterization, we wish to compute the error exponent for the

probability of a miss, under a given bound αon the probability of false alarm:

lim

t→+∞−1

tlog Pα

miss,t =: ζ, (6)

where Pα

miss,t is the minimal probability of a miss among all decision tests that have probability of false

alarm bounded by α. By results from detection theory, e.g., [16],[17], the ζin (6) is given by the

asymptotic Kullback-Leibler rate in (7), provided that this limit exists

ζ= lim

t→+∞−1

tlog Lt(Xt).(7)

We prove the existence of the limit in (7) in Lemma 7 in Section V further ahead. An illustration

of the identity (6) is given in Figure 1, which clearly shows that both sequences −1

tlog Pα

miss,t and

−1

tlog Lt(Xt)are convergent and moreover that they converge to the same value – the asymptotic

December 27, 2017 DRAFT

7

Kullback-Leibler rate for the two hypothesis deﬁned in (4). For further details on this simulation see

Section VII.

Fig. 1: Simulation setup: ∆=3,p1, p2∼ U([1,∆]),µ1= 2,µ2= 5,σ= 10,α= 0.01. Green full line

plots the evolution of −1

tlog Lt; blue dotted line plots the evolution of −1

tlog Pα

miss,t, and red dashed

line plots the estimated slope of the probability of a miss values (in the logarithmic scale) calculated for

values until t= 300 observations.

III. PRELIMINARIES

In this section we now introduce a number of quantities related with the sequences st∈ St,t= 1,2, ...,

and give certain results pertaining to these quantities that will be useful for our analysis.

Statistics for the durations of phases. For each t, we deﬁne the sets of discrete times until time tin

which the signal was in states 1and 2, which we respectively denote by T1and T2:

T1(st) = {1≤k≤t:sk= 1},(8)

T2(st) = {1≤k≤t:sk= 2}.(9)

We denote cardinalities of T1and T2, respectively, by τ1and τ2, i.e., τ1≡ |T1|and τ2≡ |T2|. Note

that functions T1and T2are, strictly speaking, dependent on time t(this dependence is observed in their

domain sets Stwhich clearly change with time t). However, for reasons of easier readibility, we suppress

this dependence in the notation, as we also do for all the subsequently deﬁned quantities.

December 27, 2017 DRAFT

8

For each t, for each st, we also introduce N1and N2to count the number of state-1and state-2phases,

respectively, in the sequence st:

N1(st) = |{1≤k≤t:sk−1= 2, sk= 1}| (10)

N2(st) = |{1≤k≤t:sk−1= 1, sk= 2}|,(11)

where, since the ﬁrst phase is state-1phase, we set s0≡2. We remark that, for any sequence st, if the

last state st= 2, then N1(st) = N2(st), and if st= 1, then N1(st) = N2(st)+1. Finally, N(st)is the

total number of phases in st,N≡N1+N2.

We further deﬁne the sets Tmn(st)that contain time indices for the n-th state-mphase, n= 1, ..., Nm(st),

m= 1,2. Note that, for each m= 1,2,∪Nm(st)

n=1 Tmn(st) = Tm. We now increase granularity in the counts

N1and N2and deﬁne

N1d(st) =

N1(st)

X

n=1

1{|T1n|=d}(st),for d= 1, ..., ∆1,(12)

N2d(st) =

N2(st)

X

n=1

1{|T2n|=d}(st),for d= 1, ..., ∆2;(13)

i.e., in words, vectors (Nm1, ..., Nm∆),m= 1,2, represent histograms of phase 1and phase 2durations.

It is easy to see that Nm=P∆m

d=1 Nmd, for m= 1,2. Also, for each time tand each sequence st,

the total number of state 1and state 2occurrences must sum up to t, and therefore P∆1

d=1 d N1d(st) +

P∆2

d=1 d N2d(st) = t.

Figure 2 shows an example of simulation signals under Hypothesis H1with ∆ = 10, µ1= 3, µ2= 5

and σ= 0.05 using random duration model for various switching times T, difference process durations

Dk,i and numbers of different state-phases with ﬁxed duration Nk,d. We can see from the ﬁgure that

D1,1=T1−T0= 8 as shown in eq. (2) and there is only one state-phase 1last for 8samples,

hence N1,8= 1. Again, from eq. (3) we can see from the ﬁgure again that D2,1=T2−T1= 8 and

D2,3=T6−T5= 8. Thus N2,8= 2 for there are two state-phase 2last for 8samples.

To simplify the notation, let o(st)return the duration of the last phase in the sequence st, and note

also that streturns the type of the last phase in st. The next lemma computes the probability of a given

sequence st,P(st) = P1St=st.

Lemma 1. For any sequence st, there holds

P(st) = p+

sto(st)

psto(st)

∆1

Y

d=1

pN1d(st)

1d

∆2

Y

d=1

pN2d(st)

2d,(14)

where by p+

ml we shortly denote p+

ml =pml +pml+1 +... +pm∆m, for l= 1,2, ..., ∆mand m= 1,2.

December 27, 2017 DRAFT

9

Fig. 2: Example of simulation signals with ∆ = 10, µ1= 3, µ2= 5 and σ= 0.05 and various T,Dk,i ,

and Nk,d.

The proof of Lemma 1 is given in Appendix. Besides function Pwhich returns the exact probability of

occurrence of sequence st, it will also be of interest to deﬁne a related function P0:{1,2}t7→ R, deﬁned

through Pby leaving out the ﬁrst factor in (14), i.e., P0(st) = psto(st)

p+

sto(st)

P(st)(note that the assumption

that p1, p2>0(entrywise) ensures that P0is always well deﬁned). Let pmin = min{pmd :m= 1,2, d =

1, ..., ∆m}and note that, for any mand d,pmd ≤p+

md ≤1(this relation can be easily seen from the

deﬁnition of p+

md). Thus, the following relation holds between Pand P0:

P0(st)≤P(st)≤1

pmin

P0(st).(15)

For increasing t, the two functions will have equal exponents, that is, the effect of the factor 1

pmin will

vanish, and thus in our subsequent analyses we will use the analytically more appealing function P0.

Further, to simplify the analysis, in what follows we will assume that ∆1= ∆2=: ∆.

We let Stdenote the set of all feasible sequences of states stof length t, i.e., the sequences for which

P1(St=st)>0; we let Ctdenote the cardinality of St. When p1and p2are strictly greater than zero,

it can be shown that Ctequals the number of ways in which integer tcan be partitioned with parts

bounded by ∆. This number is known as the ∆-generalized Fibonacci number, and is computed via the

following recursion:

Ct=Ct−1+. . . +Ct−∆,(16)

December 27, 2017 DRAFT

10

with the initial condition C1= 1. The recursion in (16) is linear and hence can be represented in the

form e

Ct=Ae

Ct−1, where e

Ct= [CtCt−1. . . Ct−∆+1 ]and Ais a square, ∆×∆matrix; it can be shown

that Ais equal to A=e11>+A0, where, we recall, A0is the lower shift matrix of dimension ∆. The

growth rate of Ctis given by the largest zero of the characteristic polynomial of A, as the next result,

which we borrow from [18] asserts.

Lemma 2. [Asymptotics for ∆-generalized Fibonacci number [18]] For any , there exists t0=t0()

such that

et(ψ−)≤Ct≤et(ψ+),(17)

where ψis the unique positive zero of the following polynomial ψ∆−ψ∆−1−. . . −1=0.

A. Sequence types

Duration fractions. For d= 1,2, ..., ∆, let Vm,d denote the number of times along a given sequence of

states that state-1phase had length d, normalized by time t, i.e.,

Vm,d(st) = Nm,d (st)

t, m = 1,2.(18)

For each sequence st, we deﬁne its type as the 2×∆matrix V:= V1(st)>;V2(st)>, where

Vm(st) = Vm,1(st), ..., Vm,∆(st), for m= 1,2. Recalling N1and N2(10), which, respectively, count

the number of state-1and state-2phases along st, we see that Nm=t1>Vm,m= 1,2.

It will also be of interest to deﬁne the fractions of times Θ1and Θ2that a given sequence of states

was in states 1and 2, respectively,

Θm(st) = τm(st)

t, m = 1,2.(19)

It is easy to verify that Θm=P∆

d=1 d Vm,d, for m= 1,2.

Let Vtdenote the set of all 2×∆-tuples of feasible occurrence of type Vat time t

Vt=ν= (ν1, ν2) : ν=V(st),for some st.(20)

Note that, as they are deﬁned as normalized versions of quantities Nmd(st),Vmd(st)’s also inherit the

properties of Nmd’s: 1) P∆

d=1 dV1d(st)+dV2d(st)=1; 2) 0≤1>V1(st)−1>V2(st)≤1/t. As t→+∞,

for every st∈ St, the difference between 1>V1(st)and 1>V2(st)decreases. Motivated by this, we

introduce the set

V=nν∈R2×∆

+:1>ν1=1>ν2, q>ν1+q>ν2= 1o.(21)

December 27, 2017 DRAFT

11

For each t,ν∈ Vt, deﬁne the set St

νthat collects all sequences st∈ Stwhose type is ν:

St

ν=st∈ St:V(st) = ν(22)

(note that if ν /∈ Vt, then set St

νwould be empty). Set St

νtherefore consists of all sequences with the

following properties: 1) the ﬁrst phase is state-1phase; 2) the total number of state-1phases is 1>ν1t,

where the total number of such phases of duration exactly dis given by ν1,d t; and 3) the total number of

state-2phases is 1>ν2t, where the total number of such phases of duration exactly dis given by ν2,d t.

Let Ct,ν denote the cardinality of St

ν. This number is equal to the number of ways in which one can

order 1>ν1tstate-1phases (of different durations), where each new ordering has to give rise to a different

pattern of state occurrences, times the corresponding number for state-2phases. Since for any d, any

permutation of νm,dtphases, each of which is of length d, gives the same sequence pattern, Ct,ν is given

by the number of permutations with repetitions for state-1phases times the number of permutations with

repetitions for state-2phases:

Ct,ν =1>ν1t!

(ν1,1t)! ·. . . ·(ν1,∆1t)! 1>ν2t!

(ν2,1t)! ·. . . ·(ν2,∆2t)! .(23)

From (23) the following result regarding the growth rate of Ct,ν easily follows (e.g., by Stirling’s

approximation bounds).

Lemma 3. For any > 0there exists t1=t1()such that for all t≥t1

et(H(ν1)+H(ν2)−)≤Ct,ν ≤et(H(ν1)+H(ν2)+),(24)

where H:R∆

+7→ Ris deﬁned as

H(λ) = −

∆

X

d=1

λd

1>λlog λd

1>λ,(25)

where λddenotes the d-th element of an arbitrary vector λ∈R∆

+.

We end this section by giving some well-known results from the theory of large deviations that we

will use in our analysis of detection problem (4).

B. Varadhan’s lemma and large deviations principle

Large deviations principle.

Deﬁnition 4 (Large deviations principle [15] with probability 1).Let µω

t:BRDbe a sequence of

Borel random measures deﬁned on probability space (Ω,F,P). Then, µω

t,t= 1,2, ... satisﬁes the large

deviations principle with probability one, with rate function Iif the following two conditions hold:

December 27, 2017 DRAFT

12

1) for every closed set Fthere exists a set Ω?

F⊆Ωwith P(Ω?

F)=1, such that for each ω∈Ω?

F,

lim sup

t→+∞

1

tlog µω

t(F)≤ − inf

x∈FI(x); (26)

2) for every open set Ethere exists a set Ω?

E⊆Ωwith P(Ω?

E)=1, such that for each ω∈Ω?

E,

lim inf

t→+∞

1

tlog µω

t(E)≥ − inf

x∈EI(x).(27)

We give here the version of the Varadhan’s lemma which involves sequence of random probability

measures and large deviations principle (LDP) with probability one.

Lemma 5 (Varadhan’s lemma [15]).Suppose that the random sequence of measures µω

tsatisﬁes the

LDP with probability one, with rate function I, see Deﬁnition 4. Then, if for function Fthe tail condition

below holds with probability one,

lim

B→+∞lim sup

t→+∞

1

tlog Zx:F(x)≥B

etF (x)dµω

t(x) = −∞,(28)

then, with probability one,

lim

t→+∞

1

tlog Zx

etF (x)dµω

t(x) = sup

x∈RD

F(x)−I(x).(29)

IV. LINEAR RECURSION FOR THE LLR AN D TH E LYAPUNOV EXPONENT

From (5) and (14), it is easy to see that the likelihood ratio can be expressed through the deﬁned

quantities as:

Lt(Xt) = X

st∈St

P(st)e1

σ2P2

m=1 µmPk∈Tm(st)Xk−τm(st)µ2

m

2σ2

=X

st∈St

p+

st,o(st)

pst,o(st)

eP2

m=1 P∆m

d=1 N1m(st) log p1m×

e1

σ2P2

m=1 µmPk∈Tm(st)Xk−τm(st)µ2

m

2σ2.(30)

The expression in (30) is combinatorial, and its straightforward implementation would require com-

puting Ct≈eψt summands. This is prohibitive when the observation interval tis large. In this paper,

we unveil a simple, linear recursion form for the likelihood Lt(Xt), for t= 1,2, .... We give this result

in the next lemma. To shorten the notation, we introduce functions fm:R7→ R, which we deﬁne by

fm(x) := 1

σ2µmx−1

2σ2µ2

m, for x∈Rand m= 1,2. Recall that e1denotes the ﬁrst canonical vector in

R∆(the ∆dimensional vector with 1only in the ﬁrst position, and having zeros in all other positions),

and 1denotes the vector of all ones in R∆.

December 27, 2017 DRAFT

13

Lemma 6. Let Λk=Λ1

k>,Λ2

k>>evolve according to the following recursion

Λk+1 =Ak+1Λk,(31)

with the initial condition Λ1=ef1(Xk)e>

1, ef2(Xk)e>

1>, and where, for k≥2, matrix Ak= [A11

kA12

k;A21

kA22

k]

is deﬁned by

A11

k=ef1(Xk)A0

A12

k=ef1(Xk)e1p>

2

A21

k=ef2(Xk)e1p>

1

A22

k=ef2(Xk)A0,(32)

and A0is, we recall, the lower shift matrix of dimension ∆. Then, the likelihood ratio Lt(Xt)is, for

each t≥1, computed by

Lt(Xt) =

∆

X

d=1

p+

1dΛ1

t,d +p+

2dΛ2

t,d,(33)

where Λm

t,d is the d-th element of Λm

t, for d= 1, ..., ∆and m= 1,2.

Remark. We note that the matrix Akcan be further decomposed as

Ak=DkM0(34)

Dk= diag ef1(Xk)1>, ef2(Xk)1>>, k = 1,2, ..., (35)

M0=

A0e1p>

2

e1p>

1A0

,(36)

i.e., Dkis a random diagonal matrix of size 2∆, modulated by the k-th measurement Xk, and M0is a

sparse, constant matrix of the same dimension, which deﬁnes transitions from the current state pattern

to the one in the next time step.

Proof intuition. The intuition behind this recursive form is the following. We break the sum in (30) into

sequences stwhose last phases are of the same type. For sequences that end with state m= 1,Λ1

t,d

represents the contribution to the overall likelihood ratio Lt(Xt)of all such sequences whose last phase

is of length d, and similarly for Λ2

t,d. Once the vectors Λ1

t,d and Λ2

t,d are deﬁned, their update is simple.

Consider the value Λ1

t+1,d, where d > 1; this value corresponds to the likelihood ratio contribution of

all sequences st+1 that end with state-1phase of duration d. Since d > 1, the only possible way to

get a sequence of that form is to have a sequence at time tthat ends with the same state, where the

December 27, 2017 DRAFT

14

duration of the last phase is d−1. This translates to the update Λ1

t+1,d =ef1(Xt+1)Λ1

t,d−1, where the

choice of f1in the exponent is due to the fact that the last state is st+1 = 1; see also the ﬁrst line

in (32). On the other hand, if d= 1, then the state at time tmust have been m= 2. The duration of this

previous phase could have been arbitrary from d= 1 to d= ∆. Hence Λ1

t+1,1is computed as the sum

Λ1

t+1,1=P∆

d=1 p2def1(Xt+1)Λ2

t,d, where the probabilities p2dare used to mark that the previous phase is

completed, see the second line in (32). The analysis for Λ2

t+1,d is similar. The formal proof of Lemma 6

is given in Appendix.

A. Error exponent ζas Lyapunov exponent

From Lemma 6 we see that Ltcan be represented as a linear function of the matrix product Πt:=

At·. . . ·A1,

Lt=p+>ΠtΛ0,(37)

where Akare matrices of the form (32), and p+=hp+

1>, p+

2>i>, where the d-th entry of p+

mequals p+

md,

for m= 1,2,d= 1,2, ..., ∆. Each Akis modulated by the measurement Xkobtained at time k. Since

Xk’s, k= 1,2, ..., are i.i.d., it follows that the matrices Akare i.i.d. as well. Applying a well-known

result from the theory of random matrices, see Theorem 2 in [19], to sequence Akit follows that the

sequence of the negative values of the normalized log-likelihood ratios −1

tlog Lt,t= 1,2, ..., converges

to the Lyapunov exponent of the matrix product Πt. This result is given in Lemma 7 and proven in

Appendix.

Lemma 7. With probability one,

lim

t→+∞

1

tlog kΠtk= lim

t→+∞

1

tE0[log kΠtk],(38)

and thus, with probability one,

ζ= lim

t→+∞−1

tlog kΠtk= lim

t→+∞−1

tE0[log Lt].(39)

Lemma 7 asserts that the error exponent for hypothesis testing problem (4) equals the top Lyapunov

exponent for the sequence of products Πt. Computation of the Lyapunov exponent (e.g., for i.i.d. matrices)

is a well-known problem in random matrix theory and theory of random dynamical systems, proven to be

very difﬁcult to solve, see, e.g., [14]. We instead search for tractable lower bounds that tightly approximate

ζ. We base our method for approximating ζon the right hand-side identity in (39).

December 27, 2017 DRAFT

15

V. MA IN R ES ULT

Our ﬁrst step for computing the limit in (39) is a natural one. Since µ1≥0is the guaranteed signal

level (recall that µ2> µ1≥0), we assume that the signal was at all times at state 1, and remove the

corresponding components of the signal to noise ratio (SNR) µ2

1

2σ2and the signal sum Pt

k=1 Xkfrom the

likelihood ratio. This manipulation then gives us a lower bound on the error exponent. By doing so, we

arrive at an equivalent problem to problem (4) just with µ1= 0. Mathematically, we have

Lt(Xt)= X

st∈St

P(st)e

1

σ2µ1 t

P

k=1

Xk−P

k∈T2(st)

Xk−(t−τ2(st)) µ2

1

2σ2!×

×e

1

σ2µ2P

k∈T2(st)

Xk−τ2(st)µ2

2

2σ2

=e

1

σ2µ1

t

P

k=1

Xk−tµ2

1

2σ2×

×X

st∈St

P(st)e

1

σ2P

k∈T2(st)

(µ2−µ1)Xk−τ2(st)µ2

2−µ2

1

2σ2

.(40)

Taking the logarithm, dividing by t, and computing the expectation with respect to hypothesis H0, we

get

1

tE0log Lt(Xt)=−µ2

1

2σ2+1

tE0"log X

st∈St

P(st)×

×e1

σ2Pk∈T2(st)(µ2−µ1)Xk−τ2(st)µ2

2−µ2

1

2σ2,(41)

where we used that E0[Xk]=0, for all k, see (4). Taking the limit as t→+∞, we obtain

ζ=µ2

1

2σ2+η, (42)

where ηis given by the following limit

η= lim

t→+∞−1

tE0"log X

st∈St

P(st)×

×e1

σ2Pk∈T2(st)(µ2−µ1)Xk−τ2(st)µ2

2−µ2

1

2σ2,(43)

the existence of which is guaranteed by (39), in Lemma 7. From now on, we focus on computing η.

Before we proceed, we make a simpliﬁcation in the expression for ηby replacing the term P(st)with

its analytically more appealing proxy P0(st), see (15). Applying inequality (15) in (43) and using the

fact that 1

tlog pmin →0, as t→+∞, we obtain that the limit in (43) does not change when we replace

P(st)with P0(st), i.e.,

η= lim

t→+∞−1

tE0"log X

st∈St

P0(st)×

×e1

σ2Pk∈T2(st)(µ2−µ1)Xk−τ2(st)µ2

2−µ2

1

2σ2.(44)

December 27, 2017 DRAFT

16

For λ∈R∆, and p∈S∆−1, introduce the relative entropy function D(λ||p) := P∆

d=1 λd

1>λlog λd

1>λpd.

Theorem 8. There holds η+µ2

1

2σ2≤ζ, where ηis the optimal value of the following optimization problem

minimize G(ν)

subject to H(ν1) + H(ν2)≥ξ2

2θ2σ2

θ2=q>ν2

ν∈ V

ξ∈R.

,(45)

where G(ν) = D(ν1||p1) + D(ν2||p2) + θ2

2σ2ξ

θ2−(µ2−µ1)2+θ2µ1(µ2−µ1)

σ2, for ν∈R2∆

+,ξ∈R.

Guaranteed error exponent. Since each of the terms in the objective function of (45) is non-negative, its

optimal value is lower bounded by 0. Using relation (42), we obtain that the value of the error exponent

is lower bounded by the value of SNR in state-1,µ2

1

2σ2, i.e.,

ζ≥µ2

1

2σ2.(46)

The preceding bound holds for any choice of parameters ∆, p1, p2, µ1and µ2. This result is very intuitive,

as it mathematically formalizes the reasoning that, no matter which conﬁguration of states occurs, signal

level µ1is always guaranteed, and hence the corresponding value of error exponent µ2

1

2σ2is ensured. In

that sense, any appearance of state 2(i.e., signal level µ2> µ1) can only increase the error exponent.

Special case µ1= 0 and detectability condition. When the signal level in state 1equals zero, then,

since the statistics of Xkfor Sk= 1 is the same as its statistics under H0, effectively we can have

information on the state of nature H1only when state Sk= 2 occurs. Denoting µ=µ2, optimization

problem (45) then simpliﬁes to:

minimize D(ν1||p1) + D(ν2||p2) + θ2

2σ2ξ

θ2−µ2

subject to H(ν1) + H(ν2)≥ξ2

2θ2σ2

θ=ν>q

ν∈ V

ξ∈R.

.(47)

From (47) we obtain the following condition for detectability of process Sk:

H(p1) + H(p2)≥q>p2

q>p1+q>p2

µ2

2σ2,(48)

i.e., if the inequality above holds, then the optimal value of optimization problem (47) is zero. To see

why this holds, note that the point (ν1, ν2, ξ)∈R2∆+1, where νm=pm/(q>p1+q>p2),m= 1,2,

December 27, 2017 DRAFT

17

and ξ=q>p2/((q>p1+q>p2))µunder which the cost function of (47) vanishes, under condition (48)

belongs to the constraint set of (47). Thus, under condition (48), the lower bound on the error exponent

ηis zero, indicating that the process Skis not detectable. To further illustrate this condition, note that the

left hand-side corresponds to the entropy of the process Sk, and the right hand-side corresponds to the

expected, i.e. – long-run SNR of the measured signal (q>p2/q>p1+q>p2is the expected fraction of

times that the process was in state 2, and µ2

2σ2is the SNR for this state). Condition (48) therefore asserts

that, if the entropy of the process Skis too high compared to the expected, or long-run, SNR, then it is

not possible to detect its presence. Intuitively, if the dynamics of the phase durations is too stochastic,

then it is not possible to estimate the locations of state 2occurrences, in order to perform the likelihood

ratio test. However, on the other hand, if the SNR is very high (e.g., the level µis high compared to the

process noise σ2) then, whenever state 2occurs, the signal will make a sharp increase and can therefore

be easily detected. The condition in this sense quantitatively characterizes the threshold between the two

physical quantities which makes detection possible.

A. Reformulation of (47)

In this subsection we show that optimization problem (47) admits a simpliﬁed form, obtained by

suppressing the dependence on ξthrough inner minimization over this variable. To simplify the notation,

introduce H(ν) = H(ν1) + H(ν2)and R(ν) = q>ν2µ2

2σ2; note that the function Rhas the physical

meaning of the expected SNR of the Stprocess that we wish to detect, for a given sequence type ν.

Lemma 9. Suppose that H(p1) + H(p2)< q>p2/q>p1+q>p2µ2

2σ2. Then, optimization problem (47)

is equivalent to the following optimization problem:

minimize D(ν1||p1) + D(ν2||p2) + pH(ν)−pR(ν)2

subject to H(ν)≤R(ν)

ν∈ V

.(49)

Proof. Fix ν∈ V. To remove the dependence on ξin (47), for any given ﬁxed ν∈ V, we need to solve

minimize θ2ξ

θ2−µ2

2σ2

subject to H(ν)≥ξ2

2θ2σ2

ξ∈R

,(50)

where, as before, we denote θ2=q>ν2. Since µ > 0, and the constraint set is deﬁned only through the

square of ξ, the optimal solution of (50) is achieved for ξ≥0. Thus, (50) is equivalent to

minimize θ2ξ

θ2−µ2

2σ2

subject to 0 ≤ξ≤σp2θ2H(ν)

.(51)

December 27, 2017 DRAFT

18

The solution of (51) is given by: 1) ξ?=θ2µ, if θ2µ≤σp2θ2H(ν); and 2) ξ?=σp2θ2H(ν), otherwise.

Hence, to solve (47) we can partition its constraint set V=V1SV2according to these two cases,

where V1=nν∈ V :H(ν)≥θ2µ2

2σ2oand V2=nν∈ V :H(ν)≤θ2µ2

2σ2o, solve the corresponding

two optimization problems, and ﬁnally ﬁnd the minimum among the two obtained optimal values.

Consider ﬁrst the case ν∈ V1. Since in this case ξ?=θ2µ, plugging in this value in (51), we have

that the optimization problem (47) with Vreduced to V1simpliﬁes to:

minimize D(ν1||p1) + D(ν2||p2)

subject to ν∈ V1.

.(52)

If H(p)≥q>p2

q>p1+q>p2

µ2

2σ2, then the point 1/q>p1+q>p2pbelongs to V, where p= (p1, p2)and hence

the optimal solution to (52) equals 1/q>p1+q>p2pwith the corresponding optimal value equal to 0.

Suppose now that H(p)<q>p2

q>p1+q>p2

µ2

2σ2. We show that in this case the solution to (52) must be at the

boundary of the constraint set, in the set of points nν∈ V :H(ν) = θ2µ2

2σ2o.

We prove the above claim. Since the entropy function H, see eq. (25), is concave, the constraint set

V1is convex, and since KL divergence Dis convex, we conclude that the problem in (52) is convex.

Also, it can be shown that the Slater point exists [20]. Therefore, the solution to (52) is given by the

corresponding Karush-Kuhn-Tucker (KKT) conditions:

(1 + λ) log ν1d

1>ν1−log p1d= 0,for d= 1, ..., ∆

(1 + λ) log ν2d

1>ν2−log p2d+λd µ2

2σ2= 0,for d= 1, ..., ∆

H(ν)≥q>ν2µ2

2σ2

λ≥0

λH(ν)−q>ν2µ2

2σ2= 0

ν∈ V

.(53)

From the fourth and ﬁfth condition, we have that either λ= 0, or that λ > 0and H(ν) = q>ν2µ2

2σ2.

Suppose that λ= 0. Then, from the ﬁrst two KKT conditions we have that the solution νmust satisfy

νmd/1>νm=pmd , for m= 1,2,d= 1, ..., ∆. However, this contradicts with the third condition

(recall that we assumed that H(p)< q>p2µ2

2). Therefore, the solution to (52) must belong to the set

nν∈ V :H(ν) = q>p2/q>p1+q>p2µ2

2σ2o. Since this set intersects with the set V2, we conclude that,

when H(p)< q>p2/q>p1+q>p2µ2

2σ2, then the optimal solution to (47) is found by optimizing over

the smaller set V2⊆ V, i.e., (47) is equivalent to

minimize D(ν1||p1) + D(ν2||p2) + θ2

2σ2ξ?

θ2−µ2

ν∈ V2.

,(54)

December 27, 2017 DRAFT

19

where ξ?(ν) = σp2θ2H(ν). Simple algebraic manipulations reveal that the third term in the objective

above is equal to pH(ν)−pR(ν)2. Finally, set V2is precisely the constraint set in (47), and hence

the claim of the lemma follows.

VI. PRO OF O F THE OR EM 8

Sum of conditionals as an expectation. For each st∈ St, introduce

Xst=1

tX

k∈T2

Xk,(55)

and note that, for each stand under H=H0,Xstis Gaussian random variable of mean zero and variance

equal to τ2(st)/t2=θ2(st)/t. The idea is to view the sum in (44) as an expectation of a certain function

gX:St7→ Rdeﬁned over the set Stof all possible sequences st, parameterized by random family (i.e.,

vector) X=Xst:st∈ Xt. More precisely, consider the probability space with the set of outcomes

Stand where an element stof Stis drawn uniformly at random – and hence with probability 1/Ct,

where, we recall Ct=|St|; denote the corresponding expectation by EU. We see that the sum under the

logarithm in (44) equals

X

st∈St

P0(st)et(µ2−µ1)

σ2Xst−τ2(st)µ2

2−µ2

1

2σ2

=CtX

st∈St

1

Ct

gX(st) = CtEUgX(st),(56)

where it is easy to see that gX(st) = P0(st)et(µ2−µ1)

σ2Xst−τ2(st)µ2

2−µ2

1

2σ2, for st∈ St.

Using further the type Vdeﬁned in Subsection III-A, we can express gX(st)as

gX(st) = et(µ2−µ1)

σ2Xst−tΘ2(st)µ2

2−µ2

1

2σ2+t

2

P

m=1

∆

P

d=1

Vmd(st) log pmd .(57)

Induced measure. We see that function gXdepends on stonly through type Vof the sequence and the

values of vector X. More precisely, deﬁne F:R2∆ ×R7→ Ras

F(ν, ξ) = µ2−µ1

σ2ξ−θ2

µ2

2−µ2

1

2σ2+

2

X

m=1

∆

X

d=1

νmd log pmd.(58)

Then, for any st,gX(st) = eF(V(st),Xst). For each vector X, let then QX

t:BR2∆+17→ Rdenote the

probability measure induced by V(st),X(st), for the assumed uniform measure on St:

QX

t(B) := Pst∈St1{(V ,X)∈B}(st)

Ct

,(59)

for arbitrary B∈ BRN2+N. It is easy to verify that QX

tis indeed a probability measure. Also, we note

that, for any ﬁxed tand X,QX

tis discrete, supported on the discrete set V(st),Xst:st∈ St; note

December 27, 2017 DRAFT

20

that the latter set is a subset of Vt×∪st∈StXst– the Cartesian product of the set of all feasible types at

time twith the set of all elements of vector X.

Let EQdenote the expectation with respect to measure QX

t. Then, we have EUgX(St)=EQetF (V,X).

Going back to (56), and using the result of Lemma 2, we obtain for ηgiven in (44):

η=−log ψ+ lim

t→+∞−1

tE0hlog EQhetF (V,X)ii,(60)

where, we recall E0is the expectation with respect to probability P0that corresponds to H0state of

nature, under which measurements Xk– and hence vector Xare generated.

If the measures QX

twere sufﬁciently nice such that they satisﬁed the LDP and the moderate growth

condition (28), then one could apply Varadhan’s lemma to compute the exponential growth of the

expectation in the right hand side of (60). However, the measures QX

tare very difﬁcult to analyze

due to the correlations in different elements of Xwhich couple the indicator functions in (59). Hence,

we resort to an upper bound of ηwhich we derive by replacing vector Xby vector Zwith the same

statistical properties, but with an added feature that its elements are mutually independent. More precisely,

for each twe introduce a family of independent Gaussian variables Z=Zst:st∈ St. Further, for

each stthe corresponding element of the family Zstis Gaussian with the same mean and variance as Xst:

expected value equal to 0, and variance equal to Var [Zst] = θ2(st)/t. Denote by Pand E, respectively,

the probability function and the expectation corresponding to the family Zst:st∈ St:t= 1,2, . . ..

Then, the following result holds; the proof is based on Slepian’s lemma [21], and it can be found in

Appendix.

Lemma 10. For each t, there holds,

Ehlog EQhetF (V,Z)ii≥E0hlog EQhetF (V,X)ii,(61)

where the inner left hand side expectation is with respect to the measures QX

tand the inner right

hand-side expectation is with respect to the measures QZ

t.

The next result asserts that QZ

tsatisﬁes the LDP with probability one and computes the corresponding

rate function. To simplify the notation, denote q= (1,2,...,∆)>.

Theorem 11. For every measurable set G, the sequence of measures QZ

t,t= 1,2, ..., with probability

one satisﬁes the LDP upper bound (26) and the LDP lower bound (27), with the same rate function

I:R2∆+1 7→ R, equal for all sets G, which for ν∈ V for which H(ν1) + H(ν2)≥Jν(ξ)is given by

I(ν, ξ) = log ψ−H(ν1)−H(ν2) + Jν(ξ),(62)

December 27, 2017 DRAFT

21

and equals +∞otherwise, and where, for any ν∈ V, function Jν:R7→ Ris deﬁned as Jν(ξ) := 1

q>ν2

ξ2

2.

The proof of Theorem 11 is given in Appendix.

Having the large deviations principle for the sequence QZ

t, we can invoke Varadhan’s lemma to compute

the limit of the scaled values in (60). Applying Lemma 5 (the details of the moderate growth condition (28)

for QZ

tare given in Appendix, we obtain that, with probability one,

lim

t→+∞

1

tlog EQhetF (V,Z)i= sup

(ν,ξ)

F(ν, ξ)−I(ν, ξ ).(63)

It can be shown that the sequence under the preceding limit is uniformly integrable; the proof of this

result is very similar to the proof of a similar result in the context of hidden Markov models, given in

Appendix E of [12], hence we omit the proof here. Thus, the limit of the sequence values and the limit

of their expected values coincide, i.e.,

lim

t→+∞

1

tEhlog EQhetF (V,Z)ii= lim

t→+∞

1

tlog EQhetF (V,Z)i.(64)

Combining with (60), (61), and (63), we ﬁnally obtain

η≥ −log ψ−sup

(ν,ξ)∈R2∆+1

F(ν, ξ)−I(ν, ξ ).(65)

It remains to show that the value of the above supremum equals the value of the optimization problem (45).

Using the deﬁnition of I, we have that I(ν, ξ) = +∞for any (ν, ξ)such that H(ν)< Jθ(ξ)or such that

ν /∈ V. Since the supremum is surely not achieved at these points, set R2∆+1 in (65) can be replaced by

{(ν, ξ)∈ V × R:H(ν)< Jθ(ξ)}. Using the deﬁnitions of Fand I, we have

F(ν, ξ)−I(ν, ξ ) =

2

X

m=1

∆

X

d=1

νmd log pmd −νmd log νmd

+µ2−µ1

σ2ξ−θ2

µ2

2−µ2

1

2σ2−1

θ2

ξ2

2σ2−log ψ. (66)

Cancelling out the term log ψin the preceding equation with the one in (65), and recognizing that

P∆

d=1 νmd log pmd −νmd log νmd =−D(νm||pm), we see that problem (45) is equivalent to the one

in (65). This completes the proof of Theorem 8.

VII. NUMERICAL RESULTS

In this section we report our numerical results to demonstrate tightness of the developed performance

bounds. We also illustrate our methodology on the problem of detecting one single run of a dish-washer,

where we use real-world data to estimate the state values for a dish-washer.

In the ﬁrst set of simulations, we consider the setup in which µ1>0and we compare the error

exponents obtained via simulations to the guaranteed lower bound (46). We simulate a two-state signal,

December 27, 2017 DRAFT

22

Xt, as an i.i.d. Gaussian random variable with standard deviation σand mean µ1= 2 and µ2= 5 in

states 1and 2, respectively. The duration of each state is random uniform distributed between 1and

∆ = 3. The observation interval is t∈[1, T ], where T= 200. In the absence of the signal, the data is

distributed according to the Gaussian distribution with mean µ0= 0 and the same standard deviation σ.

To estimate the receiver operating characteristics (ROC) curves, we use J= 100000 Monte Carlo

simulation runs for each hypothesis. For each hypothesis and each simulation run, we compute the

values Lt(Xt), for t= 1,2,3, ..., T , using the linear recursion from Lemma 6. Then, for each t, to obtain

the corresponding ROC curve, we ﬁrst ﬁnd the minimal and maximum value Lt,m and Lt,m, respectively,

across Jruns for each hypothesis m, and change the detection threshold γwith a small step size from

Lt,1−βto Lt,0+β, where βis a carefully chosen bound. For each tand γthe probability of false alarm

Pfa or false positive, i.e., wrongly determining that the signal is present, is calculated as

Pγ

fa,t =PJ

j=1 (Lt(Xt

(j))≥γ)

J

where is an indicator function that returns 1if the corresponding condition is true and 0otherwise,

and Xt

(j)is the j-th realisation of the sequence Xtunder H0. The probability of a miss Pmiss or false

negative, that is, declaring that the signal is not present, though it is, is calculated as:

Pγ

miss,t =PJ

j=1 (Lt(Xt

(j))< γ)

J.

We set the bound α= 0.01 and ﬁnd Pα

miss,t =Pγ?

miss,t where γ?resulted in the highest probability of a

miss that satisﬁed Pγ?

fa,t ≤α.

To investigate the dependence of the slope on the SNR, we ﬁx signal levels µ1and µ2, and pmf’s p1

and p2as described above, and we vary the standard deviation of noise σ. For each different value of σ,

we compute the values of Pα

miss,t, for t= 1, ..., T , and apply linear regression on the sequence of values

−log Pα

miss,t for all observation times tfor which the probability of a miss was non-zero. This gives an

estimate for the error exponent (i.e., the slope) for the probability of a miss under a ﬁxed value of σ,

which we denote by Sσ.

Figure 3 plots the probability of a miss (in the logarithmic scale) vs. the number of samples tfor

ﬁve different values of σ, namely σ= 10,15,20,25,30. We observe that for large observation intervals

tthe curves are close to linear, as predicted by the theory, see Lemma 7. Further, as σincreases the

magnitude of the slope decreases becoming very close to 0for large values of σ. Figure 4 compares

the error exponent Sσobtained from simulations with the theoretical bound calculated using (46). The

theoretical curve is plotted in red dashed line, while the numerical curve Sσis plotted in blue full line.

For comparison, we also plot the curve µ2

2/(2σ2), which corresponds to the best possible error exponent

December 27, 2017 DRAFT

23

for the studied setup, obtained when the signal throughout the whole observation interval stays at the

higher signal value µ2> µ1; this curve is plotted in green dotted line. It can be seen from the ﬁgure

that the numerical error exponent curve is at all points sandwiched between the lower bound (46) curve

µ2

1/(2σ2)and the curve µ2

2/(2σ2). Also, the difference between the numerical error exponent and the

lower bound (46) decreases as σincreases, where the differences become negligible for large σ, showing

that our bound is tight for large values of σ.

Fig. 3: Simulation setup: ∆=3,p1, p2∼ U([1,∆]),µ1= 2,µ2= 5,α= 0.01. Evolution of probability

of a miss, in the logarithmic scale, for σ= 5,10,15,20,25.

In the second set of experiments, we consider the setup where the signal level in state 1is zero, µ1= 0,

and µ2=µ= 1; similarly as in the previous setup, we consider uniform distributions p1, p2∼ U([1,∆]),

with ∆=2. We compare the numerical error exponent with the one obtained as a solution to optimization

problem (49). To solve (49), we apply random search over 106different vectors from set V, and pick

the point which gives the smallest value of the objective (and satisﬁes the constraint in (49)).

Figure 5 plots probability of a miss vs. number of samples tfor 5different values of σ, in the interval

from 0.2to 0.6. Again, we can observe that linearity emerges with the increase of σ. Figure 6, top,

compares error exponent estimated from the slope in Figure 5 with the theoretical bound calculated from

solving (49). We can see from the plot that the two lines are very close to each other. In fact, we have that

the numerical values are slightly below the lower bound values. This seemingly contradictory effect is a

December 27, 2017 DRAFT

24

Fig. 4: Simulation setup: ∆=3,p1, p2∼ U([1,∆]),µ1= 2,µ2= 5,α= 0.01.σvaries from 5to 50.

Blue full line plots the numerical error exponent estimated from slope of log Pα

miss,t vs. σ. Red dashed

line plots the theoretical bound µ2

1/(2σ2)in (46). Green dotted line plots function µ2

2/(2σ2).

consequence of the following. As the probability of a miss curves have a concave shape in this simulation

setup (which can be observed from Figure 5) their slopes continuously increase with the increase of the

observation interval. As a consequence, the linear ﬁtting performed on the whole observation interval

is underestimating the slope, as it is trying to ﬁt also the region of values where concavity is more

prominent. To further investigate this effect, we performed linear ﬁtting of probability of a miss curves

only for a region of higher values of t, where emergence of linearity is already evident. In particular,

for each different value of σ, we apply linear ﬁtting for [4/5tmax, tmax], where tmax is the maximal

tfor which the probability of a miss is non-zero, and we plot the results in Figure 6, bottom. It can

be seen from the ﬁgure that the numerical curve got closer to the theoretical curve, indicating that the

bound in (49) is very tight or even exact. Finally, it can be seen from Figure 6 (top and bottom) that the

value of σfor which the error exponent is equal to zero matches the threshold predicted by the theory,

σ?=µ/(2√2 log ∆) = 0.4247, obtained from detectability condition (48).

In the ﬁnal set of simulations, we demonstrate applicability of the results to estimate the number of

samples needed to detect an appliance run from the smart meter data. To do that, we use measurements of

a dishwasher from the REFIT dataset [2]. REFIT dataset contains 2years of appliance measurements from

20 houses. The monitored dishwasher is a two-state appliance, with mean power values of µ1= 2200W,

December 27, 2017 DRAFT

25

Fig. 5: Simulation setup: ∆=2,p1, p2∼ U([1,∆]),µ1= 0,µ2= 1,α= 0.01. Plots of probability of

a miss in the logarithmic scale for σ= 0.3,0.33,0.37,0.4,0.45

µ2= 66Wand standard deviation of σ1= 36.6Wand σ2= 18.2W, in states 1and 2, respectively. The

mean value of background noise which is also base-load in that house is µ0= 90 and with standard

deviation σ0= 16.6W. We down sampled dishwasher data with ∆ = 10 to simulate the inﬂuence of

noise including base-load and unknown appliances on detecting the appliance. The simulation results are

shown in Figure 7 as plots of Pα

miss,t vs. tfor several values of σbetween the measured σ1and σ2.

As expected, the probability of a miss decreases with the increase of number of samples t. Furthermore,

the number of samples needed for successful detection is about 10.

VIII. CONCLUSION

We studied the problem of detecting a multi-state signal hidden in noise, where the durations of state

occurrences vary over time in a nondeterministic manner. We modelled such a process via a random

duration model that, for each state, assigns a (possibly distinct) probability mass function to the duration

of each occurrence of that state. Assuming Gaussian noise and a process with two possible states, we

derived optimal likelihood ratio test and showed that it has a form of a linear recursion of dimension equal

to the sum of the duration spreads of the two states. Using this result, we showed that the Neyman-Pearson

error exponent is equal to the top Lyapunov exponent for the linear recursion, the exact computation of

December 27, 2017 DRAFT

26

Fig. 6: Simulation setup: ∆ = 2,p1, p2∼ U([1,∆]),µ1= 0,µ2= 1,α= 0.01.σvaries from 0.2 to

0.6. Blue full line plots the numerical error exponent estimated from slope of log Pα

miss,t vs. σby linear

ﬁtting. Top: linear ﬁtting performed on the whole interval [1, tmax];bottom: linear ﬁtting performed on

[4/5tmax, tmax]. Red dashed line plots the theoretical bound calculated by solving (49)).

December 27, 2017 DRAFT

27

Fig. 7: Simulation setup: ∆ = 10,p1, p2∼ U([1,∆]),µ1= 66,µ2= 2200,σ= 90,α= 0.01. Plots of

probability of a miss for 5 different σvalues.

which is a well-known hard problem. Using the theory of large deviations, we provided a lower bound

on the error exponent. We demonstrated the tightness of the bound with numerical results. Finally, we

illustrated the developed methodology in the context of NILM, applying it on the problem of detecting

multi-state appliances from the aggregate power consumption signal.

APPENDIX

Proof of Lemma 1. Fix an arbitrary sequence st. Let n1=N1(st),n2=N2(st)and n=N(st)denote,

respectively, the number of state-1phases, state-2phases, and the total number of phases in st. Let the

durations of state-1phases (by the order of appearance) in stbe d11, d12 , ..., d1n1, and the durations of

state-2phases be d21, d22, ..., d2n. Recall that o(st)denotes the duration of the last phase in st. Then, if

st= 1, we have

P(st) = P1(St=st)

=P1(D11 =d11, D21 =d21, . . . , D1n1≥d1n1)

=

n1−1

Y

l=1

P1(D1l=d1l)P1(D1n1≥d1n1)

n2

Y

l=1

P1(D2l=d2l),(67)

where the second equality follows from the fact that the last phase is state 1and that with the knowledge

of only up to time tit is not certain whether this last phase lasts longer than d1n1, i.e., stretches over time

December 27, 2017 DRAFT

28

t; the last equality follows from the fact that Dmn’s are i.i.d. for each mand mutually independent for

different m. Adding the missing factor in the product P1(D1n1=d1n1), and dividing the middle term

in (67) by the same factor, yields

P(st) = P1(D1n1≥d1n1)

P1(D1n1=d1n1)

2

Y

m=1

nm

Y

l=1

P1(Dml =dml).(68)

Similar formula can be obtained for the case when st= 2. Note now that, for every d= 1, ..., ∆m,

P1(Dmnm≥d) = pmd +pmd+1 +. . . +pm∆=: p+

md, for m= 1,2. Grouping, for each state, the product

terms with equal durations, and denoting n1d=N1d(st), for d= 1, ..., ∆1, and n2d=N2d(st), for

d= 1, ..., ∆2, we obtain that

P(st) = p+

m,nm

pmnm

∆1

Y

d=1

pn1d

1d

∆2

Y

d=1

pn2d

2d.(69)

This completes the proof of the lemma.

Proof of Lemma 6. Consider (30) and note that P(st)can be expressed as P(st) = p+

sto(st)P0(st−o(st)),

where, we recall o:St7→ Zis an integer-valued function which returns the duration of the last phase

in a sequence st. We break the sum in (30) as follows,

Lt(Xt) =

2

X

m=1

∆

X

d=1

p+

md X

st∈St:st=m,

o(st)=d

P0(st−d)ePt

k=1 fsk(Xk),(70)

To prove the lemma, it sufﬁces to show that, for each m, d, t, the Λm

t,d’s are equal to the corresponding

summands in (70),

Σm

t,d := X

st∈St:st=m,o(st)=d

P0(st−d)ePt

k=1 fsk(Xk).(71)

To prove the previous claim, ﬁx m= 1. For t= 1, it is easy to see that Σ1

1,1=ef1(X1), and, since, when

t= 1, there cannot be sequences with last phase longer than 1, we have Σ1

1,d = 0 for all 2≤d≤∆.

Analogous identities can be derived for m= 2. Thus, we have proved that, for t= 1, the summands

Σm

t,d = Λm

t,d, for each dand m.

Consider now an arbitrary ﬁxed t≥2. Consider m= 1 and d= 1. This pair of parameter values

corresponds to sequences that end with state 1with phase of length 1. We thus obtain that st−1= 2, and

we can represent this set of sequences as:

st∈ St:st= 1, o(st)=1

=

∆

[

l=1 st−1,1:st−1∈ St−1, st−1= 2, o(st−1) = l.(72)

December 27, 2017 DRAFT

29

Hence, we can write Σ1

t,1as follows:

Σ1

t,1=ef1(Xt)

∆

X

l=1 X

st−1∈St−1:st−1=2,

o(st−1)=l

p2lP0(st−1−l)ePt−1

k=1 fsk(Xk)

=p2lef1(Xt)

∆

X

l=1

Σ2

t−1,l,(73)

where in the ﬁrst equality we used that, when o(st−1) = l,P0(st−1) = p2lP0(st−1−l)and the last equality

follows by the deﬁnition of Σ2

t,l,l= 1,2, ..., ∆in (71).

Consider now m= 1 and d≥2. Since the last dstates must be state 1, we can represent this set of

sequences as:

st∈ St:st=st−1=. . . =st−d+1 = 1, o(st) = d=

st−1,1:st−1∈ St−1, st−1=. . . =st−1−(d−1)+1 = 1,

o(st−1) = d−1.(74)

Thus, we can write Σ1

t,d as follows:

Σ1

t,d =ef1(Xt)X

st−1∈St−1:st−1=1,

o(st−1)=d−1

P0(st−1−(d−1))ePt−1

k=1 fsk(Xk)

=ef1(Xt)Σ1

t−1,d−1,(75)

where, we note that in the ﬁrst equality we used that P0(st−d) = P0(st−1−(d−1)).

Representing (73) and (75) in a matrix form (we remark that derivations for m= 2 are analogous),

we recover recursion (31). Since we proved that the initial conditions are equal, i.e., Σm

1= Λm

1, for

m= 1,2, we proved that Σm

t= Λm

tfor all t, which proves the claim of the lemma.

Proof of Lemma 7. To prove the claim, we apply Theorem 2 from [19]. Note that since matrices Ak

are i.i.d., they are stationary and ergodic, and hence they are also metrically transitive, see, e.g., [22].

Therefore the assumptions of the theorem are fulﬁlled. We now show that the condition of the theorem

holds, i.e., we show that

E0log+kAkk<+∞,(76)

December 27, 2017 DRAFT

30

where log+= max{log,0}. It is easy to verify that kAkk ≤ emaxm=1,2|fm(Xk)|CM0, where CM0=kM0k.

Thus, we have

log+kAkk ≤ log+CM0emaxm=1,2|fm(Xk)|

≤log+CM0+ max

m=1,2|fm(Xk)|

≤log+CM0+|f1(Xk)|+|f2(Xk)|.(77)

Since Xkis Gaussian, and f1and f2are linear functions, we have that f1(Xk)and f2(Xk)are Gaussian.

Therefore, the expectation of the right hand side of the preceding equation is ﬁnite (which can be seen

by bounding E0[|f1(Xk)|]≤qE0f2

1(Xk)≤+∞, and similarly for m= 1). Hence, the condition (76)

follows. By Theorem 2 from [19] we therefore have that

lim

t→+∞

1

tlog kΠtk= lim

t→+∞

1

tE[log kΠtk],(78)

which proves (38). To prove (39), we note that Lt=p+>Πt12∆, where p+>0. Thus, there exist

constants cand Csuch that ckΠtk ≤ Lt≤CkΠtk[23]. The claim now follows from the preceding

sandwich relation between Ltand kΠtk.

Proof of Theorem 11.

Fix t≥1and ﬁx ν∈ Vt. For D⊆R, introduce

QZ

t,ν (D) := Pst∈St

ν1{Zst∈D}

Ct,ν

,(79)

where, we recall, Ct,ν is the number of type νfeasible sequences of length t. Let B=C×Dbe a box

in R2∆+1, where Cis a box in R2∆ and D= [a, b]is an interval in R. Then, we have

QZ

t(B) = X

ν∈Vt∩C

Ct,ν

Ct

QZ

t,ν (D).(80)

From (80) it follows that, for each t, for any ν∈ Vtthere holds

Ct,νt

Ct

QZ

t,ν (D)≤QZ

t(B).(81)

Further, note that, for each ν∈ Vt, the corresponding elements of the random vector Z,Zst:V(st) = ν,

are i.i.d., Gaussian, with mean 0and variance equal to q>ν2σ2

t. Thus, QZ

t,ν (D)is binomial with Ct,ν

trials and probability of success EQZ

t,ν (D)=qt,ν (D)equal to

qt,ν (D) = Za≤x≤b

√t

p2π q>ν2σe−tx2

q>ν2σ2dx. (82)

Using the well-known bounds on the Q-function [24], the following bounds on qt,ν(D), for an arbitrary

interval D, are straightforward to show.

December 27, 2017 DRAFT

31

Lemma 12. Fix > 0. Then, for any D= [a, b],a<b, there holds

e−te−tinf a≤η≤bJν(η)≤qt,ν (D)≤ete−tinf a≤η≤bJν(η)(83)

for each ν∈ Vt, and all tsufﬁciently large.

We next show that the random measures QZ

t,ν approach their expected values qt,ν as tincreases.

Lemma 13. Fix an arbitrary > 0.

1) With probability one,

QZ

t,ν (D)≤qt,ν (D)et,(84)

for all ν∈ Vt, for all tsufﬁciently large.

2) Let νt∈ Vt,t= 1,2, ..., be a sequence of types converging to ν?∈ V. Then, with probability one,

for all tsufﬁciently large

QZ

t,νt(D)≥qt,ν?(D)(1 −).(85)

The proof of part 1 of Lemma 13 can be obtained by considering separately the cases: 1) inf(ν,ξ)∈BJν(ξ)−

H(ν)<0and 2) inf(ν,ξ)∈BJν(ξ)−H(ν)≥0. Then, in each of the two cases the claim can be obtained

by a corresponding application of Markov’s inequality on a conveniently deﬁned sequence of sets in Ω.

In case 1), we use At=ω:QZ

t,ν (D)≥qt,ν (D)et,for some ν∈C∩ Vt. Applying the union bound,

together with fact that the the cardinality of Vtis polynomial in t(|Vt| ≤ (t+ 1)2∆), we obtain from

condition 1) that the probabilities P(At)decay exponentially with t. The claim in 1 then follows by the

Borel-Cantelli lemma. Similar arguments can be derived for case 2), where in the place of set At, set

Bt=nω:Pst∈St

ν1{Zst∈D}(st)≥1,for some ν∈C∩ Vtois used. For details we refer the reader to

Section V-A in [13].

By deﬁning Ct=nω:QZ

t,νt(D)

qt,νt(D)−1≥, for some ν∈C∩ Vtoand applying Chebyshev’s inequality,

the proof of part 2 can be derived similarly as in the proof of part 1. For details, see the proof of

Lemma 13 in [13].

Having the preceding technical results, we are now ready to prove the LDP for the sequence QZ

t. We

ﬁrst prove the LDP upper bound, and then turn to the LDP lower bound.

Proof of the LDP upper bound. We break the proof of the LDP upper bound into the following steps.

In the ﬁrst step, we show that the LDP upper bound holds with probability one for all boxes in R2∆+1.

In the second step, we extend the claim to all compact sets via the standard ﬁnite cover argument [15].

Finally, in the third step, we move from compact sets to closed sets by using the fact that Ihas compact

support.

December 27, 2017 DRAFT

32

Step 1: LDP for boxes Let B=C×Dbe an arbitrary closed box in R2∆+1, where Cis a box in

R2∆ and Dis a closed interval in R. To prove the LDP upper bound for box B, we need to show that

there exists a set Ω?

1= Ω?

1(B)which has probability one, P(Ω?

1)=1, such that for every ω∈Ω?

1, there

holds

lim inf

t→+∞−1

tlog QZ

t(B)≤ −I(B),(86)

where I(B) := inf(ν,ξ)∈BI(ν, ξ). To this end, ﬁx > 0. Applying Lemma 2, Lemma 3, Lemma 12, and

part 1 of Lemma 13, together with (80), we have

QZ

t(B)≤X

ν∈C∩Vt

e4te−tinf ξ∈DJνt(ξ)+tH(νt)−tlog ψ(87)

≤ |Vt|e4te−t−log ψ−tinf ν∈C∩V infξ∈DJν(ξ)−H(ν),(88)

which holds with probability one for all tsufﬁciently large. Dividing by t, taking the limit t→+∞,

and letting →0, the upper bound for boxes follows.

Step 2: LDP for compact sets The extension of the upper bound to all compact sets in R2∆+1 can be

done by picking an arbitrary closed set F, covering it with a family of boxes Bof the form as in Step

1, where a ball of a conveniently chosen size is assigned to each point of F, and ﬁnally extracting a

ﬁnite cover of F. As this is a standard argument in the proof of LDP upper bounds, we omit the details

of the proof here and refer the reader to [15] (see, e.g., the proof of Cram´

er’s theorem in Rd, Chapter

2.2.2 in [15]).

Step 3: LDP for closed sets Since the rate function has compact domain, LDP upper bound for compact

sets implies LDP upper bound for closed sets. This completes the proof of the upper bound.

Proof of the LDP lower bound. Let Ube an arbitrary open set in R2∆+1. To prove the LDP lower

bound we need to show that there exists a set Ω?

2= Ω?

2(U)which has probability one, P(Ω?

2)=1, such

that for every ω∈Ω?

2, there holds

lim inf

t→+∞−1

tlog QZ

t(U)≥ −I(U).(89)

Since Iis non-negative at any point of its domain, it follows that I(U)can either be a ﬁnite non-negative

number or +∞. In the latter case the lower bound holds trivially, hence we focus on the case I(U)<+∞.

For any point ν∈ V, we deﬁne a sequence of types νt∈ Vtconverging to ν, by picking, for each

t≥1, an arbitrary closest neighbor of νin the set Vt2, i.e.,

νt∈Argminν∈Vt|νt−ν|.(90)

2Since Vtgets denser with t, the sequence νtindeed converges to ν.

December 27, 2017 DRAFT

33

Now note that by the fact that I(U)is an inﬁmal value, for any δ > 0there must exist (ν, ξ)∈U

such that I(ν, ξ)≤I(U) + δ. If for (ν, ξ )there holds H(ν)−Jν(ξ)>0, we assign ν?=νand ξ?=ξ.

Otherwise, we can decrease ξin absolute value to a new point ξ0such that (ν, ξ0)still belongs to U(note

that this is feasible due to the fact that Uis open), and for which the strict inequality H(ν)−Jν(ξ0)>0

holds. Assigning ξ?=ξ0we prove the existence of (ν?, ξ?)∈Usuch that

I(ν?, ξ?)≤I(U) + δ(91)

H(ν?)−Jν?(ξ?)>0.(92)

Let νtdenote a sequence of points obtained from (90) converging to ν?. Since Uis open, there exists

a box Bcentered at (ν?, ξ?)that entirely belongs to U. This implies that there exists a closed interval

D∈Rsuch that, for sufﬁciently large t,νt×D⊆U. By the inequality in (81), it follows that

QZ

t(U)≥QZ

t({νt} × D) = Ct,νt

Ct

QZ

t,νt(D).

Combining the lower bound on qt,νt(D)from Lemma 12 with part 2 of Lemma 13, we obtain that for

sufﬁciently large t,

QZ

t(U)≥qt,νt(D)(1 −)Ct,νt

Ct

≥e−3te−tinf ξ∈DJν?(ξ)+H(νt)−log ψ(1 −).

Taking the logarithm and dividing by t, we obtain

1

tlog QZ

t(U)≥ −3−inf

ξ∈DJν?(ξ) + H(νt)−log ψ+log(1 −)

t.(93)

As t→+∞,νt→ν?, and by the continuity of Hwe have that H(θt)→H(θ?). Thus, taking the limit

in (93) yields

lim inf

t→+∞

1

tlog QZ

t(U)≥ −3−inf

ξ∈DJν?(ξ) + H(ν?)−log ψ

≥ −3−I(ν?, ξ?),

where in the last inequality we used the fact that ξ?∈D. The latter bound holds for all > 0, and hence

taking the supremum over all > 0yields

lim inf

t→+∞

1

tlog QZ

t(U)≥ −I(ν?, ξ?)

≥ − inf

(ν,ξ)∈UI(ν, ξ)−δ.

Recalling that δwas chosen arbitrarily, the lower bound is proven.

Proof of Lemma 10. For reference, we state here the Slepian’s lemma that we use in our proof.

December 27, 2017 DRAFT

34

Lemma 14. (Slepian’s lemma [21]) Let the function φ:RL7→ Rsatisfy

lim

kxk→+∞φ(x)e−αkxk2= 0,for all α > 0.(94)

Suppose that φhas nonnegative mixed derivatives,

∂2φ

∂xl∂xm≥0,for l6=m. (95)

Then, for any two independent zero-mean Gaussian vectors Xand Ztaking values in RLsuch that

EX[X2

l] = EZ[Z2

l]and EX[XlXm]≥EZ[ZlZm]there holds EX[φ(X)] ≥EZ[φ(Z)], where EXand EZ,

respectively, denote expectation operators on probability spaces on which Xand Zare deﬁned.

Proof. For each ﬁxed tdeﬁne function φt:RCt7→ R,

φt(x) := −log X

st∈St

eγst(xst),(96)

where xstis an element of a vector x=xst:st∈ St∈RCt, whose index is st, and where each

function γstis deﬁned through function gX, given in (56), as γst(xst) := log(gx(st)). Since each γst(xst),

st∈ St, grows linearly in x, we have that condition (94) is fulﬁlled.

Further, it is straightforward to show that the second partial derivative of φtis given by

∂2φt

∂xst∂xst0

=(µ2−µ1)2

σ4

eγst(xst)+γst0(xst0)

Pst∈Steγst(xst)2,(97)

which is always non-negative, and hence condition (97) is also fulﬁlled.

We next verify the conditions of the lemma on the vectors Xand Z. Since for the same sequence st,

the corresponding Xstand Zsthave the same Gaussian distribution (of mean zero and variance equal to

q>V2(st)σ2/t, there holds E0[X2

st] = E[Z2

st]. Further, it is easy to see that E0[XstXst0] = Pk:sk=s0

kσ2≥0.

On the other hand, since Zstand Z0

stare independent for st6=st0, and they are both zero mean, we

have E[ZstZst0] = 0. Therefore, the last condition of the Slepian’s lemma is fulﬁlled. Hence, the claim

of Lemma 10 follows.

Proof of the moderate growth of QZ

t.

Proof. Conditions that deﬁne set Vimply that 1/(2∆) ≤q>ν2≤1,ν≥0, and hence Vis compact.

Further, condition H(ν1) + H(ν2)≥1

q>ν2ξ2, which deﬁnes the domain of the rate function I, implies

2 log ∆ ≥ξ2. Thus, ξmus be bounded in order for Ito be ﬁnite, which combined with the fact that ν

must belong to Vwhich is compact, shows that Ihas compact domain. Let B0be a box that contains

the domain of I, and let M0:= max(ν,ξ)∈B0F(ν, ξ). Since function Fis continuous, it must achieve

maximum on B0, which we denote by M0. It follows that for each M≥M0, with probability one, the

integral in (28) equals zero for all tsufﬁciently large. Thus, condition (28) is fulﬁlled.

December 27, 2017 DRAFT

35

REFERENCES

[1] G. W. Hart, Nonintrusive Appliance Load Data Acquisition Method: Progress Report. Massachusetts Institute

of Technology. Energy Laboratory and Electric Power Research Institute., 1984. [Online]. Available: https:

//books.google.co.uk/books?id=gYlYtwAACAAJ

[2] D. Murray, L. Stankovic, and V. Stankovic, “An electrical load measurements dataset of United Kingdom households from

a two-year longitudinal study,” Scientiﬁc data, vol. 4, p. 160122, 2017.

[3] J. Liao, G. Elafoudi, L. Stankovic, and V. Stankovic, “Non-intrusive appliance load monitoring using low-resolution smart

meter data,” in 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm), Nov 2014, pp.

535–540.

[4] K. He, L. Stankovic, J. Liao, and V. Stankovic, “Non-intrusive load disaggregation using graph signal processing,” IEEE

Transactions on Smart Grid, vol. PP, no. 99, pp. 1–1, 2017.

[5] B. Zhao, L. Stankovic, and V. Stankovic, “On a training-less solution for non-intrusive appliance load monitoring using

graph signal processing,” IEEE Access, vol. 4, pp. 1784–1799, 2016.

[6] O. Parson, S. Ghosh, M. J. Weal, and A. Rogers, “Non-intrusive load monitoring using prior models of general appliance

types.” in AAAi, 2012.

[7] J. Z. Kolter and T. Jaakkola, “Approximate inference in additive factorial HMMs with application to energy disaggregation,”

in Artiﬁcial Intelligence and Statistics, 2012, pp. 1472–1482.

[8] C. Uggen, “Work as a turning point in the life course of criminals: A duration model of age, employment, and recidivism,”

American Sociological Review, vol. 65, no. 4, pp. 529–546, Aug. 2000.

[9] J. R. Russell and R. F. Engle, “A discrete-state continuous-time model of ﬁnancial transactions prices and times: The

autoregressive conditional multinomial-autoregressive conditional duration model,” Journal of Business and Economic

Statistics, vol. 23, no. 2, pp. 166–180, April 2005.

[10] M. Ting, A. O. Hero, D. Rugar, C.-Y. Yip, and J. A. Fessler, “Near-optimal signal detection for ﬁnite-state Markov signals

with application to magnetic resonance force microscopy,” IEEE Transactions on Signal Processing, vol. 54, no. 6, pp.

2049–2062, June 2006.

[11] M. Ting and A. O. Hero, “Detection of a random walk signal in the regime of low signal to noise ratio and long observation

time,” in 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 3, May 2006.

[12] A. Agaskar and Y. M. Lu, “Optimal detection of random walks on graphs: Performance analysis via statistical physics,”

April 2015, http://arxiv.org/abs/1504.06924.

[13] D. Bajovi´

c, J. M. F. Moura, and D. Vukobratovic, “Detecting random walks on graphs with heterogeneous sensors,” July

2017, http://arxiv.org/abs/1707.06900.

[14] J. N. Tsitsiklis and V. D. Blondel, “The Lyapunov exponent and joint spectral radius of pairs of matrices are hard—when

not impossible—to compute and to approximate,” Mathematics of Control, Signals and Systems, vol. 10, no. 1, pp. 31–40,

March 1997. [Online]. Available: http://dx.doi.org/10.1007/BF01219774

[15] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications. Boston, MA: Jones and Barlett, 1993.

[16] Y. Sung, L. Tong, and H. V. Poor, “Neyman-Pearson detection of Gauss-Markov signals in noise: closed-form error exponent

and properties,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1354–1365, April 2006.

[17] P.-N. Chen, “General formulas for the Neyman-Pearson type-II error exponent subject to ﬁxed and exponential type-I error

bounds,” IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 316–323, Jan 1996.

[18] I. Flores, “Direct calculation of k-generalized Fibonacci numbers,” Fibonacci Quarterly, vol. 5, no. 3, pp. 259–266, 1967.

December 27, 2017 DRAFT

36

[19] H. Furstenberg and H. Kesten, “Products of random matrices,” Ann. Math. Statist., vol. 31, no. 2, pp. 457–469, 06 1960.

[Online]. Available: http://dx.doi.org/10.1214/aoms/1177705909

[20] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, United Kingdom: Cambridge University Press, 2004.

[21] O. Zeitouni, “Gaussian Fields,” March 2016, Lecture notes. Courant institute, New York.

[22] C. Shalizi, “Stochastic processes,” 2007, Lecture notes. Stochastic processes. Carnegie Mellon University.

[23] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, United Kingdom: Cambridge University Press, 1990.

[24] A. F. Karr, Probability. Springer Texts in Statistics. New York: Springer-Verlag, 1993.

December 27, 2017 DRAFT