Page 1

Automatica 43 (2007) 1418–1425

www.elsevier.com/locate/automatica

Brief paper

Parameteridentificationfornonlinearsystems:Guaranteedconfidence

regionsthroughLSCR?

Marco Dalaia, Erik Weyerb, Marco C. Campia,∗

aDepartment of Electrical Engineering and Automation, University of Brescia, Via Branze 38, 25123 Brescia, Italy

bDepartment of Electrical and Electronic Engineering, The University of Melbourne, Parkville VIC 3010, Australia

Received 6 June 2006; received in revised form 27 October 2006; accepted 19 January 2007

Available online 19 June 2007

Abstract

In this paper we consider the problem of constructing confidence regions for the parameters of nonlinear dynamical systems. The proposed

method uses higher order statistics and extends the LSCR (leave-out sign-dominant correlation regions) algorithm for linear systems introduced

in Campi and Weyer [2005, Guaranteed non-asymptotic confidence regions in system identification. Automatica 41(10), 1751–1764. Extended

version available at ?http://www.ing.unibs.it/∼campi?]. The confidence regions contain the true parameter value with a guaranteed probability

for any finite number of data points. Moreover, the confidence regions shrink around the true parameter value as the number of data points

increases. The usefulness of the proposed approach is illustrated on some simple examples.

? 2007 Elsevier Ltd. All rights reserved.

Keywords: Confidence sets; Finite sample results; Nonlinear system identification

1. Introduction

It is well known that a model of a dynamical system is of

limited use if no quality tag which describes the accuracy of

the model is attached. Confidence regions for the system pa-

rameters are commonly used as quality tags, and asymptotic

theoryiswidelyusedfortheconstructionofsuchregions.How-

ever, in practice one always has a finite number of samples,

and—even though the asymptotic theory delivers sensible re-

sults in many cases—there are also examples (Garatti, Campi,

& Bittanti, 2004) where it fails when applied to a finite num-

ber of data points. Thus, there is a need for techniques which

deliver confidence regions with guaranteed probabilities when

only a finite number of data points are available.

?This paper was not presented at any IFAC meeting. This paper was

recommended for publication in revised form by Associate Editor Antonio

Vicino under the direction of Editor Torsten Söderström.

∗Corresponding author. Tel.: +390303715458; fax: +39030380014.

E-mail addresses: marco.dalai@ing.unibs.it (M. Dalai),

e.weyer@ee.unimelb.edu.au (E. Weyer), marco.campi@ing.unibs.it

(M.C. Campi).

0005-1098/$-see front matter ? 2007 Elsevier Ltd. All rights reserved.

doi:10.1016/j.automatica.2007.01.016

In Campi and Weyer (2005) a method called LSCR (leave-

outsign-dominantcorrelationregions)wasproposedforfinding

confidence regions to which the parameters of a linear system

belong with guaranteed probability. See also Campi and Weyer

(2006) for a comprehensive presentation of LSCR. LSCR ex-

tends earlier work by Hartigan (1969, 1970) to a dynamical

system setting, and it has two important features: first, the prob-

ability that the confidence region contains the true parameters

is guaranteed for any finite amount of data samples; second, the

confidence region concentrates around the true parameter value

when the number of samples increases. In Campi and Weyer

(2005), second order statistics were explored for the construc-

tion of the confidence regions. In the present paper, we consider

nonlinear systems. It is well known (see for example, Ljung,

2001 for a general discussion, or Subba Rao, 1981 for the par-

ticular case of bilinear systems) that second order statistics are

insufficient for the identification of nonlinear systems. Here we

show that it is possible to extend the framework of LSCR to

higher order statistics, and hence to consider the problem of

nonlinear system identification within this setting.

The focus of this paper is on time series, that is the system

to be identified has no exogenous inputs which are measured.

The outline of the paper is as follows. In the next section, we

Page 2

M. Dalai et al. / Automatica 43 (2007) 1418–1425

1419

motivate the use of higher order statistics for nonlinear systems.

Section 3 contains the procedure for the construction of the

confidence region, and the properties of this procedure are also

studied. In Section 4 a simulation example using a bilinear

system is presented before conclusions are given in Section 5.

2. A simple nonlinear example: from second to higher

order statistics

This section illustrates the problems encountered when the

standard LSCR procedure of Campi and Weyer (2005) using

second order statistics is applied to a nonlinear system.

Consider the system

yt= ?0(y2

where ?0is the parameter value to be identified and wt is an

independent sequence of Gaussian variables with zero mean

and unit variance. We use the standard LSCR algorithm for

construction of a confidence region for ?0. To this end, we first

rewritethesystemwithagenericparameter?,yt=?(y2

wt, and then compute the associated optimal predictor: ˆ yt(?)=

?(y2

constructs a confidence region based on an empirical evaluation

of the correlations E[?t(?)?t+r(?)], r?1. In Campi and Weyer

(2005) it is shown that ?0is the only value of ? for which these

correlations are zero in the case of linear ARMA systems and,

consequently, the obtained confidence region shrinks around

the true parameter value ? = ?0as the number of data points

grows. Here we show that E[?t(?)?t+r(?)] = 0 does not imply

? = ?0for the system in (1), i.e. second order statistics do not

suffice.

Suppose that the true parameter value is ?0=0. Then yt=wt,

and we have

t−1− 1) + wt, (1)

t−1−1)+

t−1−1), and the prediction error: ?t(?)=yt− ˆ yt(?). LSCR

?t(?) = yt− ˆ yt(?) = wt− ?(w2

Thus,

t−1− 1).

E[?t(?)?t+r(?)]

= E[(wt− ?(w2

For r?2, E[?t(?)?t+r(?)] = 0 for any value of ? since wtand

(w2

t−1− 1) are zero mean random variables, and the products

in (2) only contain terms with different time indeces. For r =1

we have: E[?t(?)?t+1(?)]=−?E[wt(w2

E[wt]) = 0. So, E[?t(?)?t+r(?)] = 0 for any r?1, and any

value of ?. This implies that it is not possible to establish the

true value of ? from the conditions E[?t(?)?t+r(?)] = 0. In

turn, following the analysis in Campi and Weyer (2005), we

see that the confidence region obtained by using the standard

LSCR algorithm does not shrink around ?0when the number

of samples increases.

We complete this example by showing that the true value ?0

can indeed be determined by using higher order statistics. Take

for example the condition E[?2

E[?2

t−1− 1))(wt+r− ?(w2

t+r−1− 1))]. (2)

t−1)]=−?(E[w3

t]−

t(?)?t+1(?)] = 0. We have

t] − E[w4

t(?)?t+1(?)] = ?(E[w2

t]) = ?(1 − 3) = −2?. (3)

Thus, E[?2

?= 0 for any ? ?= ?0.

So, in order to construct confidence regions that shrink

around ?0higher order statistics must be utilized. In the next

section we generalize the LSCR method to this case.

t(?0)?t+1(?0)]=0 since ?0=0, while E[?2

t(?)?t+1(?)]

3. Extension of LSCR to higher order statistics

Consider a nonlinear system S0which maps a non-measured

noise process wt into a measured signal yt. Furthermore, as-

sume that S0belongs to a parameterized system class {S?}, that

is S0=S?0 for some ?0. wtis an independent sequence of ran-

dom variables, whose distribution is symmetric around zero.

Apart from this, we make no other assumptions on wt. The dis-

tribution of wtcan as well be time-varying. We aim at finding

a confidence region for the parameter vector ?0by observing

the output yt.

The LSCR method in Campi and Weyer (2005) constructs,

for every value of ?, a sequence wt(?) such that for the true

parameter ?0we have that wt(?0)=wt. Then, roughly speaking,

the confidence region for ?0is obtained by choosing the values

of ? for which wt(?) resembles an independent process. For

linear systems, one can take wt(?)=?t(?), the prediction error,

since ?t(?0) = wt, see Campi and Weyer (2005).

The case of nonlinear systems requires some extra care be-

cause ?t(?0) ?= wtand ?t(?0) is not even an independent pro-

cess in general. To see this, consider, e.g. the system class

yt= ?yt−1+ yt−1wt. The optimal predictor is ˆ yt(?) = ?yt−1;

but yt− ˆ yt(?0) = yt−1wtis not an independent sequence!

In order to obtain a sequence wt(?) such that wt(?0)=wt, we

can proceed in a different way by resorting to system inversion

instead of constructing the prediction error, see Fig. 1. For

linearsystemsthesetwoapproachescoincidesinceconstructing

the prediction error is the same as inverting the system. In

the example above we let wt(?) = (yt− ?yt−1)/yt−1, so that

wt(?0) = wtas long as yt−1?= 0. System inversion is used as

a basic building block in the algorithm presented below.

Before proceeding we formally introduce our working as-

sumptions.

Assumptions.

(i) The observed data yt are obtained as output of a causal

system S0whose input is an independent noise sequence wt

symmetrically distributed around zero, i.e. yt= S0(w?,??t).

(ii) The system S0belongs to a system model class S?, i.e.

there exists a value ?0of the parameter such that S?0 = S0.

(iii) The systems in {S?} are invertible with a causal inverse,

i.e. for every ? there exists an inverse system S−1

S−1

?

such that

?(y?(?),??t) = wt, where yt(?) = S?(w?,??t).

wt

yt

yt

S0

S-1

?

wt(?)

Fig. 1. Scheme for the extraction of wt(?).

Page 3

1420

M. Dalai et al. / Automatica 43 (2007) 1418–1425

The assumptions state that the model class consists of causal

systems which are also causally invertible and that the true data

generating system belongs to the model class.

3.1. Construction of the confidence region

We next describe the algorithm for the construction of the

confidence region.

Algorithm.

(A.1). Compute wt(?) = S−1

(A.2). Choose an integer s?0 and let e = (e0,e1,...,es) be

a vector of nonnegative integers such that at least one

of the ej, 0?j ?s, is odd (the way e should be chosen

is discussed later). For every t = 1,2,...,K − s = N,

compute

?(y?,??t) for t = 1,2,...,K.

ft,e(?) =

s?

j=0

wt+j(?)ej.

(A.3). Let IN= {1,...,N} and consider a collection GNof

differentsubsetsIN

i

⊆ IN,i=1,...,M,formingagroup

underthesymmetricdifferenceoperation(i.e.(IN

(IN

i

∩ IN

loss of generality, that IN

Mis the zero element of the group

GN: IN

M= ∅, the empty set. Compute

1

#IN

i

k∈IN

i

(# stands for “number of elements in the set”).

(A.4). Select an integer q in the interval [1,(M +1)/2) and find

the confidence region ?N

functions are bigger than zero and at least q are smaller

than zero.

i∪IN

j)−

j) ∈ GNif IN

i,IN

j

∈ GN). Suppose, without

gN

i,e(?) =

?

fk,e(?),i = 1,...,M − 1

ewhere at least q of the gN

i,e(?)

The intuitive idea behind the algorithm is as follows. For

the true parameter vector ?0, wt(?0) = wt is an independent

sequence symmetrically distributed around zero. Since at least

one ejis odd, ft,e(?0) is a zero mean random variable. More-

over, when ? = ?0, the functions gN

sums of zero mean random variables. It is therefore unlikely

that nearly all of them are positive or that nearly all of them

are negative. Based on this observation we exclude the regions

in parameter space where the gN

i,e(?) functions take on positive

or negative values too many times.

Note that the construction of ?N

edge of the characteristics of the noise wt. The Algorithm let

the data speak for themselves and constructs the region ?N

correspondingly: ?N

is through data only, not through a priori assumptions.

The next theorem says that the Algorithm always produces

a region that contains ?0with a probability chosen by the user.

i,e(?),i = 1,...,M − 1, are

edoes not require any knowl-

e

edoes depend on the noise level, but this

Theorem1. Theregion?N

that

P[?0∈ ?N

econstructedabovehastheproperty

e] = 1 − 2q/M.

-1-0.8-0.6-0.4-0.20

θ

0.20.40.60.81

-5

-4

-3

-2

-1

0

1

2

3

4

gi,e (θ)

Fig. 2. Some of the gN

i,e(?) functions obtained with e = (2,1), N = 1000.

Proof. See Appendix A.1.

?

Thus, the user controls the probability that ?0∈ ?N

choice of q.

In general, in order to determine a confidence region of

suitable shape we may want to intersect several regions

?N

e1,e2,...,eh, then the confidence region is given by

evia the

eobtained with different e vectors. If we have h vectors

?N=

h?

l=1

?N

el.(4)

Theorem2. Theregion?Nconstructedabovehastheproperty

that

P[?0∈ ?N]?1 − 2hq/M.(5)

Proof. The proof follows from Theorem 1. The inequality in

(5) is due to possible overlaps between the events ?0/ ∈?N

l = 1,...,h.

el,

?

To make the procedure more concrete, we next apply it to

the example in Section 2.

Example 3. Suppose we want to find a 90% confidence region

for ?0. Since yt=?(y2

Note that, in this example, wt(?) = ?t(?), the prediction error.

In Section 2, we established that E[?t(?)2?t+1(?)]=0 only for

? = ?0. Motivated by this observation we take e = (2,1).

We simulated the system with N =1000 and constructed the

group GNas explained in Appendix A.3 with M = 256. We

discarded the parameter values where less than q=12 functions

out of the M = 256 functions were positive or less than 12

functions were negative. Fig. 2 shows some of the obtained

gN

be [−0.05,0.03].

t−1−1)+wt, let wt(?)=yt−?(y2

t−1−1).

i,e(?) functions. The confidence interval for ?0turned out to

Page 4

M. Dalai et al. / Automatica 43 (2007) 1418–1425

1421

0

1000 2000 3000 4000500060007000 80009000

10000

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

N

ΘN

11000

Fig. 3. Ninety percent confidence regions with e = (2,1) for increasing N.

-1-0.8 -0.6 -0.4-0.200.20.4 0.60.81

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

θ

gi,e (θ)

Fig. 4. Some of the gN

i,e(?) functions obtained with e = (1,1), N = 1000.

The gN

E[w2

the gN

i,e(?) functions cut the ? axis near ?=0 and the confidence

region is a neighborhood of 0 = ?0.

As N increases one expects that the gN

better and better approximations of the function −2?. Corre-

spondingly, the confidence interval is expected to shrink around

? = 0. In Fig. 3 the confidence intervals obtained for increas-

ing values of N are shown, and the trend is that the length of

the intervals decreases as N increases. Section 3.2 provides a

general study of the convergence properties of the algorithm.

As a comparison, Fig. 4 shows some of the gN

obtained using the second order statistic E[wt(?)wt+1(?)],

i.e. by choosing e = (1,1). As we noticed in Section 2,

E[wt(?)wt+1(?)] = 0 for all values of ?; hence the gN

i,e(?) functions are estimates of the third order statistic

t(?)wt+1(?)]. From Eq. (3), E[w2

t(?)wt+1(?)] = −2?, so

i,e(?) functions become

i,e(?) functions

i,e(?)

functions are all flat along the ? axis, and the confidence

interval does not shrink around ? = 0.

?

Theorems 1 and 2 are very general and apply to any statistic

as described in point A.2 of the algorithm. Consequently, the

probability of the obtained region is always guaranteed. On the

other hand, the effect of the used statistics shows up in the

shape of the obtained region. Determining suitable statistics is

a problem for which no general guidelines can be given, and

the user should choose the statistics based on an analysis of the

system class at hand. See also Section 4 for an example.

3.2. Asymptotic behavior

When N → ∞, we would like the confidence region ?N

given by (4) to shrink around the true value ?0. In this section,

we discuss general conditions for this to happen.

Weneedthefollowingadditionalassumptions(whilesomeof

theseassumptionscanberelaxed,wehavepreferredtomaintain

them to avoid very technical mathematical derivations).

Assumptions.

(iv) The input noise wt is independent and identically dis-

tributed (i.i.d.).

(v) For every ? the considered statistics are in L1, i.e.

E[|ft,el(?)|]<∞,l = 1,2,...,h, and ?0is the only solution

to the set of conditions E[ft,el(?)] = 0, l = 1,2,...,h.

(vi)ThegroupsGNareconstructedasexplainedinAppendix

A.3, the value of M is fixed and the value of N is increasing.

Theorem 4. Under the hypotheses above, for every fixed ? ?=

?0,

P[∃¯ N |? / ∈?N,∀N >¯ N] = 1.

Proof. See Appendix A.2.

In other words, Theorem 4 says that any ? ?= ?0is eliminated

from ?Nstarting at some¯ N with probability 1.

Remark 5. The Algorithm in Section 3.1 can be generalized so

that the assumption of Theorem 4 that E[|ft,el(?)|]<∞,l=1,

2,...,h, is certainly satisfied. In points A.2 and A.3 of the Al-

gorithm, the ft,e(?) functions can be replaced by more general

expressions. By inspection of the proof of Theorem 1, we see

that the only property of ft,e(?) used is that ft,e(?) is a function

of wt(?),wt+1(?),...,wt+s(?) which is even or odd in all ar-

gumentsandoddinatleastoneargument.Forexample,suppose

s =2 and e=(2,1,2), then ft,e(?)=wt(?)2wt+1(?)wt+2(?)2.

This function is even in wt(?) and wt+2(?) and odd in

wt+1(?). However, other functions than monomials exhibit

the same odd–even structure. For example, the function

tanh2(wt(?))tanh(wt+1(?))tanh2(wt+2(?)), where tanh is the

hyperbolic tangent, can be used and Theorem 1 still holds. This

observation makes it easier to satisfy the first part of Assump-

tion (v) where it is required that E[|ft,e(?)|]<∞ since such

Page 5

1422

M. Dalai et al. / Automatica 43 (2007) 1418–1425

a condition is automatically satisfied by considering bounded

functions such as tanh2(wt(?))tanh(wt+1(?))tanh2(wt+2(?)).

4. Application example: a simple bilinear system

Here we illustrate the proposed approach on a bilinear sys-

tem, see Bruni, Di Pillo, and Koch (1974), Fnaiech and Ljung

(1987), Mohler and Kolodziej (1980), Priestley (1991), and

Subba Rao (1981).

Consider the system

yt= ?0yt−2wt−1+ wt,

where wtis i.i.d. with symmetric distribution around zero and

with unit variance. This system has been studied in detail in

Terdik and Máth (1998). By iterating (6), it is easy to see that

the output ytcan, for any q?1, be written as

(6)

yt=

q−1

?

k=0

?0kwt−2k

k?

j=1

wt−2j+1+ ?0qyt−2q

q?

j=1

wt−2j+1. (7)

Note that the product?q

⎡

⎣

Thus, if |?0|<1, by letting q → ∞ in (7) we can take

∞

?

as a candidate stationary solution. A calculation omitted here

shows that the series on the right-hand side of (8) is indeed

convergent in the L2-sense as well as almost surely, the limit

is stationary and it satisfies the system Eq. (6). We will refer

to this stationary solution in what follows.

A simulation with ?0=0.2 and wtnormally distributed with

zero mean and unit variance was carried out. A confidence

region was then constructed as explained next.

Following the procedure in the previous section, wt(?) was

obtained by applying the inverse system S−1

output yt, which can be done by solving the recursive relation

j=1wt−2j+1has second order moment

equal to 1 for any q:

⎛

j=1

E

⎢

⎝

q?

wt−2j+1

⎞

⎠

2⎤

⎥

⎦=

q?

j=1

E[w2

t−2j+1] = 1.

yt=

k=0

?0kwt−2k

k?

j=1

wt−2j+1

(8)

?

(|?|<1) to the

wt(?) = yt− ?yt−2wt−1(?).

Note that for ? = 0 we have wt(0) = yt. This has impor-

tant consequences: after some cumbersome calculations it is

possible to show that yt satisfies E[ytyt+r] = 0 for every

r >0 and E[ytyt+ryt+l] = 0 for every l?r?0 except for

(r,l)=(1,2).Sincewt(0)=yt,thisimpliesthat—independently

of the true parameter ?0—the value ? = 0 is a solution

of the equations E[wt(?)wt+r(?)] = 0 for every r >0 and

E[wt(?)wt+r(?)wt+l(?)] = 0 for every l?r?0 with (r,l) ?=

(1,2). So it is clear that the only possible statistic (up to third

order) is E[wt(?)wt+1(?)wt+2(?)]. Indeed, this choice turns

out to be an effective one since it can be shown that the only

-0.8 -0.6-0.4 -0.20

θ

0.2 0.4 0.6

-4

-2

0

2

4

6

8

10

12

gi,e (θ)

0.8

Fig. 5. Some of the gN

i,e(?) functions obtained with e = (1,1,1), N = 1000.

0

100020003000400050006000

N

7000 80009000

10000

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

ΘN

11000

Fig. 6. Ninety percent confidence regions with e=(1,1,1) for increasing N.

solution to E[wt(?)wt+1(?)wt+2(?)] = 0 is the true parameter

? = ?0.

Following the above reasoning, we selected e = (1,1,1) in

point A.1 in Section 3.1. The group GNwas constructed as

in Appendix A.3 with M = 256, and the functions gN

i = 1,2,...,M − 1, are given by

1

#IN

i

k∈IN

i

i,e(?), for

gN

i,e(?) =

?

wk(?)wk+1(?)wk+2(?).

SomeofthegN

in Fig. 5. The corresponding 90% confidence region for ?0

turned out to be [0.11,0.21]. In Fig. 6 the confidence regions

for different values of N are plotted.

i,e(?)functionsobtainedwithN=1000 areshown

Page 6

M. Dalai et al. / Automatica 43 (2007) 1418–1425

1423

5. Conclusion

In this paper we have derived a method for construction of

confidence regions for the parameters of nonlinear systems.

The obtained confidence regions have guaranteed probability

to contain the true parameter value for any finite number of

data points. Moreover, the confidence regions shrink around

the true ?0under natural assumptions on the data generating

system and the model class provided the higher order statistics

are suitably chosen.

Acknowledgments

Research partly supported by MIUR under the project “New

Methods for Identification and Adaptive Control for Industrial

Systems”, by EC contract IST-2000-28304 “SPATION” and by

the Australian Research Council under the Discovery Grant DP

0558579.

Appendix A. Proofs

A.1. Proof of Theorem 1

The proof is similar to the proof of Theorem 2.1 in Campi

and Weyer (2005), the only difference being that Proposition

A.1 in appendix A of Campi and Weyer (2005) is replaced by

Proposition 6 below. Throughout, we omit to indicate explicitly

the dependence on N and we, e.g. write G for GNand Iifor IN

i.

Proposition 6. Let wt be a sequence of independent random

variables with symmetric distribution around zero. Let I =

{1,...,N}, and let G be a collection of subsets Ii ⊆ I, i =

1,...,M, forming a group under the symmetric difference op-

eration (i.e. Ii?Ij:= (Ii∪ Ij) − (Ii∩ Ij) ∈ G if Ii,Ij∈ G).

Choose an integer s?0 and let e=(e0,e1,...,es) be a vector

of nonnegative integers such that at least one ej is odd. For

every t ∈ I, let Wt=?s

⎧

⎩

has the same joint M-dimensional distribution as the set of

variables

⎧

⎩

provided that the order of the variables is suitably rearranged.

j=0wej

t+j. Pick any¯I ∈ G; then, the set

of variables

⎨

?

k∈Ii

Wk,i = 1,...,M

⎫

⎭

⎬

(A.1)

⎨

?

k∈Ii

Wk−

?

k∈¯I

Wk,i = 1,...,M

⎫

⎬

⎭,(A.2)

Proof. The idea of the proof is to introduce new variables

˜ wt= −wtfor some of the wtand to rewrite these wtas − ˜ wt

in (A.2) in such a way that the set (A.2) is written as (A.1)

with some of the wtreplaced with ˜ wt. As wtis symmetrically

distributed around 0, wtand ˜ wtwill have the same distribution

and (A.2) and (A.1) will have the same joint M-dimensional

distribution.

Consider the whole set of elements

W1, W2, W3, ..., WN.(A.3)

We scan these elements from left to right and we rewrite some

of them in the new notation. Starting from W1, we do not

change anything until we find an element—say W¯k—in the set

{Wk,k ∈¯I}. Recall that

W¯k= we0

¯kwe1

¯k+1···wes

¯k+s.

Let p be the maximum integer such that epis odd and define

˜ w¯k+p= −w¯k+p. Then rewrite W¯kas

W¯k= −we0

¯kwe1

¯k+1··· ˜ wep

¯k+p···wes

¯k+s.

We next substitute the old variable w¯k+pwith the new one

− ˜ w¯k+pin all other elements Wkof the sequence (A.3) where

the variable w¯k+pshows up. The important thing to note is that

the substitution of w¯k+pwith − ˜ w¯k+pdoes not introduce any

“minus” sign in front of the elements Wkwith k <¯k. In fact,

if w¯k+pis contained in an element Wk? with k?<¯k then, by

construction, this w¯k+pis raised to an even exponent. Thus,

with this substitution only the signs of the Wkfor k >¯k can be

affected. We continue with our procedure and check the sign

of W¯k+1, W¯k+2and so on. If the generic element Wkhas sign

“+” and k ∈¯I, or if Wkhas sign “−” and k / ∈¯I, we substitute

the variable wk+pwith − ˜ wk+p, stopping the procedure when

all the Wkhave been scanned. (See Example 7 at the end of

the proof for an example of this procedure.)

Set vk=wkif wkhas not been substituted and vk= ˜ wkif wk

has been substituted. Define the new elements Vk=?s

ith element of (A.2) is given by

j=0vej

k+j.

If k ∈¯I we have Wk=−Vk, while if k / ∈¯I Wk=Vk. Now, the

?

k∈Ii−¯I

Wk−

?

k∈¯I−Ii

Wk=

?

?

k∈Ii−¯I

Vk+

?

k∈¯I−Ii

Vk

=

k∈Ii?¯I

Vk. (A.4)

As G is a group under the symmetric difference, the set

{Ii?¯I, i=1,...,M} coincides with the set {Ii, i=1,...,M}.

Thismeansthat(A.2)canbewritten,byreorderingtheelements

and using (A.4), as

⎧

⎩

But, for every k, vkand wkhave the same distribution and, as

the wkare independent, so are the vk. Thus, for every k, Wkand

Vkhave the same distribution and, more generally, the set of

variables in (A.5) has the same joint M-dimensional distribution

as the set of variables in (A.1).

⎨

?

k∈Ii

Vk,i = 1,...,M

⎫

⎬

⎭.(A.5)

?

Page 7

1424

M. Dalai et al. / Automatica 43 (2007) 1418–1425

Example 7. For the sake of clarity we give a simple exam-

ple illustrating the procedure explained in the proof for the

substitutions of the wkwith − ˜ wk. Set I = {1,...,7}, s = 4,

e=(1,0,3,2) and¯I ={2,4,5}. The sequence of elements Wk

is

w1w3

w5w3

3w2

4,

7w2

w2w3

w6w3

4w2

5,

8w2

w3w3

w7w3

5w2

6,

9w2

w4w3

10.

6w2

7,

8,

9,

We consider these elements from left to right. As 1 / ∈¯I, we skip

W1. Then we find that 2 ∈¯I. Here, p=2, so that we substitute

w4with − ˜ w4obtaining

w1w3

w5w3

3˜ w2

4,

7w2

−w2˜ w3

w6w3

4w2

8w2

5,w3w3

w7w3

5w2

9w2

6,

10.

− ˜ w4w3

6w2

7,

8,

9,

Note that the substitution of w4has not changed the sign of W1.

Continuing, we skip W3and W4because their signs are already

correct (there is a “+” in front of W3and 3 / ∈¯I, and there is

a “−” in front of W4and 4 ∈¯I). We stop again at W5, which

is written without a “−” while 5 ∈¯I. Thus, we substitute w7

with − ˜ w7obtaining

w1w3

3˜ w2

− w5˜ w3

4,

−w2˜ w3

7w2

8,

4w2

w6w3

5,

8w2

w3w3

5w2

− ˜ w7w3

6,

− ˜ w4w3

9w2

6˜ w2

7,

9,

10.

Finally, we skip W6and we stop at W7because there is a “−”,

but 7 / ∈¯I. Thus we change w9with − ˜ w9obtaining

w1w3

3˜ w2

− w5˜ w3

4,

−w2˜ w3

7w2

8,

4w2

w6w3

5,

8˜ w2

w3w3

5w2

˜ w7˜ w3

6,

9w2

− ˜ w4w3

10,

6˜ w2

7,

9,

and the procedure is completed.

A.2. Proof of Theorem 4

We will prove that with probability 1 the functions gN

i = 1,...,M − 1, tend to E[ft,el(?)] when N goes to infinity.

For ? ?= ?0there is an l such that E[ft,el(?)] ?= 0 (see assump-

tion (v)), and for that value of l, when N → ∞ all the gN

i =1,...,M −1, will have the same sign as E[ft,el(?)]. Con-

sequently, ? will be discarded from ?Nfor N large enough, as

stated in the theorem.

Take an element IN

i

in the group GN. For a fixed i, we

consider the elements in IN

i

for increasing N. Note first that

IN

i

is a set increasing with N, i.e. IN1

N =n(M −1), for n=1,2,..., that is we restrict attention to

N that are multiples of (M −1) (the case of generic N’s easily

follows). The set In(M−1)

i

can be decomposed as

i,el(?),

i,el(?),

i

⊆ IN2

i

if N1?N2. Let

In(M−1)

i

=

?

i

j∈I(M−1)

{j,j + (M − 1),...,j + (n − 1)(M − 1)},

so focusing on subsets of regularly spaced indices. We now

have

1

#In(M−1)

i

k∈In(M−1)

i

1

n · #IM−1

i

?

i

We want to show that

gn(M−1)

i,el

(?) =

?

?

fk,el(?)

=

i

j∈IM−1

n−1

?

n−1

?

r=0

fj+r(M−1),el(?)

=

1

#IM−1

i

j∈IM−1

1

n

r=0

fj+r(M−1),el(?).(A.6)

1

n

n−1

?

r=0

fj+r(M−1),el(?) → E[ft,el(?)]

a.s.,(A.7)

for any j, so concluding the proof.

wt is an i.i.d. process and hence it is strict sense sta-

tionary and ergodic. Since wt(?) = S−1

yt = S0(w?,??t) we have that wt(?) is a function of

wt,wt−1,..., etc. and fj+r(M−1),el(?) is a function of

wj+r(M−1)+s,wj+r(M−1)+s−1,..., etc. Thus fj+r(M−1),el(?)

inherits from wt the property of being strict sense stationary

and ergodic, from which (A.7) follows from Birchoff–Khinchin

theorem (see Shiryaev, 1991, Theorem 3 in Section 3,

Chapter 5).

The reader may be interested in noting that the splitting of

(A.6) in a double summation formula is necessary because,

even though ft,el(?) is stationary, fk,el(?), k ∈ In(M−1)

general not a stationary sequence due to the irregular sampling.

?(y?,??t) and

i

, is in

A.3. Group construction

GivenasetIN={1,2,...,N}andanintegerM=2m,weuse

the following extension of Gordon’s method, (Gordon, 1974),

for constructing a collection GNof M subsets IN

which is a group under the symmetric difference.

i,i=1,...,M,

(1) Generate an M × (M − 1) matrix QM−1using Gordon’s

construction (Gordon, 1974). That is, let R(1) = [1], and

recursively compute (k = 2,3,...,m)

?R(k − 1)

0T

R(k) =

R(k − 1)

J − R(k − 1)

eT

0

e

1

R(k − 1)

?

,

where J and e are, respectively, a matrix and a vector of

all ones and 0 is a vector of all zeros. Then let

?R(m)

(2) Construct the matrix

QM−1=

0T

?

.

Q = [QM−1

by listing enough QM−1matrices so that Q has at least

N columns and then extract the submatrix QNof Q con-

taining the first N columns of Q. The so obtained QNis

QM−1···QM−1]

Page 8

M. Dalai et al. / Automatica 43 (2007) 1418–1425

1425

the incidence matrix of GN, i.e. the matrix with generic

element QN(i,j) = 1 if j ∈ IN

i

and zero otherwise.

References

Bruni, C., Di Pillo, G., & Koch, G. (1974). Bilinear systems: An appealing

class of nearly linear systems in theory and applications. IEEE Transactions

on Automatic Control, AC-19, 334–348.

Campi, M. C., & Weyer, E. (2005). Guaranteed non-asymptotic confidence

regions in system identification. Automatica, 41(10), 1751–1764 Extended

version available at ?http://www.ing.unibs.it/∼campi?.

Campi, M. C., & Weyer, E. (2006). Identification with finitely many data

points: the LSCR approach, semi-plenary presentation. In Proceedings of

the 14th IFAC Symposium on system identification, (pp. 46–64) SYSID

2006, Newcastle, Australia.

Fnaiech, F., & Ljung, L. (1987). Recursive identification of bilinear systems.

International Journal of Control, 45(2), 453–470.

Garatti, S., Campi, M. C., & Bittanti, S. (2004). Assessing the quality

of identified models through the asymptotic theory—when is the result

reliable?. Automatica, 40(8), 1319–1332.

Gordon, L. (1974). Completely separating groups in subsampling. Annals of

Statistics, 2, 572–578.

Hartigan, J. A. (1969). Using subsample values as typical values. Journal of

American Statistical Association, 64, 1303–1317.

Hartigan, J. A. (1970). Exact confidence intervals in regression problems

with independent symmetric errors. Annals of Mathematical Statistics, 41,

1992–1998.

Ljung, L. (2001). Estimating linear time-invariant models of nonlinear time-

varying systems. European Journal of Control, 7, 203–219.

Mohler, R. R., & Kolodziej, W. J. (1980). An overview of bilinear system

theory and applications. IEEE Transactions Systems Man and Cybernetics,

SMC-10, 683–688.

Priestley, M. (1991). Non-linear and non-stationary time series analysis. New

York: Academic press.

Shiryaev, A. N. (1991). Probability. (2nd ed.), New York: Springer.

Subba Rao, T. (1981). On the theory of bilinear time series models. Journal

of Royal Statistical Society Series B, 43, 244–255.

Terdik, G. Y., & Máth, J. (1998). A new test of linearity of time series based

on the bispectrum. Journal of Time Series Analysis, 19(6), 737–753.

Marco Dalai was born in 1979 in Manerbio,

Italy. He obtained his Dr. Eng. degree in Elec-

tronic Engineering in 2003 from the University

of Brescia, Italy, and, since 2004, he has been

a Ph.D. student in Information Engineering at

the Department of Electronics for Automation

of this same university.

Erik Weyer received the Siv. Ing. degree in

1988 and the Ph.D. in 1993, both from the

Norwegian Institute of Technology, Trondheim,

Norway. From 1994 to 1996 he was a Research

Fellow at the University of Queensland, and

since 1997 he has been with the Department

of Electrical and Electronic Engineering, the

University of Melbourne, where he is currently

a Senior Lecturer. His research interests are in

the area of system identification and control.

Marco Claudio Campi is Professor of Auto-

matic Control at the University of Brescia, Italy.

He was born in Tradate, Italy, on December 7,

1963. In 1988, he received the Doctor degree

in electronic engineering from the Politecnico

di Milano, Milano, Italy. From 1988 to 1989,

he was a Research Assistant at the Department

of Electrical Engineering of the Politecnico di

Milano. From 1989 to 1992, he worked as a

Researcher at the Centro di Teoria dei Sistemi

of the National Research Council (CNR) in Mi-

lano. Since 1992, he has been with the Univer-

sity of Brescia, Italy.

Marco Campi is an Associate Editor of Systems and Control Letters, and a

past Associate Editor of Automatica and the European Journal of Control.

Serves as Chair of the Technical Committee IFAC on Stochastic Systems (SS)

and is a member of the Technical Committee IFAC on Modeling, Identifica-

tion and Signal Processing (MISP) and of the Technical Committee IFAC on

Cost Oriented Automation. Moreover, he is a Distinguished Lecturer under

the IEEE Control Systems Society (CSS) Program. His doctoral thesis was

awarded the “Giorgio Quazza” prize as the best original thesis for year 1988.

He has held visiting and teaching positions at many universities and institu-

tions including the Australian National University, Canberra, Australia; the

University of Illinois at Urbana-Champaign, USA; the Centre for Artificial

Intelligence and Robotics, Bangalore, India; the University of Melbourne,

Australia; the Kyoto University, Japan.

The research interests of Marco Campi include: system identification, stochas-

tic systems, adaptive and data-based control, robust convex optimization,

robust control and estimation, and learning theory.