Page 1

A Two-step Method For Nonlinear Polynomial Model Identification Based on

Evolutionary Optimization

Yu CHENG, Lan WANG and Jinglu HU

Graduate School of Information, Production and Systems, Waseda University

Hibikino 2-7, Wakamatsu-ku, Kitakyushu-shi, Fukuoka, JAPAN

Email: chengyu0930@fuji.waseda.jp, wanglan@fuji.waseda.jp, jinglu@waseda.jp

Abstract

A two-step identification method for nonlinear polynomial

model using Evolutionary Algorithm (EA) is proposed in

this paper, and the method has the ability to select a

parsimonious structure from a very large pool of model

terms. In a nonlinear polynomial model, the number of

candidate monomial terms increases drastically as the order

of polynomial model increases, and it is impossible to obtain

the accurate model structure directly even with state-of-

art algorithms. The proposed method firstly carries out a

pre-screening process to select a reasonable number of

important monomial terms based on the importance index.

In the next step, EA is applied to determine a set of

significant terms to be included in the polynomial model. In

this way, the whole identification algorithm is implemented

very efficiently. Numerical simulations are carried out to

demonstrate the effectiveness of the proposed identification

method.

1. Introduction

System identification is concerned with building a model

from input-output observations of a system, creating a

mathematical description of the system [1]. In these years,

various NARX (nonlinear AutoRegressive with eXogenous

inputs) models have been reported in the literature [2]. As

a polynomial expansion of NARX models, nonlinear poly-

nomial models have shown great potential in their ability in

approximating complex nonlinear input-output relationships

and have attracted much attention because it is linear-in-

parameter [3], [4], [6].

However, nonlinear polynomial model identification re-

mains a difficult task, because a very large pool of model

terms has to be considered initially [4], [7], from which a

useful model is then generated based on the parsimonious

principle, of selecting the smallest possible model [8].

What’s more, the number of candidate terms grows dras-

tically with increasing the order of the model and the

maximum delays of the input and output signals.

Some famous methods for identifying nonlinear poly-

nomial model have been proposed in the last few years.

The Orthogonal Least-Squares method (OLS) is usually

considered as an effective approach [4], [5]. However, it has

been pointed out that the algorithm cannot guarantee that

the resultant model is globally optimized [9]. In addition,

it will become prohibitively expensive as its cost increases

super linearly with the number of candidates.

In recent years, Genetic Algorithm (GA) based methods

have been proposed extensively [1], [10], [11], and it can

effectively search many local optima and thereby increases

the possibility of finding the global optimum. However, GA

based approaches are so far containing all possible terms.

When the model order and the maximum delays of input

and output signals are large, it becomes time-consuming and

easily trap into a local optimum.

Since a GA based approach is efficient when the searching

space is small, it is highly motivated to combine a GA

based approach with a pre-screening process, in which a low

accuracy identification method is used to reduce searching

space [12]. Based on this consideration, a two-step method

for nonlinear polynomial model identification is proposed.

Firstly candidate pool is reduced in a low accuracy according

to the importance index of all the terms. If the candidate pool

is still too large, a further selection would be processed with

the help of AP clustering method [13] on the correlation

coefficient between each pairs of monomial terms, and

important terms in each cluster are selected. In the second

step, single and multi-objective GA are applied to determine

the nonlinear polynomial model. In order to differ from

method using GA directly, Evolutionary Algorithm (EA) is

used to represent both the single and multi-objective GA

applied in the second step.

The paper is organized as follows: Section 2 briefly

describes the background of problem to be solved. Section

3 discusses the two-step identification approach in detail.

Section 4 provides numerical simulations and section 5

presents the discussions and conclusions.

2. Background

Consider the following SISO NARX system whose input-

output relation described by:

푦(푡) = 푔(휑(푡)) + 푒(푡)

(1)

613

978-1-4244-5612-3/09/$26.00 c ?2009 IEEE

Page 2

휑(푡)=

=

[푦(푡 − 1)⋅⋅⋅푦(푡 − 푛푦)푢(푡 − 1)⋅⋅⋅푢(푡 − 푛푢)]푇

[푥1(푡)⋅⋅⋅푥푛푦(푡)푥푛푦+1(푡)⋅⋅⋅푥푛(푡)]푇

(2)

where 푢(푡) ∈ 푅, 푦(푡) ∈ 푅, 푒(푡) ∈ 푅 are the system input,

the system output and a stochastic noise of zero-mean at

time 푡 (푡 = 1,2,⋅⋅⋅), respectively. 푔(⋅) : 푅푛=푛푢+푛푦→ 푅

is an unknown continuous function (black-box) describing

the dynamics of system under study, and 휑(푡) ∈ 푅푛is the

regression vector composed of delayed input-output data.

푛푢and 푛푦are unknown maximum delays of the input and

output, respectively(푛푢and 푛푦denote the order of system).

When identifying the system, we use the following non-

linear polynomial model with an order of 푞

푦(푡)=푦0

+

푛

∑

푛

∑

푦0+ 푦1(푡) + ⋅⋅⋅ + 푦푖(푡) + ⋅⋅⋅ + 푦푀(푡) + 푒(푡) (4)

Φ(푡)Θ푇+ 푒(푡)

푖1=1

훼푖1푥푖1(푡) +

푛

∑

푖1=1

푛

∑

푖2=푖1

훼푖1푖2푥푖1(푡)푥푖2(푡) + ⋅⋅⋅

+

푖1=1

⋅⋅⋅

푛

∑

푖푞=푖푞−1

훼푖1⋅⋅⋅푖푞푥푖1(푡)⋅⋅⋅푥푖푞(푡) + 푒(푡)

(3)

=

=

(5)

where 푀 is the maximum number of the candidate terms,

푦푖(푡) is the 푖 th monomial terms, the vectors Θ and Φ(푡) are

defined as follows:

Θ = [푦0,훼푖1,훼푖1푖2,훼푖1⋅⋅⋅푖푞;⋅⋅⋅]푇

Φ(푡) = [1,푥푖1(푡),푥푖1(푡)푥푖2(푡),⋅⋅⋅ ,푥푖1(푡)⋅⋅⋅푥푖푞(푡);⋅⋅⋅]푇

푖1,푖2,⋅⋅⋅ ,푖푞= 1,⋅⋅⋅ ,푛.

Although the polynomial model (3) is linear-in-parameter,

it consists of a huge number of candidate monomial terms.

When the order 푞 of polynomial model and the dimension 푛

of the regression vector 휑(푡) increase, the maximum number

of the candidate terms increases drastically by

푀 =

푞

∑

푖=0

푛푖

(6)

where

푛푖=푛푖−1(푛푦+ 푛푢+ 푖 − 1)

푖

,푛0= 1.

(7)

Experience shows that provided the significant terms in the

model can be detected models with about 10 terms are

usually sufficient to capture the dynamics of highly non-

linear SISO processes [4]. Most candidate terms are either

redundant or make very little contribution to the system

output and therefore can be removed from the model.

3. Two-Step Identification Scheme

3.1. Pre-screening Step

In the pre-screening step, we will select important terms

from a huge number of candidates according to the impor-

tance index. Although it is a low-accuracy method, it can

be make sure that the reduced candidate pool is big enough

that contains all the real terms.

3.1.1. Importance Index. Let 휌푖 denote the correlation

coefficient of the monomial term 푦푖(푡) and the system output

푦(푡), calculated by [12]

휌푖=

∑푁

푡=1(푦푖(푡) − 푦푖)(푦(푡) − 푦)

√∑푁

푡=1(푦푖(푡) − 푦푖)2∑푁

푡=1(푦(푡) − 푦)2

(8)

where 푦푖and 푦 are the mean of 푦푖(푡) and 푦(푡) respectively.

Needless to say, an important term should have larger value

of ∣휌푖∣, but a monomial term with larger ∣휌푖∣ does not

necessary to be an important term, because if 푦푖(푡) has larger

value of ∣휌푖∣, and the term 푦푗(푡) contains 푦푖(푡) as subset also

has larger value of ∣휌푖∣. So the importance index should be

given by

ℐ푖=

∣휌푖∣

푒푂푟푖

(9)

where 푂푟푖is the order of the 푖th term. A reasonable number

of terms could be selected according to the importance

index.

3.1.2. Further Selecting Using AP Clustering Method.

In some cases, candidate pool selected according to the

importance index is still very large because the correlation

coefficient of monomial terms and system output could

not reflect the term importance accurately. For instance, in

example 2 of the simulation part, the term size selected

from 6434 candidates according to importance index is still

about two thousand. It is still too large for EAs to optimize

directly in an efficient way. Based on this conversation, it

is motivated to reduce the candidate pool size according to

relationship between each pair of terms. Similar terms will

be clustered, and only important ones in each cluster will

be selected.

Affinity Propagation (AP) [13] is a recently introduced

algorithm for exemplar-based clustering. The goal of the

algorithm is to find good partitions of data and associate each

partition with its most prototypical data point (exemplar)

such that the similarity between points to their exemplar is

maximized, and the overall cost associated with making a

point an exemplar is minimized. Similarity between given

candidate term pair 푖 and 푗 is defined as

푠(푖,푗) =

∑푁

푡=1(푦푖(푡) − 푦푖)(푦푗(푡) − 푦푗)

√∑푁

푡=1(푦푖(푡) − 푦푖)2∑푁

푡=1(푦푗(푡) − 푦푗)2

(10)

614

2009 World Congress on Nature & Biologically Inspired Computing (NaBIC 2009)

Page 3

where 푦푖and 푦푗are the mean of 푦푖(푡) and 푦푗(푡) respectively.

According to the importance index proposed before, a rea-

sonable part of important terms in each cluster are selected,

and the candidate pool is retrenched effectively.

3.2. EA Based Identification Step

In the second step, EAs are applied in the searching

space containing only the important terms selected according

to the importance index, to determine the model structure.

Methods of single objective optimization and multi-objective

optimization will be implemented.

3.2.1. Coding For Nonlinear Polynomial Model Identifi-

cation. In the case of applying GA directly to the searching

space containing a huge number of candidates, much effort

and computing cost for encoding has to be done in order

to improve the efficiency because of the large searching

space [10], [11]. However, in our case the searching space

containing only the selected important terms is rather small,

a simple real-valued encoding scheme may be used. Since

each selected important monomial term has assigned with

a number (No.), this number is used directly as gene of

chromosome. For example, a chromosome with fixed length

of 8, defined by

00012345

It represents a polynomial model consisting of monomial

terms No.1, No.2, No.3, No.4 and No.5, where ”0” denotes

empty term.

3.2.2. Single Objective Optimization Method. The fitness

function for EA optimization is defined based on BIC

(Bayesian Information Criterion) of the identified polyno-

mial model [12]

퐹퐼푇=푄 − 퐵퐼퐶

(11)

퐵퐼퐶=푁 ⋅ ln(1

푁

푁

∑

푡=1

[푦(푡) − Φ(푡)Θ]2) + 퐿 ⋅ ln(푁)

where the elements of Φ(푡) and Θ contain only the signif-

icant terms described in chromosome, 퐿 is the number of

significant terms in the model and 푄 is appropriate number.

3.2.3. Multi-objective Optimization Method. There are

two objectives to be minimized: mean square error of

one-step-ahead prediction, and the model size that form a

nonlinear polynomial model to describe the system under

study, which are conflicted with each other. In which

표푏푗1=

푁

∑

퐿

푡=1

[푦(푡) − Φ(푡)Θ]2

(12)

표푏푗2=

(13)

where Θ is the least squares estimate of Φ, which contains

only the selected significant terms in the model, and 퐿 is

the length of the model structure.

With the help of NSGA-II [14], all the non-dominated

solutions with different model size could be acquired. In

other words, what we get from optimization result is a set of

possible solutions, rather than only one solution, therefore it

increases the possibility to find the most well-approximated

model.

In order to select the best solution from a set of non-

dominated polynomial models with both parsimonious struc-

ture and low prediction error, validation data [2] is applied

to make sure the generalization in the simulation.

4. Numerical Simulations

To show the efficiency of the proposed two-step identi-

fication method, we consider the following two examples.

The examples are the same as those used in Ref. [10] [12],

and there is no other methods could solve this two ex-

amples directly because the so big original terms pool

expires our computational ability (CPU: Intel Core2 Duo

T9400(2.53GHz), RAM: 3G) by Matlab 7.6. We will com-

pare our results with those of Ref. [10] and the results

from OLS as well. In the second step in this research, the

specifications are given in Table 1.

Table 1. Parameter set for EAs

Example 1

300

60

10

2

0.7

0.3

Example 2

600

500

10

2

0.7

0.3

Maximum generation

Population

Chromosome length

Tournament size

Crossover possibility

Mutation possibility

4.1. Training Data

As the training data, 500 and 1000 input-output data sets

are sampled from both two systems respectively when the

system is excited using random input sequence. Figure 1

shows the first 300 set of training data for Example 2.

4.2. Nonlinear Polynomial System Identification

With Heavy Noise

Example 1: The system is governed by a polynomial

model, described by

푦(푡)=−0.5푦(푡 − 2) + 0.7푢(푡 − 1)푦(푡 − 1) + 0.6푢2(푡 − 2)

0.2푦3(푡 − 1) − 0.7푢2(푡 − 2)푦(푡 − 2) + 푒(푡)

+

where 푒(푡) ∈ (0,0.2) is white Gaussian noise with amplitude

between -1.2 and 1.2.

2009 World Congress on Nature & Biologically Inspired Computing (NaBIC 2009)

615

Page 4

0 50100 150

Time(t)

200 250300

−1

−0.5

0

0.5

1

y(t)

0 50 100150

Time(t)

200250 300

−1

−0.5

0

0.5

1

u(t)

Figure 1. Training data for Example 2.

Assumed the maximum input-output delays for identifica-

tion is 푛푢= 4, 푛푦= 4, and maximum order of polynomial

model is 5, the maximum number of the candidate terms is

1286.

As mentioned before, 500 input-output data set is sampled

for training, and according to the importance index calcu-

lated by (9), 400 important terms are selected. Table 2 shows

an image of the table formed with selected important terms,

sorted based on the importance index ℐ푖.

Table 2. Table formed with the selected important

terms based on ℐ푖for Example 1.

No.

1

2

...

8

9

...

399

400

Monomial Terms

푦(푡 − 2)

푦(푡 − 4)

...

푢(푡 − 2)2

푢(푡 − 1)푦(푡 − 1)

...

푦(푡 − 1)2푦(푡 − 3)푢(푡 − 3)2

푦(푡 − 1)2푦(푡 − 3)2푦(푡 − 4)

Order

1

1

...

2

2

...

5

5

ℐ푖

0.1932

0.0824

...

0.0408

0.0398

...

0.0014

0.0014

In the second step, single objective and multi-objective

optimization methods are used to identify the model struc-

ture. When multi-objective method is applied, in order to

decide the model size from a set of non-dominatedsolutions,

a new 200 validation data set is sampled, in which magnitude

of input data is between -1.2 and +1.2. The one with least

validation mean square error is selected and considered as

the right model structure. With comparison of the results

from OLS [4] and GA method [10], it is shown that all the

algorithms could get the correct model structure when there

is no noise in the input data, and the polynomial model we

obtained is

ˆ 푦(푡)=

−

0.5988푢2(푡 − 2) − 0.5039ˆ 푦(푡 − 2)

0.6780ˆ 푦(푡 − 2)푢(푡 − 2)2+ 0.1823ˆ 푦3(푡 − 1)

+0.7007ˆ 푦(푡 − 1)푢(푡 − 1).

(14)

However, when heavy noise is added, OLS method could

not find the right model structure although it works very fast.

Compare with GA method, TSO(two-step single objective

optimization) and TMO(two-step multi-objective optimiza-

tion) could find the correct model structure with much less

time cost. Experiment results are shown by Table 3.

Table 3. Experiment results and comparison for

Example 1 with heavy noise.

Algorithm

OLS

GA

TSO

TMO

Model structure

Wrong

Correct

Correct

Correct

Time(푠)

0.7662

> 600

6.5762

34.5774

Monte Carlo Test

∖

∖

49/100

99/100

The correct polynomial model in Table 3 is

ˆ 푦(푡)=

−

0.6922푢2(푡 − 2) − 0.4989ˆ 푦(푡 − 2)

0.6545ˆ 푦(푡 − 2)푢(푡 − 2)2+ 0.1914ˆ 푦3(푡 − 1)

+0.6486ˆ 푦(푡 − 1)푢(푡 − 1).

From Table 3, 100 times continuous trails are tested with

the same parameters in each Monte Carlo test for TSO and

TMO, the result shows that TSO works faster than TMO,

but only 49 times correct identification in TSO, in contrast

with 99 times in TMO.

4.3. Complicated Nonlinear System Identification

Example 2: The system is a nonlinear rational model

studied by Narendra in 1990

푦(푡) = 푓[푦(푡 − 1),푦(푡 − 2),푦(푡 − 3),푢(푡 − 1),푢(푡 − 2)]

where

푓[푥1,푥2,푥3,푥4,푥5] =푥1푥2푥3푥5(푥3− 1) + 푥4

1 + 푥2

2+ 푥2

3

where a white noise of 푒(푡) ∈ (0,0.1) is further added.

It has been believed that this is a very complex and

strongly nonlinear system, and few method could be used

to determine the nonlinear polynomial model directly and

efficiently. Assumed the maximum input-output delays for

approximating this system is 푛푢= 2, 푛푦= 3, and maximum

order of polynomial model is 8, the maximum number of

the candidate terms is 6434.

In the pre-screening step, 2000 terms are selected roughly

by the importanceindex. However, it is still too large for EAs

to search. Therefore, AP clustering based further selecting

method is applied. Because clustering is also a rough term

616

2009 World Congress on Nature & Biologically Inspired Computing (NaBIC 2009)

Page 5

selection method, 515 important terms are selected from

all the 111 automatical clustered classes. Then EAs are

applied to find the approximate model. In multi-objective

optimization method, 500 validation data is sampled, in

which the magnitude is between -1.2 and 1.2.

Table 4 illustrates the model structures acquired by four

different algorithms. In which, result of OLS1 is the method

Table 4. Experiment results of Example 2

Algorithm

OLS1

OLS2

GA

TSO

TMO

Model structure

12 20

1220

1220

12 20

1220

Order

7

5

4

5

5

Time(푠)

5.1100

903.0165

> 7200

969.1272

1579.3047

1

1

1

1

1

263

239

50

239

201

1596

302

239

302

399

using OLS from 2000 important term candidates directly. In

contrast, result of OLS2 is the one generated from candidates

after AP clustering, and the result is same with the one

generated by TSO. All the numbers in the model structure

represent the corresponding monomial terms illustrated in

Table 5, which were calculated according to the importance

index by (9). The rank of the importance index for each

monomial term in the corresponding class is also given

in Table 5 if clustering method is applied. For instance,

1/11 means the term ranks first in all the 11 terms of the

corresponding class, and 푛표푛푒 means that it is not generated

through clustering method. It is shown that each selected

monomial term is ranked in the top of the corresponding

class.

It is shown that result of OLS1 contains monomial term

with high order, therefore it is not considered as a good

model for application in real-world. In contrast, the identified

nonlinear polynomial model by GA is

푦(푡)=0.9062푢(푡− 1) − 0.2208푦2(푡 − 3)푢(푡 − 1)

−

+

0.4395푦(푡− 2)푢(푡 − 1)푢(푡 − 3)

0.0885푦(푡− 2)푢2(푡 − 2)

−0.5184푦(푡− 1)푦(푡 − 2)푦(푡 − 3)푢(푡 − 2).

The identified nonlinear polynomial model by TSO and

OLS2 is

푦(푡)=0.9085푢(푡− 1) − 0.2075푦2(푡 − 3)푢(푡 − 1)

−

−

0.4693푦(푡− 2)푢(푡 − 1)푢(푡 − 3)

0.4932푦(푡− 1)푦(푡 − 2)푦(푡 − 3)푢(푡 − 2)

0.1609푦(푡− 1)푢(푡 − 2)푢3(푡 − 3).

+

The identified nonlinear polynomial model by TMO is

푦(푡)=0.9151푢(푡− 1) − 0.2998푦2(푡 − 3)푢(푡 − 1)

−0.4218푦(푡− 2)푢(푡 − 1)푢(푡 − 3)

0.4855푦2(푡 − 1)푦(푡 − 3)푢(푡 − 3)

0.3563푦(푡− 1)푦2(푡 − 3)푢(푡 − 2)푢(푡 − 3).

−

+

Table 5. Monomial term for results of Example 2

No.

1

12

20

50

201

239

263

302

399

1596

Monomial terms

푢(푡 − 1)

푦(푡 − 2)푢(푡 − 1)푢(푡 − 3)

푦2(푡 − 3)푢(푡 − 1)

푦(푡 − 2)푢2(푡 − 2)

푦2(푡 − 1)푦(푡 − 3)푢(푡 − 3)

푦(푡 − 1)푦(푡 − 2)푦(푡 − 3)푢(푡 − 2)

푦(푡 − 1)푦(푡 − 3)푢(푡 − 2)푢(푡 − 3)

푦(푡 − 1)푢(푡 − 2)푢3(푡 − 3)

푦(푡 − 1)푦2(푡 − 3)푢(푡 − 2)푢(푡 − 3)

푦(푡 − 1)푦(푡 − 2)푢3(푡 − 2)푢2(푡 − 3)

Rank

1/11

2/36

1/28

none

3/21

4/21

none

5/22

6/30

none

Table 6. Results comparison of Example 2

OLS1

11.5621

inf

OLS2 and TSO

11.6198

0.5451

GATMO

11.3497

0.0951

training data

test data

11.9047

0.8711

To test the obtained polynomial models, a 800 input-

output data is sampled as test data, and the input data is

described as

{

0.8sin(2휋푡/250)+ 0.2sin(2휋푡/25)

푢(푡) =

sin(2휋푡/250)

if 푡 ≤ 500

otherwise.

(15)

Figure 2 shows the simulation of the obtained polynomial

models from four different algorithms on the test data. The

solid line is the system true output, and the dash line is

the polynomial model output. Surprisingly, we can find that

the polynomial model obtained from TMO represents the

system quite well.

In order to compared the identification results from four

methods, the one-step-ahead mean square error of training

data and full prediction mean square error of test data are

calculated and compared. From Table 6, we know that the

two step evolutionary identification method proposed in this

research approximate nonlinear system well, especially the

full prediction error of TMO algorithm on test data is much

less than other algorithms. Although GA method could also

generate a good model, as the author told us that it will cost

a terrible long time for the searching.

5. Discussions and Conclusions

In this paper, we propose a two-step identification method

for nonlinear polynomial model from a very large pool of

terms. In both two numerical examples, the famous OLS

algorithm could not cope with such huge candidate pool

directly. Even if the OLS could be used after the pre-

screening process, it hasn’t the ability to deal with data set

with heavy noise well because it is not a global optimization

method. Although GA method could find the global optimal

solutions, the time-cost for such a big searching space

is terrible. In contrast, because the two-step identification

method combines merits of correlation analysis and EA, we

2009 World Congress on Nature & Biologically Inspired Computing (NaBIC 2009)

617