A twostep method for nonlinear polynomial model identification based on evolutionary optimization
ABSTRACT A twostep identification method for nonlinear polynomial model using Evolutionary Algorithm (EA) is proposed in this paper, and the method has the ability to select a parsimonious structure from a very large pool of model terms. In a nonlinear polynomial model, the number of candidate monomial terms increases drastically as the order of polynomial model increases, and it is impossible to obtain the accurate model structure directly even with stateofart algorithms. The proposed method firstly carries out a prescreening process to select a reasonable number of important monomial terms based on the importance index. In the next step, EA is applied to determine a set of significant terms to be included in the polynomial model. In this way, the whole identification algorithm is implemented very efficiently. Numerical simulations are carried out to demonstrate the effectiveness of the proposed identification method.

Article: A two‐step scheme for polynomial NARX model identification based on MOEA with prescreening process
[Show abstract] [Hide abstract]
ABSTRACT: Polynomial NARX (nonlinear autoregressive with exogenous) model identification has received considerable attention in last three decades. However, in a highorder nonlinear system, it is very difficult to obtain the model structure directly even with stateofart algorithms, because the number of candidate monomial terms is huge and increases drastically as the model order increases. Motivated by this fact, in this research, the identification is performed in two steps: firstly a prescreening process is carried out to select a reasonable number of important monomial terms based on two kinds of the importance indices. Then, in the reduced searching space with only the selected important terms, multiobjective evolutionary algorithm (MOEA) is applied to determine a set of significant terms to be included in the polynomial model with the help of independent validation data. In this way, the whole identification algorithm is implemented efficiently. Numerical simulations are carried out to demonstrate the effectiveness of the proposed identification method. © 2011 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.IEEJ Transactions on Electrical and Electronic Engineering 04/2011; 6(3):253  259. · 0.34 Impact Factor
Page 1
A Twostep Method For Nonlinear Polynomial Model Identification Based on
Evolutionary Optimization
Yu CHENG, Lan WANG and Jinglu HU
Graduate School of Information, Production and Systems, Waseda University
Hibikino 27, Wakamatsuku, Kitakyushushi, Fukuoka, JAPAN
Email: chengyu0930@fuji.waseda.jp, wanglan@fuji.waseda.jp, jinglu@waseda.jp
Abstract
A twostep identification method for nonlinear polynomial
model using Evolutionary Algorithm (EA) is proposed in
this paper, and the method has the ability to select a
parsimonious structure from a very large pool of model
terms. In a nonlinear polynomial model, the number of
candidate monomial terms increases drastically as the order
of polynomial model increases, and it is impossible to obtain
the accurate model structure directly even with stateof
art algorithms. The proposed method firstly carries out a
prescreening process to select a reasonable number of
important monomial terms based on the importance index.
In the next step, EA is applied to determine a set of
significant terms to be included in the polynomial model. In
this way, the whole identification algorithm is implemented
very efficiently. Numerical simulations are carried out to
demonstrate the effectiveness of the proposed identification
method.
1. Introduction
System identification is concerned with building a model
from inputoutput observations of a system, creating a
mathematical description of the system [1]. In these years,
various NARX (nonlinear AutoRegressive with eXogenous
inputs) models have been reported in the literature [2]. As
a polynomial expansion of NARX models, nonlinear poly
nomial models have shown great potential in their ability in
approximating complex nonlinear inputoutput relationships
and have attracted much attention because it is linearin
parameter [3], [4], [6].
However, nonlinear polynomial model identification re
mains a difficult task, because a very large pool of model
terms has to be considered initially [4], [7], from which a
useful model is then generated based on the parsimonious
principle, of selecting the smallest possible model [8].
What’s more, the number of candidate terms grows dras
tically with increasing the order of the model and the
maximum delays of the input and output signals.
Some famous methods for identifying nonlinear poly
nomial model have been proposed in the last few years.
The Orthogonal LeastSquares method (OLS) is usually
considered as an effective approach [4], [5]. However, it has
been pointed out that the algorithm cannot guarantee that
the resultant model is globally optimized [9]. In addition,
it will become prohibitively expensive as its cost increases
super linearly with the number of candidates.
In recent years, Genetic Algorithm (GA) based methods
have been proposed extensively [1], [10], [11], and it can
effectively search many local optima and thereby increases
the possibility of finding the global optimum. However, GA
based approaches are so far containing all possible terms.
When the model order and the maximum delays of input
and output signals are large, it becomes timeconsuming and
easily trap into a local optimum.
Since a GA based approach is efficient when the searching
space is small, it is highly motivated to combine a GA
based approach with a prescreening process, in which a low
accuracy identification method is used to reduce searching
space [12]. Based on this consideration, a twostep method
for nonlinear polynomial model identification is proposed.
Firstly candidate pool is reduced in a low accuracy according
to the importance index of all the terms. If the candidate pool
is still too large, a further selection would be processed with
the help of AP clustering method [13] on the correlation
coefficient between each pairs of monomial terms, and
important terms in each cluster are selected. In the second
step, single and multiobjective GA are applied to determine
the nonlinear polynomial model. In order to differ from
method using GA directly, Evolutionary Algorithm (EA) is
used to represent both the single and multiobjective GA
applied in the second step.
The paper is organized as follows: Section 2 briefly
describes the background of problem to be solved. Section
3 discusses the twostep identification approach in detail.
Section 4 provides numerical simulations and section 5
presents the discussions and conclusions.
2. Background
Consider the following SISO NARX system whose input
output relation described by:
푦(푡) = 푔(휑(푡)) + 푒(푡)
(1)
613
9781424456123/09/$26.00 c ?2009 IEEE
Page 2
휑(푡)=
=
[푦(푡 − 1)⋅⋅⋅푦(푡 − 푛푦)푢(푡 − 1)⋅⋅⋅푢(푡 − 푛푢)]푇
[푥1(푡)⋅⋅⋅푥푛푦(푡)푥푛푦+1(푡)⋅⋅⋅푥푛(푡)]푇
(2)
where 푢(푡) ∈ 푅, 푦(푡) ∈ 푅, 푒(푡) ∈ 푅 are the system input,
the system output and a stochastic noise of zeromean at
time 푡 (푡 = 1,2,⋅⋅⋅), respectively. 푔(⋅) : 푅푛=푛푢+푛푦→ 푅
is an unknown continuous function (blackbox) describing
the dynamics of system under study, and 휑(푡) ∈ 푅푛is the
regression vector composed of delayed inputoutput data.
푛푢and 푛푦are unknown maximum delays of the input and
output, respectively(푛푢and 푛푦denote the order of system).
When identifying the system, we use the following non
linear polynomial model with an order of 푞
푦(푡)=푦0
+
푛
∑
푛
∑
푦0+ 푦1(푡) + ⋅⋅⋅ + 푦푖(푡) + ⋅⋅⋅ + 푦푀(푡) + 푒(푡) (4)
Φ(푡)Θ푇+ 푒(푡)
푖1=1
훼푖1푥푖1(푡) +
푛
∑
푖1=1
푛
∑
푖2=푖1
훼푖1푖2푥푖1(푡)푥푖2(푡) + ⋅⋅⋅
+
푖1=1
⋅⋅⋅
푛
∑
푖푞=푖푞−1
훼푖1⋅⋅⋅푖푞푥푖1(푡)⋅⋅⋅푥푖푞(푡) + 푒(푡)
(3)
=
=
(5)
where 푀 is the maximum number of the candidate terms,
푦푖(푡) is the 푖 th monomial terms, the vectors Θ and Φ(푡) are
defined as follows:
Θ = [푦0,훼푖1,훼푖1푖2,훼푖1⋅⋅⋅푖푞;⋅⋅⋅]푇
Φ(푡) = [1,푥푖1(푡),푥푖1(푡)푥푖2(푡),⋅⋅⋅ ,푥푖1(푡)⋅⋅⋅푥푖푞(푡);⋅⋅⋅]푇
푖1,푖2,⋅⋅⋅ ,푖푞= 1,⋅⋅⋅ ,푛.
Although the polynomial model (3) is linearinparameter,
it consists of a huge number of candidate monomial terms.
When the order 푞 of polynomial model and the dimension 푛
of the regression vector 휑(푡) increase, the maximum number
of the candidate terms increases drastically by
푀 =
푞
∑
푖=0
푛푖
(6)
where
푛푖=푛푖−1(푛푦+ 푛푢+ 푖 − 1)
푖
,푛0= 1.
(7)
Experience shows that provided the significant terms in the
model can be detected models with about 10 terms are
usually sufficient to capture the dynamics of highly non
linear SISO processes [4]. Most candidate terms are either
redundant or make very little contribution to the system
output and therefore can be removed from the model.
3. TwoStep Identification Scheme
3.1. Prescreening Step
In the prescreening step, we will select important terms
from a huge number of candidates according to the impor
tance index. Although it is a lowaccuracy method, it can
be make sure that the reduced candidate pool is big enough
that contains all the real terms.
3.1.1. Importance Index. Let 휌푖 denote the correlation
coefficient of the monomial term 푦푖(푡) and the system output
푦(푡), calculated by [12]
휌푖=
∑푁
푡=1(푦푖(푡) − 푦푖)(푦(푡) − 푦)
√∑푁
푡=1(푦푖(푡) − 푦푖)2∑푁
푡=1(푦(푡) − 푦)2
(8)
where 푦푖and 푦 are the mean of 푦푖(푡) and 푦(푡) respectively.
Needless to say, an important term should have larger value
of ∣휌푖∣, but a monomial term with larger ∣휌푖∣ does not
necessary to be an important term, because if 푦푖(푡) has larger
value of ∣휌푖∣, and the term 푦푗(푡) contains 푦푖(푡) as subset also
has larger value of ∣휌푖∣. So the importance index should be
given by
ℐ푖=
∣휌푖∣
푒푂푟푖
(9)
where 푂푟푖is the order of the 푖th term. A reasonable number
of terms could be selected according to the importance
index.
3.1.2. Further Selecting Using AP Clustering Method.
In some cases, candidate pool selected according to the
importance index is still very large because the correlation
coefficient of monomial terms and system output could
not reflect the term importance accurately. For instance, in
example 2 of the simulation part, the term size selected
from 6434 candidates according to importance index is still
about two thousand. It is still too large for EAs to optimize
directly in an efficient way. Based on this conversation, it
is motivated to reduce the candidate pool size according to
relationship between each pair of terms. Similar terms will
be clustered, and only important ones in each cluster will
be selected.
Affinity Propagation (AP) [13] is a recently introduced
algorithm for exemplarbased clustering. The goal of the
algorithm is to find good partitions of data and associate each
partition with its most prototypical data point (exemplar)
such that the similarity between points to their exemplar is
maximized, and the overall cost associated with making a
point an exemplar is minimized. Similarity between given
candidate term pair 푖 and 푗 is defined as
푠(푖,푗) =
∑푁
푡=1(푦푖(푡) − 푦푖)(푦푗(푡) − 푦푗)
√∑푁
푡=1(푦푖(푡) − 푦푖)2∑푁
푡=1(푦푗(푡) − 푦푗)2
(10)
614
2009 World Congress on Nature & Biologically Inspired Computing (NaBIC 2009)
Page 3
where 푦푖and 푦푗are the mean of 푦푖(푡) and 푦푗(푡) respectively.
According to the importance index proposed before, a rea
sonable part of important terms in each cluster are selected,
and the candidate pool is retrenched effectively.
3.2. EA Based Identification Step
In the second step, EAs are applied in the searching
space containing only the important terms selected according
to the importance index, to determine the model structure.
Methods of single objective optimization and multiobjective
optimization will be implemented.
3.2.1. Coding For Nonlinear Polynomial Model Identifi
cation. In the case of applying GA directly to the searching
space containing a huge number of candidates, much effort
and computing cost for encoding has to be done in order
to improve the efficiency because of the large searching
space [10], [11]. However, in our case the searching space
containing only the selected important terms is rather small,
a simple realvalued encoding scheme may be used. Since
each selected important monomial term has assigned with
a number (No.), this number is used directly as gene of
chromosome. For example, a chromosome with fixed length
of 8, defined by
00012345
It represents a polynomial model consisting of monomial
terms No.1, No.2, No.3, No.4 and No.5, where ”0” denotes
empty term.
3.2.2. Single Objective Optimization Method. The fitness
function for EA optimization is defined based on BIC
(Bayesian Information Criterion) of the identified polyno
mial model [12]
퐹퐼푇=푄 − 퐵퐼퐶
(11)
퐵퐼퐶=푁 ⋅ ln(1
푁
푁
∑
푡=1
[푦(푡) − Φ(푡)Θ]2) + 퐿 ⋅ ln(푁)
where the elements of Φ(푡) and Θ contain only the signif
icant terms described in chromosome, 퐿 is the number of
significant terms in the model and 푄 is appropriate number.
3.2.3. Multiobjective Optimization Method. There are
two objectives to be minimized: mean square error of
onestepahead prediction, and the model size that form a
nonlinear polynomial model to describe the system under
study, which are conflicted with each other. In which
표푏푗1=
푁
∑
퐿
푡=1
[푦(푡) − Φ(푡)Θ]2
(12)
표푏푗2=
(13)
where Θ is the least squares estimate of Φ, which contains
only the selected significant terms in the model, and 퐿 is
the length of the model structure.
With the help of NSGAII [14], all the nondominated
solutions with different model size could be acquired. In
other words, what we get from optimization result is a set of
possible solutions, rather than only one solution, therefore it
increases the possibility to find the most wellapproximated
model.
In order to select the best solution from a set of non
dominated polynomial models with both parsimonious struc
ture and low prediction error, validation data [2] is applied
to make sure the generalization in the simulation.
4. Numerical Simulations
To show the efficiency of the proposed twostep identi
fication method, we consider the following two examples.
The examples are the same as those used in Ref. [10] [12],
and there is no other methods could solve this two ex
amples directly because the so big original terms pool
expires our computational ability (CPU: Intel Core2 Duo
T9400(2.53GHz), RAM: 3G) by Matlab 7.6. We will com
pare our results with those of Ref. [10] and the results
from OLS as well. In the second step in this research, the
specifications are given in Table 1.
Table 1. Parameter set for EAs
Example 1
300
60
10
2
0.7
0.3
Example 2
600
500
10
2
0.7
0.3
Maximum generation
Population
Chromosome length
Tournament size
Crossover possibility
Mutation possibility
4.1. Training Data
As the training data, 500 and 1000 inputoutput data sets
are sampled from both two systems respectively when the
system is excited using random input sequence. Figure 1
shows the first 300 set of training data for Example 2.
4.2. Nonlinear Polynomial System Identification
With Heavy Noise
Example 1: The system is governed by a polynomial
model, described by
푦(푡)=−0.5푦(푡 − 2) + 0.7푢(푡 − 1)푦(푡 − 1) + 0.6푢2(푡 − 2)
0.2푦3(푡 − 1) − 0.7푢2(푡 − 2)푦(푡 − 2) + 푒(푡)
+
where 푒(푡) ∈ (0,0.2) is white Gaussian noise with amplitude
between 1.2 and 1.2.
2009 World Congress on Nature & Biologically Inspired Computing (NaBIC 2009)
615
Page 4
050100150
Time(t)
200 250 300
−1
−0.5
0
0.5
1
y(t)
0 50 100150
Time(t)
200250300
−1
−0.5
0
0.5
1
u(t)
Figure 1. Training data for Example 2.
Assumed the maximum inputoutput delays for identifica
tion is 푛푢= 4, 푛푦= 4, and maximum order of polynomial
model is 5, the maximum number of the candidate terms is
1286.
As mentioned before, 500 inputoutput data set is sampled
for training, and according to the importance index calcu
lated by (9), 400 important terms are selected. Table 2 shows
an image of the table formed with selected important terms,
sorted based on the importance index ℐ푖.
Table 2. Table formed with the selected important
terms based on ℐ푖for Example 1.
No.
1
2
...
8
9
...
399
400
Monomial Terms
푦(푡 − 2)
푦(푡 − 4)
...
푢(푡 − 2)2
푢(푡 − 1)푦(푡 − 1)
...
푦(푡 − 1)2푦(푡 − 3)푢(푡 − 3)2
푦(푡 − 1)2푦(푡 − 3)2푦(푡 − 4)
Order
1
1
...
2
2
...
5
5
ℐ푖
0.1932
0.0824
...
0.0408
0.0398
...
0.0014
0.0014
In the second step, single objective and multiobjective
optimization methods are used to identify the model struc
ture. When multiobjective method is applied, in order to
decide the model size from a set of nondominatedsolutions,
a new 200 validation data set is sampled, in which magnitude
of input data is between 1.2 and +1.2. The one with least
validation mean square error is selected and considered as
the right model structure. With comparison of the results
from OLS [4] and GA method [10], it is shown that all the
algorithms could get the correct model structure when there
is no noise in the input data, and the polynomial model we
obtained is
ˆ 푦(푡)=
−
0.5988푢2(푡 − 2) − 0.5039ˆ 푦(푡 − 2)
0.6780ˆ 푦(푡 − 2)푢(푡 − 2)2+ 0.1823ˆ 푦3(푡 − 1)
+0.7007ˆ 푦(푡 − 1)푢(푡 − 1).
(14)
However, when heavy noise is added, OLS method could
not find the right model structure although it works very fast.
Compare with GA method, TSO(twostep single objective
optimization) and TMO(twostep multiobjective optimiza
tion) could find the correct model structure with much less
time cost. Experiment results are shown by Table 3.
Table 3. Experiment results and comparison for
Example 1 with heavy noise.
Algorithm
OLS
GA
TSO
TMO
Model structure
Wrong
Correct
Correct
Correct
Time(푠)
0.7662
> 600
6.5762
34.5774
Monte Carlo Test
∖
∖
49/100
99/100
The correct polynomial model in Table 3 is
ˆ 푦(푡)=
−
0.6922푢2(푡 − 2) − 0.4989ˆ 푦(푡 − 2)
0.6545ˆ 푦(푡 − 2)푢(푡 − 2)2+ 0.1914ˆ 푦3(푡 − 1)
+0.6486ˆ 푦(푡 − 1)푢(푡 − 1).
From Table 3, 100 times continuous trails are tested with
the same parameters in each Monte Carlo test for TSO and
TMO, the result shows that TSO works faster than TMO,
but only 49 times correct identification in TSO, in contrast
with 99 times in TMO.
4.3. Complicated Nonlinear System Identification
Example 2: The system is a nonlinear rational model
studied by Narendra in 1990
푦(푡) = 푓[푦(푡 − 1),푦(푡 − 2),푦(푡 − 3),푢(푡 − 1),푢(푡 − 2)]
where
푓[푥1,푥2,푥3,푥4,푥5] =푥1푥2푥3푥5(푥3− 1) + 푥4
1 + 푥2
2+ 푥2
3
where a white noise of 푒(푡) ∈ (0,0.1) is further added.
It has been believed that this is a very complex and
strongly nonlinear system, and few method could be used
to determine the nonlinear polynomial model directly and
efficiently. Assumed the maximum inputoutput delays for
approximating this system is 푛푢= 2, 푛푦= 3, and maximum
order of polynomial model is 8, the maximum number of
the candidate terms is 6434.
In the prescreening step, 2000 terms are selected roughly
by the importanceindex. However, it is still too large for EAs
to search. Therefore, AP clustering based further selecting
method is applied. Because clustering is also a rough term
616
2009 World Congress on Nature & Biologically Inspired Computing (NaBIC 2009)
Page 5
selection method, 515 important terms are selected from
all the 111 automatical clustered classes. Then EAs are
applied to find the approximate model. In multiobjective
optimization method, 500 validation data is sampled, in
which the magnitude is between 1.2 and 1.2.
Table 4 illustrates the model structures acquired by four
different algorithms. In which, result of OLS1 is the method
Table 4. Experiment results of Example 2
Algorithm
OLS1
OLS2
GA
TSO
TMO
Model structure
12 20
1220
1220
1220
1220
Order
7
5
4
5
5
Time(푠)
5.1100
903.0165
> 7200
969.1272
1579.3047
1
1
1
1
1
263
239
50
239
201
1596
302
239
302
399
using OLS from 2000 important term candidates directly. In
contrast, result of OLS2 is the one generated from candidates
after AP clustering, and the result is same with the one
generated by TSO. All the numbers in the model structure
represent the corresponding monomial terms illustrated in
Table 5, which were calculated according to the importance
index by (9). The rank of the importance index for each
monomial term in the corresponding class is also given
in Table 5 if clustering method is applied. For instance,
1/11 means the term ranks first in all the 11 terms of the
corresponding class, and 푛표푛푒 means that it is not generated
through clustering method. It is shown that each selected
monomial term is ranked in the top of the corresponding
class.
It is shown that result of OLS1 contains monomial term
with high order, therefore it is not considered as a good
model for application in realworld. In contrast, the identified
nonlinear polynomial model by GA is
푦(푡)=0.9062푢(푡− 1) − 0.2208푦2(푡 − 3)푢(푡 − 1)
−
+
0.4395푦(푡− 2)푢(푡 − 1)푢(푡 − 3)
0.0885푦(푡− 2)푢2(푡 − 2)
−0.5184푦(푡− 1)푦(푡 − 2)푦(푡 − 3)푢(푡 − 2).
The identified nonlinear polynomial model by TSO and
OLS2 is
푦(푡)=0.9085푢(푡− 1) − 0.2075푦2(푡 − 3)푢(푡 − 1)
−
−
0.4693푦(푡− 2)푢(푡 − 1)푢(푡 − 3)
0.4932푦(푡− 1)푦(푡 − 2)푦(푡 − 3)푢(푡 − 2)
0.1609푦(푡− 1)푢(푡 − 2)푢3(푡 − 3).
+
The identified nonlinear polynomial model by TMO is
푦(푡)=0.9151푢(푡− 1) − 0.2998푦2(푡 − 3)푢(푡 − 1)
−0.4218푦(푡− 2)푢(푡 − 1)푢(푡 − 3)
0.4855푦2(푡 − 1)푦(푡 − 3)푢(푡 − 3)
0.3563푦(푡− 1)푦2(푡 − 3)푢(푡 − 2)푢(푡 − 3).
−
+
Table 5. Monomial term for results of Example 2
No.
1
12
20
50
201
239
263
302
399
1596
Monomial terms
푢(푡 − 1)
푦(푡 − 2)푢(푡 − 1)푢(푡 − 3)
푦2(푡 − 3)푢(푡 − 1)
푦(푡 − 2)푢2(푡 − 2)
푦2(푡 − 1)푦(푡 − 3)푢(푡 − 3)
푦(푡 − 1)푦(푡 − 2)푦(푡 − 3)푢(푡 − 2)
푦(푡 − 1)푦(푡 − 3)푢(푡 − 2)푢(푡 − 3)
푦(푡 − 1)푢(푡 − 2)푢3(푡 − 3)
푦(푡 − 1)푦2(푡 − 3)푢(푡 − 2)푢(푡 − 3)
푦(푡 − 1)푦(푡 − 2)푢3(푡 − 2)푢2(푡 − 3)
Rank
1/11
2/36
1/28
none
3/21
4/21
none
5/22
6/30
none
Table 6. Results comparison of Example 2
OLS1
11.5621
inf
OLS2 and TSO
11.6198
0.5451
GATMO
11.3497
0.0951
training data
test data
11.9047
0.8711
To test the obtained polynomial models, a 800 input
output data is sampled as test data, and the input data is
described as
{
0.8sin(2휋푡/250)+ 0.2sin(2휋푡/25)
푢(푡) =
sin(2휋푡/250)
if 푡 ≤ 500
otherwise.
(15)
Figure 2 shows the simulation of the obtained polynomial
models from four different algorithms on the test data. The
solid line is the system true output, and the dash line is
the polynomial model output. Surprisingly, we can find that
the polynomial model obtained from TMO represents the
system quite well.
In order to compared the identification results from four
methods, the onestepahead mean square error of training
data and full prediction mean square error of test data are
calculated and compared. From Table 6, we know that the
two step evolutionary identification method proposed in this
research approximate nonlinear system well, especially the
full prediction error of TMO algorithm on test data is much
less than other algorithms. Although GA method could also
generate a good model, as the author told us that it will cost
a terrible long time for the searching.
5. Discussions and Conclusions
In this paper, we propose a twostep identification method
for nonlinear polynomial model from a very large pool of
terms. In both two numerical examples, the famous OLS
algorithm could not cope with such huge candidate pool
directly. Even if the OLS could be used after the pre
screening process, it hasn’t the ability to deal with data set
with heavy noise well because it is not a global optimization
method. Although GA method could find the global optimal
solutions, the timecost for such a big searching space
is terrible. In contrast, because the twostep identification
method combines merits of correlation analysis and EA, we
2009 World Congress on Nature & Biologically Inspired Computing (NaBIC 2009)
617