Content uploaded by K. Salahshoor
Author content
All content in this area was uploaded by K. Salahshoor on Sep 07, 2014
Content may be subject to copyright.
Non uniform noisy data training using Wavelet neural network
based on sampling theory
EHSAN HOSSAINI ASL, MEHDI SHAHBAZIAN, KARIM SALAHSHOOR
Department of Automation and Instrumentation
Petroleum University of technology
South khosro, Sattarkhan Street, Tehran, Iran
IRAN
e.hossaini.asl@gmail.com, shahbazian_m@yahoo.com , salahshoor@yahoo.com
Abstract: - Global convergence and overfitting are the main problem in neural network training. One of
the new methods to overcome these problems is sampling theory that is applied in training of wavelet
neural network. In this paper this new method is improved for training of wavelet neural network in non
uniform and noisy data. The improvements include suggesting a method for finding the appropriate
feedback matrix, addition of early stopping and wavelet thresholding to training procedure. Two
experiments are conducted for one and two dimensional function. The results establish a satisfied
performance of this algorithm in reduction of generalization error, reduction the complexity of wavelet
neural network and mainly avoiding overfitting.
Key-Words: wavelet neural network; sampling theory; overfitting; early stopping; feedback matrix;
wavelet thresholding; non uniform data
1 Introduction
Neural networks are proven to be a powerful
tool for modeling nonlinear systems using
numerical data [19,20]. The generalization
capability is an ultimate criterion for measuring
the validity of identified models. Overfitting is
one of the most important problems in neural
network learning. Complexity of model and
noise contents in training data are two major
sources of this problem.
Wavelet neural networks which use wavelets
as basis function are found to have various
interesting properties including fast training and
good generalization performance [2]. Various
methods have been proposed for structure
selection and training of wavelet neural
networks [1,2]. Recently, training wavelet neural
network based on sampling theory has been
proposed by Zhang [1]. This new algorithm is
based on the limited band of wavelet networks,
in which the input weights are determined by the
sampling period or the frequency band of the
target function (if available). This approach has
been shown to have global convergence and
avoid overfitting for non-noisy equi-spaced
samples.
In many practical situations, a finite number
of samples of the target function are known and
there is not a priori information about the
frequency contents and frequency band of the
target function. Without using information about
target signal (or noise), relying on sampling
period for finding the input weights of the neural
network may results in a complex structure and
serious overfitting. To overcome this problem, a
suitable model selection approach for
complexity control and preventing overfitting
should be used.
In case of noisy data, wavelet thresholding
and early stopping are two helpful techniques for
suppressing the overfitting by preventing the
noise to be trained in the wavelet neural
network. In wavelet thresholding technique,
various methods are presented by donoho and
silverman for denoising of uniform data [6,7]
and non uniform data [8,9,10,11,21] in case of
white and colored noise. In the wavelet
thresholding techniques, the wavelet basis
functions with coefficients smaller than
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1381
Issue 12, Volume 7, December 2008
specified thresholds will be eliminated because
they essentially represent noise. This approach
has been proven to be a powerful method in
wavelet domain denoising. In the early stopping
technique, which is a general approach in neural
network modeling, the training data are divided
into several sets for training of networks and
validation of generalization capability. In this
technique, before achieving a minimum training
error, the training course is stopped at certain
iteration. The stopping iteration is decided by
cross validation.
Using only the sampling period for
determination of the input weights of a wavelet
neural network may results in a very large
number of basis functions and overfitting. In this
article, we have shown that using wavelet
thresholding and or early stopping in
conjunction with the sampling information of
the given data will results in a less complex
network with better noise removal.
This article is divided into four sections.
Following this introduction, section 2 briefly
reviews the theory of wavelet networks and its
training based on the sampling theory. In section
3, new approaches developed for improvement
of the wavelet neural network training based on
sampling theory are presented. This section
includes two parts. The first part explains a new
method for constructing the appropriate
feedback matrix and the second part explains
developments in the training wavelet network. In
section 4 simulation results for both one and
two-dimensional target functions are presented.
2 Wavelet network and sampling
theory
2.1 Review of wavelet neural network
In neural network learning, in order to take the
full advantage of orthonormality of basis
function, with localized learning, we need a set
of basis functions which are local and
orthogonal. Wavelets are new family of
localized basis functions that have found many
applications in large areas of science and
engineering [2,3]. Wavelets are universal
approximator which can be used to approximate
any arbitrary multidimensional nonlinear
function. They have many powerful
mathematical properties such as orthonormality,
locality in time and frequency domains, different
degrees of smoothness, fast implementations,
and effective compact support.
Wavelets are usually introduced in a
multiresolution framework developed by Mallat
[3]. We focus on the wavelet networks
constructed from a multiresolution analysis
(MRA) [3]. Consider a function ݂(ݔ) in ܮଶ(ℝ),
where ܮଶ(ℝ) denotes the vector space of all
measurable, square integrable one dimensional
functions. In addition, assume ܸ be the vector
space containing all possible approximations of
݂(ݔ) at the resolution ݉. Then, the ladder of
spaces ܸ , ݆߳ℤ represents the successive
resolution levels for ݂(ݔ). The properties of
these spaces are as follows:
1.
(Nested) ܸ⊆ܸାଵ , ∀݆∈ ℤ (1)
2.
݂ሺݔሻ∈ܸ ⇔݂ሺݔ−݇ሻ∈ܸ , ∀(݆,݇)∈ ℤଶ (2)
3.
(Density) ݈ܿݏ݁మ൫ڂܸ∈ℤ ൯=ܮଶ(ℝ) (3)
4.
(Separation) ځܸ∈ℤ =ሼ0ሽ (4)
5.
(Scaling) the function ݂(ݔ) belongs to ܸ if and
only if the function ݂(2ିݔ) belongs to ܸ (5)
6.
(Basis) There exists a function ߶∈ܮଶ (called a
scaling function or a father wavelet), with
߶,= 2 ଶ
Τ߶(2ݔ−݇), such that ൛߶,;݇∈ ℤൟ
is a basis for ܸ. (6)
The function ߶ is called a scaling function of
the multiresolution analysis (MRA). A family of
scaling functions of the MRA is expressed as:
߶,ሺݔሻ= 2 ଶ
Τ߶൫2ݔ−݇൯ , ݆,݇∈ ℤ (7)
Where 2 and k correspond to the dilation and
translation factors of the scaling function
respectively while 2/ଶ is an energy
normalisation factor.
Let ܹ be the orthogonal complement of ܸ to
ܸାଵ (ܸ⨁ܹ=ܸାଵ). Then the orthonormal
basis functions corresponding to ܹ’s named
wavelets and denoted by ߰,’s can be easily
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1382
Issue 12, Volume 7, December 2008
obtained from ߶,’s[3]. A family of wavelets
may be represented as:
߰,ሺݔሻ= 2 ଶ
Τ߰൫2ݔ−݇൯ , ݆,݇∈ ℤ (8)
with 2,k and 2/ଶ being the dilation,
translation, and normalisation factor of the
wavelets, respectively. Next ܮଶ(ℝ) can be
expressed as:
ܮଶሺℝሻ=⋃∈ℤܸ=⋯ܹିଵ⨁ܹ⨁ܹଵ…=
⨁∈ℤܹ (9)
Where ܹ⊥ܹ for ݆≠݉.
Fig.1 illustrates the relation between ܸ and
ܹ spaces in MRA:
ܸାଵ =ܸ⨁ܹ
ܮଶ(ℝ)
Fig.1 Embedded spacesܸ’s for multiresolution
representation of ܮଶ(ℝ)
Equation (9) indicates that the wavelet basis
generates decomposition of the ܮଶ space. It
shows that any ܮଶ function is uniformly
approximated using a wavelet series:
݂ሺݔሻ=σ σ ݀,߰,(ݔ)
ୀାஶ
ୀିஶ
ୀାஶ
ୀିஶ (10)
If we start from the approximation of the
function at resolution j=0, then:
݂ሺݔሻ=݂ሺݔሻ+σ σ ݀,߰୨,୩(x)
ୀାஶ
ୀିஶ
ୀାஶ
ୀ (11)
Where
݂ሺݔሻ=σܽ,߶,(ݔ)
ୀାஶ
ୀିஶ (12)
We can conclude that any function݂(ݔ)∈ܮଶ can
be written as a unique linear combination of
wavelets of different resolutions. This means
that ሺݔሻ=⋯+݃ିଵሺݔሻ+݃ሺݔሻ+݃ଵሺݔሻ+⋯ ,
where ݃(ݔ)∈ܹ is unique. Since ܸ=ܹ+
ܹିଵ+⋯ and spaces ܸ can be generated by the
scaling function ߶(ݔ)∈ܮଶ, there exists
݂ሺݔሻ=σܿ,߶൫2ݔ−݇൯=
ஶ
ୀିஶ
σܿ,߶,
ஶ
ୀିஶ (13)
Such that ԡ݂ሺݔሻ−݂(ݔ)ԡ⟶0 when j→∞. In
fact formula (13) is just the presentation of
wavelet networks with three layers. In an impact
interval of interest, formula (13) can be written
ass:
݂ሺݔሻ=σܿ,߶,(ݔ)
ூభ
ୀூబ (14)
Where ߶,=߶(2ݔ−݇). A wavelet network
is realized by taking ܿ,’s as the output weights,
2’s as the input weights and ߶(ݔ−݇) as the
activation function.
Variety of approaches have been proposed for
determining wavelet network parameters such as
input weights 2 and also output weights c୨,୩’s.
Here we use the approach based on sampling
theory proposed by Zhang[1] for specifying
appropriate resolution j.
2.2 Sampling theory
Since we use the sampling theory for the
training of wavelet network, we briefly
introduce some aspects of the sampling theory.
For more discussions, we refer the reader to
[17,18]. An analog signal can be simply
discretized by recording its sample
valuesሼ݂(݊ܶ)ሽ∈ℤ at interval T. An
approximation of f(x) at any ݔ∈ ℝ may be
recovered by interpolating these samples.
If the samples ݔare taken in constant T
period, then the target function is represented as:
݂ሺ݊ܶሻ=݂ሺ݊ܶሻߜ(ݔ−݊ܶ) (15)
The Fourier transform of the discrete signal
obtained by sampling f at intervals T is:
݂መௗሺݓሻ=ଵ
்σ݂መ(ݓ−ଶగ
்)
ାஶ
ୀିஶ (16)
If the support of ݂መ is included in ሾ−ߨ ܶ
Τ,ߨ ܶ
Τሿ
then
݂ሺݔሻ=σ݂ሺ݇ܶሻ((ݔ−݇ܶ)ܶ
Τ)
ାஶ
ୀିஶ (17)
On the other hand, the frequency band of
wavelet network that described in previous
section is obtained as follows:
ห݂ሚ(ݓ)หଶ݀ݓ
ஶ
ିஶ ≤ห݂ሚ(ݓ)หଶ݀ݓ
ଶೕೢ
ିଶೕೢ+
2ିߝσหܿ,หଶ
(18)
So the energy of wavelet network is
concentrated well in the following frequency
band:
ൣ−2ܾ௪, 2ܾ௪൧ (19)
The parameter ܾ௪ only depends on scaling
function. Formula (19) means that the frequency
band of wavelet network can be controlled by
input weights.
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1383
Issue 12, Volume 7, December 2008
Suppose ሼ݇ܶ,݂௦(݇ܶ)ሽ’s are training data with
σȁ݂௦(݇ܶ)ȁଶ< +∞
then, by the sampling
theorem, there exists a unique function ݂(ݔ) to
interpolate all training data in the Paley–Wiener
space ்ܲ. On the other hand, a wavelet network
is a function in ܮଶ(ℝ) so a wavelet network
represents a function in ்ܲ if its Fourier
transform has a support included in
ሾ−ߨ ܶ
Τ,ߨ ܶ
Τሿ.
This means that the network ݂(ݔ)whose
Fourier transform has a support in ሾ−ߨ ܶ
Τ,ߨ ܶ
Τሿ
is complex enough to recover a band-limited
function. Since the regularity of function is
related to the asymptotic decay of Fourier
transform, a band-limited function is always
‘‘smoother’’ than the other functions. This
means that the training results that are obtained
when the frequency band of wavelet network is
limited in the interval ሾ−ߨ ܶ
Τ,ߨ ܶ
Τሿ are always
more regular than the results that are obtained
when the band is limited in the other intervals,
so we limit the frequency band of wavelet
network to ሾ−ߨ ܶ
Τ,ߨ ܶ
Τሿ in our new algorithm.
According to (18), the input weights can be
calculated using following formula:
2=ߨ(ܾ௪×ܶ)
Τ (20)
This formula is the consequence of a
distinguishing property of wavelet function that
its energy is well localized in frequency domain.
For constructing the structure of wavelet
network, the property of energy concentration of
wavelet in time domain should be employed. In
wavelet network, the ݇௧ node has the following
input-output function:
ܵ௨௧=߶(2ܵ−݇) (21)
Where ܵ is the input, 2’s are the input
weights, k is the ݇௧ threshold and ߶(∘) is the
scaling function. If the support of scaling
function is limited to ൣ0, ܰథ൧, then the ݇௧ node
of network has the following support:
ൣ2ି݇, 2ି(ܰథ+݇)൧ (22)
Assume the domain of interest for estimation
of function is the interval [a,b], then the
translations are found as follows:
2ܽ−ܰథ≤݇≤2ܾ (23)
Many methods have been proposed for training
output weights of wavelet neural network based
on minimizing error function:
ܬሺ݂௦,݂ሻ=σ ȁ݂௦ሺݔሻ−݂(ݔ)ȁଶ
ே
ୀଵ (24)
Whereሼݔ,݂௦(ݔ)ሽ’s are samples and݂(ݔ)’s
are output of approximator. Without any
additive term, this cost function is widely used
in the training of networks because of
convenient implementation.
Three commonly used methods are direct
solution method, iterative method and inner
product method. In this paper, the iterative
method is employed for training the output
weights. In this method the output weights can
be calculated as follows:
ܧ(ାଵ)=ܨ௦−߶×ܥ(ାଵ) (25)
ܥ(ାଵ)=ܥ()+ܣܧ() (26)
The column vector ܧ() denotes the error of
interpolation by the wavelet network at ݇௧
iteration, the column vector ܥ() represents the
output weights at ݇௧ iteration and the matrix ܣ
is the feedback matrix. The values of elements in
the feedback matrix indicate that how much the
errors in each data point would affect on output
weights. The ߶× matrix is
߶×=
߶,ூబ(ݔଵ)⋯߶,(ݔଵ)⋯߶,ூభ(ݔ)
⋮ ⋯ ⋯ ⋮
߶,ூబ(ݔ)⋯߶,(ݔ)⋯߶,ூభ(ݔ)
(27)
Where the subscript ܫ=ൣ2ܽ−ܰథ൧ and the
subscript ܫଵ=ሾ2ܾሿ, which denote respectively,
the minimum and maximum of translation ݇
obtained from (23).
The structure of wavelet network with two sub
wavelet network is depicted in Fig.2:
Fig.2 structure of wavelet neural network with two sub
network
Zhang [1] has shown that we always can find a
feedback matrix that causes the iterative course
to converge to fix point. However there is not a
unique method for constructing the feedback
matrix. In the next section, an intuitive method
for finding this matrix is proposed.
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1384
Issue 12, Volume 7, December 2008
0 10 20 30 40 50 60 70 80 90 100
0
0.2
0.4
MSE
K1=0.4 , K2=0.1 , a=0.3 , b=0.01 , c=0.001
0 10 20 30 40 50 60 70 80 90 100
0
5
10
MSE
K1=0.3 , K2=0.1 , a=0.3 , b=0.01 , c=0.001
0 10 20 30 40 50 60 70 80 90 100
0
5
10 x 1027
MSE
K1=0.4 , K2=0.1 , a=0.4 , b=0.01 , c=0.001
0 10 20 30 40 50 60 70 80 90 100
0
100
200
MSE
No. of iterations
K1=0.3 , a=0.3 , b=0.01 , c=0.001
3 Modifications in the training of
wavelet network
We propose three modifications for
improvement in training of wavelet network that
will be discussed in the following sections.
3.1 Determination of appropriate
feedback matrix
In this part, an intuitive approach for finding the
appropriate feedback matrix is proposed. In this
method the feedback matrix is constructed based
on the ߶ matrix. This method uses the receptive
field of each node or scaling function in wavelet
network. It means that each output weight is
trained only using the training data that lie in the
receptive field of its node. However, all of these
training data should not have the same effect in
training of output weight. It means that the
training data that the scaling function has a
larger value in its location have a stronger effect
in training of output weight.
On the other hand, the elements of feedback
matrix represent the effect of each training data
in training of output weight in each node. Since
the values of ߶ matrix represents the amplitude
of scaling function in each data point, the
feedback matrix should be constructed based on
߶ matrix. We use the forth order cardinal scaling
function as activation function that shown in
Fig.3 as follows:
Fig.3 Fourth order cardinal scaling function at scale=1
According to Fig.3, the domain that the
scaling function has larger amplitude is in the
vicinity of zero. In other words, the training data
that lie in this vicinity have intensive effect on
training of the correspondent output weight.
Therefore we can define different levels of effect
for training data in each node. Therefore, the
procedure for finding the appropriate feedback
matrix is stated as follow:
1. Generate raw feedback matrix: ܣ=߶்
2. Define levels of effect by partitioning the
amplitude of scaling function by levels
ܭ1 & ܭ2 & … as depicted in Fig.1 (about 2 or 3
levels is appropriate.)
3. Assign the values of a, b and c that represent
the values of effect for each level. These values
can be calculated by trial and error.
Performance of the algorithm is indicated by
training a sinusoidal function. In this example,
different groups of levels of effect and values of
effects are applied in training course. The
convergence of four training courses is
compared together. The results are shown
below:
Fig.4 Comparison of MSE for different feedback matrix
In Fig.4, mean square error (MSE) versus
Number of iteration is depicted. In comparison
between 4 figures, it’s concluded that by
appropriate assignment of levels of effect and
values of effect, the MSE converges to zero by
increasing of iteration.
3.2 Improvement in algorithm for
training non uniform data
For uniform sampled data, training wavelet
neural network based on sampling theory shows
-3 -2 -1 0 1 2 3
0
0.1
0.2
0.3
0.4
0.5
0.6 Value of effect a
Value of effect b
Value of effect c
Level of effect K1
Level of effect K2
Level of effect K3
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1385
Issue 12, Volume 7, December 2008
0 50 100
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Original signal
0 50 100
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Hard thresholded signal
0 50 100
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Soft thresholded signal
quite acceptable results. However, for non
uniform data, the algorithm encounters severe
problems such as high overfitting error and
deviation of estimated function from the actual
target function. Here we use some available
techniques to overcome this problem. These
techniques are described in detail in the
following sections.
3.2.1 Optimum number for sub wavelet
networks
For training of non uniform data, the domain of
interest for estimation is divided in some clusters
that the sampling rate is approximately uniform
in each cluster. The formula that is proposed by
Zhang in [1] is described as follows:
ܦ=ݏݑ
݇∈ ℤȁݔ−݇ܶȁ< 0.25 × ܶ (28)
In this formula, ݔ denotes the training data at
݇௧ point, ܶ denotes the approximate sampling
rate and ܦ denotes the maximum distance
between training point in cluster with data in
uniform sampling rate. In implementation of this
method in simulation, it will be proved that
because of generating multitude sub wavelet
network, noise contents will be trained in
estimation of wavelet network. This event
causes overfitting error in estimation. In
simulation, it is proved that the best criterion for
generating the appropriate clusters for domain of
interest is as follow:
ܦ=ݏݑ
݇∈ ℤȁݔ−݇ܶȁ< 0.99 × ܶ (29)
Using formula (29), the number of sub wavelet
network is optimized and results in less noise
content in estimation of wavelet neural network.
The performance of the new proposed
formula is shown in training of exponential
function in Fig.5.
Fig.5 Comparison of estimations for 2 methods
3.2.2 Application of wavelet thresholding
The wavelet thresholding is an effective way for
removing noise content from training data
[6,7,8].
Hard and soft thresholding could be
employed for this purpose. Hard thresholding
can be described as the usual process of setting
to zero the output weights whose absolute values
are lower than the threshold. Soft thresholding is
an extension of hard thresholding, first setting to
zero the output weights whose absolute values
are lower than the threshold, and then shrinking
the nonzero weights towards 0. Fig.6 depicts
ramp signal that is thresholded in the amplitude
of 0.4.
Fig.6 Comparison between hard and soft thresholding
There are some approaches for determining
the value of threshold. The stein’s unbiased risk
estimate (sure), fixed form threshold, the
mixture of two methods and minimax estimation
principle, are four methods that is presented in
the literature [6,7,8,9,10,11].
In addition, there are different types of noise
such as white noise, unscaled white noise and
non white noise that can be treated by different
types of wavelet thresholding. These methods of
wavelet thresholding could be added to the
training procedure of wavelet network for
reducing the effect of overfitting. In simulation
(section 4), it is proved that by defining a
threshold, the estimation performance is
improved.
-10 -8 -6 -4 -2 0 2 4 6 8 10
0
0.5
1
Unnoisy Target function
-10 -8 -6 -4 -2 0 2 4 6 8 10
0
0.5
1
Estim ation of sub wavelet networks using D< 0.25T
-10 -8 -6 -4 -2 0 2 4 6 8 10
0
0.5
1
Estim ation of sub wavelet networks using D< 0.99T
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1386
Issue 12, Volume 7, December 2008
3.2.3 Apply early stopping
Early stopping technique is widely used in
neural network training for reducing the effect of
noise and overfitting. This technique causes the
training course to stops when the test error
begins to increase. In simulation (section 4), it is
shown that this technique intensively affect on
reducing the overfitting error.
4 Simulation
In this part, we indicate the performance of the
proposed modifications described previously.
This section includes two parts. In the first part,
the one dimensional function is employed and in
the second part the two dimensional function is
used.
4.1 Learning one dimensional function
4.1.1 The target function
For comparison purposes, we use the function
that is used in [1]. The target function is:
݂௦ሺݐሻ=σ݁ି(௧ି)మ
ଷ
ୀିଷ (30)
And the interval [-10, 10] is taken as domain of
interest. The shape of target function is shown in
Fig.7 as below:
Fig.7 Target function
For producing non uniform data the random data
with uniform distribution is employed. The
noise function ݊(ݐ) of Gaussian distribution is
added to training data. The noise has zero mean
and variable variance with time as follows:
ߪଶሺݐሻ= (0.005 + 0.009ȁݐȁ)ଶ (31)
The noise function is depicted in Fig.8 as shown
below:
Fig.8 The noise function
Now the training data are generated. For
generating the test data, we use the same
method.
In each iteration, the wavelet network is
tested with test data and if the mean square error
begins to increase, then the training course stops.
The objective of this simulation is to compare
the performance of method in [1] with the
improved one that described in previous
sections. First we should find the appropriate
feedback matrix for training.
4.1.2 Results of training
After some iteration, the values that are achieved
for levels of effect and values of effect values are
as follows:
݇ଵ= 0.4 , ݇ଶ= 0.1
ܽ= 0.3 , ܾ= 0.01 , ܿ= 0 (32)
In simulation, the method that is described in
[1] is called method I and the improved one that
is described in this paper is called method II.
This method contains decreasing of sub wavelet
networks, applying wavelet thresholding and
addition of early stopping to training procedure.
The structures of wavelet network that are
generated of two methods are shown in Table 1.
Table 1
Comparison of two wavelet networks
No. of sub No. of
wavelet networks nodes
Method I 81 413
Method II 7 72
-10 -8 -6 -4 -2 0 2 4 6 8 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
t
f
s
(t)
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
t
n(t)
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1387
Issue 12, Volume 7, December 2008
Table 2
Statistic errors of two methods
Maximum absolute error Mean absolute error Root mean square error
Method I 0.20019 0.043885 0.058885
Method II 0.11919 0.026758 0.036794
According to Table 1, the wavelet network of
method II is so smaller in number of sub wavelet
network and also nodes. The training course of
method I is stopped after 200 iterations, but in
method II the training course stopped after 8
iterations. The value for wavelet thresholding in
method II is chosen0.009. The estimation results
are depicted in the Fig.9 and Fig.10. This result
indicates that in addition of reducing the sub
wavelet networks, the iterations of training
course and also the overfitting error are
significantly reduced. By comparison of two
figures, it’s clear that in addition to noise
reduction, the estimated function in method II is
smoother than the one in method I.
Fig.9 Estimation of wavelet network with method I
Fig.10 Estimation error of wavelet network with method II
4.2 Learning two dimensional function
4.2.1 The target function
The target function that is used for this course of
training is in domain ሾ−10,10ሿ×ሾ−10,10ሿ and
is as follows:
݂௦=݁ି.ଶହ(௫మା௬మ)+݁ି.ଶହ(ඥ௫మା௬మିଵ)మ+
݁ି.ଶହ(ඥ௫మା௬మିଶ)మ (33)
The noise function is of Gaussian distribution
with zero mean and variable variance as follows:
ߪଶ= 10ିସ+ 10ିଷߩଶ (34)
The shapes of target function and noise
function are shown in Fig.11 and Fig.12 as
below:
Fig.11 The target function
Fig.12 The noise function
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
-10 -5 0510
-10
-5
0
5
10
0
0.2
0.4
0.6
0.8
1
x
y
f
s
(x,y)
z
-10 -5 0510
-10
-5
0
5
10
-0.5
0
0.5
x
y
n(x,y)
z
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1388
Issue 12, Volume 7, December 2008
Table 4
Statistic errors of two methods
Maximum absolute error Mean absolute error Root mean square error
Method I 0.45125 0.060125 0.080736
Method II 0.33816 0.045253 0.062437
4.2.2 Results of training
After trial and error, the levels of effect and
values of effect of feedback matrix are calculated
as follows:
݇ଵ= 0.35
ܽ= 0.22 , ܾ= 0 (35)
Table 3 compares the structure of wavelet
networks that are generated by these two
methods.
Table 3
Comparison of two wavelet networks
No. of sub No. of
wavelet networks nodes
Method I 81 170569
Method II 10 7114
According to Table 3, it’s clear that the
wavelet network that is generated by method II
is much smaller than the first one. This leads to
less mathematic computation and compensating
in time of training course.
The Fig.13 to Fig.16 depicts the results of
two methods. As it is clear in these figures, the
results of method II contain less noise and so its
performance in rejecting noise is better than the
method I in [1].
Fig.13 Estimation of wavelet network with method I
Fig.14 Error surface of method I
Fig.15 Estimation of wavelet network with method II
Fig.16 Error surface of method II
-10 -5 0510
-10
-5
0
5
10
-0.5
0
0.5
1
1.5
x
y
f
wn
(x,y)
z
-10 -5 0510
-10
-5
0
5
10
-0.4
-0.2
0
0.2
0.4
0.6
x
y
E
wn
(x,y)
z
-10 -5 0510
-10
-5
0
5
10
-0.5
0
0.5
1
1.5
x
y
fwn(x,y)
z
-10 -5 0510
-10
-5
0
5
10
-0.4
-0.2
0
0.2
0.4
0.6
x
y
E
wn
(x,y)
z
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1389
Issue 12, Volume 7, December 2008
The results of method I are achieved after
100 iterations but in method II, the results are
gained after 15 iterations. The wavelet threshold
value that is chosen in this course of training is
0.009. Table 4 also shows that the maximum
absolute error, mean absolute error and root
mean square error are decreased by reducing the
number of sub wavelet networks and also the
number of nodes and also applying early
stopping technique. Therefore method II avoid
wavelet neural network from over training the
noise contents of target function in training
course.
5. Conclusion
Training wavelet neural network based on
sampling theory has been shown to have good
performance for uniform data. For non uniform
noisy data, however, it encounters severe
problems such as overfitting and large number
of sub wavelet networks and long training
course. This paper has proposed modifications to
this approach to improve its performance. We
proposed a new method for determining the
feedback matrix, reduction the number of sub
wavelet networks and employing some useful
techniques for complexity control like wavelet
thresholding and early stopping to overcome
overfitting. The simulation results proved that
applying this modification results in smaller
wavelet network with faster training and less
overfitting. Actually the main problem in
wavelet networks based on sampling theory is
that in practice, no information about noise is
employed in determination of input weights.
Therefore we recommend new researches on
including the information of target function
and/or noise in determining the input and output
weights of wavelet network.
References:
[1] Zhigou zhang, Learning algorithm of wavelet
network based on sampling theory, Neuro
computing, January 2007.
[2] Daubechies I., Orthonormal Bases of
Compactly Supported Wavelets, Com. Pure
Appl. Math., Vol.XLI, 1988, pp.909-996.
[3] S.G. Mallat, A theory for multi-resolution
signal decomposition: the wavelet
representation, IEEE Trans. Pattern Analysis
Mach. Int., Vol.11, No.7, 1989, pp.674-693.
[4] B.R. Bakshi, G. Stephanopoulos, Wave-net:
a multiresolution hierarchical neural network
with localized learning, AIChE Journal,Vol.39,
No.1, 1993, pp.57-81.
[5] A.A. Safavi, Wavelet neural networks and
multiresolution analysis with applications to
process systems engineering, PhD thesis,
Department of Chemical Engineering, The
University of Sydney, Australia, January 1996.
[6] D.L.Donoho, De-noising via soft-
thresholding, IEEE Trans. Inform., Theory 41,
1995, pp.613-627.
[7] D.L.Donoho, I.M.Johnstone, Ideal spatial
adaptation by wavelet shrinkage, Biometrika 81 ,
1994, pp.425-455.
[8] D.L. Donoho, I.M.Johnstone, Adapting to
unknown smoothness via wavelet shrinking, J.
Amer. Statist. Assoc., Vol.90, 1995, pp.1200-
1224.
[9] D.L. Donoho, I.M.Johnstone, Minimax
estimation via wavelet shrinkage, Ann. Statist.,
Vol.26, 1998, pp.879-921.
[10] A.Kovac, B.W.Silverman, Extending the
scope of wavelet regression methods by
coefficient-dependent thresholding, J. Amer.
Statist. Assoc., Vol.95, 2000, pp.172-183.
[11] M. Jansen, G.Nason, B. Silverman,
Scattered data smoothing by empirical Bayesian
shrinkage of second generation wavelet
coefficients, IX Proceedings of the SPIE,
vol.4478, 2001, pp.87-97.
[12] I.M.Johnstone, B.W. Silverman, Needles
and straw in haystacks: empirical Bayes
estimates of possibly sparse sequences, Ann.
Statist., Vol.32, No.4, 2004, pp.1594-1649.
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1390
Issue 12, Volume 7, December 2008
[13] I.M.Johnstone, B.W. Silverman, Empirical
Bayes selection of wavelet thresholds, Ann.
Statist., Vol.33, 2005.
[14] H.Q.Thuan, S. Rudy, Effective neural
network pruning using crossvalidation, in:
Proceedings of the International Joint
Conference on Neural Networks, 2005, pp. 972–
977.
[15] Van Der, N.T.Merwe, A.J. Hoffman,
Developing an efficient cross validation strategy
to determine classifier performance (CVCP),
Proc. Int. Jt. Conf. Neural Networks, Vol.3
2001, pp.1663–1668.
[16] C.E.Vasios, G.K.Matsopoulos, E.M.
Ventouras, K.S. Nikita, N.Uzunoglu, Cross-
validation and neural network architecture
selection for the classification of intracranial
current sources, Seventh Seminar on Neural
Network Applications in Electrical Engineering,
Proceedings NEUREL, 2004, pp. 151–158.
[17] M. Farokh, Nonuniform Sampling Theory
and Practice, Kluwer Academic/Plenum
Publishers, Dordrecht/New York, 2001.
[18] Z. Jun, G.G. Walter, Y. Miao, Wavelet
neural networks for function learning, IEEE
Trans. Signal Process., Vol.43, No.6, 1995,
pp.1485–1496.
[19] Z. Zainuddin, O. Pauline, Function
approximation using artificial neural networks,
WSEAS Trans. On Mathematics, 2008.
[20] P. Radonja, S. Stankovic, D. Drazic, B.
Matovic, Development and Construction of
Generalized Process Models by using Neural
Networks and Multiple Linear Regression,
WSEAS Trans. On systems, Vol.6, No.287,
2007.
[21] V. Niola, R. Oliviero, G. Quaremba, A
neural network application for signal denoising,
WSEAS Trans. on signal processing, Vol.2,
No.88, 2006.
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1391
Issue 12, Volume 7, December 2008