ArticlePDF Available

Non uniform noisy data training using wavelet neural network based on sampling theory‖

Authors:

Abstract

Global convergence and overfitting are the main problem in neural network training. One of the new methods to overcome these problems is sampling theory that is applied in training of wavelet neural network. In this paper this new method is improved for training of wavelet neural network in non uniform and noisy data. The improvements include suggesting a method for finding the appropriate feedback matrix, addition of early stopping and wavelet thresholding to training procedure. Two experiments are conducted for one and two dimensional function. The results establish a satisfied performance of this algorithm in reduction of generalization error, reduction the complexity of wavelet neural network and mainly avoiding overfitting.
Non uniform noisy data training using Wavelet neural network
based on sampling theory
EHSAN HOSSAINI ASL, MEHDI SHAHBAZIAN, KARIM SALAHSHOOR
Department of Automation and Instrumentation
Petroleum University of technology
South khosro, Sattarkhan Street, Tehran, Iran
IRAN
e.hossaini.asl@gmail.com, shahbazian_m@yahoo.com , salahshoor@yahoo.com
Abstract: - Global convergence and overfitting are the main problem in neural network training. One of
the new methods to overcome these problems is sampling theory that is applied in training of wavelet
neural network. In this paper this new method is improved for training of wavelet neural network in non
uniform and noisy data. The improvements include suggesting a method for finding the appropriate
feedback matrix, addition of early stopping and wavelet thresholding to training procedure. Two
experiments are conducted for one and two dimensional function. The results establish a satisfied
performance of this algorithm in reduction of generalization error, reduction the complexity of wavelet
neural network and mainly avoiding overfitting.
Key-Words: wavelet neural network; sampling theory; overfitting; early stopping; feedback matrix;
wavelet thresholding; non uniform data
1 Introduction
Neural networks are proven to be a powerful
tool for modeling nonlinear systems using
numerical data [19,20]. The generalization
capability is an ultimate criterion for measuring
the validity of identified models. Overfitting is
one of the most important problems in neural
network learning. Complexity of model and
noise contents in training data are two major
sources of this problem.
Wavelet neural networks which use wavelets
as basis function are found to have various
interesting properties including fast training and
good generalization performance [2]. Various
methods have been proposed for structure
selection and training of wavelet neural
networks [1,2]. Recently, training wavelet neural
network based on sampling theory has been
proposed by Zhang [1]. This new algorithm is
based on the limited band of wavelet networks,
in which the input weights are determined by the
sampling period or the frequency band of the
target function (if available). This approach has
been shown to have global convergence and
avoid overfitting for non-noisy equi-spaced
samples.
In many practical situations, a finite number
of samples of the target function are known and
there is not a priori information about the
frequency contents and frequency band of the
target function. Without using information about
target signal (or noise), relying on sampling
period for finding the input weights of the neural
network may results in a complex structure and
serious overfitting. To overcome this problem, a
suitable model selection approach for
complexity control and preventing overfitting
should be used.
In case of noisy data, wavelet thresholding
and early stopping are two helpful techniques for
suppressing the overfitting by preventing the
noise to be trained in the wavelet neural
network. In wavelet thresholding technique,
various methods are presented by donoho and
silverman for denoising of uniform data [6,7]
and non uniform data [8,9,10,11,21] in case of
white and colored noise. In the wavelet
thresholding techniques, the wavelet basis
functions with coefficients smaller than
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1381
Issue 12, Volume 7, December 2008
specified thresholds will be eliminated because
they essentially represent noise. This approach
has been proven to be a powerful method in
wavelet domain denoising. In the early stopping
technique, which is a general approach in neural
network modeling, the training data are divided
into several sets for training of networks and
validation of generalization capability. In this
technique, before achieving a minimum training
error, the training course is stopped at certain
iteration. The stopping iteration is decided by
cross validation.
Using only the sampling period for
determination of the input weights of a wavelet
neural network may results in a very large
number of basis functions and overfitting. In this
article, we have shown that using wavelet
thresholding and or early stopping in
conjunction with the sampling information of
the given data will results in a less complex
network with better noise removal.
This article is divided into four sections.
Following this introduction, section 2 briefly
reviews the theory of wavelet networks and its
training based on the sampling theory. In section
3, new approaches developed for improvement
of the wavelet neural network training based on
sampling theory are presented. This section
includes two parts. The first part explains a new
method for constructing the appropriate
feedback matrix and the second part explains
developments in the training wavelet network. In
section 4 simulation results for both one and
two-dimensional target functions are presented.
2 Wavelet network and sampling
theory
2.1 Review of wavelet neural network
In neural network learning, in order to take the
full advantage of orthonormality of basis
function, with localized learning, we need a set
of basis functions which are local and
orthogonal. Wavelets are new family of
localized basis functions that have found many
applications in large areas of science and
engineering [2,3]. Wavelets are universal
approximator which can be used to approximate
any arbitrary multidimensional nonlinear
function. They have many powerful
mathematical properties such as orthonormality,
locality in time and frequency domains, different
degrees of smoothness, fast implementations,
and effective compact support.
Wavelets are usually introduced in a
multiresolution framework developed by Mallat
[3]. We focus on the wavelet networks
constructed from a multiresolution analysis
(MRA) [3]. Consider a function ݂(ݔ) in ܮ(),
where ܮ() denotes the vector space of all
measurable, square integrable one dimensional
functions. In addition, assume ܸ be the vector
space containing all possible approximations of
݂(ݔ) at the resolution ݉. Then, the ladder of
spaces ܸ , ݆߳ represents the successive
resolution levels for ݂(ݔ). The properties of
these spaces are as follows:
1.
(Nested) ܸܸ௝ାଵ , ݆∈ ℤ (1)
2.
݂ݔܸ ݂ݔܸ݇ , (݆,݇)∈ ℤ (2)
3.
(Density) ݈ܿ݋ݏ݁ڂܸ∈ℤ =ܮ() (3)
4.
(Separation) ځܸ∈ℤ =0 (4)
5.
(Scaling) the function ݂(ݔ) belongs to ܸ if and
only if the function ݂(2ି௝ݔ) belongs to ܸ (5)
6.
(Basis) There exists a function ߶ܮ (called a
scaling function or a father wavelet), with
߶,= 2௝ ଶ
Τ߶(2ݔ݇), such that ߶,;݇∈ ℤ
is a basis for ܸ. (6)
The function ߶ is called a scaling function of
the multiresolution analysis (MRA). A family of
scaling functions of the MRA is expressed as:
߶,ݔ= 2௝ ଶ
Τ߶2ݔ݇ , ݆,݇∈ ℤ (7)
Where 2 and k correspond to the dilation and
translation factors of the scaling function
respectively while 2/ is an energy
normalisation factor.
Let ܹ be the orthogonal complement of ܸ to
ܸାଵ (ܸܹ=ܸ௝ାଵ). Then the orthonormal
basis functions corresponding to ܹs named
wavelets and denoted by ߰,s can be easily
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1382
Issue 12, Volume 7, December 2008
obtained from ߶,s[3]. A family of wavelets
may be represented as:
߰,ݔ= 2௝ ଶ
Τ߰2ݔ݇ , ݆,݇∈ ℤ (8)
with 2,k and 2/ being the dilation,
translation, and normalisation factor of the
wavelets, respectively. Next ܮ() can be
expressed as:
ܮ=∈ℤܸ=ܹିଵܹܹ=
∈ℤܹ (9)
Where ܹܹ for ݆݉.
Fig.1 illustrates the relation between ܸ and
ܹ spaces in MRA:
ܸାଵ =ܸܹ
ܮ()
Fig.1 Embedded spacesܸs for multiresolution
representation of ܮ()
Equation (9) indicates that the wavelet basis
generates decomposition of the ܮ space. It
shows that any ܮ function is uniformly
approximated using a wavelet series:
݂ݔ=σ σ ݀,߰,(ݔ)
௞ୀାஶ
௞ୀିஶ
௝ୀାஶ
௝ୀିஶ (10)
If we start from the approximation of the
function at resolution j=0, then:
݂ݔ=݂ݔ+σ σ ݀,߰,(x)
௞ୀାஶ
௞ୀିஶ
௝ୀାஶ
௝ୀ଴ (11)
Where
݂ݔ=σܽ,߶,(ݔ)
௞ୀାஶ
௞ୀିஶ (12)
We can conclude that any function݂(ݔ)ܮ can
be written as a unique linear combination of
wavelets of different resolutions. This means
that ݔ=+݃ିଵݔ+݃ݔ+݃ݔ+ ,
where ݃(ݔ)ܹ is unique. Since ܸ=ܹ+
ܹ௝ି+ and spaces ܸ can be generated by the
scaling function ߶(ݔ)ܮ, there exists
݂௡௘ݔ=σܿ,߶2ݔ݇=
௞ୀିஶ
σܿ,߶,
௞ୀିஶ (13)
Such that ԡ݂ݔ݂௡௘(ݔ)ԡ0 when j→∞. In
fact formula (13) is just the presentation of
wavelet networks with three layers. In an impact
interval of interest, formula (13) can be written
ass:
݂௡௘ݔ=σܿ,߶,(ݔ)
௞ୀூ (14)
Where ߶,=߶(2ݔ݇). A wavelet network
is realized by taking ܿ,s as the output weights,
2s as the input weights and ߶(ݔ݇) as the
activation function.
Variety of approaches have been proposed for
determining wavelet network parameters such as
input weights 2 and also output weights c,s.
Here we use the approach based on sampling
theory proposed by Zhang[1] for specifying
appropriate resolution j.
2.2 Sampling theory
Since we use the sampling theory for the
training of wavelet network, we briefly
introduce some aspects of the sampling theory.
For more discussions, we refer the reader to
[17,18]. An analog signal can be simply
discretized by recording its sample
values݂(݊ܶ)∈ℤ at interval T. An
approximation of f(x) at any ݔ∈ ℝ may be
recovered by interpolating these samples.
If the samples ݔare taken in constant T
period, then the target function is represented as:
݂݊ܶ=݂݊ܶߜ(ݔ݊ܶ) (15)
The Fourier transform of the discrete signal
obtained by sampling f at intervals T is:
݂ݓ=
σ݂(ݓଶ௞గ
)
ାஶ
௞ୀିஶ (16)
If the support of ݂ is included in ߨ ܶ
Τ,ߨ ܶ
Τ
then
݂ݔ=σ݂݇ܶ((ݔ݇ܶ)ܶ
Τ)
ାஶ
௞ୀିஶ (17)
On the other hand, the frequency band of
wavelet network that described in previous
section is obtained as follows:
׬݂௡௘(ݓ)݀ݓ
ିஶ ׬݂௡௘(ݓ)݀ݓ
ିଶ+
2ି௝ߝσܿ,
(18)
So the energy of wavelet network is
concentrated well in the following frequency
band:
2ܾ, 2ܾ (19)
The parameter ܾ only depends on scaling
function. Formula (19) means that the frequency
band of wavelet network can be controlled by
input weights.
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1383
Issue 12, Volume 7, December 2008
Suppose ݇ܶ,݂(݇ܶ)s are training data with
σȁ݂(݇ܶ)ȁ< +
then, by the sampling
theorem, there exists a unique function ݂(ݔ) to
interpolate all training data in the PaleyWiener
space ܲ. On the other hand, a wavelet network
is a function in ܮ() so a wavelet network
represents a function in ܲ if its Fourier
transform has a support included in
ߨ ܶ
Τ,ߨ ܶ
Τ.
This means that the network ݂௡௘(ݔ)whose
Fourier transform has a support in ߨ ܶ
Τ,ߨ ܶ
Τ
is complex enough to recover a band-limited
function. Since the regularity of function is
related to the asymptotic decay of Fourier
transform, a band-limited function is always
‘‘smoother’’ than the other functions. This
means that the training results that are obtained
when the frequency band of wavelet network is
limited in the interval ߨ ܶ
Τ,ߨ ܶ
Τ are always
more regular than the results that are obtained
when the band is limited in the other intervals,
so we limit the frequency band of wavelet
network to ߨ ܶ
Τ,ߨ ܶ
Τ in our new algorithm.
According to (18), the input weights can be
calculated using following formula:
2=ߨ(ܾ×ܶ)
Τ (20)
This formula is the consequence of a
distinguishing property of wavelet function that
its energy is well localized in frequency domain.
For constructing the structure of wavelet
network, the property of energy concentration of
wavelet in time domain should be employed. In
wavelet network, the ݇ node has the following
input-output function:
ܵ௢௨௧=߶(2ܵ݇) (21)
Where ܵ௜௡ is the input, 2s are the input
weights, k is the ݇௧௛ threshold and ߶() is the
scaling function. If the support of scaling
function is limited to 0, ܰ, then the ݇௧௛ node
of network has the following support:
2ି௝݇, 2ି௝(ܰ+݇) (22)
Assume the domain of interest for estimation
of function is the interval [a,b], then the
translations are found as follows:
2ܽܰ݇2ܾ (23)
Many methods have been proposed for training
output weights of wavelet neural network based
on minimizing error function:
ܬ݂,݂௡௘=σ ȁ݂ݔ݂௡௘(ݔ)ȁ
௜ୀଵ (24)
Whereݔ,݂(ݔ)s are samples and݂௡௘(ݔ)s
are output of approximator. Without any
additive term, this cost function is widely used
in the training of networks because of
convenient implementation.
Three commonly used methods are direct
solution method, iterative method and inner
product method. In this paper, the iterative
method is employed for training the output
weights. In this method the output weights can
be calculated as follows:
ܧ(௞ାଵ)=ܨ߶×ܥ(௞ାଵ) (25)
ܥ(௞ାଵ)=ܥ()+ܣܧ() (26)
The column vector ܧ() denotes the error of
interpolation by the wavelet network at ݇௧௛
iteration, the column vector ܥ() represents the
output weights at ݇௧௛ iteration and the matrix ܣ
is the feedback matrix. The values of elements in
the feedback matrix indicate that how much the
errors in each data point would affect on output
weights. The ߶× matrix is
߶×=
߶,(ݔ)߶,(ݔ)߶,(ݔ)
⋮ ⋯ ⋯ ⋮
߶,(ݔ)߶,(ݔ)߶,(ݔ)
(27)
Where the subscript ܫ=2ܽܰ and the
subscript ܫ=2ܾ, which denote respectively,
the minimum and maximum of translation ݇
obtained from (23).
The structure of wavelet network with two sub
wavelet network is depicted in Fig.2:
Fig.2 structure of wavelet neural network with two sub
network
Zhang [1] has shown that we always can find a
feedback matrix that causes the iterative course
to converge to fix point. However there is not a
unique method for constructing the feedback
matrix. In the next section, an intuitive method
for finding this matrix is proposed.
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1384
Issue 12, Volume 7, December 2008
0 10 20 30 40 50 60 70 80 90 100
0
0.2
0.4
MSE
K1=0.4 , K2=0.1 , a=0.3 , b=0.01 , c=0.001
0 10 20 30 40 50 60 70 80 90 100
0
5
10
MSE
K1=0.3 , K2=0.1 , a=0.3 , b=0.01 , c=0.001
0 10 20 30 40 50 60 70 80 90 100
0
5
10 x 1027
MSE
K1=0.4 , K2=0.1 , a=0.4 , b=0.01 , c=0.001
0 10 20 30 40 50 60 70 80 90 100
0
100
200
MSE
No. of iterations
K1=0.3 , a=0.3 , b=0.01 , c=0.001
3 Modifications in the training of
wavelet network
We propose three modifications for
improvement in training of wavelet network that
will be discussed in the following sections.
3.1 Determination of appropriate
feedback matrix
In this part, an intuitive approach for finding the
appropriate feedback matrix is proposed. In this
method the feedback matrix is constructed based
on the ߶ matrix. This method uses the receptive
field of each node or scaling function in wavelet
network. It means that each output weight is
trained only using the training data that lie in the
receptive field of its node. However, all of these
training data should not have the same effect in
training of output weight. It means that the
training data that the scaling function has a
larger value in its location have a stronger effect
in training of output weight.
On the other hand, the elements of feedback
matrix represent the effect of each training data
in training of output weight in each node. Since
the values of ߶ matrix represents the amplitude
of scaling function in each data point, the
feedback matrix should be constructed based on
߶ matrix. We use the forth order cardinal scaling
function as activation function that shown in
Fig.3 as follows:
Fig.3 Fourth order cardinal scaling function at scale=1
According to Fig.3, the domain that the
scaling function has larger amplitude is in the
vicinity of zero. In other words, the training data
that lie in this vicinity have intensive effect on
training of the correspondent output weight.
Therefore we can define different levels of effect
for training data in each node. Therefore, the
procedure for finding the appropriate feedback
matrix is stated as follow:
1. Generate raw feedback matrix: ܣ=߶
2. Define levels of effect by partitioning the
amplitude of scaling function by levels
ܭ1 & ܭ2 & as depicted in Fig.1 (about 2 or 3
levels is appropriate.)
3. Assign the values of a, b and c that represent
the values of effect for each level. These values
can be calculated by trial and error.
Performance of the algorithm is indicated by
training a sinusoidal function. In this example,
different groups of levels of effect and values of
effects are applied in training course. The
convergence of four training courses is
compared together. The results are shown
below:
Fig.4 Comparison of MSE for different feedback matrix
In Fig.4, mean square error (MSE) versus
Number of iteration is depicted. In comparison
between 4 figures, its concluded that by
appropriate assignment of levels of effect and
values of effect, the MSE converges to zero by
increasing of iteration.
3.2 Improvement in algorithm for
training non uniform data
For uniform sampled data, training wavelet
neural network based on sampling theory shows
-3 -2 -1 0 1 2 3
0
0.1
0.2
0.3
0.4
0.5
0.6 Value of effect a
Value of effect b
Value of effect c
Level of effect K1
Level of effect K2
Level of effect K3
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1385
Issue 12, Volume 7, December 2008
0 50 100
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1Original signal
0 50 100
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Hard thresholded signal
0 50 100
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Soft thresholded signal
quite acceptable results. However, for non
uniform data, the algorithm encounters severe
problems such as high overfitting error and
deviation of estimated function from the actual
target function. Here we use some available
techniques to overcome this problem. These
techniques are described in detail in the
following sections.
3.2.1 Optimum number for sub wavelet
networks
For training of non uniform data, the domain of
interest for estimation is divided in some clusters
that the sampling rate is approximately uniform
in each cluster. The formula that is proposed by
Zhang in [1] is described as follows:
ܦ=ݏݑ݌
݇∈ ℤȁݔ݇ܶȁ< 0.25 × ܶ (28)
In this formula, ݔ denotes the training data at
݇௧௛ point, ܶ denotes the approximate sampling
rate and ܦ denotes the maximum distance
between training point in cluster with data in
uniform sampling rate. In implementation of this
method in simulation, it will be proved that
because of generating multitude sub wavelet
network, noise contents will be trained in
estimation of wavelet network. This event
causes overfitting error in estimation. In
simulation, it is proved that the best criterion for
generating the appropriate clusters for domain of
interest is as follow:
ܦ=ݏݑ݌
݇∈ ℤȁݔ݇ܶȁ< 0.99 × ܶ (29)
Using formula (29), the number of sub wavelet
network is optimized and results in less noise
content in estimation of wavelet neural network.
The performance of the new proposed
formula is shown in training of exponential
function in Fig.5.
Fig.5 Comparison of estimations for 2 methods
3.2.2 Application of wavelet thresholding
The wavelet thresholding is an effective way for
removing noise content from training data
[6,7,8].
Hard and soft thresholding could be
employed for this purpose. Hard thresholding
can be described as the usual process of setting
to zero the output weights whose absolute values
are lower than the threshold. Soft thresholding is
an extension of hard thresholding, first setting to
zero the output weights whose absolute values
are lower than the threshold, and then shrinking
the nonzero weights towards 0. Fig.6 depicts
ramp signal that is thresholded in the amplitude
of 0.4.
Fig.6 Comparison between hard and soft thresholding
There are some approaches for determining
the value of threshold. The steins unbiased risk
estimate (sure), fixed form threshold, the
mixture of two methods and minimax estimation
principle, are four methods that is presented in
the literature [6,7,8,9,10,11].
In addition, there are different types of noise
such as white noise, unscaled white noise and
non white noise that can be treated by different
types of wavelet thresholding. These methods of
wavelet thresholding could be added to the
training procedure of wavelet network for
reducing the effect of overfitting. In simulation
(section 4), it is proved that by defining a
threshold, the estimation performance is
improved.
-10 -8 -6 -4 -2 0 2 4 6 8 10
0
0.5
1
Unnoisy Target function
-10 -8 -6 -4 -2 0 2 4 6 8 10
0
0.5
1
Estim ation of sub wavelet networks using D< 0.25T
-10 -8 -6 -4 -2 0 2 4 6 8 10
0
0.5
1
Estim ation of sub wavelet networks using D< 0.99T
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1386
Issue 12, Volume 7, December 2008
3.2.3 Apply early stopping
Early stopping technique is widely used in
neural network training for reducing the effect of
noise and overfitting. This technique causes the
training course to stops when the test error
begins to increase. In simulation (section 4), it is
shown that this technique intensively affect on
reducing the overfitting error.
4 Simulation
In this part, we indicate the performance of the
proposed modifications described previously.
This section includes two parts. In the first part,
the one dimensional function is employed and in
the second part the two dimensional function is
used.
4.1 Learning one dimensional function
4.1.1 The target function
For comparison purposes, we use the function
that is used in [1]. The target function is:
݂ݐ=σ݁ି(௧ି଺௞)
௞ୀିଷ (30)
And the interval [-10, 10] is taken as domain of
interest. The shape of target function is shown in
Fig.7 as below:
Fig.7 Target function
For producing non uniform data the random data
with uniform distribution is employed. The
noise function ݊(ݐ) of Gaussian distribution is
added to training data. The noise has zero mean
and variable variance with time as follows:
ߪݐ= (0.005 + 0.009ȁݐȁ) (31)
The noise function is depicted in Fig.8 as shown
below:
Fig.8 The noise function
Now the training data are generated. For
generating the test data, we use the same
method.
In each iteration, the wavelet network is
tested with test data and if the mean square error
begins to increase, then the training course stops.
The objective of this simulation is to compare
the performance of method in [1] with the
improved one that described in previous
sections. First we should find the appropriate
feedback matrix for training.
4.1.2 Results of training
After some iteration, the values that are achieved
for levels of effect and values of effect values are
as follows:
݇= 0.4 , ݇= 0.1
ܽ= 0.3 , ܾ= 0.01 , ܿ= 0 (32)
In simulation, the method that is described in
[1] is called method I and the improved one that
is described in this paper is called method II.
This method contains decreasing of sub wavelet
networks, applying wavelet thresholding and
addition of early stopping to training procedure.
The structures of wavelet network that are
generated of two methods are shown in Table 1.
Table 1
Comparison of two wavelet networks
No. of sub No. of
wavelet networks nodes
Method I 81 413
Method II 7 72
-10 -8 -6 -4 -2 0 2 4 6 8 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
t
f
s
(t)
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
t
n(t)
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1387
Issue 12, Volume 7, December 2008
Table 2
Statistic errors of two methods
Maximum absolute error Mean absolute error Root mean square error
Method I 0.20019 0.043885 0.058885
Method II 0.11919 0.026758 0.036794
According to Table 1, the wavelet network of
method II is so smaller in number of sub wavelet
network and also nodes. The training course of
method I is stopped after 200 iterations, but in
method II the training course stopped after 8
iterations. The value for wavelet thresholding in
method II is chosen0.009. The estimation results
are depicted in the Fig.9 and Fig.10. This result
indicates that in addition of reducing the sub
wavelet networks, the iterations of training
course and also the overfitting error are
significantly reduced. By comparison of two
figures, its clear that in addition to noise
reduction, the estimated function in method II is
smoother than the one in method I.
Fig.9 Estimation of wavelet network with method I
Fig.10 Estimation error of wavelet network with method II
4.2 Learning two dimensional function
4.2.1 The target function
The target function that is used for this course of
training is in domain 10,10×10,10 and
is as follows:
݂=݁ି଴.ଶହ(ା௬)+݁ି଴.ଶହ(ା௬ିଵ଴)+
݁ି଴.ଶହ(ା௬ିଶ଴) (33)
The noise function is of Gaussian distribution
with zero mean and variable variance as follows:
ߪ= 10ିସ+ 10ିଷߩ (34)
The shapes of target function and noise
function are shown in Fig.11 and Fig.12 as
below:
Fig.11 The target function
Fig.12 The noise function
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
-10 -8 -6 -4 -2 0 2 4 6 8 10
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
-10 -5 0510
-10
-5
0
5
10
0
0.2
0.4
0.6
0.8
1
x
y
f
s
(x,y)
z
-10 -5 0510
-10
-5
0
5
10
-0.5
0
0.5
x
y
n(x,y)
z
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1388
Issue 12, Volume 7, December 2008
Table 4
Statistic errors of two methods
Maximum absolute error Mean absolute error Root mean square error
Method I 0.45125 0.060125 0.080736
Method II 0.33816 0.045253 0.062437
4.2.2 Results of training
After trial and error, the levels of effect and
values of effect of feedback matrix are calculated
as follows:
݇= 0.35
ܽ= 0.22 , ܾ= 0 (35)
Table 3 compares the structure of wavelet
networks that are generated by these two
methods.
Table 3
Comparison of two wavelet networks
No. of sub No. of
wavelet networks nodes
Method I 81 170569
Method II 10 7114
According to Table 3, its clear that the
wavelet network that is generated by method II
is much smaller than the first one. This leads to
less mathematic computation and compensating
in time of training course.
The Fig.13 to Fig.16 depicts the results of
two methods. As it is clear in these figures, the
results of method II contain less noise and so its
performance in rejecting noise is better than the
method I in [1].
Fig.13 Estimation of wavelet network with method I
Fig.14 Error surface of method I
Fig.15 Estimation of wavelet network with method II
Fig.16 Error surface of method II
-10 -5 0510
-10
-5
0
5
10
-0.5
0
0.5
1
1.5
x
y
f
wn
(x,y)
z
-10 -5 0510
-10
-5
0
5
10
-0.4
-0.2
0
0.2
0.4
0.6
x
y
E
wn
(x,y)
z
-10 -5 0510
-10
-5
0
5
10
-0.5
0
0.5
1
1.5
x
y
fwn(x,y)
z
-10 -5 0510
-10
-5
0
5
10
-0.4
-0.2
0
0.2
0.4
0.6
x
y
E
wn
(x,y)
z
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1389
Issue 12, Volume 7, December 2008
The results of method I are achieved after
100 iterations but in method II, the results are
gained after 15 iterations. The wavelet threshold
value that is chosen in this course of training is
0.009. Table 4 also shows that the maximum
absolute error, mean absolute error and root
mean square error are decreased by reducing the
number of sub wavelet networks and also the
number of nodes and also applying early
stopping technique. Therefore method II avoid
wavelet neural network from over training the
noise contents of target function in training
course.
5. Conclusion
Training wavelet neural network based on
sampling theory has been shown to have good
performance for uniform data. For non uniform
noisy data, however, it encounters severe
problems such as overfitting and large number
of sub wavelet networks and long training
course. This paper has proposed modifications to
this approach to improve its performance. We
proposed a new method for determining the
feedback matrix, reduction the number of sub
wavelet networks and employing some useful
techniques for complexity control like wavelet
thresholding and early stopping to overcome
overfitting. The simulation results proved that
applying this modification results in smaller
wavelet network with faster training and less
overfitting. Actually the main problem in
wavelet networks based on sampling theory is
that in practice, no information about noise is
employed in determination of input weights.
Therefore we recommend new researches on
including the information of target function
and/or noise in determining the input and output
weights of wavelet network.
References:
[1] Zhigou zhang, Learning algorithm of wavelet
network based on sampling theory, Neuro
computing, January 2007.
[2] Daubechies I., Orthonormal Bases of
Compactly Supported Wavelets, Com. Pure
Appl. Math., Vol.XLI, 1988, pp.909-996.
[3] S.G. Mallat, A theory for multi-resolution
signal decomposition: the wavelet
representation, IEEE Trans. Pattern Analysis
Mach. Int., Vol.11, No.7, 1989, pp.674-693.
[4] B.R. Bakshi, G. Stephanopoulos, Wave-net:
a multiresolution hierarchical neural network
with localized learning, AIChE Journal,Vol.39,
No.1, 1993, pp.57-81.
[5] A.A. Safavi, Wavelet neural networks and
multiresolution analysis with applications to
process systems engineering, PhD thesis,
Department of Chemical Engineering, The
University of Sydney, Australia, January 1996.
[6] D.L.Donoho, De-noising via soft-
thresholding, IEEE Trans. Inform., Theory 41,
1995, pp.613-627.
[7] D.L.Donoho, I.M.Johnstone, Ideal spatial
adaptation by wavelet shrinkage, Biometrika 81 ,
1994, pp.425-455.
[8] D.L. Donoho, I.M.Johnstone, Adapting to
unknown smoothness via wavelet shrinking, J.
Amer. Statist. Assoc., Vol.90, 1995, pp.1200-
1224.
[9] D.L. Donoho, I.M.Johnstone, Minimax
estimation via wavelet shrinkage, Ann. Statist.,
Vol.26, 1998, pp.879-921.
[10] A.Kovac, B.W.Silverman, Extending the
scope of wavelet regression methods by
coefficient-dependent thresholding, J. Amer.
Statist. Assoc., Vol.95, 2000, pp.172-183.
[11] M. Jansen, G.Nason, B. Silverman,
Scattered data smoothing by empirical Bayesian
shrinkage of second generation wavelet
coefficients, IX Proceedings of the SPIE,
vol.4478, 2001, pp.87-97.
[12] I.M.Johnstone, B.W. Silverman, Needles
and straw in haystacks: empirical Bayes
estimates of possibly sparse sequences, Ann.
Statist., Vol.32, No.4, 2004, pp.1594-1649.
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1390
Issue 12, Volume 7, December 2008
[13] I.M.Johnstone, B.W. Silverman, Empirical
Bayes selection of wavelet thresholds, Ann.
Statist., Vol.33, 2005.
[14] H.Q.Thuan, S. Rudy, Effective neural
network pruning using crossvalidation, in:
Proceedings of the International Joint
Conference on Neural Networks, 2005, pp. 972
977.
[15] Van Der, N.T.Merwe, A.J. Hoffman,
Developing an efficient cross validation strategy
to determine classifier performance (CVCP),
Proc. Int. Jt. Conf. Neural Networks, Vol.3
2001, pp.16631668.
[16] C.E.Vasios, G.K.Matsopoulos, E.M.
Ventouras, K.S. Nikita, N.Uzunoglu, Cross-
validation and neural network architecture
selection for the classification of intracranial
current sources, Seventh Seminar on Neural
Network Applications in Electrical Engineering,
Proceedings NEUREL, 2004, pp. 151158.
[17] M. Farokh, Nonuniform Sampling Theory
and Practice, Kluwer Academic/Plenum
Publishers, Dordrecht/New York, 2001.
[18] Z. Jun, G.G. Walter, Y. Miao, Wavelet
neural networks for function learning, IEEE
Trans. Signal Process., Vol.43, No.6, 1995,
pp.14851496.
[19] Z. Zainuddin, O. Pauline, Function
approximation using artificial neural networks,
WSEAS Trans. On Mathematics, 2008.
[20] P. Radonja, S. Stankovic, D. Drazic, B.
Matovic, Development and Construction of
Generalized Process Models by using Neural
Networks and Multiple Linear Regression,
WSEAS Trans. On systems, Vol.6, No.287,
2007.
[21] V. Niola, R. Oliviero, G. Quaremba, A
neural network application for signal denoising,
WSEAS Trans. on signal processing, Vol.2,
No.88, 2006.
WSEAS TRANSACTIONS on SYSTEMS
Ehsan Hossaini Asl, Mehdi Shahbazian, Karim Salahshoor
ISSN: 1109-2777
1391
Issue 12, Volume 7, December 2008
Article
Full-text available
Car seat fabrics are uniquely fashioned textiles. A number of them is branded by a sponged-like appearance, characterized by spots and slightly discoloured areas. Their surface anisotropy is considered to be a relevant aesthetic feature since it has a strong impact on customer perceived quality. A first-rate car seat fabric requires a ―small‖ quantity of spots and discoloured areas while fabrics characterized either by a large number or by a low number of spots, are considered to be of lower quality. Therefore, car seat fabric quality grading is a relevant issue to be dealt with downstream to the production line. Nowadays, sponged-like fabric grading is performed by human experts by means of manual inspection and classification; though this manual classification proves to be effective in fabric grading, the process is subjective and its results may vary depending on the operator skills. Accordingly, the definition of a method for the automatic and objective grading of sponged-like fabrics is necessary. The present work aims to provide a computer-based tool capable of classifying sponged-like fabrics, as closely as possible to classifications performed by skilled operators. Such a tool, composed by an appositely devised machine vision system, is capable of extracting a number of numerical parameters characterizing the fabric veins and discoloured areas. Such parameters are, then, used for training an Artificial Neural Network (ANN) with the aim of classifying the fabrics in terms of quality. Finally, a comparison between the ANN-based classification and the one provided by fabric inspectors is performed. The proposed method, tested on a validation set composed by 65 sponged-like fabrics, proves to be able to classify the fabrics into the correct quality class in 93.8% of the cases, with respect to the selection provided by human operators.
Article
Full-text available
Car seat fabrics are uniquely fashioned textiles. A number of them is branded by a sponged-like appearance, characterized by spots and slightly discoloured areas. Their surface anisotropy is considered to be a relevant aesthetic feature since it has a strong impact on customer perceived quality. A first-rate car seat fabric requires a "small" quantity of spots and discoloured areas while fabrics characterized either by a large number or by a low number of spots, are considered to be of lower quality. Therefore, car seat fabric quality grading is a relevant issue to be dealt with downstream to the production line. Nowadays, sponged-like fabric grading is performed by human experts by means of manual inspection and classification; though this manual classification proves to be effective in fabric grading, the process is subjective and its results may vary depending on the operator skills. Accordingly, the definition of a method for the automatic and objective grading of sponged-like fabrics is necessary. The present work aims to provide a computer-based tool capable of classifying sponged-like fabrics, as closely as possible to classifications performed by skilled operators. Such a tool, composed by an appositely devised machine vision system, is capable of extracting a number of numerical parameters characterizing the fabric veins and discoloured areas. Such parameters are, then, used for training an Artificial Neural Network (ANN) with the aim of classifying the fabrics in terms of quality. Finally, a comparison between the ANN-based classification and the one provided by fabric inspectors is performed. The proposed method, tested on a validation set composed by 65 sponged-like fabrics, proves to be able to classify the fabrics into the correct quality class in 93.8% of the cases, with respect to the selection provided by human operators.
Conference Paper
Wavelet neural network based on sampling theory has been found to have a good performance in function approximation. In this paper, this type of wavelet neural network is applied to modeling and control of a nonlinear dynamic system and some methods are employed to optimize the structure of wavelet neural network to prevent a large number of nodes. The direct inverse control technique is employed for investigating the ability of this network in control application. A variety of simulations are conducted for demonstrating the performance of the direct inverse control using wavelet neural network. The performance of this approach is compared with direct inverse control using multilayer perceptron neural network (MLP). Simulation results show that our proposed method reveals better stability and performance in reference tracking and control action.
Article
Full-text available
Function approximation, which finds the underlying relationship from a given finite input-output data is the fundamental problem in a vast majority of real world applications, such as prediction, pattern recognition, data mining and classification. Various methods have been developed to address this problem, where one of them is by using artificial neural networks. In this paper, the radial basis function network and the wavelet neural network are applied in estimating periodic, exponential and piecewise continuous functions. Different types of basis functions are used as the activation function in the hidden nodes of the radial basis function network and the wavelet neural network. The performance is compared by using the normalized square root mean square error function as the indicator of the accuracy of these neural network models.
Article
In this paper, a wavelet-based neural network is described. The structure of this network is similar to that of the radial basis function (RBP) network, except that here the radial basis functions are replaced by orthonormal scaling functions that are not necessarily radial symmetric. The efficacy of this type of network in function learning and estimation is demonstrated through theoretical analysis and experimental results. In particular, it has been shown that the wavelet network has universal and L(2) approximation properties and is a consistent function estimator. Convergence rates associated with these properties are obtained for certain function classes where the rates avoid the ''curse of dimensionality.'' In the experiments, the wavelet network performed well and compared favorably to the MLP and RBF networks.
Article
In this paper neural networks, linear and multiple linear regression are used for obtaining generalized (regional) models of biological processes. Regional models enable getting the most important regional characteristics without detailed measurements on all individual objects Applications of the regional models in estimation of the most important, general (common) characteristic of the whole region is shown. Testing of the obtained regional model by using both data from selected samples and regression analyses technique is done. A very high correlation is obtained between referent data and data computed on the basis of regional models. It is shown that application of NNs provides better regional models than those obtained by linear or multiple linear regression.
Article
We attempt to recover an unknown function from noisy, sampled data. Using orthonormal bases of compactly supported wavelets, we develop a nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients. The shrinkage can be tuned to be nearly minimax over any member of a wide range of Triebel- and Besov-type smoothness constraints and asymptotically minimax over Besov bodies with p ≤ q. Linear estimates cannot achieve even the minimax rates over Triebel and Besov classes with p < 2, so the method can significantly outperform every linear method (e.g., kernel, smoothing spline, sieve) in a minimax sense. Variants of our method based on simple threshold nonlinear estimators are nearly minimax. Our method possesses the interpretation of spatial adaptivity; it reconstructs using a kernel which may vary in shape and bandwidth from point to point, depending on the data. Least favorable distributions for certain of the Triebel and Besov scales generate objects with sparse wavelet transforms. Many real objects have similarly sparse transforms, which suggests that these minimax results are relevant for practical problems. Sequels to this paper, which was first drafted in November 1990, discuss practical implementation, spatial adaptation properties, universal near minimaxity and applications to inverse problems.
Article
We attempt to recover a function of unknown smoothness from noisy sampled data. We introduce a procedure, SureShrirtk, that suppresses noise by thresholding the empirical wavelet coefficients. The thresholding is adaptive: A threshold level is assigned to each dyadic resolution level by the principle of minimizing the Stein unbiased estimate of risk (Sure) for threshold estimates. The computational effort of the overall procedure is order N · log(N) as a function of the sample size N, SurvShrink is smoothness adaptive: If the unknown function contains jumps, then the reconstruction (essentially) does also; if the unknown function has a smooth piece, then the reconstruction is (essentially) as smooth as the mother wavelet will allow. The procedure is in a sense optimally smoothness adaptive: It is near minimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet. We know from a previous paper by the authors that traditional smoothing methods-kernels, splines, and orthogonal series estimates-even with optimal choices of the smoothing parameter, would be unable to perform in a near-minimax way over many spaces in the Besov scale. Examples of SureShtink are given. The advantages of the method are particularly evident when the underlying function has jump discontinuities on a smooth background.
Article
A Wave-Net is an artificial neural network with one hidden layer of nodes, whose basis functions are drawn from a family of orthonormal wavelets. The good localization characteristics of the basis functions, both in the input and frequency domains, allow hierarchical, multiresolution learning of input-output maps from experimental data. Furthermore, Wave-Nets allow explicit estimation for global and local prediction error-bounds, and thus lend themselves to a rigorous and explicit design of the network. This article presents the mathematical framework for the development of Wave-Nets and discusses the various aspects of their practical implementation. Computational complexity arguments prove that the training and adaptation efficiency of Wave-Nets is at least an order of magnitude better than other networks. In addition, it presents two examples on the application of Wave-Nets; (a) the prediction of a chaotic time-series, representing population dynamics, and (b) the classification of experimental data for process fault diagnosis.